Documenting Kafka Topics with AsyncAPI 3.0
The symptom is familiar: a new team joins an event-driven platform, asks which Kafka topic carries order events, what the message key is, how many partitions the topic has, and whether the schema is Avro or JSON — and the answer is “ask Diego, he set it up in 2022.” That tribal knowledge breaks the moment Diego is on holiday, a consumer team onboards remotely, or a schema change goes in without warning. This guide is part of the AsyncAPI for Event-Driven Systems cluster and shows exactly how to encode all of that Kafka-specific metadata — topic name, partitioning, key schema, cleanup policy, and Avro/schema registry wiring — into a single AsyncAPI 3.0 document that can be validated, diffed, and rendered into documentation automatically.
Root Cause: What Goes Wrong Without a Contract
Kafka’s operational model scatters the metadata that a consumer needs across four different surfaces: the broker admin API (partition count, replication factor, retention, cleanup policy), the schema registry (Avro subject, schema version), the producer codebase (key serializer class, key type), and the team wiki (intent, SLA, ownership). None of these surfaces is version-controlled alongside the schema, none of them produces a machine-readable diff when something changes, and none of them blocks a deployment when a breaking change ships.
The concrete failures look like:
- A producer team changes the key from
orderId(UUID string) totenantId:orderId(composite string) to improve partition locality. Consumers hard-coded the UUID assumption and begin silently misrouting records to the wrong business context. - A compacted topic has three consumers, each of which assumes a different key cardinality. One deletes records it shouldn’t because its key interpretation differs from the producer’s.
- An Avro schema evolves with a removed field. The schema registry rejects it under
BACKWARDcompatibility, but the rejection only surfaces at deploy time because no CI step validated the schema contract beforehand. - A new team onboards and spends two days reverse-engineering the topic structure from Kafka consumer group offsets and broker logs rather than reading a document.
AsyncAPI 3.0 Kafka bindings encode all of this in one versioned file that asyncapi validate checks structurally and asyncapi diff checks for breaking changes.
Step-by-Step Fix
Step 1 — Declare the Kafka server binding with schema registry
Start with the broker. The servers block in AsyncAPI 3.0 carries a bindings.kafka object that records the schema registry URL, its vendor, and the security protocol. This is connection-level metadata shared by every topic the service uses.
# asyncapi.yaml — AsyncAPI 3.0.0
asyncapi: 3.0.0
id: urn:com:acme:order-service
info:
title: Order Service Events
version: 2.1.0
defaultContentType: application/vnd.apache.avro+json;version=1.9.0
servers:
production-kafka:
host: kafka.acme.internal:9092
protocol: kafka-secure # TLS + SASL
description: Primary Kafka cluster (us-east-1)
bindings:
kafka:
schemaRegistryUrl: https://registry.acme.internal # Confluent Schema Registry
schemaRegistryVendor: confluent # also: ibm, aws
# Why this works: asyncapi validate checks the binding schema, and the
# generator uses these values to emit registry-aware producer/consumer code.
The schemaRegistryVendor field is significant: the AsyncAPI generator and tooling like contract-driven mocking with Microcks use it to select the correct registry client when generating stubs or replaying messages.
Step 2 — Add kafka channel binding for topic-level metadata
Each channel maps to one Kafka topic. The bindings.kafka block on the channel documents the topic configuration that the broker enforces — partition count, replication factor, retention in milliseconds, and cleanup policy. None of this is inferrable from the message schema alone.
channels:
ordersCreated:
address: orders.created # exact Kafka topic name
description: >
Emitted by order-service when an order transitions to CONFIRMED state.
Partitioned by customerId. Retained 14 days. Read by billing, fulfillment,
and analytics services.
servers:
- $ref: '#/servers/production-kafka'
bindings:
kafka:
topic: orders.created # redundant with address but explicit
partitions: 24 # partition count as of schema version 2.x
replicationFactor: 3
topicConfiguration:
retention.ms: 1209600000 # 14 days
cleanup.policy: delete # 'delete' or 'compact' — see Edge Cases
min.insync.replicas: 2
bindingVersion: "0.5.0"
# Why this works: teams know partition count before they write consumer group
# assignment logic; the diff will flag a partition increase as a change so
# it gets reviewed rather than silently reshuffling consumer assignments.
messages:
orderCreated:
$ref: '#/components/messages/OrderCreated'
Document partitions and replicationFactor even though they look like ops details. A consumer that implements a custom partitioner or a stream processor that maintains partition-local state depends on partition count being stable. Encoding it here makes any change visible in asyncapi diff output.
Step 3 — Document the message key schema
This is the most-overlooked field in Kafka documentation. The Kafka message binding’s key property takes a JSON Schema object describing the key’s type and format. Make it explicit even when the key is a simple UUID string — because “simple” is exactly when teams stop documenting it.
components:
messages:
OrderCreated:
name: OrderCreated
title: Order Created
summary: An order has been confirmed and payment captured.
schemaFormat: "application/vnd.apache.avro+json;version=1.9.0"
contentType: application/octet-stream # Avro is binary on the wire
bindings:
kafka:
key:
type: string
format: uuid
description: >
customerId UUID. All events for one customer land on the same
partition, preserving per-customer ordering. Key is NOT orderId —
a common misconception that causes misrouted consumers.
schemaIdLocation: header # 'header' or 'payload' (magic byte)
schemaIdPayloadEncoding: "confluent"
bindingVersion: "0.5.0"
headers:
type: object
properties:
correlationId:
type: string
format: uuid
description: Distributed trace ID, propagated from the HTTP request.
eventVersion:
type: string
example: "2.1.0"
payload:
# Avro schema for the message value (the order event itself)
type: record
name: OrderCreated
namespace: com.acme.orders.events
doc: "Emitted when an order is confirmed. Backward-compatible additions only."
fields:
- name: orderId
type: string
doc: "UUID v4. Stable identifier for the order."
- name: customerId
type: string
doc: "UUID v4. Used as the partition key — see message binding."
- name: total
type:
type: record
name: Money
fields:
- name: amount
type: string
doc: "Decimal string to avoid floating-point representation issues."
- name: currency
type:
type: enum
name: Currency
symbols: [USD, EUR, GBP]
- name: occurredAt
type: string
doc: "ISO 8601 UTC timestamp."
- name: lineItems
type:
type: array
items:
type: record
name: LineItem
fields:
- name: sku
type: string
- name: quantity
type: int
- name: unitPrice
type: string
The inline doc fields on Avro record fields are first-class documentation. They appear in the generated HTML event catalog and in IDE tooling when engineers work with the generated model classes. Treat them with the same care as code comments on a public API.
Step 4 — Wire the operation and generate docs
Declare the operation that connects the channel to the application’s intent, then validate and generate.
operations:
publishOrderCreated:
action: send
channel:
$ref: '#/channels/ordersCreated'
messages:
- $ref: '#/channels/ordersCreated/messages/orderCreated'
description: >
Publishes OrderCreated after payment capture succeeds.
Guaranteed at-least-once delivery; idempotency key is orderId.
bindings:
kafka:
clientId:
type: string
enum: [order-service-producer] # locks down the producer identity
bindingVersion: "0.5.0"
Now validate and generate:
# Validate the complete document including binding structure
asyncapi validate asyncapi.yaml
# Expected: File asyncapi.yaml is valid!
# Generate a static HTML event catalog
asyncapi generate fromTemplate asyncapi.yaml @asyncapi/html-template@3.0.0 \
-o ./docs/event-catalog \
--param singleFile=false
# Generate TypeScript models from Avro schemas
asyncapi generate models typescript asyncapi.yaml -o ./src/generated/events
# Open the rendered docs
open ./docs/event-catalog/index.html
For Avro schemas, the @asyncapi/html-template renders the Avro record fields, their doc strings, and the message key schema in a structured table. The asyncapi generate models command invokes Modelina with Avro input and produces typed classes — the OrderCreated TypeScript class has orderId, customerId, total, and lineItems fields that stay in sync with the schema registry subject.
Before and After
Before — what tribal knowledge looks like in practice:
# asyncapi.yaml (2.x, pre-documentation)
asyncapi: 2.6.0
channels:
orders.created:
publish:
message:
payload:
type: object
properties:
orderId:
type: string
total:
type: number
# Missing: key schema, partition count, cleanup policy, schema registry,
# Avro format, retention, who produces it, who consumes it.
After — a fully documented AsyncAPI 3.0 contract:
asyncapi: 3.0.0
servers:
production-kafka:
host: kafka.acme.internal:9092
protocol: kafka-secure
bindings:
kafka:
schemaRegistryUrl: https://registry.acme.internal
schemaRegistryVendor: confluent
channels:
ordersCreated:
address: orders.created
bindings:
kafka:
partitions: 24
replicationFactor: 3
topicConfiguration:
retention.ms: 1209600000
cleanup.policy: delete
bindingVersion: "0.5.0"
messages:
orderCreated:
$ref: '#/components/messages/OrderCreated'
components:
messages:
OrderCreated:
schemaFormat: "application/vnd.apache.avro+json;version=1.9.0"
bindings:
kafka:
key:
type: string
format: uuid
description: customerId — partition key for per-customer ordering
schemaIdLocation: header
bindingVersion: "0.5.0"
payload:
type: record
name: OrderCreated
# ... full Avro schema
operations:
publishOrderCreated:
action: send
channel:
$ref: '#/channels/ordersCreated'
The before version tells you there is a topic with an orderId and a total. The after version tells you the partition key is customerId (not orderId), there are 24 partitions, the schema is Avro stored in Confluent Schema Registry via magic-byte header encoding, retention is 14 days, and the topic is not compacted.
If you are migrating from a 2.x document, the AsyncAPI 2 to 3 migration checklist walks through the structural changes systematically before you add Kafka bindings on top.
Verification
Run these checks before merging any change to asyncapi.yaml:
# 1. Structural validation — catches binding schema errors, missing required fields
$ asyncapi validate asyncapi.yaml
File asyncapi.yaml is valid! File asyncapi.yaml and referenced documents
don't have governance problems.
# 2. Breaking-change diff against main branch
$ git show origin/main:asyncapi.yaml > /tmp/base.yaml
$ asyncapi diff /tmp/base.yaml asyncapi.yaml --type breaking
No breaking changes detected.
# 3. Verify generated HTML includes the key schema and Avro fields
$ asyncapi generate fromTemplate asyncapi.yaml @asyncapi/html-template@3.0.0 \
-o /tmp/docs --param singleFile=true
$ grep -c "customerId" /tmp/docs/index.html
# Expected: at least 2 (key description + payload field)
# 4. Confirm generated TypeScript models have the Avro fields
$ asyncapi generate models typescript asyncapi.yaml -o /tmp/models
$ grep "orderId\|customerId\|lineItems" /tmp/models/OrderCreated.ts
# Expected: three property declarations
In CI, wire these four steps into a workflow job that runs on every pull request that touches asyncapi.yaml. Exit non-zero on any failure blocks the merge.
Edge Cases
Avro vs JSON Schema payloads. AsyncAPI 3.0 supports mixed-format documents. If some messages use Avro and others use JSON Schema, set schemaFormat per message rather than relying on defaultContentType. The generator selects the right Modelina parser based on the per-message schemaFormat. Do not set defaultContentType: application/vnd.apache.avro+json at the document root unless every message in the document is Avro — a mismatch causes asyncapi validate to reject messages that lack a schemaFormat override.
Compacted topics and key cardinality. When cleanup.policy: compact, the message key is no longer just a routing hint — it is the primary record identifier. Every message with the same key replaces the previous value for that key in the compacted log. Document this in the channel’s description and in the key schema’s description. If the key type changes from a single entity ID to a composite key (such as tenantId:entityId), the compaction semantics change for every existing consumer and existing records — this is a breaking change that deserves a major version bump and coordinated migration.
Partition key vs message key. Kafka consumers sometimes assume the partition key and the Avro record field that shares its name are the same value. They are not always. The partition key is set by the producer’s partitioner configuration; it is the value passed to ProducerRecord<K, V> as the key argument. Document both: the bindings.kafka.key schema describes what the producer passes as the key argument, and the payload schema describes the Avro value. When a field in the Avro value is also used as the key (a common pattern), add a doc note on that field saying “also used as the Kafka partition key — see message binding.”
Schema registry subject naming. Confluent Schema Registry subjects follow <topic>-key and <topic>-value naming by default under TopicNameStrategy. If your registry uses RecordNameStrategy or TopicRecordNameStrategy, note it in the server binding’s description and in the message’s bindings.kafka block. A consumer that assumes TopicNameStrategy and queries the wrong subject will fail to deserialize. This is the kind of invariant that belongs in the AsyncAPI document, not in a Confluence page.
Frequently Asked Questions
What is the kafka channel binding address field in AsyncAPI 3.0?
The address field on the channel sets the Kafka topic name exactly as Kafka sees it. The kafka channel binding adds topic-level metadata — partition count, cleanup policy, retention — that the address alone cannot express. Both live under the channel, and asyncapi validate checks the binding structure against the published binding schema.
How do I document the Kafka message key schema in AsyncAPI 3.0?
Add a bindings.kafka.key JSON Schema object to the message binding. This makes the partitioning key contract explicit and machine-readable. For Avro keys, point the schema at the registry subject using a $ref or embed the Avro JSON schema inline in the key field.
Can AsyncAPI 3.0 reference a Confluent Schema Registry for Avro schemas?
Yes. Set schemaRegistryUrl and schemaRegistryVendor: confluent on the kafka server binding. Then use schemaFormat: application/vnd.apache.avro+json;version=1.9.0 on the message and provide the Avro schema in the payload field. The asyncapi validate command resolves and checks the schema structure.
Does asyncapi generate work with Avro payload schemas?
Modelina, which backs asyncapi generate models, supports Avro schemas when you set schemaFormat on the message to the Avro MIME type. Generated TypeScript or Java models reflect the Avro record fields. If you mix Avro and JSON Schema payloads in one document, set schemaFormat per message rather than relying on defaultContentType.
How do I document a compacted Kafka topic in AsyncAPI?
Set cleanupPolicy: compact in the kafka channel binding. Also document the key schema carefully in the message binding — compaction semantics are driven by the key, so an undocumented key schema is especially costly on compacted topics where the latest value per key is the record of truth.
What is the difference between a Kafka server binding and a channel binding in AsyncAPI?
The server binding configures the broker connection — schema registry URL, SASL mechanism, security protocol. The channel binding configures one specific topic — partition count, replication factor, retention, cleanup policy. Both are kafka-keyed objects but they live at different levels of the document and serve different audiences: platform engineers read server bindings, application engineers read channel bindings.