Skip to main content

Documenting Kafka Topics with AsyncAPI 3.0

The symptom is familiar: a new team joins an event-driven platform, asks which Kafka topic carries order events, what the message key is, how many partitions the topic has, and whether the schema is Avro or JSON — and the answer is “ask Diego, he set it up in 2022.” That tribal knowledge breaks the moment Diego is on holiday, a consumer team onboards remotely, or a schema change goes in without warning. This guide is part of the AsyncAPI for Event-Driven Systems cluster and shows exactly how to encode all of that Kafka-specific metadata — topic name, partitioning, key schema, cleanup policy, and Avro/schema registry wiring — into a single AsyncAPI 3.0 document that can be validated, diffed, and rendered into documentation automatically.

AsyncAPI 3.0 Kafka binding anatomy Four nested layers of an AsyncAPI document: the servers block holds the schema registry URL in a kafka server binding; the channels block holds partition count and cleanup policy in a kafka channel binding; the messages block holds the key schema in a kafka message binding; and the payload field holds the Avro or JSON Schema value schema. Arrows show how each layer maps to the corresponding Kafka concept.

AsyncAPI 3.0 — Kafka Binding Layers

servers.production-kafka.bindings.kafka schemaRegistryUrl · schemaRegistryVendor · securityProtocol broker-level channels.orders-created.bindings.kafka topic (address) · partitions · replicationFactor · cleanupPolicy topic-level messages.OrderCreated.bindings.kafka key (JSON Schema) · schemaIdLocation · bindingVersion key schema messages.OrderCreated.payload + schemaFormat Avro record or JSON Schema — the message value contract

Root Cause: What Goes Wrong Without a Contract

Kafka’s operational model scatters the metadata that a consumer needs across four different surfaces: the broker admin API (partition count, replication factor, retention, cleanup policy), the schema registry (Avro subject, schema version), the producer codebase (key serializer class, key type), and the team wiki (intent, SLA, ownership). None of these surfaces is version-controlled alongside the schema, none of them produces a machine-readable diff when something changes, and none of them blocks a deployment when a breaking change ships.

The concrete failures look like:

  • A producer team changes the key from orderId (UUID string) to tenantId:orderId (composite string) to improve partition locality. Consumers hard-coded the UUID assumption and begin silently misrouting records to the wrong business context.
  • A compacted topic has three consumers, each of which assumes a different key cardinality. One deletes records it shouldn’t because its key interpretation differs from the producer’s.
  • An Avro schema evolves with a removed field. The schema registry rejects it under BACKWARD compatibility, but the rejection only surfaces at deploy time because no CI step validated the schema contract beforehand.
  • A new team onboards and spends two days reverse-engineering the topic structure from Kafka consumer group offsets and broker logs rather than reading a document.

AsyncAPI 3.0 Kafka bindings encode all of this in one versioned file that asyncapi validate checks structurally and asyncapi diff checks for breaking changes.

Step-by-Step Fix

Step 1 — Declare the Kafka server binding with schema registry

Start with the broker. The servers block in AsyncAPI 3.0 carries a bindings.kafka object that records the schema registry URL, its vendor, and the security protocol. This is connection-level metadata shared by every topic the service uses.

# asyncapi.yaml  —  AsyncAPI 3.0.0
asyncapi: 3.0.0
id: urn:com:acme:order-service
info:
  title: Order Service Events
  version: 2.1.0
defaultContentType: application/vnd.apache.avro+json;version=1.9.0

servers:
  production-kafka:
    host: kafka.acme.internal:9092
    protocol: kafka-secure              # TLS + SASL
    description: Primary Kafka cluster (us-east-1)
    bindings:
      kafka:
        schemaRegistryUrl: https://registry.acme.internal   # Confluent Schema Registry
        schemaRegistryVendor: confluent                      # also: ibm, aws
        # Why this works: asyncapi validate checks the binding schema, and the
        # generator uses these values to emit registry-aware producer/consumer code.

The schemaRegistryVendor field is significant: the AsyncAPI generator and tooling like contract-driven mocking with Microcks use it to select the correct registry client when generating stubs or replaying messages.

Step 2 — Add kafka channel binding for topic-level metadata

Each channel maps to one Kafka topic. The bindings.kafka block on the channel documents the topic configuration that the broker enforces — partition count, replication factor, retention in milliseconds, and cleanup policy. None of this is inferrable from the message schema alone.

channels:
  ordersCreated:
    address: orders.created             # exact Kafka topic name
    description: >
      Emitted by order-service when an order transitions to CONFIRMED state.
      Partitioned by customerId. Retained 14 days. Read by billing, fulfillment,
      and analytics services.
    servers:
      - $ref: '#/servers/production-kafka'
    bindings:
      kafka:
        topic: orders.created           # redundant with address but explicit
        partitions: 24                  # partition count as of schema version 2.x
        replicationFactor: 3
        topicConfiguration:
          retention.ms: 1209600000      # 14 days
          cleanup.policy: delete        # 'delete' or 'compact' — see Edge Cases
          min.insync.replicas: 2
        bindingVersion: "0.5.0"
    # Why this works: teams know partition count before they write consumer group
    # assignment logic; the diff will flag a partition increase as a change so
    # it gets reviewed rather than silently reshuffling consumer assignments.
    messages:
      orderCreated:
        $ref: '#/components/messages/OrderCreated'

Document partitions and replicationFactor even though they look like ops details. A consumer that implements a custom partitioner or a stream processor that maintains partition-local state depends on partition count being stable. Encoding it here makes any change visible in asyncapi diff output.

Step 3 — Document the message key schema

This is the most-overlooked field in Kafka documentation. The Kafka message binding’s key property takes a JSON Schema object describing the key’s type and format. Make it explicit even when the key is a simple UUID string — because “simple” is exactly when teams stop documenting it.

components:
  messages:
    OrderCreated:
      name: OrderCreated
      title: Order Created
      summary: An order has been confirmed and payment captured.
      schemaFormat: "application/vnd.apache.avro+json;version=1.9.0"
      contentType: application/octet-stream   # Avro is binary on the wire
      bindings:
        kafka:
          key:
            type: string
            format: uuid
            description: >
              customerId UUID. All events for one customer land on the same
              partition, preserving per-customer ordering. Key is NOT orderId —
              a common misconception that causes misrouted consumers.
          schemaIdLocation: header            # 'header' or 'payload' (magic byte)
          schemaIdPayloadEncoding: "confluent"
          bindingVersion: "0.5.0"
      headers:
        type: object
        properties:
          correlationId:
            type: string
            format: uuid
            description: Distributed trace ID, propagated from the HTTP request.
          eventVersion:
            type: string
            example: "2.1.0"
      payload:
        # Avro schema for the message value (the order event itself)
        type: record
        name: OrderCreated
        namespace: com.acme.orders.events
        doc: "Emitted when an order is confirmed. Backward-compatible additions only."
        fields:
          - name: orderId
            type: string
            doc: "UUID v4. Stable identifier for the order."
          - name: customerId
            type: string
            doc: "UUID v4. Used as the partition key — see message binding."
          - name: total
            type:
              type: record
              name: Money
              fields:
                - name: amount
                  type: string
                  doc: "Decimal string to avoid floating-point representation issues."
                - name: currency
                  type:
                    type: enum
                    name: Currency
                    symbols: [USD, EUR, GBP]
          - name: occurredAt
            type: string
            doc: "ISO 8601 UTC timestamp."
          - name: lineItems
            type:
              type: array
              items:
                type: record
                name: LineItem
                fields:
                  - name: sku
                    type: string
                  - name: quantity
                    type: int
                  - name: unitPrice
                    type: string

The inline doc fields on Avro record fields are first-class documentation. They appear in the generated HTML event catalog and in IDE tooling when engineers work with the generated model classes. Treat them with the same care as code comments on a public API.

Step 4 — Wire the operation and generate docs

Declare the operation that connects the channel to the application’s intent, then validate and generate.

operations:
  publishOrderCreated:
    action: send
    channel:
      $ref: '#/channels/ordersCreated'
    messages:
      - $ref: '#/channels/ordersCreated/messages/orderCreated'
    description: >
      Publishes OrderCreated after payment capture succeeds.
      Guaranteed at-least-once delivery; idempotency key is orderId.
    bindings:
      kafka:
        clientId:
          type: string
          enum: [order-service-producer]   # locks down the producer identity
        bindingVersion: "0.5.0"

Now validate and generate:

# Validate the complete document including binding structure
asyncapi validate asyncapi.yaml
# Expected: File asyncapi.yaml is valid!

# Generate a static HTML event catalog
asyncapi generate fromTemplate asyncapi.yaml @asyncapi/html-template@3.0.0 \
  -o ./docs/event-catalog \
  --param singleFile=false

# Generate TypeScript models from Avro schemas
asyncapi generate models typescript asyncapi.yaml -o ./src/generated/events

# Open the rendered docs
open ./docs/event-catalog/index.html

For Avro schemas, the @asyncapi/html-template renders the Avro record fields, their doc strings, and the message key schema in a structured table. The asyncapi generate models command invokes Modelina with Avro input and produces typed classes — the OrderCreated TypeScript class has orderId, customerId, total, and lineItems fields that stay in sync with the schema registry subject.

Before and After

Before — what tribal knowledge looks like in practice:

# asyncapi.yaml (2.x, pre-documentation)
asyncapi: 2.6.0
channels:
  orders.created:
    publish:
      message:
        payload:
          type: object
          properties:
            orderId:
              type: string
            total:
              type: number
# Missing: key schema, partition count, cleanup policy, schema registry,
# Avro format, retention, who produces it, who consumes it.

After — a fully documented AsyncAPI 3.0 contract:

asyncapi: 3.0.0
servers:
  production-kafka:
    host: kafka.acme.internal:9092
    protocol: kafka-secure
    bindings:
      kafka:
        schemaRegistryUrl: https://registry.acme.internal
        schemaRegistryVendor: confluent
channels:
  ordersCreated:
    address: orders.created
    bindings:
      kafka:
        partitions: 24
        replicationFactor: 3
        topicConfiguration:
          retention.ms: 1209600000
          cleanup.policy: delete
        bindingVersion: "0.5.0"
    messages:
      orderCreated:
        $ref: '#/components/messages/OrderCreated'
components:
  messages:
    OrderCreated:
      schemaFormat: "application/vnd.apache.avro+json;version=1.9.0"
      bindings:
        kafka:
          key:
            type: string
            format: uuid
            description: customerId — partition key for per-customer ordering
          schemaIdLocation: header
          bindingVersion: "0.5.0"
      payload:
        type: record
        name: OrderCreated
        # ... full Avro schema
operations:
  publishOrderCreated:
    action: send
    channel:
      $ref: '#/channels/ordersCreated'

The before version tells you there is a topic with an orderId and a total. The after version tells you the partition key is customerId (not orderId), there are 24 partitions, the schema is Avro stored in Confluent Schema Registry via magic-byte header encoding, retention is 14 days, and the topic is not compacted.

If you are migrating from a 2.x document, the AsyncAPI 2 to 3 migration checklist walks through the structural changes systematically before you add Kafka bindings on top.

Verification

Run these checks before merging any change to asyncapi.yaml:

# 1. Structural validation — catches binding schema errors, missing required fields
$ asyncapi validate asyncapi.yaml
File asyncapi.yaml is valid! File asyncapi.yaml and referenced documents
don't have governance problems.

# 2. Breaking-change diff against main branch
$ git show origin/main:asyncapi.yaml > /tmp/base.yaml
$ asyncapi diff /tmp/base.yaml asyncapi.yaml --type breaking
No breaking changes detected.

# 3. Verify generated HTML includes the key schema and Avro fields
$ asyncapi generate fromTemplate asyncapi.yaml @asyncapi/html-template@3.0.0 \
    -o /tmp/docs --param singleFile=true
$ grep -c "customerId" /tmp/docs/index.html
# Expected: at least 2 (key description + payload field)

# 4. Confirm generated TypeScript models have the Avro fields
$ asyncapi generate models typescript asyncapi.yaml -o /tmp/models
$ grep "orderId\|customerId\|lineItems" /tmp/models/OrderCreated.ts
# Expected: three property declarations

In CI, wire these four steps into a workflow job that runs on every pull request that touches asyncapi.yaml. Exit non-zero on any failure blocks the merge.

Edge Cases

Avro vs JSON Schema payloads. AsyncAPI 3.0 supports mixed-format documents. If some messages use Avro and others use JSON Schema, set schemaFormat per message rather than relying on defaultContentType. The generator selects the right Modelina parser based on the per-message schemaFormat. Do not set defaultContentType: application/vnd.apache.avro+json at the document root unless every message in the document is Avro — a mismatch causes asyncapi validate to reject messages that lack a schemaFormat override.

Compacted topics and key cardinality. When cleanup.policy: compact, the message key is no longer just a routing hint — it is the primary record identifier. Every message with the same key replaces the previous value for that key in the compacted log. Document this in the channel’s description and in the key schema’s description. If the key type changes from a single entity ID to a composite key (such as tenantId:entityId), the compaction semantics change for every existing consumer and existing records — this is a breaking change that deserves a major version bump and coordinated migration.

Partition key vs message key. Kafka consumers sometimes assume the partition key and the Avro record field that shares its name are the same value. They are not always. The partition key is set by the producer’s partitioner configuration; it is the value passed to ProducerRecord<K, V> as the key argument. Document both: the bindings.kafka.key schema describes what the producer passes as the key argument, and the payload schema describes the Avro value. When a field in the Avro value is also used as the key (a common pattern), add a doc note on that field saying “also used as the Kafka partition key — see message binding.”

Schema registry subject naming. Confluent Schema Registry subjects follow <topic>-key and <topic>-value naming by default under TopicNameStrategy. If your registry uses RecordNameStrategy or TopicRecordNameStrategy, note it in the server binding’s description and in the message’s bindings.kafka block. A consumer that assumes TopicNameStrategy and queries the wrong subject will fail to deserialize. This is the kind of invariant that belongs in the AsyncAPI document, not in a Confluence page.

Frequently Asked Questions

What is the kafka channel binding address field in AsyncAPI 3.0?

The address field on the channel sets the Kafka topic name exactly as Kafka sees it. The kafka channel binding adds topic-level metadata — partition count, cleanup policy, retention — that the address alone cannot express. Both live under the channel, and asyncapi validate checks the binding structure against the published binding schema.

How do I document the Kafka message key schema in AsyncAPI 3.0?

Add a bindings.kafka.key JSON Schema object to the message binding. This makes the partitioning key contract explicit and machine-readable. For Avro keys, point the schema at the registry subject using a $ref or embed the Avro JSON schema inline in the key field.

Can AsyncAPI 3.0 reference a Confluent Schema Registry for Avro schemas?

Yes. Set schemaRegistryUrl and schemaRegistryVendor: confluent on the kafka server binding. Then use schemaFormat: application/vnd.apache.avro+json;version=1.9.0 on the message and provide the Avro schema in the payload field. The asyncapi validate command resolves and checks the schema structure.

Does asyncapi generate work with Avro payload schemas?

Modelina, which backs asyncapi generate models, supports Avro schemas when you set schemaFormat on the message to the Avro MIME type. Generated TypeScript or Java models reflect the Avro record fields. If you mix Avro and JSON Schema payloads in one document, set schemaFormat per message rather than relying on defaultContentType.

How do I document a compacted Kafka topic in AsyncAPI?

Set cleanupPolicy: compact in the kafka channel binding. Also document the key schema carefully in the message binding — compaction semantics are driven by the key, so an undocumented key schema is especially costly on compacted topics where the latest value per key is the record of truth.

What is the difference between a Kafka server binding and a channel binding in AsyncAPI?

The server binding configures the broker connection — schema registry URL, SASL mechanism, security protocol. The channel binding configures one specific topic — partition count, replication factor, retention, cleanup policy. Both are kafka-keyed objects but they live at different levels of the document and serve different audiences: platform engineers read server bindings, application engineers read channel bindings.