Scope
- Use cases
- Mirror operational tables (e.g.,
users,orders) into the data lake with full history. - Build near-real-time dimensional data without polling.
- Mirror operational tables (e.g.,
- Non-use cases
- Stream event enrichment (see S004).
Common steps
Build context
- Identify source database and tables (e.g., MySQL
orders). - Ensure binlog/WAL is enabled and accessible.
Implementation notes
- Enable primary keys on the sink for upsert semantics.
- Use
debezium-jsonor connector-specific formats that carry operation types. - Consider compaction settings for frequent updates.
RESINK.AI recommendations
Example
Variations
- Postgres CDC via
postgres-cdcconnector. - Capture deletes explicitly by enabling row-level delete support.
Troubleshooting
Missed changes after connector restart
Missed changes after connector restart
Ensure checkpointing and state backends are configured. Verify binlog retention and offsets.
High update rate causes small files
High update rate causes small files
Tune Iceberg compaction (rewrite data files) and consider clustering.

