MongoDB
CDC source setup for MongoDB
Sling supports Change Data Capture from MongoDB by using Change Streams, which are built on MongoDB's oplog (operations log). Each run reads document-level inserts, updates, replaces, and deletes from the change stream and merges them into the target table.
For general CDC concepts, the two-phase process, and all available options, see the Change Capture overview.
Prerequisites
1. Replica Set Required
MongoDB Change Streams require a replica set. Standalone MongoDB instances are not supported. A single-node replica set is sufficient for development and testing.
To convert a standalone instance to a single-node replica set:
Add the following to your
mongod.conf:replication: replSetName: "rs0"Restart
mongod, then initialize the replica set:// In mongosh rs.initiate()Verify the replica set is running:
rs.status()
Change Streams are not available on standalone MongoDB instances. If your MongoDB is standalone, convert it to a single-node replica set before enabling CDC.
MongoDB Atlas: Change Streams work out of the box on M10+ dedicated clusters (which are always deployed as replica sets). No special setup is needed — just use the standard Atlas connection string. Note that Atlas Flex Clusters do not support Change Streams.
2. User Permissions
The MongoDB user needs the read role on the source database. This role includes both the find and changeStream privileges required for CDC:
To watch multiple databases, grant the read role on each, or use readAnyDatabase on the admin database for deployment-wide access.
3. Post-Images (Recommended)
MongoDB 6.0+ supports Change Stream post-images, which provide the complete document state after every update. This ensures full row data for UPDATE events without an additional database read.
To enable post-images on a collection:
Post-images are optional. On MongoDB versions below 6.0, or on collections without post-images enabled, Sling falls back to updateLookup mode, which performs a separate read to fetch the current document on each UPDATE event. For most workloads this is sufficient, but enabling post-images provides stronger consistency guarantees and avoids the extra read.
4. Version Requirements
Change Streams (collection-level)
MongoDB 3.6
Change Streams (database-level)
MongoDB 4.0
Post-images (changeStreamPreAndPostImages)
MongoDB 6.0
Sling requires MongoDB 4.0 or later for database-level change stream watching.
Quick Start
No special CDC connection properties are needed for MongoDB. Sling uses the standard MongoDB connection to open Change Streams. The primary_key defaults to [_id] since every MongoDB document has an _id field.
Examples
Large Tables with Custom Chunk Size
For very large collections, adjust the chunk size to control memory usage and checkpointing frequency during the initial snapshot.
Time-Bounded Snapshots for Very Large Tables
For collections with hundreds of millions of documents, the initial snapshot can take hours. Use snapshot_run_duration to cap how long each run spends on the snapshot. The next run automatically resumes from the last completed chunk.
High-Throughput Workloads
For collections with heavy write activity, increase run_max_events and run_max_duration so each run captures more changes.
Soft Deletes
Keep deleted documents in the target instead of physically removing them. Deleted rows are marked with _sling_synced_op = 'D'. Useful for audit trails or when downstream queries need to detect deletions.
When a document is deleted in MongoDB, the target row is preserved with _sling_synced_op set to 'D' and _sling_synced_at updated to the current timestamp. A subsequent re-insert of the same _id restores the row with the appropriate operation type.
Mixed Streams with Per-Stream Overrides
Different collections can have different CDC options.
Replay / Backfill from a Point in Time
If target data becomes inconsistent, you can replay changes from an earlier position. The replay_from value is applied exactly once per unique value.
After the replay run completes, remove or change the replay_from value. Leaving it unchanged has no effect (it is only applied once).
Replay Formats
The replay_from option accepts this MongoDB-specific position format:
RFC 3339 timestamp:
2025-06-01T00:00:00Z— resolved to a MongoDBoperationTime(ClusterTime)
Oplog Retention
MongoDB's oplog is a capped collection. If the oplog wraps around and removes entries that Sling hasn't processed, the Change Stream will be invalidated.
Check the current oplog size and retention window:
Ensure the oplog window is longer than the maximum gap between CDC runs. To resize the oplog:
If the oplog wraps past Sling's saved position, the next run will detect the invalidated Change Stream and automatically perform a fresh initial snapshot.
Document Flattening
MongoDB documents are nested JSON. Sling flattens nested documents into columns using dot notation — for example, a field address.city in a nested document becomes the column address__city in the target table. Arrays and deeply nested objects beyond the flattening depth are serialized as JSON strings.
Troubleshooting
"CDC not supported for <type>"
Ensure your source connection is configured as type=mongodb.
"change stream not supported"
MongoDB Change Streams require a replica set. Standalone instances are not supported. Convert to a single-node replica set:
"not authorized to run changeStream"
The MongoDB user needs the read role on the source database, which includes the changeStream privilege:
Initial snapshot keeps restarting
Ensure SLING_STATE is configured. Without state persistence, Sling cannot track that the snapshot completed and will restart it on every run.
"could not resolve replay_from position"
The replay_from value must be a valid RFC 3339 timestamp.
Missing fields in UPDATE events
Enable post-images (MongoDB 6.0+) for complete document state on updates:
Without post-images, Sling uses updateLookup which reads the current document state. This works for most cases but may miss intermediate values in high-write scenarios.
"change stream invalidated"
The oplog wrapped past the saved resume position. Sling will automatically re-snapshot on the next run. To prevent this, increase oplog size or run CDC more frequently:
Last updated
Was this helpful?