MongoDB

CDC source setup for MongoDB

Sling supports Change Data Capture from MongoDB by using Change Streams, which are built on MongoDB's oplog (operations log). Each run reads document-level inserts, updates, replaces, and deletes from the change stream and merges them into the target table.

For general CDC concepts, the two-phase process, and all available options, see the Change Capture overview.

Prerequisites

1. Replica Set Required

MongoDB Change Streams require a replica set. Standalone MongoDB instances are not supported. A single-node replica set is sufficient for development and testing.

To convert a standalone instance to a single-node replica set:

  1. Add the following to your mongod.conf:

    replication:
      replSetName: "rs0"
  2. Restart mongod, then initialize the replica set:

    // In mongosh
    rs.initiate()
  3. Verify the replica set is running:

    rs.status()
circle-exclamation
circle-info

MongoDB Atlas: Change Streams work out of the box on M10+ dedicated clusters (which are always deployed as replica sets). No special setup is needed — just use the standard Atlas connection string. Note that Atlas Flex Clusters do not support Change Streams.

2. User Permissions

The MongoDB user needs the read role on the source database. This role includes both the find and changeStream privileges required for CDC:

To watch multiple databases, grant the read role on each, or use readAnyDatabase on the admin database for deployment-wide access.

3. Post-Images (Recommended)

MongoDB 6.0+ supports Change Stream post-images, which provide the complete document state after every update. This ensures full row data for UPDATE events without an additional database read.

To enable post-images on a collection:

circle-info

Post-images are optional. On MongoDB versions below 6.0, or on collections without post-images enabled, Sling falls back to updateLookup mode, which performs a separate read to fetch the current document on each UPDATE event. For most workloads this is sufficient, but enabling post-images provides stronger consistency guarantees and avoids the extra read.

4. Version Requirements

Feature
Minimum Version

Change Streams (collection-level)

MongoDB 3.6

Change Streams (database-level)

MongoDB 4.0

Post-images (changeStreamPreAndPostImages)

MongoDB 6.0

Sling requires MongoDB 4.0 or later for database-level change stream watching.

Quick Start

circle-info

No special CDC connection properties are needed for MongoDB. Sling uses the standard MongoDB connection to open Change Streams. The primary_key defaults to [_id] since every MongoDB document has an _id field.

Examples

Large Tables with Custom Chunk Size

For very large collections, adjust the chunk size to control memory usage and checkpointing frequency during the initial snapshot.

Time-Bounded Snapshots for Very Large Tables

For collections with hundreds of millions of documents, the initial snapshot can take hours. Use snapshot_run_duration to cap how long each run spends on the snapshot. The next run automatically resumes from the last completed chunk.

High-Throughput Workloads

For collections with heavy write activity, increase run_max_events and run_max_duration so each run captures more changes.

Soft Deletes

Keep deleted documents in the target instead of physically removing them. Deleted rows are marked with _sling_synced_op = 'D'. Useful for audit trails or when downstream queries need to detect deletions.

When a document is deleted in MongoDB, the target row is preserved with _sling_synced_op set to 'D' and _sling_synced_at updated to the current timestamp. A subsequent re-insert of the same _id restores the row with the appropriate operation type.

Mixed Streams with Per-Stream Overrides

Different collections can have different CDC options.

Replay / Backfill from a Point in Time

If target data becomes inconsistent, you can replay changes from an earlier position. The replay_from value is applied exactly once per unique value.

After the replay run completes, remove or change the replay_from value. Leaving it unchanged has no effect (it is only applied once).

Replay Formats

The replay_from option accepts this MongoDB-specific position format:

  • RFC 3339 timestamp: 2025-06-01T00:00:00Z — resolved to a MongoDB operationTime (ClusterTime)

Oplog Retention

MongoDB's oplog is a capped collection. If the oplog wraps around and removes entries that Sling hasn't processed, the Change Stream will be invalidated.

Check the current oplog size and retention window:

Ensure the oplog window is longer than the maximum gap between CDC runs. To resize the oplog:

circle-exclamation

Document Flattening

MongoDB documents are nested JSON. Sling flattens nested documents into columns using dot notation — for example, a field address.city in a nested document becomes the column address__city in the target table. Arrays and deeply nested objects beyond the flattening depth are serialized as JSON strings.

Troubleshooting

"CDC not supported for <type>"

Ensure your source connection is configured as type=mongodb.

"change stream not supported"

MongoDB Change Streams require a replica set. Standalone instances are not supported. Convert to a single-node replica set:

"not authorized to run changeStream"

The MongoDB user needs the read role on the source database, which includes the changeStream privilege:

Initial snapshot keeps restarting

Ensure SLING_STATE is configured. Without state persistence, Sling cannot track that the snapshot completed and will restart it on every run.

"could not resolve replay_from position"

The replay_from value must be a valid RFC 3339 timestamp.

Missing fields in UPDATE events

Enable post-images (MongoDB 6.0+) for complete document state on updates:

Without post-images, Sling uses updateLookup which reads the current document state. This works for most cases but may miss intermediate values in high-write scenarios.

"change stream invalidated"

The oplog wrapped past the saved resume position. Sling will automatically re-snapshot on the next run. To prevent this, increase oplog size or run CDC more frequently:

Last updated

Was this helpful?