Version: 2.2.0

Frequently Asked Questions

This document answers some frequently asked questions about the MSR (Multi-Session Replay) module.

Configuration and Planning

How do I choose the right chunk interval for my system?

The chunk_time_interval is a critical developer-level setting that determines how TimescaleDB partitions your CDC data. This decision significantly impacts query performance, memory usage, and operational complexity.

Why This Matters:

TimescaleDB loads chunk metadata into memory. With too many chunks, you can exhaust available memory and degrade performance.

TimescaleDB's Recommended Approach:

Start with the default: 7 days (TimescaleDB's recommended starting point)
Monitor and adjust: Observe your data patterns and query performance
Target chunk count: Aim for 10-20 chunks per hypertable in memory at any given time
Memory calculation: Number of chunks × Size of Chunk x 4 = Required RAM

Choosing Your Interval:

Calculate expected data volume: How many events per day/week/month?
Consider query patterns: Do you query recent data (hours) or historical data (days)?
Evaluate available RAM: How much memory can you dedicate to chunk data?
Balance chunk size: Not too small (memory overhead) or too large (query inefficiency)

Practical Examples:

-- High-frequency CDC (millions of events/day)
-- Smaller chunks for better query targeting
chunk_time_interval => INTERVAL '1 hour'
-- ~720 chunks/month

-- Medium-frequency CDC (hundreds of thousands/day)
-- Balanced approach
chunk_time_interval => INTERVAL '1 day'
-- ~30 chunks/month

-- Low-frequency CDC (thousands/day)
-- Larger chunks, less overhead
chunk_time_interval => INTERVAL '7 days'
-- ~4 chunks/month

MSR Default Configuration:

The MSR schema uses INTERVAL '1 hour' by default, optimized for high-volume command & control systems. This may need adjustment for your specific use case.

Location: Set in schema.sql during initial deployment:

SELECT create_hypertable(
    'msr.cdc_event',
    'event_timestamp',
    chunk_time_interval => INTERVAL '1 hour',  -- CRITICAL: Configure before data ingestion
    migrate_data => TRUE
);

Changing After Deployment:

Changing chunk intervals after data exists is extremely difficult and risky:

Requires converting hypertable back to a regular PostgreSQL table
Must export all existing data
Recreate hypertable with new interval
Re-import all data (can take hours/days for large datasets)
High risk of data loss if process is interrupted
Downtime required during migration

For production systems with significant data, changing the chunk interval is often considered not feasible.

Cannot Be Changed Easily

The chunk interval is effectively permanent once you have production data. Take time to calculate the right value before deployment. Consider:

Expected data growth over 1-2 years
Available database server RAM
Query performance requirements
Storage retention policies

When in doubt, start with TimescaleDB's default of 7 days, then adjust for future deployments based on observed patterns.

Additional Resources:

How does MAX_PLAYBACK_RANGE affect my system?

MAX_PLAYBACK_RANGE controls the historical window available for replay and has several important impacts:

Storage Retention:

CDC events older than NOW - MAX_PLAYBACK_RANGE days are automatically deleted during maintenance
A snapshot is maintained at the cutoff boundary to preserve historical state
Example: With MAX_PLAYBACK_RANGE = 7, data beyond 7 days ago is cleaned up daily (configurable CRON expression)

Available Date Selection:

Users can select any start time within the past MAX_PLAYBACK_RANGE days
The date picker UI automatically enforces this limit (with configurable buffers)
Combining with EARLIEST_VALID_TIMESTAMP provides an absolute minimum boundary

Data Flow Timeline:

DEPLOYMENT ──→ OLD_DATA ────→ CUTOFF_POINT ←─── MAX_PLAYBACK_RANGE days ────→ NOW
    ↑              (Cleanup)      (Snapshot)        (User can select)      (Current)
EARLIEST_VALID
TIMESTAMP

Configuration Impact:

Longer ranges (14-30 days) = More storage required, potentially slower queries, better historical access
Shorter ranges (3-7 days) = Less storage, faster queries, limited historical access
Typical values: 7 days (production), 3 days (resource-constrained), 14-30 days (long-term analysis)

Related Settings (Auto-Calculated):

CAGG retention: MAX_PLAYBACK_RANGE + 1 day
CAGG columnstore compression: CAGG retention + 1 day
All calculations are dynamic - changing MAX_PLAYBACK_RANGE automatically updates dependent policies

Modifying at Runtime:

-- Safe to change at any time, takes effect on next maintenance cycle
UPDATE msr.configuration
SET value = '14'  -- Increase to 14 days
WHERE config_key = 'MAX_PLAYBACK_RANGE';

When should I set EARLIEST_VALID_TIMESTAMP?

EARLIEST_VALID_TIMESTAMP should be set for new deployments to prevent users from selecting replay dates before the system had any data.

Use cases:

Fresh deployments: Set to your deployment date/time to prevent empty replays
System migrations: Set to when CDC was first enabled
Testing environments: Keep default (1900-01-01T00:00:00Z) to allow any date
Data backfill scenarios: Set to the earliest date of valid historical data

Example for production deployment on Jan 15, 2024:

UPDATE msr.configuration
SET value = '2024-01-15T00:00:00Z'  -- ISO 8601 format with timezone
WHERE config_key = 'EARLIEST_VALID_TIMESTAMP';

Behavior:

Takes precedence over MAX_PLAYBACK_RANGE calculations
Frontend date picker respects this as the absolute minimum selectable date
Prevents empty replays from dates before data existed
Users attempting to select earlier dates will see them as disabled in the UI

How do I optimize performance for high-frequency replay?

For systems with high event rates or frequent replays, consider these optimizations:

Frontend Performance Options:

const highPerformanceOptions = {
	pollWindowMs: 1000, // Faster polling (1s windows) for high-frequency data
	frameRate: 60, // Smooth 60 FPS playback
	maxBufferSize: 2000000, // Larger buffer (2M events) for bursty data
	minPollingInterval: 50, // Reduced minimum interval (50ms)
};

<MultiSessionReplay
	lobbyTitle="High-Performance Replay"
	lobbyDescription="Optimized for high-frequency data"
	lobbyButtonText="Start Session"
	performanceOptions={highPerformanceOptions}
/>;

Backend Database Tuning:

-- Increase work memory for complex queries (256MB → 512MB)
UPDATE msr.configuration
SET value = '512'
WHERE config_key = 'QUERY_WORK_MEM_MB';

-- Adjust columnstore compression timing if needed
UPDATE msr.configuration
SET value = '2'  -- Compress after 2 days instead of 1
WHERE config_key = 'COLUMNSTORE_COMPRESSION_AGE_DAYS';

System Resources:

Database connections: Ensure PostgreSQL max_connections accommodates concurrent replays
Database memory: Allocate sufficient shared_buffers (25% of RAM recommended)
TimescaleDB compression: Monitor compression ratios with SHOW compression
Read replicas: Consider read replicas for very high concurrent replay load (>10 simultaneous users)
Network bandwidth: Ensure adequate bandwidth between backend and frontend (especially for remote access)

Monitoring:

Track Web Worker memory usage in browser dev tools
Monitor backend query performance with PostgreSQL's pg_stat_statements
Watch for buffer overflows in worker logs ([MSR Worker] Buffer size limit reached)
Check TimescaleDB chunk statistics: SELECT * FROM timescaledb_information.chunks;

Deployment Issues

PostgreSQL WAL Level Not Set

Symptom: Debezium connector fails with "logical decoding requires wal_level >= logical"

Solution:

Set wal_level = logical in postgresql.conf or via startup command:
```
postgres -c wal_level=logical
```
Restart PostgreSQL server

Verify with:

SHOW wal_level;
-- Must return: 'logical'

REPLICA IDENTITY Not Configured

Symptom: Missing or incomplete data in CDC events, especially for UPDATE operations

Solution:

Set REPLICA IDENTITY FULL for all tracked tables:

ALTER TABLE schema.table REPLICA IDENTITY FULL;

Restart Debezium connector to pick up changes

Replication Slot Issues

Symptom: "Replication slot already exists" or WAL accumulation

Solution:

-- List existing replication slots
SELECT * FROM pg_replication_slots;

-- Drop unused slot if necessary
SELECT pg_drop_replication_slot('slot_name');

Connector Configuration Problems

Symptom: Connector fails to start or doesn't capture changes

Checklist:

✓ Database user has REPLICATION privilege
✓ Table names in table.include.list are correct
✓ plugin.name is set to pgoutput (recommended)
✓ Signal configuration matches for ad-hoc snapshots
✓ Topics in sink connector match source connector output

Change Data Capture Issues

My database changes are not being captured by Debezium. What's wrong?

The most common causes are:

WAL level not set to logical: PostgreSQL must have wal_level=logical for CDC to work
REPLICA IDENTITY not FULL: Tables need REPLICA IDENTITY FULL for complete change capture

Solution:

ALTER TABLE <schema>.<table_name> REPLICA IDENTITY FULL;

Examples:

ALTER TABLE gis.geo_entity REPLICA IDENTITY FULL;
ALTER TABLE gis.bookmark REPLICA IDENTITY FULL;

I see "slot does not exist" errors in Kafka Connect logs

This usually means the replication slot name is already in use or wasn't created properly. Each source connector needs a unique slot.name across the entire Kafka Connect cluster.

Solution:

Ensure slot.name is unique for each connector
Check PostgreSQL for existing slots: SELECT * FROM pg_replication_slots;
Drop unused slots if necessary: SELECT pg_drop_replication_slot('slot_name');

No data is appearing in the MSR CDC events table

This could be due to several issues:

Source connector not running: Check Kafka Connect status at http://localhost:8083/connectors
Sink connector misconfigured: Verify the sink connector is consuming from the correct topics
Transform errors: Check Kafka Connect logs for JSLT transformation errors
Topic permissions: Ensure Kafka Connect has permission to read/write topics

I'm getting "permission denied" errors when setting up connectors

Verify that:

Database user has replication permissions: ALTER USER admin REPLICATION;
Database user can access the required tables
Kafka Connect can reach both source and target databases

Disk Space Management

How do I check disk space usage for the MSR database?

To monitor disk space usage for the MSR database, use these commands to get an overview of storage consumption. Note that MSR uses TimescaleDB hypertables, so standard PostgreSQL table size queries won't show accurate results for partitioned data.

Check Individual Hypertable Sizes

-- Get size of the main CDC events hypertable (includes all chunks)
SELECT pg_size_pretty(hypertable_size('msr.cdc_event')) as cdc_events_size;

-- Get size of the continuous aggregate for entity states
SELECT pg_size_pretty(hypertable_size('msr.entity_last_states')) as entity_states_cagg_size;

Check All MSR Tables

-- See size breakdown of all MSR tables
SELECT
    CASE
        WHEN h.hypertable_name IS NOT NULL THEN
            t.tablename || ' (hypertable)'
        ELSE
            t.tablename
    END as table_name,
    CASE
        WHEN h.hypertable_name IS NOT NULL THEN
            pg_size_pretty(hypertable_size('msr.'||t.tablename))
        ELSE
            pg_size_pretty(pg_total_relation_size('msr.'||t.tablename))
    END as size
FROM pg_tables t
LEFT JOIN timescaledb_information.hypertables h
    ON h.hypertable_schema = t.schemaname
    AND h.hypertable_name = t.tablename
WHERE t.schemaname = 'msr'
ORDER BY
    CASE
        WHEN h.hypertable_name IS NOT NULL THEN
            hypertable_size('msr.'||t.tablename)
        ELSE
            pg_total_relation_size('msr.'||t.tablename)
    END DESC;

Notes:

The msr.cdc_event table will typically use the most space as it stores all historical change events, however, it is a hypertable, so its size is distributed across many chunks in the timescaledb_internal schema.
Space usage grows based on your MAX_PLAYBACK_RANGE configuration and CDC event volume
The msr.entity_last_states continuous aggregate provides pre-computed daily snapshots and is usually much smaller
Regular tables like msr.session, msr.configuration, and snapshot tables use minimal space

Performance Issues

CDC processing is slow or lagging behind

Consider these optimizations:

Increase batch.size in sink connector configuration
Adjust snapshot.fetch.size for source connectors
Monitor database work_mem settings: SET work_mem = '256MB';
Check network latency between components

High memory usage in Kafka Connect containers

This is often caused by:

Large batch sizes processing too much data at once
JSLT transformations on very large JSON payloads
Insufficient memory allocation to Kafka Connect JVM

Solution: Tune JVM heap size and connector batch configurations.

Configuration Issues

My JSLT transformation is failing with parsing errors

Common JSLT issues include:

Incorrect JSON escaping in connector configuration
Missing fields in the source data structure
Type mismatches between expected and actual data types

Debug by:

Testing JSLT expressions separately
Checking source data structure in Kafka topics
Validating JSON syntax in connector configuration

Connector keeps restarting or failing

Check these common causes:

Database connection timeouts
Insufficient database connection limits
Missing required permissions
Network connectivity issues between services

Cloud Production Issues

This is a common issue in cloud production environments when loading the initial replay state. MSR's architecture requires transmitting potentially large amounts of historical data (via /replay/state) to the frontend during session initialization. The 502 Bad Gateway error typically indicates that one or more components in your infrastructure chain are timing out or running out of resources while processing this large payload.

Why This Happens:

The /replay/state endpoint reconstructs the complete state of all tracked entities at a specific timestamp. Depending on your data volume, this response can be:

Large in size: Hundreds of MB to several GB for systems with many entities
Slow to generate: Complex queries with many JOIN operations and JSON serialization
Memory-intensive: Buffering large responses in memory before transmission

Common Causes:

Ingress Controller Timeouts: Default timeout too low (often 60s or less)
Load Balancer Timeouts: Cloud provider LB with aggressive timeouts (e.g., 2-30s)
Proxy/API Gateway Limits: Request/response size limits or timeout settings
Backend Application Memory: MSR service OOM (Out of Memory) while building response
Web Server Memory: Frontend web server (nginx, etc.) buffering limits
Database Connection Limits: Connection pool exhaustion during expensive queries
Network Bandwidth: Insufficient bandwidth between components
Kubernetes Node Resources: Node CPU/memory saturation

Systematic Troubleshooting:

Step 1: Identify the Failing Component

Check logs in order from frontend to backend:

# 1. Check ingress controller logs
kubectl logs -n ingress-nginx deployment/ingress-nginx-controller | grep 502
kubectl logs -n ingress-nginx deployment/ingress-nginx-controller | grep "upstream timed out"

# 2. Check load balancer access logs (cloud provider specific)
# AWS ALB: Check CloudWatch Logs for TargetResponseTime
# GCP: Check Cloud Logging for load balancer logs

# 3. Check web server (frontend) logs
kubectl logs -n <namespace> deployment/web-base | grep "replay/state"
kubectl logs -n <namespace> deployment/web-base | grep "502\|504\|timeout"

# 4. Check MSR backend logs
kubectl logs -n <namespace> deployment/msr | grep "replay/state"
kubectl logs -n <namespace> deployment/msr | grep "OOM\|memory\|timeout"

# 5. Check database logs
kubectl logs -n <namespace> statefulset/postgres | grep "canceling statement"
kubectl logs -n <namespace> statefulset/postgres | grep "out of memory"

# 6. Check resource usage
kubectl top pods -n <namespace>
kubectl describe pod <msr-pod> -n <namespace> | grep -A 10 "Conditions:"

Step 2: Measure Request Characteristics

Test the endpoint directly to understand payload size and response time:

# Time the request and measure response size
curl -w "\nTime: %{time_total}s\nSize: %{size_download} bytes\n" \
  -H "Authorization: Bearer $TOKEN" \
  "https://msr.yourcompany.com/replay/state?timestamp=2024-06-15T10:00:00Z" \
  -o /tmp/replay_state.json

# Check the actual payload size
ls -lh /tmp/replay_state.json

# Count entities in response
jq '.data | length' /tmp/replay_state.json

Step 3: Increase Timeouts

Increase timeouts progressively through the entire request chain:

A. Ingress Controller (nginx-ingress):

# ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
    name: msr-ingress
    annotations:
        nginx.ingress.kubernetes.io/proxy-connect-timeout: "300"
        nginx.ingress.kubernetes.io/proxy-send-timeout: "300"
        nginx.ingress.kubernetes.io/proxy-read-timeout: "300"
        nginx.ingress.kubernetes.io/proxy-body-size: "500m"
        nginx.ingress.kubernetes.io/proxy-buffer-size: "128k"
        nginx.ingress.kubernetes.io/proxy-buffers-number: "4"
spec:
    rules:
        - host: msr.yourcompany.com
          http:
              paths:
                  - path: /
                    pathType: Prefix
                    backend:
                        service:
                            name: msr
                            port:
                                number: 8080

B. Load Balancer (cloud provider specific):

# AWS ALB
apiVersion: v1
kind: Service
metadata:
    name: msr
    annotations:
        service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout: "300"
spec:
    type: LoadBalancer
    ports:
        - port: 80
          targetPort: 8080
    selector:
        app: msr

# GCP Backend Config
apiVersion: cloud.google.com/v1
kind: BackendConfig
metadata:
    name: msr-backendconfig
spec:
    timeoutSec: 300
    connectionDraining:
        drainingTimeoutSec: 60

C. Frontend Web Server (nginx):

# nginx.conf
http {
    proxy_connect_timeout 300s;
    proxy_send_timeout 300s;
    proxy_read_timeout 300s;

    # Increase buffer sizes for large responses
    proxy_buffer_size 128k;
    proxy_buffers 4 256k;
    proxy_busy_buffers_size 256k;

    client_max_body_size 500m;
    client_body_buffer_size 128k;
}

Step 4: Increase Memory Limits

Increase memory allocations for services handling large payloads:

A. MSR Backend Service:

# msr-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
    name: msr
spec:
    template:
        spec:
            containers:
                - name: msr
                  image: msr:latest
                  resources:
                      requests:
                          memory: "512Mi"
                          cpu: "500m"
                      limits:
                          memory: "2Gi" # Increase for large datasets
                          cpu: "2000m"

B. Frontend Web Server:

# web-base-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
    name: web-base
spec:
    template:
        spec:
            containers:
                - name: web-base
                  image: web-base:latest
                  resources:
                      requests:
                          memory: "256Mi"
                          cpu: "250m"
                      limits:
                          memory: "1Gi" # Increase if proxying large responses
                          cpu: "1000m"

C. Database (PostgreSQL/TimescaleDB):

# postgres-statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
    name: postgres
spec:
    template:
        spec:
            containers:
                - name: postgres
                  image: timescale/timescaledb:latest-pg15
                  resources:
                      requests:
                          memory: "2Gi"
                          cpu: "1000m"
                      limits:
                          memory: "8Gi" # Increase for complex queries
                          cpu: "4000m"

Step 5: Check Node Resources

If individual pods are hitting node resource limits:

# Check node resource utilization
kubectl top nodes
kubectl describe nodes | grep -A 10 "Allocated resources"

# Check if pods are being evicted
kubectl get events -n <namespace> --sort-by='.lastTimestamp' | grep "Evicted\|OOMKilled"

Consider:

Increasing node pool instance sizes (e.g., from m5.large to m5.xlarge)
Adding more nodes to the cluster
Using dedicated node pools for memory-intensive workloads

Step 6: Use Filtering to Reduce Payload Size

The most effective solution is to reduce the initial payload by filtering stale data:

<script>
  import dayjs from 'dayjs';

  // Filter out entities older than 30 days
  const stateFilterConfig = {
    tables: ['patients', 'medications', 'appointments'],
    getMinTimestamp: ({ targetTimestamp }) => {
      return dayjs(targetTimestamp).subtract(30, 'day').toDate();
    }
  };
</script>

<MultiSessionReplay
  lobbyTitle="Replay"
  lobbyDescription="Historical system replay"
  lobbyButtonText="Start"
  {stateFilterConfig}
/>

This approach:

Reduces response payload size significantly
Decreases database query time
Lowers memory requirements across all components
Improves overall user experience with faster load times

Testing Changes:

After each change, verify the fix:

# 1. Apply the change
kubectl apply -f <modified-resource>.yaml

# 2. Wait for rollout
kubectl rollout status deployment/msr -n <namespace>

# 3. Test the endpoint
time curl -H "Authorization: Bearer $TOKEN" \
  "https://msr.yourcompany.com/replay/state?timestamp=2024-06-15T10:00:00Z" \
  -o /dev/null

# 4. Check for errors
kubectl logs -f deployment/msr -n <namespace> | grep -i "error\|timeout"

Recommended Production Configuration:

For a production system with moderate data volume (100K-1M entities):

Ingress timeouts: 300s (5 minutes)
Load balancer timeouts: 300s
MSR backend memory: 2-4Gi
Database memory: 8-16Gi
Frontend web server memory: 1-2Gi
Use filtering: Filter entities older than 30-90 days

For high-volume systems (>1M entities):

Increase all timeout limits to 600s (10 minutes)
MSR backend memory: 4-8Gi
Database memory: 16-32Gi
Aggressive filtering: Filter entities older than 7-30 days
Consider dedicated high-memory node pools

Monitoring Recommendations:

Set up alerts for:

# Prometheus alert examples
groups:
    - name: msr_production
      rules:
          - alert: MSRHighResponseTime
            expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket{endpoint="/replay/state"}[5m])) > 60
            annotations:
                summary: "MSR /replay/state responses taking >60s at p95"

          - alert: MSRHighMemoryUsage
            expr: container_memory_usage_bytes{pod=~"msr-.*"} / container_spec_memory_limit_bytes{pod=~"msr-.*"} > 0.9
            annotations:
                summary: "MSR pod using >90% of memory limit"

          - alert: MSRFrequent502Errors
            expr: rate(http_requests_total{status="502",service="msr"}[5m]) > 0.1
            annotations:
                summary: "MSR returning frequent 502 errors"

Development and Debugging

How do I add logs to the MSR Web Worker

The MSR Web Worker uses a custom logging system that sends all log messages to the main browser thread for proper console output. The worker includes a workerLog object with different log levels:

// Available log levels in the Web Worker
workerLog.debug("Debug message", { additionalData });
workerLog.info("Info message", { sessionData });
workerLog.warn("Warning message", { errorDetails });
workerLog.error("Error message", { errorContext });

To see Web Worker logs in your browser console:

Open browser Developer Tools (F12)
Go to the Console tab
Worker logs will appear with [MSR Worker] prefix
Use console filtering to show only MSR logs: [MSR Worker]

Common log messages to look for:

"Session created successfully" - Confirms session initialization
"Loaded entities into worker state" - Shows initial data loading
"Starting data polling" - Indicates background polling started
"Buffering changes" - Shows CDC data being processed
"Buffer size limit reached" - Warning about memory usage

Configuration and Planning​

How do I choose the right chunk interval for my system?​

How does MAX_PLAYBACK_RANGE affect my system?​

When should I set EARLIEST_VALID_TIMESTAMP?​

How do I optimize performance for high-frequency replay?​

Deployment Issues​

PostgreSQL WAL Level Not Set​

REPLICA IDENTITY Not Configured​

Replication Slot Issues​

Connector Configuration Problems​

Change Data Capture Issues​

My database changes are not being captured by Debezium. What's wrong?​

I see "slot does not exist" errors in Kafka Connect logs​

No data is appearing in the MSR CDC events table​

I'm getting "permission denied" errors when setting up connectors​

Disk Space Management​

How do I check disk space usage for the MSR database?​

Check Individual Hypertable Sizes​

Check All MSR Tables​

Performance Issues​

CDC processing is slow or lagging behind​

High memory usage in Kafka Connect containers​

Configuration Issues​

My JSLT transformation is failing with parsing errors​

Connector keeps restarting or failing​

Cloud Production Issues​

I'm getting 502 Bad Gateway errors when loading initial state in production. What's wrong?​

Step 1: Identify the Failing Component​

Step 2: Measure Request Characteristics​

Step 3: Increase Timeouts​

Step 4: Increase Memory Limits​

Step 5: Check Node Resources​

Step 6: Use Filtering to Reduce Payload Size​

Development and Debugging​

How do I add logs to the MSR Web Worker​

Configuration and Planning

How do I choose the right chunk interval for my system?

How does MAX_PLAYBACK_RANGE affect my system?

When should I set EARLIEST_VALID_TIMESTAMP?

How do I optimize performance for high-frequency replay?

Deployment Issues

PostgreSQL WAL Level Not Set

REPLICA IDENTITY Not Configured

Replication Slot Issues

Connector Configuration Problems

Change Data Capture Issues

My database changes are not being captured by Debezium. What's wrong?

I see "slot does not exist" errors in Kafka Connect logs

No data is appearing in the MSR CDC events table

I'm getting "permission denied" errors when setting up connectors

Disk Space Management

How do I check disk space usage for the MSR database?

Check Individual Hypertable Sizes

Check All MSR Tables

Performance Issues

CDC processing is slow or lagging behind

High memory usage in Kafka Connect containers

Configuration Issues

My JSLT transformation is failing with parsing errors

Connector keeps restarting or failing

Cloud Production Issues

I'm getting 502 Bad Gateway errors when loading initial state in production. What's wrong?

Step 1: Identify the Failing Component

Step 2: Measure Request Characteristics

Step 3: Increase Timeouts

Step 4: Increase Memory Limits

Step 5: Check Node Resources

Step 6: Use Filtering to Reduce Payload Size

Development and Debugging

How do I add logs to the MSR Web Worker