Data Pipeline Migration

Databricks → AWS (RDS PostgreSQL + ECS Fargate + S3)

Retiring Current Architecture Databricks
API
Camelot OData
3PL · 18 endpoints
QB
QuickBooks
Creme + Sage orgs
GS
Google Sheets
SWEED mapping
Nb
PySpark Notebooks
OData client + batch processing → Delta tables
Bronze
Raw Delta tables
18 tables
Silver
dbt SQL models
17 models
Gold
dbt SQL + Python
7 models
Job
Databricks Jobs
2 daily · UI-managed
WH
SQL Warehouse
Serves queries
Sec
Databricks Secrets
creme-scope
Δ
Delta Lake
Unity Catalog
DB
Creme Dashboard
databricks-sql-connector
RP
Reporting Portal
SQLite sync
T
Tableau
Being retired
Target Target Architecture AWS
API
Camelot OData
3PL · 18 endpoints
QB
QuickBooks
Creme + Sage orgs
GS
Google Sheets
SWEED mapping
EB
EventBridge
Cron: daily 3:00 AM EST
ECS Fargate Task (Docker container from ECR)
1
Ingest — Python scripts (requests + pandas) fetch data → write to RDS PostgreSQL bronze schema
2
Transformdbt run (dbt-core + dbt-postgres) builds silver → gold
3
Validatedbt test runs data quality + business logic checks
4
Notify — Slack webhook reports success or failure
PG
RDS PostgreSQL
All schemas: bronze, silver, gold · Single source of truth
bronze_camelot
Raw ingested data
18 tables
silver_camelot
dbt SQL models
17 models
gold_camelot
dbt SQL + Python
7 models
S3
S3 Bucket
Raw file staging + Parquet backups
SM
Secrets Manager
API credentials
DB
Creme Dashboard
psycopg2 / asyncpg → RDS
RP
Reporting Portal
Direct PostgreSQL queries
Migration Path
1
Export Bronze
Databricks bronze tables → S3 (Parquet) → COPY into RDS PostgreSQL
2
Rebuild Layers
dbt run rebuilds silver + gold in PostgreSQL from imported bronze
3
Validate
Compare row counts + key aggregates between Databricks and PostgreSQL outputs
4
Switch Consumers
Dashboard + Portal swap connectors to psycopg2 → RDS PostgreSQL
5
Decommission
Shut down Databricks SQL Warehouse, archive notebooks, cancel subscription