I'm Victor, a Data/Analytics Engineer specializing in building scalable data pipelines and lakehouse solutions. I've implemented Medallion Architecture in Databricks, orchestrated ETL/ELT pipelines with dbt and Airflow, and delivered data products that empower business teams.
My tech stack includes Python, PySpark, SQL, dbt, Azure, and Databricks. This portfolio highlights some of my side projects and technical explorations.
A high-performance Data Ingestion Project built with the Python dlt library. It is designed to move data from PostgreSQL to Databricks using CDC (Change Data Capture) for efficient synchronization. Orchestrated natively by Databricks Lakeflow Jobs, this project serves as a robust blueprint for enterprise data replication.
Technical Assessment Projects
Projects developed as part of technical assessment processes, demonstrating comprehensive problem-solving abilities and technical skills.
An EL (Extract, Load) pipeline that fetches market data (price, volume, market cap) for BTC, ETH, and LTC from CoinMarketCap API and stores it in DuckDB.
Features
Automated data extraction from CoinMarketCap API
Error handling and logging system
Data storage in DuckDB database
Tech Stack
Python
dlt
DuckDB
CoinMarketCap API
Skills Applied
API integrationData pipeline developmentSQL querying
Turns the static Chinook dataset into a living, chaotic OLTP simulator. It generates not just new sales (INSERTs), but also simulates data corrections (UPDATEs) and cancellations (DELETEs), creating a real-world data source for testing advanced pipelines (CDC, SCD Type 2).
Features
Simulates the full data lifecycle: INSERT, UPDATE, and DELETE
Models UPDATEs/DELETEs as late-arriving changes within a 90-day window
Ensures ACID compliance (all-or-nothing) for each D-1 batch
Includes a verification script to audit simulation logs against the DB state
Tech Stack
Python
PostgreSQL
Neonuv
Skills Applied
Data GenerationSQL FunctionsSystem AutomationCLI Integration
A high-performance Data Ingestion Project built with the Python dlt library, moving data from PostgreSQL to Databricks using CDC for efficient synchronization.
Features
High-performance ingestion using Python dlt
Change Data Capture (CDC) synchronization
PostgreSQL to Databricks replication
Orchestration via Databricks Lakeflow Jobs
Tech Stack
Python
dlt
PostgreSQL
Databricks
Skills Applied
CDC / Data ReplicationData IngestionLakeflow Orchestration