A complete archive of data engineering work — pipelines, batch systems, cloud architectures, and analytics platforms.
8 Projects
Production-grade retail analytics pipeline using synthetic e-commerce data and a full Medallion architecture — from raw ingestion to Looker Studio dashboards.
Fully automated pipeline extracting Kaggle sales data via dlt, loading into BigQuery, and transforming to a business-ready Gold layer with dbt — orchestrated by Airflow.
Automated batch data pipeline ingesting S&P 500 ETF data from yfinance into AWS S3 via Medallion architecture, transformed with Apache Spark and visualised in Superset.
Daily batch pipeline ingesting sector ETF data, computing financial KPIs (moving averages, returns) using Spark, stored on S3 with Medallion layers and visualized live.
Big data ETL pipeline over Kaggle startup funding datasets, loaded into PostgreSQL and surfaced via an interactive Streamlit dashboard — fully containerized with Docker Compose.
Real-time CRM data pipeline integrating the Affinity API to ingest deals and interactions, compute relationship warmth scores, and surface actionable CRM insights.
Data pipeline that ingests YouTube channel and video metadata, processes content signals, and surfaces intelligence for content strategy and audience growth analysis.
Cloud-native weather data pipeline on Google Cloud Platform, streaming live meteorological data through automated ingestion and transformation workflows.