AWS & GCP Certified Data Engineer

Hi, I'm
Mesum.

With 2+ years of professional experience, I design and build production-grade data pipelines, data models, and transformation workflows. I deliver clean, analytics-ready data through warehouses, dashboards, and APIs that power real business decisions.

Mesum Portrait

Services & Expertise

End-to-end data engineering: from raw ingestion to dashboards, APIs, and ML-ready data products.

ETL & ELT Pipelines

Production-grade ingestion pipelines with data validation, incremental loads, backfills, Reverse ETL, and schema evolution. From raw sources to warehouse-ready tables, reliably.

  • Python
  • dlt
  • Airflow
  • Kafka
  • Reverse ETL
  • CDC
  • Incremental Loads
🏗️

Data Modeling & Transformation

Dimensional models, star schemas, and Medallion-layered transformations (Bronze to Silver to Gold). Converts raw data into query-optimized, analytics-ready assets with full lineage tracking.

  • dbt
  • Ad-Hoc SQL
  • Dimensional Modeling
  • Star Schema
  • Data Quality
  • Data Lineage
  • Data Governance
☁️

Cloud Data Warehousing & Dashboards

Scalable cloud warehouses with partitioning, clustering, and row-level security. Connected to live dashboards and self-serve BI layers built for non-technical stakeholders.

  • BigQuery
  • Snowflake
  • Redshift
  • Looker Studio
  • Apache Superset
  • Dashboards
  • Query Optimization
🤖

Automation, APIs & ML Integration

REST and webhook APIs, autonomous agents, and ML model integration layers that operationalize machine learning outputs into downstream data pipelines and business systems.

  • REST APIs
  • FastAPI
  • ML Models
  • LangChain
  • OpenAI APIs
  • Docker
  • Webhooks

Skills & Tools

The full stack I use to build production data systems, from ingestion to dashboards.

Languages

Python, SQL, Ad-Hoc SQL, JavaScript, TypeScript, Java, C

Data Engineering

ETL / ELT, Reverse ETL, Data Modeling, Dimensional Modeling, Transformation, dbt, dlt, Airflow, Kafka, Apache Spark, CDC, Orchestration

Warehouses & Analytics

BigQuery, Snowflake, Redshift, PostgreSQL, Star Schema, Data Lineage, Data Quality, Query Optimization, Partitioning, Clustering

Dashboards & APIs

Looker Studio, Apache Superset, Streamlit, REST APIs, FastAPI, Webhooks, ML Models, LangChain, OpenAI

Cloud & Governance

GCP, AWS S3, Docker, Data Governance, Data Observability, Data Catalog, CI/CD, Row-level Security

Featured Work

Real-world projects spanning end-to-end pipelines, data modeling, transformation layers, dashboards, and cloud architectures.

Retail Payment Intelligence Pipeline

Production-grade retail analytics pipeline using synthetic e-commerce data and a full Medallion architecture — from raw ingestion to Looker Studio dashboards.

  • dlt
  • dbt
  • BigQuery
  • Airflow
  • Looker Studio
View on GitHub →

E-Commerce Sales Analytics Pipeline

Fully automated pipeline extracting Kaggle sales data via dlt, loading into BigQuery, and transforming to a business-ready Gold layer with dbt — orchestrated by Airflow.

  • dlt
  • dbt
  • BigQuery
  • Airflow
  • Python
View on GitHub →

AWS Spark Batch

Automated batch data pipeline ingesting S&P 500 ETF data from yfinance into AWS S3 via Medallion architecture, transformed with Apache Spark and visualised in Superset.

  • Apache Spark
  • Airflow
  • AWS S3
  • PostgreSQL
  • Superset
View on GitHub →

ETF Batch Pipeline — Apache Spark, Airflow & Superset

Daily batch pipeline ingesting sector ETF data, computing financial KPIs (moving averages, returns) using Spark, stored on S3 with Medallion layers and visualized live.

  • Apache Spark
  • Airflow
  • AWS S3
  • Apache Superset
  • yfinance
View on GitHub →

Ready to build a reliable data foundation?

Let's discuss how customized data architecture can accelerate your business.