bala@portfolio ~ ~/bay-area $ whoami

Nitturi
Balasubramanyam

→data_engineer.py

I build reliable ETL and streaming systems, optimize pipeline performance, and deliver analytics-ready datasets. Strong focus on orchestration, data quality, and production troubleshooting.

view_projects() github.open() ↗ linkedin.open() ↗

✉ nitturi.balasubramanyam@gmail.com | ✆ +1 657-532-0248 | Bay Area, CA

airflow spark python sql aws kafka data_quality warehousing azure

profile.jpeg — preview

35%

ETL runtime
reduced

20%

compute
cost cut

25%

accuracy
improved

// projects

featured_work

Production-style projects with clear architecture, tradeoffs, and measurable outcomes.

⚡

// featured

airflowspark sqlredshiftdocker

automated_etl_pipeline

End-to-end ETL with orchestration, validation, Spark transformations, and warehouse loading. Includes performance tuning and failure recovery patterns.

Reduced pipeline runtime by 35% via query + Spark tuning
Cut compute cost by 20% via partitioning, joins, and caching
Improved accuracy by 25% with automated validation checks

view_details() → github() ↗

🤖

// ai-powered

airflowclaude_api aws_s3dockerslack

llm_dq_monitor

AI-augmented pipeline that detects anomalies in transaction data and delivers plain-English root cause analysis to Slack — automatically, without manual log triage.

Integrated Claude API across 9 DQ rules — RCA in under 2 seconds
Full pipeline: S3 ingest → validation → LLM → Slack, runs daily
Replaced raw JSON alerts with AI-generated summaries

github() ↗

🌊

kafkaspark_streaming lakehousemonitoring

realtime_financial_pipeline

Streaming pipeline for transaction events with event-time handling, deduplication, and checkpoint-based recovery.

Handled late/out-of-order events using event-time logic
Prevented duplicates via idempotent writes
Recovered reliably using checkpoints

github() ↗

👤

ingestionincremental sparkwarehouse

customer_360_platform

Unified customer model built from multiple systems with incremental processing and quality gates.

Unified CRM + payments + support into one customer model
Improved runtime using incremental processing strategy
Quality checks: null / uniqueness / freshness

github() ↗

📡

dq_checksmetrics alertsdashboards

data_observability_framework

Framework to detect "pipeline succeeded but data is wrong" using anomaly signals and freshness metrics.

Row-count, freshness, and schema drift monitoring
Alerting workflow for fast root cause analysis
Reduced time-to-detect for data issues

github() ↗

// skills

tech_stack

Focused skill set aligned to modern data engineering roles.

skills.sh — zsh

$ list_skills --category "core"

python sql spark / pyspark airflow kafka etl / elt data_quality data_modeling

$ list_skills --category "cloud"

aws_s3 athena ec2 redshift azure_data_factory azure_synapse azure_data_lake

$ list_skills --category "devops"

docker ci_cd monitoring git bash

$ list_skills --category "ml_and_ai"

claude_api llm_integration ml_workflows training_datasets

$ _

// experience

work_history

Impact-focused work with production ownership.

2024 – present

Intuit

California, US

// current

data_engineer()

Designed and maintain scalable data pipelines supporting machine learning and analytics use cases. Work primarily with Python and Spark to ingest, clean, and process large volumes of structured, semi-structured, and unstructured data used in supervised learning workflows. Responsibilities include preparing training datasets, implementing data validation and quality checks, and ensuring datasets remain consistent and reproducible across model iterations. Applied LLM-assisted analysis to improve pipeline debugging and data issue triage, enabling faster human review and more reliable production workflows.

2023 – 2024

Ameriprise Financial

Minnesota, US

data_engineer()

Built and scaled enterprise data pipelines supporting analytics and downstream ML use cases. Designed and implemented ETL workflows using Azure Data Factory and Python to ingest, transform, and standardize data from multiple sources. Focused on data cleansing, validation, and monitoring to ensure datasets were reliable and ready for analytical consumption. Optimized SQL transformations to improve pipeline performance and data availability.

2019 – 2022

Michael Page

Hyderabad, India

data_engineer()

Developed and maintained Spark-based ETL pipelines to ingest and transform data from Oracle, SQL Server, and Teradata into HDFS, supporting large-scale analytics workloads. Eliminated 17+ hours per week of manual reporting by translating business requirements into analytics-ready datasets. Implemented reliable database ingestion using Sqoop and orchestrated workflows with Oozie. Tuned Spark jobs for large-scale transformations and prepared curated datasets for BI dashboards used by cross-functional teams.

Nitturi
Balasubramanyam

featured_work

automated_etl_pipeline

llm_dq_monitor

realtime_financial_pipeline

customer_360_platform

data_observability_framework

tech_stack

work_history

data_engineer()

data_engineer()

data_engineer()

credentials[]

get_in_touch()

let's_connect()

NitturiBalasubramanyam

featured_work

automated_etl_pipeline

llm_dq_monitor

realtime_financial_pipeline

customer_360_platform

data_observability_framework

tech_stack

work_history

data_engineer()

data_engineer()

data_engineer()

credentials[]

get_in_touch()

let's_connect()

Nitturi
Balasubramanyam