bala@portfolio ~ ~/bay-area $ whoami

Nitturi
Balasubramanyam

data_engineer.py

I build reliable ETL and streaming systems, optimize pipeline performance, and deliver analytics-ready datasets. Strong focus on orchestration, data quality, and production troubleshooting.

airflow spark python sql aws kafka data_quality warehousing azure
profile.jpeg — preview
Balasubramanyam profile photo
35%
ETL runtime
reduced
20%
compute
cost cut
25%
accuracy
improved

// projects

featured_work

Production-style projects with clear architecture, tradeoffs, and measurable outcomes.

🤖
// ai-powered
airflowclaude_api aws_s3dockerslack

llm_dq_monitor

AI-augmented pipeline that detects anomalies in transaction data and delivers plain-English root cause analysis to Slack — automatically, without manual log triage.

  • Integrated Claude API across 9 DQ rules — RCA in under 2 seconds
  • Full pipeline: S3 ingest → validation → LLM → Slack, runs daily
  • Replaced raw JSON alerts with AI-generated summaries
🌊
kafkaspark_streaming lakehousemonitoring

realtime_financial_pipeline

Streaming pipeline for transaction events with event-time handling, deduplication, and checkpoint-based recovery.

  • Handled late/out-of-order events using event-time logic
  • Prevented duplicates via idempotent writes
  • Recovered reliably using checkpoints
👤
ingestionincremental sparkwarehouse

customer_360_platform

Unified customer model built from multiple systems with incremental processing and quality gates.

  • Unified CRM + payments + support into one customer model
  • Improved runtime using incremental processing strategy
  • Quality checks: null / uniqueness / freshness
📡
dq_checksmetrics alertsdashboards

data_observability_framework

Framework to detect "pipeline succeeded but data is wrong" using anomaly signals and freshness metrics.

  • Row-count, freshness, and schema drift monitoring
  • Alerting workflow for fast root cause analysis
  • Reduced time-to-detect for data issues

// skills

tech_stack

Focused skill set aligned to modern data engineering roles.

skills.sh — zsh
$ list_skills --category "core"
python sql spark / pyspark airflow kafka etl / elt data_quality data_modeling
$ list_skills --category "cloud"
aws_s3 athena ec2 redshift azure_data_factory azure_synapse azure_data_lake
$ list_skills --category "devops"
docker ci_cd monitoring git bash
$ list_skills --category "ml_and_ai"
claude_api llm_integration ml_workflows training_datasets
$ _

// experience

work_history

Impact-focused work with production ownership.

2024 – present

Intuit

California, US

// current

data_engineer()

Designed and maintain scalable data pipelines supporting machine learning and analytics use cases. Work primarily with Python and Spark to ingest, clean, and process large volumes of structured, semi-structured, and unstructured data used in supervised learning workflows. Responsibilities include preparing training datasets, implementing data validation and quality checks, and ensuring datasets remain consistent and reproducible across model iterations. Applied LLM-assisted analysis to improve pipeline debugging and data issue triage, enabling faster human review and more reliable production workflows.

2023 – 2024

Ameriprise Financial

Minnesota, US

data_engineer()

Built and scaled enterprise data pipelines supporting analytics and downstream ML use cases. Designed and implemented ETL workflows using Azure Data Factory and Python to ingest, transform, and standardize data from multiple sources. Focused on data cleansing, validation, and monitoring to ensure datasets were reliable and ready for analytical consumption. Optimized SQL transformations to improve pipeline performance and data availability.

2019 – 2022

Michael Page

Hyderabad, India

data_engineer()

Developed and maintained Spark-based ETL pipelines to ingest and transform data from Oracle, SQL Server, and Teradata into HDFS, supporting large-scale analytics workloads. Eliminated 17+ hours per week of manual reporting by translating business requirements into analytics-ready datasets. Implemented reliable database ingestion using Sqoop and orchestrated workflows with Oozie. Tuned Spark jobs for large-scale transformations and prepared curated datasets for BI dashboards used by cross-functional teams.

// certifications

credentials[]

Verified credentials. Click to view.

// contact

get_in_touch()

contact.sh — new message

let's_connect()

Actively looking for Data Engineer roles in the Bay Area. Best way to reach me is email — I respond quickly.