Sumeet Parekh

Senior Data Engineer

6+ years of experience designing and maintaining data pipelines for ML/AI applications and business intelligence. Specialized in scalable data infrastructure and real-time analytics for production environments.

About Me

I'm a Senior Data Engineer with a passion for building scalable data infrastructure that powers AI-ready systems and real-time analytics. With over 6 years of experience, I specialize in translating complex technical solutions into measurable business value.

My expertise spans across designing automated ETL pipelines, optimizing data processing performance, and architecting distributed systems that handle billions of data points daily. I've successfully collaborated with cross-functional teams including Engineering, Operations, and Product organizations to deliver impactful solutions.

25+
ETL Pipelines Built
60%
Performance Improvement
99.9%
System Uptime
$100k+
Cost Savings

Key Achievements

  • Built 25+ automated ETL pipelines for real-time insights
  • Architected hybrid distributed data platform with 10-node infrastructure
  • Reduced processing time by 60% for millions of hourly data points
  • Achieved 99.9% uptime across California and Michigan operations

Career Timeline

A visual journey through my professional and academic milestones

Natron Energy

Senior Data Engineer

Natron Energy

Jun 2023 - Present

Santa Clara, CA

Velodyne Lidar

Data Operations Engineer

Velodyne Lidar Inc.

Jun 2021 - Apr 2023

San Jose, CA

XY Health

Space Software Engineer

XY Health Inc.

Jul 2020 - May 2021

Cambridge, MA

Zylotech

Software Engineer - ML Co-op

Zylotech

Jun 2019 - Dec 2019

Boston, MA

Northeastern University

Master of Science in Computer Science

Northeastern University

Jan 2018 - May 2020

Boston, MA • GPA: 3.6

Performics

Data Analyst

Performics

Oct 2016 - Oct 2017

Mumbai, India

University of Mumbai

Bachelor of Engineering in Computer Engineering

University of Mumbai

Aug 2012 - May 2016

Mumbai, India • GPA: 3.5

Work Experience

Senior Data Engineer

Natron Energy, Santa Clara, CA

Jun 2023 - Present
Technologies: Python, Docker, DockerSwarm, DBT, Dagster, GCP, SQL, ETL, Tableau, Apache Spark, Fivetran
  • Built 25+ automated ETL pipelines to ingest ERP, IoT, and manufacturing data into Postgres, MySQL, and Oracle, then transformed the data using DBT and loaded it into BigQuery, enabling real-time insights for Product & Operations teams.
  • Optimized ingestion performance using multiprocessing and SQL tuning, reducing processing time by 60% for millions of hourly data points.
  • Architected a hybrid distributed data platform across California and Michigan with 10-node infrastructure, achieving 99.9% uptime by deploying Dagster both on-prem and on the cloud with Docker Swarm.
  • Leveraged Apache Spark for distributed processing of time series data from multiple equipment across California and Michigan, accelerating transformation layer for billions of daily data points.

Data Operations Engineer

Velodyne Lidar Inc., San Jose, CA

Jun 2021 – Apr 2023
Technologies: Python, ROS, AWS, Airflow, MLFlow, PostgreSQL, Multiprocessing, ETL
  • Engineered end-to-end ML data pipeline using Python, Airflow, SQL and AWS infrastructure, delivering datasets 3x faster to ML teams and clients, accelerating AI model development.
  • Developed an automated data filtering system by implementing YOLO object detection, reducing external vendor costs by $100k+ annually.
  • Integrated MLflow model versioning with containerized AWS deployment, reducing model training setup time by 70% and accelerating overall development of object detection and segmentation models.

Space Software Engineer

XY Health Inc., Cambridge, MA

Jul 2020 - May 2021
Technologies: Python, GCP, GIS, ETL, Docker, Postgres
  • Designed geospatial processing system using custom algorithms and optimized data retrieval, achieving 2-minute satellite data access for any US location.
  • Implemented geospatial processing scripts using GDAL and Rasterio libraries for image manipulation and tiling, reducing manual processing time by 80% and enabling automated image preparation workflows.

Software Engineer - Machine Learning Co-op

Zylotech, Boston, MA

Jun 2019 - Dec 2019
Technologies: Python, PyTest, GCP, Kafka, Docker, BigQuery, Multiprocessing, ETL
  • Integrated ML models with Kafka streaming using Docker containerization and GCP deployment architecture, enabling real-time predictions with sub-second latency for live customer data, supporting AI applications.
  • Optimized feature engineering pipeline by implementing Python multiprocessing across model training stages, reducing overall training time by 85% and enabling faster model iteration cycles for improved business responsiveness.

Data Analyst

Performics, Mumbai, India

Oct 2016 - Oct 2017
Technologies: Google Analytics, MS Excel, SQL, Google Adwords, Facebook Business, LinkedIn Ads, Twitter Ads
  • Conducted in-depth analysis of performance data from multiple marketing platforms, Lead Management Systems, and competitive intelligence using MS Excel, Google Analytics, and SQL to extract actionable business insights that directly influenced strategic decisions.
  • Enhanced the performance of SEM and Paid Social campaigns through data-driven optimization strategies, leading to a 35% monthly budget increase based on demonstrable ROI improvements and campaign effectiveness metrics.

Technical Skills

A comprehensive overview of my technical expertise and tools

Data Engineering & Orchestration

PythonSQLETL/ELTDagsterAirflowDockerMultiprocessing

Cloud & Infrastructure

GCPAWSDocker SwarmKubernetesAWS S3GCP Buckets

Databases

PostgreSQLMySQLAWS RDSGCP Cloud SQLSQL ServerSupabase

Data Streaming & Integration

KafkaDebezium CDCFivetranREST APIsMulti-source data integration

Big Data & Storage

Apache SparkBigQueryDelta LakeParquet

Dashboards & CI/CD

TableauPower BIApache SupersetPosthogCircle CIDocker HubGit actions

Featured Projects

HireCopilot (hirecopilot.io)

Full-Stack Job Application Tracking Platform

Technologies: Next.js, FastAPI, PostgreSQL, Gmail API, Google Gemini AI, OAuth2, JWT, Circle CI

Key Features

  • Built a full-stack job application tracking platform used by global users for real-time hiring insights
  • Engineered an intelligent email processing system with async tasks, rate limiting, and real-time updates, syncing emails every 2 mins and processing 5K+ historical emails/user during onboarding
  • Implemented OAuth2 authentication with Google, JWT-based session security, and optimized SQLAlchemy ORM for high-performance data storage
  • Designed an AI-driven agentic workflow using Gemini API for automated email categorization, hiring data extraction, and personalized recommendations

Technical Highlights

  • Established CI/CD pipeline using CircleCI for automated testing and deployment, ensuring reliable code delivery and platform stability
  • Real-time email synchronization with intelligent rate limiting
  • AI-powered data extraction and categorization
  • Scalable architecture supporting global user base

Articles & Publications

GDAL Article Logo

Generating Map Tiles at Different Zoom Levels Using GDAL2Tiles in Python

Medium • Geek Culture

April 10, 2021 • 4 min read

A comprehensive guide on using GDAL2Tiles library in Python to generate map tiles at different zoom levels. This article covers the challenges of handling high-resolution raster data, memory optimization, and bandwidth considerations when working with geospatial imagery from commercial satellites.

Key Topics Covered:

GDAL2TilesPythonGeospatial DataMap TilingSatellite ImageryGIS

Technical Highlights:

  • Step-by-step implementation of GDAL2Tiles library for map tile generation
  • Memory optimization techniques for high-resolution raster data processing
  • Practical examples using OpenMapTiles data with 20m/px spatial resolution
  • Configuration options for different zoom levels and tile sizes

Article Performance:

10.7K
Views
6.4K
Reads
Read Article
2 responses

Education

Master of Science in Computer Science

Northeastern University, Boston, MA

Jan 2018 - May 2020

GPA:3.6

Bachelor of Engineering in Computer Engineering

University of Mumbai, India

Aug 2012 - May 2016

GPA:3.5