Hemish Veeraboina

DevOps / SRE Engineer

DevOps and SRE engineer with 4+ years of experience building large-scale data pipelines, Kubernetes deployments, CI/CD automation, observability platforms, cloud infrastructure, and MLOps workflows across AWS, Azure, and on-prem. Hands-on with Docker, Kubernetes, Helm, Terraform, Jenkins, GitLab CI, Prometheus, Grafana, MLflow, BentoML, and distributed Python systems.

I focus on reliability, automation, ML platform engineering, and GPU-based LLM inference infrastructure—shipping platforms that stay healthy in production.

Skills

Core tooling across data platforms, Kubernetes, cloud infrastructure, observability, and ML systems.

Languages

Python, SQL, PySpark, Go

Data engineering

Dask, Databricks, Hadoop, HDFS

Orchestration

Prefect, Airflow, Mage

Kubernetes

K8s, K3s, Minikube, GKE, AKS, Helm, CRDs, controllers, operators

Backend

FastAPI, Django, Pydantic, SQLAlchemy 2.0, REST APIs

Cloud & DevOps

AWS, Azure, Docker, Terraform, Ansible, Git, GitOps, Argo CD, GitLab CI, Jenkins

Observability

Prometheus, Grafana, PromQL, CloudWatch, ELK

Databases & storage

PostgreSQL, MongoDB, MySQL, SQLite, S3, ADLS Gen2

ML / AI systems

MLflow, BentoML, vLLM, Triton Inference Server, Ray Serve

Experience

DevOps, data engineering, and SRE work across research, product, and enterprise environments.

National Internet Observatory, Northeastern University

Data Engineer / DevOps Engineer • MA, USA

Oct 2024 — Present

Built distributed data pipelines in Python using Marimo notebooks, Prefect, and Dask to migrate 5–10 million records per minute from MongoDB to PostgreSQL, with Pydantic for schema validation and SQLAlchemy 2.0 for ORM-based ingestion.
Containerized applications with Docker and deployed them to Kubernetes using Helm, Git, and GitLab CI, standardizing CI/CD for distributed pipelines on a daily cadence across cluster environments.
Implemented Kubernetes-based monitoring and observability using Prometheus, Grafana, and PromQL—metrics-driven dashboards for pipeline health, throughput, failures, resource utilization, and traffic across workloads processing 5B+ records.
Built a research-facing visualization platform on a DMZ-hosted VM: Django for authentication and invite-based provisioning, FastAPI/Pydantic APIs, Polars-powered live queries over PostgreSQL—secure self-service analytics for 100+ research users.

Adobe

Python Data Engineer • CA, USA

Aug 2024 — Oct 2024

Architected Project AJAX, an end-to-end event-driven Python pipeline extracting content from Apache AEM via Solr—5,000+ pages in under 10 minutes for ML-ready data generation.
Extended the pipeline with automated preparation for incremental ingestion, historical archiving, multi-format exports, and on-demand filtering—training-ready datasets to AI assistant teams across Acrobat, Photoshop, Lightroom, Firefly, and VEGA in under 5 minutes.

Cloud Data Works

Data Engineer Intern • TX, USA

Oct 2023 — Dec 2023

Engineered Azure Data Factory pipelines with Databricks and PySpark to ingest and transform Supabase data into ADLS Gen2, enabling ~10-second refresh cycles for near real-time student performance analytics.
Modeled analytics in Azure Synapse with SQL and CETAS over ADLS Gen2 data processed in Databricks/PySpark—derived features from Supabase sources cut custom report prep time by 65% for coaches and students.

Deloitte Touche Tohmatsu Limited

Solution Delivery Associate / Site Reliability Engineer • Hyderabad, India

Jan 2020 — Aug 2022

Built a serverless AWS security pipeline (Lambda, Step Functions, S3, Glue, CloudWatch) ingesting daily Security Hub findings—GuardDuty, Inspector, and Macie alerts cataloged for QuickSight risk visibility.
Developed Terraform templates and Ansible playbooks automating remediation across New York Life cloud infrastructure—~22% security posture improvement via scalable changes, incident response, and RCA aligned to SLA/SLO targets.
Engineered Jenkins CI/CD on RHEL/EC2 with blue-green deployments—~40% less deployment downtime, safer releases and faster rollback for First American enterprise infrastructure.

Projects & research

Selected work on GPU inference, MLOps, and platform engineering.

Benchmarking GPU-based LLM inference on Kubernetes (vLLM, Triton, Ray Serve)

Evaluated throughput, p50/p95 latency, time-to-first-token, concurrency, autoscaling, and GPU utilization using Docker, Helm, Prometheus, Grafana, and DCGM telemetry.

End-to-end MLOps for automated review workflows

MLOps pipeline using Python, MLflow, BentoML, Prometheus, and Grafana—experiment tracking, model registration, serving, monitoring, and version upgrades.

Publications

Peer-reviewed and research writing on Spark, deep learning, and NLP.

Estimation in Deregulated Environments with Spark and Big Data for Power Tracing

Apr 2023

Research on Apache Spark pipelines for energy tracing across deregulated grids with large telemetry volumes.

Read article

Facial Emotional Recognition Using Deep Convolutional Neural Networks

Sep 2021

Deep learning for accurate, real-time facial emotion detection with convolutional neural networks.

Read article

Learning Based Approach for Hindi Sentiment Analysis Using Naive Bayes Classifier

Aug 2020

Hindi sentiment analysis with Naive Bayes and classical NLP for low-resource settings.

Read article

Certifications

Cloud and infrastructure credentials aligned to how I design and run systems.

AWS Certified Solutions Architect – Associate

HashiCorp Certified: Terraform Associate

Education

Graduate and undergraduate training in computer science.

MS in Computer Science

San Jose State University

San Jose, CA

Aug 2022 — May 2024

BE in Computer Science and Engineering

M.V.S.R Engineering College

Telangana, India

Jul 2016 — Aug 2020

Contact

Let's connect

Open to DevOps, SRE, and platform engineering roles focused on reliability, Kubernetes, CI/CD, and data or ML infrastructure. Reach out by email or grab my résumé (PDF).

hemish.veer@gmail.com

Phone

(951) 316-0972

Connect