Developer-turned-SRE with 6+ years of experience building and operating production-grade distributed systems. Currently ensuring platform reliability at Solidigm (SK Hynix), managing infrastructure, observability, and incident response across mission-critical environments.
I started my career as a software engineer at a high-growth startup, building backend services and RESTful APIs with Node.js and PostgreSQL. That hands-on experience shipping features under tight deadlines shaped my ownership mentality and natural gravitation toward reliability engineering.
Today, I combine strong backend engineering skills in Node.js, Java, and Python with deep operational expertise to ensure systems stay up, perform well, and scale gracefully. I specialize in CI/CD automation, incident management, capacity planning, and production readiness for event-driven microservices.
I hold a Master of Science (Summa Cum Laude) from the Virginia Institute of Science and Technology and a B.Tech in Computer Science from JNTUH, Hyderabad.
Infrastructure at Scale
Managing multi-tenant enterprise platforms on AWS with Docker, Kubernetes, and Terraform across production environments.
Full-Stack Observability
Building monitoring strategies with Splunk, Dynatrace, Prometheus, and Grafana covering 50+ API endpoints with SLI/SLO tracking.
AWS Certified
AWS DevOps Associate certified with deep expertise across EC2, ECS, Lambda, RDS, S3, SQS, SNS, and CloudWatch.
02.Where I've Worked
Site Reliability Engineer @ Solidigm
June 2024 - Present ยท San Jose, California
Own end-to-end production reliability for a multi-tenant enterprise platform, managing infrastructure provisioning, CI/CD pipelines, observability, and incident response.
Architected a comprehensive monitoring strategy using Splunk, Dynatrace, CloudWatch, Prometheus, and Grafana with 20+ dashboards covering API latency (p50/p95/p99) and error budgets.
Designed zero-downtime deployment pipelines using GitHub Actions, reducing failed deployments by 70% and cutting deployment time from 25 to 8 minutes.
Automated operational toil using AWS Lambda, EventBridge, and Python/Bash scripts, eliminating 15+ hours/week of manual effort.
Led incident management for production outages, reducing recurring incidents by 55% through preventive automation and blameless post-mortems.
Architected a comprehensive monitoring strategy with 20+ dashboards covering infrastructure health, API latency (p50/p95/p99), error budgets, and resource utilization. Integrated Splunk, Dynatrace, CloudWatch, Prometheus, and Grafana for full-stack visibility across a multi-tenant platform.
SplunkDynatracePrometheusGrafanaCloudWatchPython
Project links
Zero-Downtime CI/CD Pipeline
Solidigm
Designed deployment pipelines using GitHub Actions with parallel jobs, health checks, atomic release switching, and automated rollback. Reduced failed deployments by 70% and cut deployment time from 25 minutes to 8 minutes with blue-green and canary strategies.
GitHub ActionsDockerAWS ECSTerraformNginx
Project links
Event-Driven Order Pipeline
ValueLabs
Engineered a Kafka-based order processing pipeline handling 50K+ daily orders with 8 topic partitions and 3 consumer groups. Implemented dead-letter queues, exactly-once delivery semantics, and a Redis caching layer that cut p99 latency from 450ms to 85ms.
Apache KafkaRedisNode.jsPostgreSQLAWS SQS
Project links
Serverless Notification Service
ValueLabs
Built a high-throughput notification microservice processing 100K+ daily events via AWS SQS FIFO queues with exponential backoff retry logic. Implemented fan-out dispatch using SNS with topic filtering across email (SES), SMS, and push channels.
AWS LambdaSQSSNSSESEventBridgePython
Project links
Infrastructure as Code Platform
Solidigm
Engineered repeatable provisioning of EC2, RDS, S3, Lambda, EventBridge, and IAM resources using Terraform and AWS CDK. Maintained environment parity across dev/staging/prod with least-privilege IAM access controls and automated compliance checks.
TerraformAWS CDKCloudFormationIAMPythonBash
Project links
Production Toil Automation
Solidigm
Automated 15+ hours/week of operational toil using AWS Lambda and EventBridge for nightly maintenance: usage quota recalculation, stale session cleanup, metrics aggregation, and certificate rotation across production environments.
AWS LambdaEventBridgePythonBashCloudWatch
04.Skills & Technologies
Languages
Python
Node.js
JavaScript
Java
Go
Bash
Cloud & Infra
AWS
Docker
Kubernetes (EKS)
Terraform
Linux
Nginx
Observability
Splunk
Dynatrace
Prometheus
Grafana
Datadog
OpenTelemetry
Backend & APIs
Express.js
Spring Boot
GraphQL
REST APIs
Prisma
Sequelize
Databases
PostgreSQL
MySQL
MongoDB
Redis
CI/CD & DevOps
GitHub Actions
Jenkins
GitOps
SonarQube
Blue-Green
Canary
SRE Practices
SLI/SLO/SLA
Incident Mgmt
RCA
Runbooks
On-Call
Error Budgets
Messaging
Apache Kafka
AWS SQS/SNS
EventBridge
Dead-Letter Queues
05. What's Next?
Get In Touch
I'm always open to discussing new opportunities, interesting projects, or just connecting with fellow engineers. Whether you have a question or just want to say hi, feel free to reach out.