Experience

Zendesk

Applied AI Engineer II, Berlin, Germany | Jun 2024 - Present

Generative AI, LLM, RAG Systems

System Reliability & Performance: Led major initiative to stabilize and accelerate uGPT imports, achieving 3x faster processing and 2x load capacity. Reduced retrieval latency by up to 75% (2 KBs) to 10x (10-30 KBs) through optimizations including removal of cosine similarity operations, query optimization, and caching strategies. Responded to critical incidents and eliminated recurring import failures.
Code Quality & Infrastructure: Introduced comprehensive code quality initiatives including pre-commit hooks, PyTest framework, MyPy type checking, and linting/formatting checks in CI (adopted across teams). Optimized CI/CD workflows, reducing Docker image size from 4.1GB to 370MB and build times from 20-25 minutes to 3-6 minutes through migration to uv package manager and workflow consolidation.
Feature Development & Model Rollouts: Fixed critical chunking bugs that increased Bot understood rate and decreased "not understood" responses. Supported custom instructions implementation and coordinated GPT-4o A/B testing with prompt migration. Led Text-Embedding-3-Large rollout resulting in higher Bot understood rate and lower escalation rate. Added batching for embedding computation achieving 2x faster imports.
Monitoring & Observability: Built comprehensive monitoring infrastructure using Datadog, Sentry, and Grafana with dashboards, alerts, and Prometheus metrics for imports and latency. Refined Sentry alert rules to eliminate false positives and accelerate triage. Established monitoring as reference implementation for other teams.
Infrastructure & Scalability: Drove ZOS migration (85% complete) and contributed to internal libraries (language-utils, db-utils, service-utils). Identified major scalability risks and designed mitigation strategy for adding OpenSearch cluster for new customers. Prevented excessive sharding through automated cleanup of orphaned indexes, reducing costs and improving cluster health. Removed secondary chunks and embeddings, reducing OpenSearch storage by 38% and improving import speed by 26%.
Documentation & Knowledge Sharing: Authored comprehensive documentation on indexing/chunking end-to-end, staged releases, A/B testing setup, and import debugging guides. Updated AI/ML onboarding documentation. Delivered multiple knowledge transfer sessions to different teams on import processes, A/B testing and others.
Collaboration & Leadership: Resolved high volume of support tickets ensuring smooth operations. Evaluated 20+ coding assignments and led/assisted in 10+ technical interviews for hiring across different internal Teams. Worked closely with cross-functional teams including research scientists, product managers, and engineers to deliver production-ready AI solutions.

Python LLMs RAG OpenSearch MongoDB Datadog Sentry Grafana GitHub Actions Google Cloud PyTest Docker

Hugging Face

Cloud Machine Learning Engineer, Berlin, Germany | Jan 2024 - May 2024

Google Cloud, LLM, MLOps

Spearheaded the development of custom containers to simplify developers’ fine-tuning and deployment experience on Google Cloud’s Vertex AI and GKE platforms, utilizing GPUs and TPUs efficiently.
Developed examples and use cases demonstrating the usage of containers, particularly focusing on fine-tuning and deployment of LLMs.

Python PyTorch Hugging Face Google Cloud Vertex AI

ML6

Machine Learning Engineer, ML in Production, Berlin | Dec 2021 - Dec 2023

Generative AI, LLM, MLOps

Created an intelligent recipe recommendation API utilizing large language models (LLMs) to offer personalized recipe suggestions based on user queries for a US Retail giant. Developed the API to consider user queries, incorporating additional factors like location and festive occasions. Leveraging LLMs, the API dynamically formats queries and retrieves curated recipe lists from the enriched database.
Developed an Azure ML pipeline to enhance recipe information, generating attributes like cuisine types, dietary restrictions, and cooking time to enrich the existing database.

Python Microsoft Azure Terraform LangSmith OpenAI GPT-4

Computer Vision, MLOps

Developed a scalable Vertex AI pipeline for training a state-of-the-art semantic segmentation model to detect fungus on leaves for a Swiss-Chinese AgroTech company. Successfully deployed the quantized TFLite model on smartphones, enabling efficient inferencing in the field with a remarkable latency of 300ms.
Developed another scalable Vertex AI pipeline for the same Swiss-Chinese AgroTech company to accurately count the number of pods while simultaneously employing instance and semantic segmentation for distinguishing between healthy and diseased parts within each pod. Models were successfully deployed on smartphones, offering real-time inference capabilities. Additionally, the models were optimized through quantization and converted to TFLite for enhanced efficiency.

Python TensorFlow Semantic Segmentation Instance Segmentation Google Cloud

Generative AI, Computer Vision, NLP

Finetuned Text2Img and Image Variations Stable Diffusion models for generating stickers, print designs and artistic inspirations. Deployed it as a scalable service on AWS EC2 instance and models are integrated into the company’s (Creative Fabrica) e-commerce platform, allowing users to generate custom artistic images using text and image prompts. [Text2Img Model Link, Image Variations Model Link ]
Finetuned a deep learning model (ISNet) for removing solid background from images generated using Stable Diffusion models. Used as a postprocessing step and helps in creation of images with transparent background.
More than 3 million images were generated in the first month. [ TechCrunch Article Link ]

Python PyTorch HuggingFace Diffusers AWS Fast API Pillow

Recommender System, MLOps, Data Engineering

Developed highly scalable, automated, and robust ETL pipelines for the ingestion of large volumes of catalog (more than 100k) and user events (more than 1.2 million) data everyday using Apache Beam from client’s (a German pan-European retail chain) FTP server to Google Cloud Retail API storage.
Utilized established pipelines to build and deploy recommendation models: Similar Items and Frequently Bought Together using Google Cloud Retail API for a retail company’s e-commerce platform, resulting in a 300k Euros/week revenue increase due to a 40% increase in conversion rate.
Utilized Terraform and GitHub Actions to setup the infrastructure required for developing the functionality.

Python Apache Beam Google Cloud GitHub Actions Terraform CI/CD

MLOps, NLP

Developed multiple scalable machine learning pipelines for efficient and accurate inference using Apache Beam RunInference API.
Utilized different machine/deep learning models available in PyTorch, Tensorflow, and Scikit-Learn to illustrate how one can use Apache Beam for building machine learning pipelines.
Some of the pipelines are: Large Language Model Inference in Beam , Using TensorRT with Beam , and Per Entity Training in Beam . The pipelines were contributed to Apache Beam and are available as examples in the Apache Beam documentation .

Python Apache Beam Google Cloud PyTorch TensorFlow HuggingFace Transformers NVIDIA Triton

MLOps, Computer Vision, NLP

Developed multiple end-to-end GCP Vertex AI based ML pipelines for tasks including text-classification, semantic segmentation, and others, easily adaptable to other machine learning tasks.
The generic pipeline has been presented at industry meetups and utilized in multiple projects.

Python PyTorch Vertex AI KubeFlow Google Cloud HuggingFace Transformers CI/CD

Bosch Center for Artificial Intelligence

Master Thesis, Robust Deep Learning, Renningen, Germany | May 2021 - Nov 2021

Master Thesis, Computer Vision

Tackled the problem of Label Noise in Semantic Segmentation. Developed a two-stage generic framework for reducing amount of noise using semi-supervised learning. The proposed framework can be easily extended to deal with label noise for other computer vision tasks such as object detection and instance segmentation.
The framework reduced the noise from 100% to 33.33% and therefore helped in improving the mean Intersection over Union (mIoU) by 9% on the corrupted CityScapes validation dataset.
Discovered that both discarding and pseudo-labeling noisy pixels are effective strategies for dealing with asymmetric label noise. As a result of experiments, discovered that semi-supervised learning is more robust to noise compared to supervised learning

Python PyTorch Matplotlib NumPy Pandas

Max Planck Institute for Intelligent Systems

Research Assistant, Bethge Lab , Tübingen, Germany | Apr 2020 - Oct 2021

Deep Learning, Computer Vision

Worked on multiple projects in the field of computer vision, with a particular focus on topics such as invariant representation learning and pruning to make neural networks more efficient, and faster.
Contributed to project ideation and hypothesis development, as well as developing robust codebases to validate research findings.
Proposed Generalized Invariant Risk Minimization (GIRM) [NeurIPS Workshop Paper Link], a technique that takes a pre-specified adaptation mechanism and aims to find invariant representations that (a) perform well across multiple different training environments and (b) cannot be improved through adaptation to individual environments.
Worked with Steffen Schneider under the guidance of Wieland Brendel and Prof. Matthias Bethge .

Python PyTorch PyTorch Lightning Matplotlib Docker Slurm

Samsung Research

Applied Research Engineer,On Device Intelligent Search, Bangalore | Jun 2018 - Sep 2019

Information Retrieval, Deep Learning, NLP

Developed a sophisticated deep learning-based model for keyword extraction, utilizing application descriptions from AppStore. Successfully commercialized the model for Samsung's flagship smartphones, where it was triggered daily to generate keywords for newly developed apps.
The search index stored the extracted keywords, which led to a significant 25% increase in recall for application search on mobile devices. e.g. Four Keywords for Uber Eats are Food, Delivery, Order, Restaurant.
The project involved designing and training the model, optimizing its performance through continuous experimentation, and integrating it with the Samsung Search app.
Published two research papers showcasing innovative approaches in the field. Presented a novel method for app clustering, classification, and retrieval using app-embeddings at CICLing 2019 [Paper Link ], resulting in improved end-user experience with mobile apps. Also, published a paper at NLDB 2019 [Paper Link ] on a multi-task neural architecture that predicts categorical parameters like app category and ratings by jointly modeling app descriptions and reviews.

Python TensorFlow Keras NLTK Spacy Java

Information Retrieval, Machine Learning, NLP

Developed and integrated a machine learning model using Apache OpenNLP for Name Entity Recognition and Stanford Core NLP for processing temporal expressions within the Gallery App, enabling natural language query search (e.g. Photos of me and Nikhil from last year in Paris) functionality.
Through rigorous experimentation and optimization, successfully delivered a feature that enhances the user experience and revolutionizes the way users search for their desired images.
Awarded the Best Demo prize at Samsung’s Annual Technical Event for the Proof of Concept feature developed within the Gallery App.

Python Apache OpenNLP Stanford CoreNLP NLTK Spacy Java

Samsung Research

Deep Learning Intern, Voice Assistant Team, Bangalore | May 2017 - July 2017

Deep Learning, NLP, Text Classification

Designed and implemented a Proof of Concept (POC) text classification model for detecting hate speech using LSTM and Word2Vec embeddings.
The developed model resulted in a 15% increase in the F1 score for hate speech detection compared to the baseline, paving the way for improved decision-making capabilities in a variety of applications.
The developed model was then deployed into an App using TFLite for demo purposes. The deployment process involved optimizing the model for mobile devices and ensuring its compatibility with the app's existing infrastructure.

Python TensorFlow Keras NLTK Spacy Scikit-Learn

IMT Atlantique, France

Research Intern, LUSSI Lab | Jan 2018 - Apr 2018

Machine Learning, Unsupervised Learning, Clustering

Conducted an end-to-end machine learning project under the guidance of Prof. Nicolas Jullien and Prof. Romain Billot aimed at identifying common behavior among online contributors of Wikipedia.
Performed a large-scale study on the contributing pattern of Wikipedia's online contributors utilizing different clustering algorithms and Principal Component Analysis using hand-engineered features, such as frequency of contributions and time gaps between contributions.
Utilized ANOVA and Student's t-test for examining statistically significant differences among different clusters , leading to a poster presentation in OpenSym 2018, the 14th International Symposium on Open Collaboration [ Paper Link ].

Python R Numpy Scikit-Learn Matplotlib

Work Experience