I am a Software Engineer specializing in LLMs, Generative AI and Data Science, currently working at CVS Health (Aetna Insurance) in New York City. I'm passionately curious and I like building large scalable AI and LLM systems with a keen interest in Inference Optimization, GPUs, HPC, and parallelism. With a strong background in AI, Machine Learning, and Software Engineering, I focus on building scalable AI solutions and large language model applications. I hold an MS in Applied Data Science from The University of Southern California and have extensive experience in developing and deploying AI systems that impact/scale to millions of users.
Languages: Python, SQL, C, C++, CUDA (Intermediate), OpenMP (Intermediate)
Tools & Technologies: Analytics, Deep Learning, Machine Learning, A/B Testing, GCP Vertex AI, AWS, Kubernetes K8s, Docker, Git, Informatica ETL, OpenMP(Intermediate), Web Scraping/Automation, AI Research, Parallel Programming, Numpy, Pandas, Scikit learn, LangChain, Nvidia DGX Server, Nsight Profiling, Nvidia NeMo, LLMOps, RAG, VectorDBs, LLM Training and Inference techniques, HuggingFace, FastAPI, RESTful API, Responsible AI, Linux, MongoDB, Selenium, Hadoop HDFS, PySpark, PowerBI/Tableau, LLMs, Generative AI, MapReduce, Azure, Google Cloud Platform, BigQuery, HiveQL, Horovod, ArcGIS, MLOps, Jenkins CI/CD, NLP, Forecasting, Unsupervised ML, Generative Models
An in-depth exploration of LLM inference optimization techniques and GPU serving strategies for large language models.
Read ArticleInside AI data centers: what happens when an LLM request hits a GPU cluster—queuing, batching, distributed execution, and inference across NVIDIA-dominated hardware/software stacks.
Read ArticleJune 2024 - Present
February 2024 - May 2024
May 2023 - August 2023
January 2022 - July 2022
November 2020 - June 2021
August 2022 - May 2024
GPA: 3.7/4.0
Coursework: Machine Learning for Data Science and AI, Applications of Data Mining, Predictive Analytics, Fairness, Security and Privacy in AI
August 2018 - June 2022
GPA: 3.95/4.0
Coursework: Data Structures and Algorithms, Data Mining, Machine Learning, Distributed & High-Performance Computing, Cloud Computing
Understanding and Coding the KV Cache in LLMs from Scratch — A practical guide to KV caches for efficient LLM inference. Read Article
Understanding LLM System with 3-layer Abstraction — A three-layer model to reason about large language model systems. Read Article
GPT3 Implementation from Scratch — Implementing core transformer components (multi-head attention, FFN) to deeply understand the architecture. View on GitHub