About Me

Rakshith Mathad

I am a Software Engineer specializing in Generative AI and Data Science, currently working at CVS Health in New York City. With a strong background in AI, Machine Learning, and Software Engineering, I focus on building scalable AI solutions and large language model applications. I hold an MS in Applied Data Science from The University of Southern California and have extensive experience in developing and deploying AI systems that impact millions of users.

Technical Skills

Languages

Python, SQL, C, C++, CUDA (Intermediate), OpenMP (Intermediate)

Tools & Technologies

Analytics, Deep Learning, Machine Learning, A/B Testing, GCP Vertex AI, AWS, Docker, Git, Informatica ETL, OpenMP(Intermediate), Web Scraping/Automation, AI Research, Parallel Programming, Numpy, Pandas, Scikit learn, LangChain, Nvidia DGX Server, Nsight Profiling, Nvidia NeMo, LLMOps, RAG, VectorDBs, LLM Training and Inference techniques, HuggingFace, FastAPI, RESTful API, Responsible AI, Linux, MongoDB, Selenium, Hadoop HDFS, PySpark, PowerBI/Tableau, LLMs, Generative AI, MapReduce, Azure, Google Cloud Platform, BigQuery, HiveQL, Horovod, ArcGIS, MLOps, Jenkins CI/CD, NLP, Forecasting, Unsupervised ML, Generative Models

Currently Cooking 🍳

CUDA + OpenMP Hybrid Vector Addition Project

Working on a high-performance CUDA + OpenMP hybrid vector addition project that combines CPU and GPU parallelism. The implementation achieves an 8.8x performance improvement over single CUDA streams by using 4 OpenMP threads with parallel CUDA streams for overlapped memory transfers and optimal load balancing.

GPT3 Implementation from Scratch

Building a GPT3 implementation from scratch to understand the transformer architecture and attention mechanisms. This project focuses on implementing the core components of the GPT model including multi-head attention, feed-forward networks, and the complete transformer architecture.

Experience

Software Engineer - Generative AI

CVS Health, New York City, NY

June 2024 - Present

  • Working in the Conversational AI customer service team to build a large-scale complex RAG and Rule based FastAPI chat application on GCP Google AI Platform with a very high impact
  • Implemented async/multithreaded/parallel semantic search using OpenAI GPT PTUs and Gemini LLMs, Feature Store and Matching Engine, improving latency and caching strategy
  • Scaled production AI system to ~20,000 req/hr impacting 300k customers daily using Kubernetes GKE and Vertex AI
  • Performed complex data ingestion and preprocessing on GCP BigQuery and BigTable, including chunking and embedding strategies
  • Worked with Airflow, K8s, Jenkins CI/CD, LLM Evaluation framework, feature flags, and AI safety guardrails

Data Engineer Intern

AEG Entertainment Group, Los Angeles, CA

February 2024 - May 2024

  • Built complex Azure Data Factory pipelines for large data processing
  • Implemented PySpark in Databricks for batch API processing
  • Handled Dynamics CRM data migration via OData REST APIs
  • Managed Parquet/Avro for streaming and financial data reporting

Analytics Engineering Intern

CVS Health, New York City, NY

May 2023 - August 2023

  • Optimized Hadoop HDFS big data pipelines, processing 100M+ rows with HiveQL
  • Reduced latency by 40% through optimization techniques
  • Designed scalable ETL workflows and customer campaign strategies
  • Leveraged Tableau, advanced OLAP, data blending, and predictive modeling

AI Software Engineer Intern

AlphaICs Corporation, India

January 2022 - July 2022

  • Developed deep learning applications for Object Detection and Visual Attention
  • Implemented Recommendation Systems using Matrix Factorization and Collaborative Filtering
  • Optimized AI Inference on custom AI processor

AI Research Intern

Samsung R&D Institute, India

November 2020 - June 2021

  • Led team of 3 in SOTA Generative AI PyTorch research
  • Implemented Conditional GAN with spatially adaptive normalization for Image manipulation
  • Performed Semantic Segmentation with DeepLab-V2 to curate 20k-image dataset
  • Trained generative models on Nvidia DGX cluster

Education

Master of Science in Applied Data Science

University of Southern California, Los Angeles

August 2022 - May 2024

GPA: 3.7/4.0

Coursework: Machine Learning for Data Science and AI, Applications of Data Mining, Predictive Analytics, Fairness, Security and Privacy in AI

Bachelor of Engineering in Computer Science

KLE Technological University, India

August 2018 - June 2022

GPA: 3.95/4.0

Coursework: Data Structures and Algorithms, Data Mining, Machine Learning, Distributed & High-Performance Computing, Cloud Computing

Projects & Certifications

Parallelism for LLM Inference on GPUs

  • Deployed and served BERT Transformer on RTX GPU using PyTorch, ONNX, and NVIDIA TensorRT runtimes
  • Profiled performance bottlenecks using NVIDIA Nsight
  • Built CUDA Kernel for optimized inference on custom NN
  • Deployed async TinyLlama using Ray Serve and FastAPI
  • Implemented FSDP/DDP training simulations using Ray/DeepSpeed

RAG-based LLM with Guardrails

  • Built RAG-based chatbot using LangChain and Llama 2
  • Integrated Pinecone VectorDB and HuggingFace Sentence Transformer
  • Implemented NVIDIA NeMo Guardrails and LoRA LLM finetuning
  • Developed e2e pipeline for LoRa finetuning and quantization
  • Improved inference using NVIDIA Triton Inference Server

Research: Distributed Deep Learning

  • Published paper on "Performance Analysis of Distributed Deep Learning using Horovod on Image Classification" at IEEE ICICCS
  • Developed ML privacy-preserving techniques for Federated Learning systems
  • Implemented anomaly detection for adversary client nodes
  • Applied Homomorphic Encryption in distributed learning
View Publication

Certifications

NVIDIA CUDA Computing

View Certificate

Juniper Networks Certified Associate, Junos (JNCIA-Junos)

View Certificate

Collection of Good Reads 📚

Understanding and Coding the KV Cache in LLMs from Scratch

A comprehensive guide to implementing KV caches for efficient LLM inference, covering the fundamental concepts and practical code implementation.

Read Article

Understanding LLM System with 3-layer Abstraction

An in-depth exploration of LLM system architecture using a three-layer abstraction model for better understanding of large language model systems.

Read Article

Contact Me