CV
Education
Thapar Institute of Engineering & Technology
- B.E. in Computer Engineering, Jul 2015 - Jun 2019
Virginia Tech
- M.S. in Computer Science (Thesis Track), Jan 2021 - Dec 2022
Work experience
Cadence Solutions
Data Analytics & ML Engineering Intern
May 2022 - Present
Automated data extraction from patients’ clinical reports, saving 50 work hours/week for clinicians
- Engineered the service to convert pdfs into plain text(OCR), redact protected health information(Named Entity Recognition) and extract data from unstructured text using Transformer based Neural Question Answering..
- Significantly improved operational efficiency, saving an estimated 50 work hours per week of clinician time since reports were originally manually perused.
- Accelerated run-time by implementing asynchronous batch processing, created session management cache using Redis, used S3 as intermediate data stores and surfaced final results to Snowflake.
Virginia Tech
Research Assistant
Aug 2021 - Dec 2022
A multi-lingual dataset to benchmark Generative AI for Code
- Curated 1 M data points in 7 programming languages and their natural language descriptions to support code translation, natural language to code generation, code summarization and natural language to code search(GitHub, HuggingFace).
- Benchmarked the performance of current state of the art Transformer models like T5 and BERT on all supported tasks within our novel, large and multilingual dataset.
Execution-based Code Generation using Deep Reinforcement Learning
- Developed evaluation framework for code translation quality to evaluate code compilability and executability.
- Conceptualized Reinforcement Learning(RL) algorithm in PyTorch, leveraging code compilation signal, syntax trees and data flow graphs as feedback; outperformed benchmark models by 13.4% in code translation.
Unilever
Project Lead
Jul 2019 - Jun 2020
- Led a team of 3 software developers to build ETL data pipelines enabling automated processing of financial documents (vendor invoices, shipment logs, tax documents, etc.) saving ∼ $1 M annually.
- Coordinated end-to-end project management – planning and leading sprints, identifying and removing bottlenecks, negotiating costs and balancing resource allocation.
Software Development Intern
Aug 2018 - Dec 2018
- Developed an end-to-end ETL pipeline for automated attendance verification using Microsoft Face API and deployed it to AWS Lambda which reduced verification time by 75% and cost by ∼ $50 K annually.
Skills
- Languages & Tools: Python, C++, SQL, PyTorch, Git, OpenCV, Pandas, AWS
- Courses: Data Structures & Algorithms, Advanced ML, Deep Learning, NLP, Data Analytics, Computer Vision, DBMS
Publications
Parshin Shojaee, Aneesh Jain, Sindhu Tipirneni and Chandan K. Reddy. (2023). "Execution-based Code Generation using Deep Reinforcement Learning." arXiv.
Ming Zhu, Aneesh Jain, Karthik Suresh, Roshan Ravindran, Sindhu Tipirneni, Chandan K. Reddy. (2022). "XLCoST: A Benchmark Dataset for Cross-lingual Code Intelligence." ICLR 2023 Workshop on DL4C.