Hello World

:)

I'm Ayush Pandey

Software Developer & Data Science Enthusiast

About Me

I am a former Full Stack Software Engineer. I’ve led development of some components for the analytics and machine learning pipeline. I have also worked on building portals and UI. Some of the programming languages that I am comfortable in are Java, Python, JavaScript, SQL. I am also familiar with AWS technologies such as EC2, SQS, Lambda.Analytical professional with hands-on experience in software design and development. Skilled in managing full life cycle of software development processes, including requirement engineering, design, coding, testing, debugging, and maintenance. Proficient in Applied Algorithms, Artificial Intelligence, Cloud computing, Big Data, Advanced Database Concepts, Applied Distributed Systems, and Machine learning techniques, such as regularization, feature engineering, principal component analysis, and model evaluation.

Resume

Here is my education, work experience, & some skills I've got.

Experience

Apache Beam

Google Summer of Code Contributor

May 2024 - Present

Spearheaded the development of Apache Beam pipelines for building out Retrieval-Augmented Generation (RAG) applications in collaboration with Google Cloud’s Beam Team targeted at increasing user acquisition. Implemented two key pipelines: one for ingesting and indexing text corpora into a vector database, and another for semantic search, enriching user queries with relevant text chunks from the knowledge base. Demonstrated proficiency in MLTransform for preprocessing data, generating embeddings, and enhancing data with Beam's Enrichment transform, contributing to the advancement of semantic search-based applications in the ML community.

Apache Cloudstack

Google Summer of Code Contributor

May 2023 - September 2023

Worked on Apache Cloudstack (IAAS at the intersection of cloud computing, Virtualization and DevOps) for extending export/import unmanaged instances for KVM Hypervisor. Work involved understanding VMWare specifics and then adding functionality in the management server and KVM Plugin to facilitate API+UI support for listing, unmanaging and importing external VMs.

Sopra Steria

Senior Software Engineer

December 2020 - August 2022

Led team of 4 engineers for full stack development with Jersey based web APIs, ReactJS frontend and Cron/Apache Spark jobs for log analytics. Migrated old transaction warehouse from SQL to AWS S3.

Tech Mahindra

Software Engineer

April 2018- December 2020

Worked on full stack applications using several technologies such as React, J2EE based Struts, Hibernate, Spring MVC and Maven for several projects with direct client collaboration.

Indian Institute of Sciences, Bangalore, India

Summer Intern

May 2016 - July 2016

Worked on generating simulation techniques and code in C++ for experimentation on carburetors. Cross collaboration required multi-disciplinary effort to run simulation on IISC’s HPC system.

Education

University of Colorado, Boulder

MS in Data Science

August 2022 - August 2024

Enrolled in ML and Big-Data focused courses as part of my curriculum to expand my knowledge in these fields. Working as a Course Facilitator for Online Coursera Data Science Program since August 2022 , I help manage and tutor various courses with over 100 students in the MSDS program, currently focused on Distributed computing, Big Data Architecture, Machine Learning, Probability theory, Generalized Linear Models and Data Mining.

Dr. A. P. J. Abdul Kalam Technical University

Bachelor of Technology

August 2013 - May 2017

Completed my bachelors in Mechanical and Computer Engineering where I explored various domains in Software Engineering and Electronics Engineering.

Skills & Expertise

Languages and Operating Systems

  • Python
  • Java
  • JavaScript
  • Linux
  • Ubuntu
  • Windows

Database Technologies

  • MySQL
  • PostgreSQL
  • MongoDB
  • SQLLite
  • Redis
  • Oracle

Web Development

  • HTML5
  • CSS3
  • React
  • NodeJS

Dev Ops

  • Heroku
  • AWS
  • Google Cloud Platform

Visualization

  • Seaborn
  • Matplotlib
  • Plotly
  • WordCloud
  • D3.js
  • Tableau
  • PowerBI

Tools and Frameworks

  • NumPy
  • Pandas
  • PyTorch
  • Keras
  • Scikit Learn
  • OpenCV

Selected Projects

Selected Projects I Have Worked On.

GSoC 2024 Project Link

Implement Retrival Augmented Generation(RAG) Pipeline in Beam

Spearheaded the development of Apache Beam pipelines for building out Retrieval-Augmented Generation (RAG) applications in collaboration with Google Cloud’s Beam Team targeted at increasing user acquisition. Implemented two key pipelines: one for ingesting and indexing text corpora into a vector database, and another for semantic search, enriching user queries with relevant text chunks from the knowledge base. Demonstrated proficiency in MLTransform for preprocessing data, generating embeddings, and enhancing data with Beam's Enrichment transform, contributing to the advancement of semantic search-based applications in the ML community.

Project Link
GSoC 2023 Project Link

Import-Export KVM Hypervisor

Worked on this project as part of Google Summer of Code 2023 Contributor.Worked on Apache Cloudstack (IAAS at the intersection of cloud computing, Virtualization and DevOps) for extending export/import unmanaged instances for KVM Hypervisor. Work involved understanding VMWare specifics and then adding functionality in the management server and KVM Plugin to facilitate API+UI support for listing, unmanaging and importing external VMs.

Project Link
Question answering using RAG on class notes Project Link

NotesBot

The Q-A Bot project aims to create an advanced Question-Answering (QA) system using Retrieval-Augmented Generation (RAG) techniques. This system is designed to answer user queries accurately by leveraging external knowledge bases.

Project Link
SafeMap Project Link

SafeMap

Recommend safest and fastest route

Project Link
Netflix Dashboard Project Link

Netflix Dashboard in Tableau

Netflix is one of the most popular media and video streaming platforms and this Visualization is helpful elpful in Data Driven Decision making in Entertainment Industry and understanding viewer preferences and behaviors to improve content recommendations and enhance user experience.

Project Link
Sequence Classification in NLP Project Link

BrainTeaser

The BRAINTEASER task centers around examining elements of speech that are not immediately apparent. The input is a question and a list of choices. Notable about the question is how it takes advantage of unconscious assumptions and irrelevant information. The dataset is in English, and provided by the task organizers. For the brainteaser task, there are two subtasks: sentence puzzle and word puzzle. Sentence puzzles focus on unstated, typical, and wrong assumptions about the question/answer, while word puzzles defy common sense since they hinge on a word's actual alphabetical makeup rather than its actual denotation. We chose to focus exclusively on the sentence puzzle. We made use of BERT, XLNet, BART,T5, yoso and Roberta models specifically the base HuggingFace implementations of each models.

Project Link
Multilabel Classification Project Link

Multilabel Toxicity Classification Challenge

In this Project I experimented with various NLP based classifiers and used Softmax thresholds to identify one of many toxic attributes in text (toxic, severe_toxic, obscene, threat, insult, identity_hate). I Built various models to benchmark on Vanilla RNN, LSTM, GRU and DistilBERT. Also attempted fine tuning OpenAI APIs based GPT-3 models. Achieved best results of 98.3% using GPT-3 (Ada base) fine tuned model.

Project Link
Sentiment Analysis Project Link

Tweets Sentiment Analysis on Abortion Rights

Tweets Sentiment Analysis on Abortion Rights

Regression Analysis Project Link

Regression Analysis of Greenhouse gases emmission

Monitoring the greenhouse gas emissions can be a time and resource consuming task for a lot of the industries. Both periodic measurements and continuous emission monitoring systems can be cost intensive as they would require constant monitoring of the sensors used to measure the gas emissions. Research was published by the Turkish journal of Electrical and Computer Science department to tackle this issue. The research proposes a predictive emission monitoring system (PEM) that predicts the emissions of CO and NOx using calibrated equipment established on site. The researchers set a benchmark for the predictive models built and this project is an attempt to apply statistical methods for the data.