← Back to list
Registration: 15.07.2025

Nikita Kandpal

Specialization: Data Analyst

Skills

Python
R
SQL
NoSQL
PySpark
PostgreSQL
Snowflake
BigQuery
ETL
Data Warehousing
Spark
Hadoop
Kafka
Databricks
TensorFlow
PyTorch
Scikit-learn
NumPy
Pandas
Keras
Deep Learning
NLP
EDA
AWS EC2
AWS Lambda
AWS S3
AWS RDS
AWS Redshift
AWS API Gateway
GCP
Microsoft Azure
Docker
Kubernetes
Tableau
Power BI
Matplotlib
Seaborn
Airflow
Terraform
Jenkins
Git
CI/CD
Agile
Microservices

Work experience

Data Analyst
since 08.2023 - Till the present day |IU Libraries
CSV, Json, XML, Python, Pandas, NumPy, SQLite, Excel
● Accelerated data migration of digital library records and unstructured data from CSV, Json and XML using Python (Pandas, NumPy, SQLite), migrating 40% more records in 75% less time than expected. ● Identified data patterns and trends; built Python pipelines for bulk imports, reducing manual effort by 25%. ● Applied rule-based validation and cleaning with Python and Excel to improve data accuracy and reduce cataloging errors.
Data Science Research Assistant
01.2025 - 05.2025 |Indiana University
LLM, BERT, RoBERTa, GPT
● Built an LLM-powered pipeline to scan and filter 10,000+ research papers, achieving 85% accuracy in surfacing high-quality studies using transformer models (BERT, RoBERTa, GPT), analogous to object detection in vision tasks. ● Fine-tuned models and optimized data ingestion workflows, boosting theme classification accuracy by 30% and reducing runtime by 40%, showcasing scalable deep learning for high-throughput NLP pipelines.
ML Engineer
05.2024 - 08.2024 |Shure Incorporated
Random Forest, XGBoost, LSTM, Python, Airflow, Tableau, Power App
● Developed ML models (Random Forest, XGBoost, LSTM) for supplier trend forecasting and risk classification, boosting accuracy by 20% and accelerating issue resolution by 35%. ● Built automated data pipelines using Python and Airflow to deliver clean, timely training data from drop tests and quality logs, improving pipeline reliability by 40%. ● Integrated ML outputs into interactive Tableau dashboards to visualize failure patterns and risk scores, enabling QA teams to prioritize high-impact supplier issues. ● Enhanced Power Apps-based Quality Lab system by integrating bulk upload and new test configurations, streamlining global lab processes and reducing manual data entry by 60%.
Data Analyst
09.2020 - 08.2023 |Standard Chartered Bank
PySpark, Scikit-learn, XGBoost, AWS, Airflow, Redshift, S3, Kafka, AWS Lambda, Tableau, CI/CD, Jenkins, Docker, Terraform
● Contributed to fraud detection and revenue forecasting models using PySpark, Scikit-learn, and XGBoost, improving signal accuracy by 20% and enabling data-driven financial planning. ● Managed scalable ML pipelines on AWS using Airflow, Redshift, S3, and containerized PySpark, ensuring reliable training and inference workflows. ● Supported real-time fraud detection via ML-driven anomaly pipelines using Kafka, AWS Lambda, and Tableau, improving risk visibility and reducing detection lag by 15%. ● Streamlined CI/CD workflows for ML model training and deployment using Jenkins, Docker, and Terraform, reducing release time by 65% and improving system reliability.
Data Scientist
05.2019 - 08.2019 |Rebel Foods
AdaBoost, CART, LDA, R
● Devised a predictive model for order prep time using kitchen load, staff availability, and order history, improving wait time estimates and boosting customer satisfaction by 80%. ● Enhanced kitchen energy efficiency by integrating AdaBoost, CART, and LDA to predict peak operational hours, cutting energy use by 40%, and deploying a real-time monitoring dashboard via Shiny in R.

Educational background

Data Science (Masters Degree)
Till 2025
Indiana University
Information Technology (Bachelor’s Degree)
Till 2020
SRM Institute of Science and Technology

Languages

EnglishProficient