Data Analytics & BI Engineering – LatAm E-commerce Platform:
● Lead of a full ETL pipeline to analyze three years of sales and logistics data.
● Integrated millions of fragmented records from APIs and flat files using Airflow, Pandas, and SQL.
● Enabled real-time KPI dashboards and actionable insights on revenue and customer behavior, significantly improving BI reporting and decision-making.
Credit Risk Modeling & Deployment – Fintech:
● Built a supervised learning pipeline on 300K+ records for loan default prediction.
● Tackled imbalanced and noisy data through advanced preprocessing and feature engineering.
● Tuned ensemble models (Random Forest, XGBoost) to achieve ROC AUC > 0.7.
● Deployed the model via API, reinforcing knowledge in credit scoring and ML systems integration.
MLOps for Computer Vision:
● Deployed a real-time image classification system (1,000+ classes) using TensorFlow, FastAPI, Redis,
and Docker.
● Reduced latency by 50% and exposed the model through a Streamlit web UI.
● Ensured system robustness via unit and load testing with Locust, deepening skills in MLOps and scalable ML microservices.
Automated Vehicle Recognition:
● Trained a CNN to classify 25 vehicle types from images with over 80% accuracy.
● Managed GPU-based training pipelines in Colab, overcoming challenges in data quality and model generalization.
● Gained hands-on experience in multi-class classification and computer vision workflows.
Sentiment Analysis with NLP & Deep Learning:
● Built a sentiment classifier using TF-IDF, Word2Vec, RNNs, and BERT on 50K movie reviews.
● Focused on preprocessing, feature engineering, and robust evaluation (confusion matrix, F1-score).
● Developed a modular pipeline applicable to real-world NLP tasks.
Conversational AI Agent Development:
● Developed a customer support agent using LangChain and lightweight LLMs (Mistral via Ollama).
● Designed prompt templates and integrated tools for real-time queries on product and purchase status via DNI or order code.
● Used LangGraph memory to handle session context, advancing skills in LLM orchestration and prompt engineering.
NYC Taxi Trip Predictor:
● Created a predictive pipeline to estimate taxi fare and trip duration using NYC TLC data (3.5 M+
records).
● Applied geospatial feature engineering and regression models, improving transparency for users and operational planning.
● Strengthened capabilities in real-time ML pipelines and data wrangling.
Personal Projects:
● Diabetes Predictor: Developed and deployed a diabetes prediction model using Scikit-learn, Streamlit, and FastAPI, with full EDA, feature engineering, and classification pipeline.
Projects in Production:
● Hospitalization Risk Prediction (2025): ML model on MHAS dataset.
● Antimicrobial Peptide Discovery (2025): Deep learning on curated AMP datasets.