Selim Incekara

Registration: 18.02.2025

Specialization: Data Scientist

— Data Scientist with expertise in machine learning, data analysis, and model development using tools like TensorFlow and Scikit-Learn. Through hands-on experience in industry and academic projects, I have tackled real-world challenges in price prediction, classification, and image recognition. — At ESCON Energy, I contributed to mobile app development and data analysis, enhancing machine tracking efficiency and optimizing data-driven decision-making for engineers. At Odeon Technology, I worked on costprofit analysis, customer segmentation, and classification tasks, applying end-to-end data science workflows from data preprocessing to model optimization. — With a strong problem-solving mindset, I continuously expand my expertise through self-driven learning and practical applications. Now, I am eager to bring my analytical skills and technical proficiency to a dynamicteam, driving data-driven solutions and business impact.

Portfolio

Datathon 2024 – Entrepreneurship Foundation Score Prediction

• Developed a machine learning model to predict evaluation scores for candidates applying to the Entrepreneurship Foundation using historical data (2014-2023). • Processed a dataset containing over 10,000 applications, performing data cleaning, outlier detection, and feature engineering to enhance model performance. • Implemented and compared multiple algorithms (Linear Regression, Random Forest, XGBoost, LightGBM), with LightGBM achieving the highest score. • Obtained a Cappa evaluation metric of 7.6 and an accuracy of 84.2%, outperforming baseline models. • Conducted SHAP analysis to interpret feature importance, ensuring model transparency and fairness.

SMS Spam Detection – NLP Classification

• Developed a spam classification model to filter malicious SMS messages using the SMS Spam Collection Dataset (5,574 messages). • Preprocessed text with Tokenization, Lemmatization, and Stop Word Removal, improving text clarity and standardization. • Converted SMS messages into numerical format using TF-IDF and Bag of Words (BoW) techniques, optimizing feature representation. • Built models using Logistic Regression, Naive Bayes, and Support Vector Machines (SVM), achieving the highest accuracy with Naive Bayes. • Evaluated model performance using Accuracy (92.1%), Precision, Recall, and F1-score, ensuring high spam detection reliability. • Generated word cloud visualizations and analyzed misclassified messages, improving feature engineering strategies

Digit Recognizer

• Developed a Convolutional Neural Network (CNN) model using the MNIST dataset (60,000 training images, 10,000 test images) to recognize handwritten digits. • Achieved 93.7% accuracy by optimizing the model through data augmentation, dropout regularization, and hyperparameter tuning. • Improved model robustness by implementing batch normalization and adaptive learning rate schedules. • Visualized predictions and misclassified images using Matplotlib and Seaborn, gaining deeper insights into model performance.

IMDB 50K Movie Reviews – Sentiment Analysis

• Developed a Natural Language Processing (NLP) model to classify movie reviews as positive or negative using the IMDB 50K dataset (25,000 training, 25,000 test samples). • Preprocessed text data with Tokenization, Lemmatization, and Stop Word Removal, improving model interpretability and reducing noise. • Converted text data into numerical representations using TF-IDF and Word2Vec embeddings, enhancing feature extraction. • Trained and compared Logistic Regression, Naive Bayes, and Support Vector Machines (SVM) models, identifying SVM as the best-performing classifier. • Evaluated model performance using Accuracy (86.4%), F1-score, Precision, and Recall, ensuring a balanced classification approach. • Visualized sentiment distribution and misclassified samples with Matplotlib and Seaborn, enhancing model interpretability.

Adult Census Income – Income Prediction Model

• Data Preprocessing & Feature Engineering o Analyzed the US Census Bureau Adult dataset containing 32,561 samples and 15 features to predict whether individuals earn over $50K per year. o Identified significant features such as age, education level, occupation, and marital status through correlation analysis and chi-square tests for categorical variables. o Handled missing values using mean imputation for numerical features and mode imputation for categorical data, ensuring no data loss. o Coded categorical variables using One-Hot Encoding and Label Encoding to make them compatible with machine learning algorithms. • Model Development & Optimization o Implemented and compared Logistic Regression, Decision Tree, and Random Forest algorithms, achieving over 85% accuracy with Random Forest as the best performer. o Tuned hyperparameters using GridSearchCV and RandomizedSearchCV for model optimization, leading to improved precision, recall, and F1-score. o Applied data normalization techniques (Min-Max Scaling and Standardization) to ensure model stability and avoid bias due to different feature scales. • Performance Evaluation & Results o The final model achieved an 89% accuracy, successfully predicting income classes with high precision and recall scores. o Evaluated model performance using confusion matrix, precision-recall curve, and ROC-AUC, ensuring well-balanced performance across all income classes. o Delivered comprehensive results in a detailed report, providing insights into the socioeconomic factors influencing income levels.

Heart Failure Prediction – Advanced Machine Learning Model

• Exploratory Data Analysis (EDA) & Feature Engineering o Analyzed Heart Failure Clinical Records dataset (299 samples, 13 clinical features) to understand correlations between features and target variable o Identified key features affecting patient survival, such as ejection fraction, serum creatinine, and age, using Pearson correlation and feature importance scores. o Visualized the data using box plots, pair plots, and correlation heatmaps to detect patterns, class imbalance, and multicollinearity. o Handled missing values using imputation techniques and engineered new features like BMI categories and risk levels to enhance model performance. • Outlier Detection & Data Preprocessing o Detected and treated outliers in numerical features using IQR (Interquartile Range) method and Z-score analysis, reducing model bias. o Categorical variables were encoded using One-Hot Encoding (OHE) and Label Encoding for algorithm compatibility. o Scaled numerical features using Min-Max Scaling and Standardization, ensuring optimal performance across multiple models. • Model Development & Hyperparameter Tuning o Trained and compared Logistic Regression, Random Forest, and XGBoost to identify the most accurate classifier. o XGBoost outperformed other models with 92% accuracy, optimized using GridSearchCV and RandomizedSearchCV for hyperparameter tuning. o Evaluated model robustness using cross-validation and fine-tuned parameters such as learning rate, number of estimators, and max depth for XGBoost. • Model Fairness & Explainability o Assessed fairness across different demographic groups (age, gender, smoking status, etc.) to ensure unbiased predictions. o Applied SHAP (SHapley Additive Explanations) and LIME (Local Interpretable Model-agnostic Explanations) to interpret feature contributions in model decisions. o Calibrated model probabilities using Platt Scaling and Isotonic Regression, improving predicted risk estimations for patients. • Performance Evaluation & Results

Skills

Power BI

Tableau

Matplotlib

Seaborn

Keras

Artificial Intelligence

Data Modeling

Data Cleansing

Tensorflow

PostgreSQL

Work experience

Data Scientist

07.2024 - 08.2024 |Escon Energy

Data Modeling, Machine Learning

● Gained hands-on experience in data analysis, predictive modeling, and mobile application development, contributing to key projects in industrial equipment management. ● Data Processing & Filtering: Cleaned and filtered machine performance data, applying EDA techniques to detect anomalies and trends for predictive maintenance. ● Mobile App Development & IoT Integration: Developed a real time machine tracking application, integrating GPS-based location tracking to enhance asset management and reduce equipment loss. ● Automation & Documentation: Optimized reporting templates in Microsoft Word, improving documentation efficiency and reducing manual workload. ● These projects strengthened my technical expertise in data driven problem solving, predictive modeling, and real-world application development.

Data Scientist

07.2023 - 10.2023 |Odeon Technology

Data Modeling, Machine Learning

● Gained hands-on experience in cost-profit analysis, customer segmentation, and classification modeling, applying data-driven strategies to optimize business decisions. Managed end-to-end data science workflows, including: ● Exploratory Data Analysis (EDA): Performed data cleaning, handled missing values, and conducted outlier detection to ensure data integrity. ● Data Visualization: Leveraged Matplotlib, Seaborn, and Power BI to generate insightful visualizations, identifying key patterns and business trends. ● Feature Engineering & Selection: Applied techniques like One Hot Encoding, Label Encoding, and Principal Component Analysis (PCA) to enhance model performance. ● Model Development & Optimization: Built and fine-tuned machine learning models using Logistic Regression, Random Forest, XGBoost, and LightGBM, optimizing hyperparameters with GridSearchCV and RandomizedSearchCV. ● Model Interpretability & Explainability: Utilized SHAP and LIME to enhance transparency, ensuring business stakeholders understood model predictions and decision-making processes. ● Tackled complex challenges by leveraging platforms like GitHub and Kaggle, continuously improving analytical and problem-solving skills. ● Active team collaboration and problem-solving were essential for both personal and professional growth.

Educational background

Coumputer Engineering (Bachelor’s Degree)

2019 - 2023

Istanbul Gelisim University

Languages

EnglishIntermediate