← Back to list
Registration: 18.02.2025

Selim Incekara

Specialization: Data Scientist
— Data Scientist with expertise in machine learning, data analysis, and model development using tools like TensorFlow and Scikit-Learn. Through hands-on experience in industry and academic projects, I have tackled real-world challenges in price prediction, classification, and image recognition. — At ESCON Energy, I contributed to mobile app development and data analysis, enhancing machine tracking efficiency and optimizing data-driven decision-making for engineers. At Odeon Technology, I worked on costprofit analysis, customer segmentation, and classification tasks, applying end-to-end data science workflows from data preprocessing to model optimization. — With a strong problem-solving mindset, I continuously expand my expertise through self-driven learning and practical applications. Now, I am eager to bring my analytical skills and technical proficiency to a dynamicteam, driving data-driven solutions and business impact.
— Data Scientist with expertise in machine learning, data analysis, and model development using tools like TensorFlow and Scikit-Learn. Through hands-on experience in industry and academic projects, I have tackled real-world challenges in price prediction, classification, and image recognition. — At ESCON Energy, I contributed to mobile app development and data analysis, enhancing machine tracking efficiency and optimizing data-driven decision-making for engineers. At Odeon Technology, I worked on costprofit analysis, customer segmentation, and classification tasks, applying end-to-end data science workflows from data preprocessing to model optimization. — With a strong problem-solving mindset, I continuously expand my expertise through self-driven learning and practical applications. Now, I am eager to bring my analytical skills and technical proficiency to a dynamicteam, driving data-driven solutions and business impact.

Portfolio

Datathon 2024 – Entrepreneurship Foundation Score Prediction

• Developed a machine learning model to predict evaluation scores for candidates applying to the Entrepreneurship Foundation using historical data (2014-2023). • Processed a dataset containing over 10,000 applications, performing data cleaning, outlier detection, and feature engineering to enhance model performance. • Implemented and compared multiple algorithms (Linear Regression, Random Forest, XGBoost, LightGBM), with LightGBM achieving the highest score. • Obtained a Cappa evaluation metric of 7.6 and an accuracy of 84.2%, outperforming baseline models. • Conducted SHAP analysis to interpret feature importance, ensuring model transparency and fairness.

Digit Recognizer

• Developed a Convolutional Neural Network (CNN) model using the MNIST dataset (60,000 training images, 10,000 test images) to recognize handwritten digits. • Achieved 93.7% accuracy by optimizing the model through data augmentation, dropout regularization, and hyperparameter tuning. • Improved model robustness by implementing batch normalization and adaptive learning rate schedules. • Visualized predictions and misclassified images using Matplotlib and Seaborn, gaining deeper insights into model performance.

IMDB 50K Movie Reviews – Sentiment Analysis

• Developed a Natural Language Processing (NLP) model to classify movie reviews as positive or negative using the IMDB 50K dataset (25,000 training, 25,000 test samples). • Preprocessed text data with Tokenization, Lemmatization, and Stop Word Removal, improving model interpretability and reducing noise. • Converted text data into numerical representations using TF-IDF and Word2Vec embeddings, enhancing feature extraction. • Trained and compared Logistic Regression, Naive Bayes, and Support Vector Machines (SVM) models, identifying SVM as the best-performing classifier. • Evaluated model performance using Accuracy (86.4%), F1-score, Precision, and Recall, ensuring a balanced classification approach. • Visualized sentiment distribution and misclassified samples with Matplotlib and Seaborn, enhancing model interpretability.

Adult Census Income – Income Prediction Model

• Data Preprocessing & Feature Engineering o Analyzed the US Census Bureau Adult dataset containing 32,561 samples and 15 features to predict whether individuals earn over $50K per year. o Identified significant features such as age, education level, occupation, and marital status through correlation analysis and chi-square tests for categorical variables. o Handled missing values using mean imputation for numerical features and mode imputation for categorical data, ensuring no data loss. o Coded categorical variables using One-Hot Encoding and Label Encoding to make them compatible with machine learning algorithms. • Model Development & Optimization o Implemented and compared Logistic Regression, Decision Tree, and Random Forest algorithms, achieving over 85% accuracy with Random Forest as the best performer. o Tuned hyperparameters using GridSearchCV and RandomizedSearchCV for model optimization, leading to improved precision, recall, and F1-score. o Applied data normalization techniques (Min-Max Scaling and Standardization) to ensure model stability and avoid bias due to different feature scales. • Performance Evaluation & Results o The final model achieved an 89% accuracy, successfully predicting income classes with high precision and recall scores. o Evaluated model performance using confusion matrix, precision-recall curve, and ROC-AUC, ensuring well-balanced performance across all income classes. o Delivered comprehensive results in a detailed report, providing insights into the socioeconomic factors influencing income levels.

Heart Failure Prediction – Advanced Machine Learning Model

• Exploratory Data Analysis (EDA) & Feature Engineering o Analyzed Heart Failure Clinical Records dataset (299 samples, 13 clinical features) to understand correlations between features and target variable o Identified key features affecting patient survival, such as ejection fraction, serum creatinine, and age, using Pearson correlation and feature importance scores. o Visualized the data using box plots, pair plots, and correlation heatmaps to detect patterns, class imbalance, and multicollinearity. o Handled missing values using imputation techniques and engineered new features like BMI categories and risk levels to enhance model performance. • Outlier Detection & Data Preprocessing o Detected and treated outliers in numerical features using IQR (Interquartile Range) method and Z-score analysis, reducing model bias. o Categorical variables were encoded using One-Hot Encoding (OHE) and Label Encoding for algorithm compatibility. o Scaled numerical features using Min-Max Scaling and Standardization, ensuring optimal performance across multiple models. • Model Development & Hyperparameter Tuning o Trained and compared Logistic Regression, Random Forest, and XGBoost to identify the most accurate classifier. o XGBoost outperformed other models with 92% accuracy, optimized using GridSearchCV and RandomizedSearchCV for hyperparameter tuning. o Evaluated model robustness using cross-validation and fine-tuned parameters such as learning rate, number of estimators, and max depth for XGBoost. • Model Fairness & Explainability o Assessed fairness across different demographic groups (age, gender, smoking status, etc.) to ensure unbiased predictions. o Applied SHAP (SHapley Additive Explanations) and LIME (Local Interpretable Model-agnostic Explanations) to interpret feature contributions in model decisions. o Calibrated model probabilities using Platt Scaling and Isotonic Regression, improving predicted risk estimations for patients. • Performance Evaluation & Results

SMS Spam Detection – NLP Classification

• Developed a spam classification model to filter malicious SMS messages using the SMS Spam Collection Dataset (5,574 messages). • Preprocessed text with Tokenization, Lemmatization, and Stop Word Removal, improving text clarity and standardization. • Converted SMS messages into numerical format using TF-IDF and Bag of Words (BoW) techniques, optimizing feature representation. • Built models using Logistic Regression, Naive Bayes, and Support Vector Machines (SVM), achieving the highest accuracy with Naive Bayes. • Evaluated model performance using Accuracy (92.1%), Precision, Recall, and F1-score, ensuring high spam detection reliability. • Generated word cloud visualizations and analyzed misclassified messages, improving feature engineering strategies

Skills

Power BI
Tableau
Matplotlib
Seaborn
Keras
Artificial Intelligence
Data Modeling
Data Cleansing
Tensorflow
PostgreSQL

Work experience

Data Scientist
07.2024 - 08.2024 |Escon Energy
Data Modeling, Machine Learning
● Gained hands-on experience in data analysis, predictive modeling, and mobile application development, contributing to key projects in industrial equipment management. ● Data Processing & Filtering: Cleaned and filtered machine performance data, applying EDA techniques to detect anomalies and trends for predictive maintenance. ● Mobile App Development & IoT Integration: Developed a real time machine tracking application, integrating GPS-based location tracking to enhance asset management and reduce equipment loss. ● Automation & Documentation: Optimized reporting templates in Microsoft Word, improving documentation efficiency and reducing manual workload. ● These projects strengthened my technical expertise in data driven problem solving, predictive modeling, and real-world application development.
Data Scientist
07.2023 - 10.2023 |Odeon Technology
Data Modeling, Machine Learning
● Gained hands-on experience in cost-profit analysis, customer segmentation, and classification modeling, applying data-driven strategies to optimize business decisions. Managed end-to-end data science workflows, including: ● Exploratory Data Analysis (EDA): Performed data cleaning, handled missing values, and conducted outlier detection to ensure data integrity. ● Data Visualization: Leveraged Matplotlib, Seaborn, and Power BI to generate insightful visualizations, identifying key patterns and business trends. ● Feature Engineering & Selection: Applied techniques like One Hot Encoding, Label Encoding, and Principal Component Analysis (PCA) to enhance model performance. ● Model Development & Optimization: Built and fine-tuned machine learning models using Logistic Regression, Random Forest, XGBoost, and LightGBM, optimizing hyperparameters with GridSearchCV and RandomizedSearchCV. ● Model Interpretability & Explainability: Utilized SHAP and LIME to enhance transparency, ensuring business stakeholders understood model predictions and decision-making processes. ● Tackled complex challenges by leveraging platforms like GitHub and Kaggle, continuously improving analytical and problem-solving skills. ● Active team collaboration and problem-solving were essential for both personal and professional growth.

Educational background

Coumputer Engineering (Bachelor’s Degree)
2019 - 2023
Istanbul Gelisim University

Languages

EnglishIntermediate