Portfolio

Jan Stüwe
Data Scientist

As a Data Scientist with a strong foundation in mathematics and statistics, I specialize in developing interpretable models and applying rigorous analytical methods to uncover meaningful insights from complex data. I am passionate about bridging theoretical knowledge with practical applications, delivering impactful solutions to real-world problems.
Download My Resume

picture

Work Experience

Internship – Data Science

BMW AG, München

02/2025 – 04/2025

I worked on clustering and statistical analysis of automotive data, utilizing partitioning, hierarchical, and density-based clustering techniques to extract meaningful insights. My focus was on evaluating the models' performance using both internal and external metrics to ensure robustness and accuracy. To ensure data representativeness, I applied a stratified sampling strategy, which helped maintain the integrity of the dataset. Additionally, I conducted rigorous descriptive and inferential statistical analysis to validate the results and ensure their statistical reliability, providing confidence in the conclusions drawn from the data.

(Some) Projects

Fake News Classification

In my second semester, I worked on a project focused on classifying news as fake or true of this Kaggle Fake and Real News Dataset, utilizing both classical machine learning classification methods and modern deep learning approaches. The project required extensive data preprocessing, feature engineering, and careful model evaluation to ensure accuracy and robustness. By applying a combination of traditional techniques and advanced neural networks, I was able to achieve highly reliable results. The final model achieved an accuracy of approximately 99%, demonstrating strong generalization across both classes. The project was graded with the highest possible score of 1.0.

Github

Time-to-sell Prediction together with Audi

As part of a team of four, I worked on a project for Audi focused on predicting the time it takes to sell cars. We developed a deep learning model that accurately predicted the timespan, achieving a margin of error of just 9 days, significantly outperforming the baseline of approximately 20 days. To ensure the predictions were interpretable, we also utilized a Generalized Additive Model (GAM), which allowed us to gain valuable insights into the key factors influencing the sales times. The project placed a strong emphasis on handling high-dimensional data and performing comprehensive feature preprocessing to ensure optimal model performance. Our team presented our findings multiple times at Audi, demonstrating the model’s effectiveness and impact. The project was graded with a top score of 1.0.

Education

Education Image

Bachelor of Science, Data Science

Catholic University Eichstätt-Ingolstadt

10/22 - now

  • Specialization in Applied Mathematics and Scientific Computing
  • Skills: Python, R, Matlab, Git
  • Focus on Mathematical Aspects of Data Science
  • Current GPA: 1.3 (on German scale; 1.0 = best)

Skills

Python

Python

Matlab

Matlab

R

R

Git

Git

Matplotlib

Matplotlib

Scikit-Learn

Scikit-Learn

Tensorflow

Tensorflow

Keras

Keras

Pytorch

Pytorch

Numpy

Numpy

Pandas

Pandas

Pytest

Pytest

SQL

MySQL

HTML

HTML

CSS

CSS

GitHub

AWS

Linux

Linux

Tex

Tex

Socials

Contact Me