I'm a student from Czechia who is passionate about math, statistics, and Data Science. This repo holds some of my work that showcases how I go around data analysis and coding in general. If you wish to seek more about me, feel free to visit my website.
- Tool: RMarkdown
- Packages: readr, dplyr, forcats, skimr, ggplot2, glue, stringr, tidytext
- Output: Written analysis
Pet Box Subscription analysis is a descriptive analysis of a pet store, which was done for my Data Analyst Associate certification. This analysis aims to identify pet owners who could purchase stuff every month (food, toys, medical supplies...). The data is read with readr
and wrangled with dplyr
. As most characteristics are factors, I heavily relied on forcats
to simplify my work. Data visualization is done with ggplot2
and skimr
. When working with text, I applied glue
for string interpolation and stringr
for text manipulation. For advanced graphs, I used tidytext
's facet functions.
My final submission consisted of a written report for Data Scientists at Data Camp, who reviewed my proposal and reviewed that the analysis meets current industry standards. You can view it in my DataCamp workspace.
- Tool: RMarkdown
- Packages: dplyr, tidyr, ggplot2, patchwork, gtsummary
- Output: Oral presentation with PowerPoint slides
My second Data Analyst certificate was achieved with my analysis on a made-up insurance company. This analysis mainly aims to identify which customers are buying insurance and what their characteristics are. Coding and data interpretation is done in R Markdown
. The data is wrangled and transformed using dplyr
and tidyr
. Data visualization is put together using ggplot2
and patchwork
. The final tables are beautified with gtsummary
.
The analysis was presented orally to Data Scientist from DataCamp, who reviewed my presentation and verbal communication. My video presentation is not available; however, the PowerPoint presentation can be downloaded from my Github repo.
- Tool: DataCamp Notebook
- Packages: readr, dplyr, glue, ggplot2, tidymodels
- Output: Written submission
To recieve the Data Scientist Associate certification, I created a report that first reads (readr
) and wrangles (dplyr
, glue
) data about a made-up fitness center. After set domain restrictions are validated and applied, data is explored using ggplot2
. To predict the number of people in a fitness class, I used various packages from the Tidymodels
family.
First model created uses Ridge regression from the glmnet
package. Alpha was validated using 10-cross validation. The second model uses Random forest to predict the number of customers. Parameters were tune()'d using 10-cross validation. The final submission can be seen on my DataCamp workspace.