Skip to content
View martinmarroyo's full-sized avatar

Block or report martinmarroyo

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
martinmarroyo/README.md

Hi, I'm Martin. Welcome to my Github

Data Engineer @Honda Research Institute/99P Labs. Data Analytics Instructor @COOP Careers. Curious human.

About Me

I am a data engineer and a data analytics instructor with experience across various industries, including higher education and automotive.

At Honda Research Institute/99P Labs, I develop streaming data pipelines using open-source technologies such as Apache Spark and Kafka. At COOP Careers, I teach fundamental skills in Excel, SQL, Tableau, and Python to aspiring data analysts and provide guidance to junior instructors. My previous experiences range from higher education to web development, content editing, and software quality assurance.

I am passionate about technology, education, and helping people who want to break into the tech industry by making instruction in the basics accessible and easy to digest. Check out some of the work that me and my friends have created to help spread free tech education at The Freestack Initiative.

Click here to learn a little more about my work.

Projects

Here are some projects that I'm particularly proud of (WIP = Work in Progress):

DataLab (WIP)

Datalab is a curated data analytics environment that helps you get hands-on practice with common industry tools. It is also extensible. This project was born out of a desire to practice the skills that I consistently saw in data and analytics engineering positions, as well as wanting to learn Docker. Think of it as a data laboratory in a box - using Docker, I created containers for a Python environment, a database server, an database administration tool, and a visualization tool. Using this library, you can avoid the headache of setting up your own database server and getting all these pieces interconnected and talking to one another. Or, if you're curious, you can dig into what I did and make it your own. This project is still under development.


teachdb (WIP)

teachdb is an in-memory micro relational database, powered by duckdb. It was made with two types of users in mind: instructors who want to teach SQL concepts, and students who want to learn and practice the fundamentals. Combined with a Jupyter Notebook, teachdb provides a database that can be used to demonstrate fundamental SQL concepts such as select queries, filtering, aggregations, and joins. It can even be used to introduce more advanced topics such as analytical/window functions, common table expressions (CTEs), data definition language (DDL) commands, etc.

For students, it provides a safe environment to learn and experiment with a SQL database without the need for setting up your their own server or downloading additional software.


I recently worked with COOP Careers to revise their introductory SQL curriculum for the Fall 2023 semester. The final deliverable includes approximately 6 hours worth of material that is designed to take learners with little to no experience with SQL and get them ready for technical interviews. There is a crash course on database theory, a short course on combining data in SQL, and three Jupyter Notebooks that include interactive SQL lessons utlizing my teachdb library.

By leveraging teachdb and Google Colab, we are able to set up a basic database environment within a notebook that can be used in the browser. This means that all students need to work with a real database and learn is an internet connection - no configuration required - which was really important for the COOP community.


This is one of my favorite projects because I was fortunate to be able to have a positive impact on a great organization while sharpening my own skills. In Fall 2021, I worked with a non-profit organization called Celebrate Dyslexia to analyze raw text data to help them create a curriculum for young children with dyslexia.

I created a web scraper to grab the data, used an NLP model to help me extract verbs, analyzed the results to find the most common verbs across each subject, and created a dashboard that would allow the team to interactively look through the results to help craft their curriculum.

Overall, the project was a great success and my work had a positive impact on the organization.

Pinned Loading

  1. DataLab DataLab Public template

    A curated, open source, data analytics laboratory

    Shell

  2. freestackinitiative/teachingdb freestackinitiative/teachingdb Public

    A free, in-memory database for facilitating hands-on, basic SQL instruction in a notebook environment

    Jupyter Notebook

  3. freestackinitiative/coop_sql_notebooks freestackinitiative/coop_sql_notebooks Public

    Notebooks used for COOP SQL Lessons

    Jupyter Notebook 4 1

  4. teks_pos_analysis teks_pos_analysis Public

    Scraping and Part of Speech Analysis of the Texas Essential Knowledge and Skills (TEKS)

    Python

  5. exoplanet_analysis exoplanet_analysis Public

    An ETL Pipeline and analysis of Exoplanet data from The Extrasolar Planets Encyclopaedia

    Python

  6. myanimelist myanimelist Public

    A Pipeline that extracts read-only anime data from My Anime List using the Jikkan API

    Python 1