Martin Arroyo martinmarroyo

Hi, I'm Martin. Welcome to my Github

Data Engineer @Honda Research Institute/99P Labs. Data Analytics Instructor @COOP Careers. Curious human.

About Me

I am a data engineer and a data analytics instructor with experience across various industries, including higher education and automotive.

At Honda Research Institute/99P Labs, I develop streaming data pipelines using open-source technologies such as Apache Spark and Kafka. At COOP Careers, I teach fundamental skills in Excel, SQL, Tableau, and Python to aspiring data analysts and provide guidance to junior instructors. My previous experiences range from higher education to web development, content editing, and software quality assurance.

I am passionate about technology, education, and helping people who want to break into the tech industry by making instruction in the basics accessible and easy to digest. Check out some of the work that me and my friends have created to help spread free tech education at The Freestack Initiative.

Click here to learn a little more about my work.

Projects

Here are some projects that I'm particularly proud of (WIP = Work in Progress):

DataLab (WIP)

Datalab is a curated data analytics environment that helps you get hands-on practice with common industry tools. It is also extensible. This project was born out of a desire to practice the skills that I consistently saw in data and analytics engineering positions, as well as wanting to learn Docker. Think of it as a data laboratory in a box - using Docker, I created containers for a Python environment, a database server, an database administration tool, and a visualization tool. Using this library, you can avoid the headache of setting up your own database server and getting all these pieces interconnected and talking to one another. Or, if you're curious, you can dig into what I did and make it your own. This project is still under development.

teachdb (WIP)

teachdb is an in-memory micro relational database, powered by duckdb. It was made with two types of users in mind: instructors who want to teach SQL concepts, and students who want to learn and practice the fundamentals. Combined with a Jupyter Notebook, teachdb provides a database that can be used to demonstrate fundamental SQL concepts such as select queries, filtering, aggregations, and joins. It can even be used to introduce more advanced topics such as analytical/window functions, common table expressions (CTEs), data definition language (DDL) commands, etc.

For students, it provides a safe environment to learn and experiment with a SQL database without the need for setting up your their own server or downloading additional software.

COOP SQL Courses

I recently worked with COOP Careers to revise their introductory SQL curriculum for the Fall 2023 semester. The final deliverable includes approximately 6 hours worth of material that is designed to take learners with little to no experience with SQL and get them ready for technical interviews. There is a crash course on database theory, a short course on combining data in SQL, and three Jupyter Notebooks that include interactive SQL lessons utlizing my teachdb library.

By leveraging teachdb and Google Colab, we are able to set up a basic database environment within a notebook that can be used in the browser. This means that all students need to work with a real database and learn is an internet connection - no configuration required - which was really important for the COOP community.

Texas Essential Knowledge and Skills (TEKS) Parts of Speech Analysis

This is one of my favorite projects because I was fortunate to be able to have a positive impact on a great organization while sharpening my own skills. In Fall 2021, I worked with a non-profit organization called Celebrate Dyslexia to analyze raw text data to help them create a curriculum for young children with dyslexia.

I created a web scraper to grab the data, used an NLP model to help me extract verbs, analyzed the results to find the most common verbs across each subject, and created a dashboard that would allow the team to interactively look through the results to help craft their curriculum.

Overall, the project was a great success and my work had a positive impact on the organization.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly