Pihole-project

I started this project to get hands-on with AWS Analytics services such as AWS Glue, Amazon Athena , and Amazon QuickSight.

I find it's always better to have a real project to work with instead of just clicking through the services so I went and bought myself a Raspberry Pi4 with the intention to run Pi-Hole on it.

Pi-Hole is this amazing open-source project which you can install on a Raspberry Pi and then route all your DNS requests to. It will subscribe to community maintained deny lists and block all add requests at the DNS level, basically acting as a DNS black-hole service.

I chose to install this as a container using docker-compose. My idea was from the beginning to get the data from Pi-hole and run some analytics on it, so I made sure the project had an API and wrote myself an API Client (in Ruby) and packaged it up as a container using this Dockerfile and also added it to docker-compose as "upload". This container connects to the API and uploads the data into an S3 bucket in my AWS account. I've made sure to keep the data in the incredibly hard to work-with format that it comes in, a JSON structure with one key called "data" and then a nested array in it. i.e.:

{"data":[["1613433600","PTR","27.22.22.172.in-addr.arpa","localhost","3","0","0","0","N/A","-1","N/A"],["1613433600","PTR","3.22.22.172.in-addr.arpa","localhost","3","0","0","0","N/A","-1","N/A"],["1613433600","PTR","32.22.22.172.in-addr.arpa","localhost","3","0","0","0","N/A","-1","N/A"],["1613433600","PTR","28.22.22.172.in-addr.arpa","localhost","3","0","0","0","N/A","-1","N/A"],["1613433600","PTR","22.22.22.172.in-addr.arpa","localhost","3","0","0","0","N/A","-1","N/A"],["1613433600","PTR","24.22.22.172.in-addr.arpa","localhost","3","0","0","0","N/A","-1","N/A"],["1613433600","PTR","29.22.22.172.in-addr.arpa","localhost","3","0","0","0","N/A","-1","N/A"],["1613433600","PTR","1.22.22.172.in-addr.arpa","localhost","3","0","0","0","N/A","-1","N/A"],["1613433608","AAAA","cdn-0.nflximg.com","firetv.sk.lan","2","0","0","0","N/A","-1","router.sk.lan#53"],["1613433612","A","cdn-0.nflximg.com","firetv.sk.lan","2","0","0","0","N/A","-1","router.sk.lan#53"]]}

The reason I kept the data in this challenging format is because I wanted to learn more about PySpark, one of the supported languages on AWS Glue (among plain old Python and Scala). This gave me the chance to really dive into PySpark as a solution and come up with this AWS Glue PySpark Job. It wasn't straight forward and it took me some time to get my head around how this works but eventually I got this working, AWS Glue Bookmarks and all! In short, this job reads all the data from my source S3 bucket that hasn't been read yet, applies some transformations to it, and then dumps it in Parquet format using snappy compression into my target S3 bucket.

From there I can use Amazon Athena for ad-hoc analysis of the data, or use Amazon QuickSight to build beautiful dashboards or analyse the data further.

NOTE: Since I have the raspberry Pi and it's always on, I've started adding other projects to it like using openVPN and a DynDNS service.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
aws-glue		aws-glue
dyndns.aws		dyndns.aws
dyndns		dyndns
openvpn		openvpn
pihole		pihole
upload		upload
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
cron		cron
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pihole-project

About

Releases

Packages

Languages

nerdyness/pihole-project

Folders and files

Latest commit

History

Repository files navigation

Pihole-project

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages