EKS pulumi (Python+Scrapy+celary)

Utilising Pulumi to create a EKS insfra structure in a minute and deploy a scraping project that uses distributed message queue.

Library/Technology

Library/Technology	Purpose
Pulumi	Infrastructure as code tool for defining, deploying, and managing cloud infrastructure using familiar programming languages.
pulumi-awsx	Higher-level abstraction for working with AWS infrastructure resources in Pulumi, simplifying the process of defining and managing AWS resources.
pulumi-eks	Abstractions for working with Amazon EKS clusters in Pulumi, simplifying the deployment and management of Kubernetes clusters on AWS.
Celery	Managing concurrent requests efficiently for web scraping.
Docker	Containerization of the Scrapy project for portability and ease of deployment.
Kubernetes	Orchestration platform for deploying and managing containerized applications.
psycopg2-binary	Optional: PostgreSQL adapter for interacting with PostgreSQL databases if storing scraped data.
python-dotenv	Loading environment variables for managing configuration settings securely.
Scrapy	Web scraping framework for extracting data from IMDb's Top 50 movies list.
SQLAlchemy	Optional: ORM library for interacting with relational databases if storing scraped data.
AWS EKS	Kubernetes service on AWS for deploying the Dockerized Scrapy scraper with scalability and reliability.
AWS Secrets Manager	Securely handling sensitive configurations like API keys or database credentials.
Github Actions	Manage CI CD and automatic deployment on main branch

Instructions

1. Setup Infrastucture

1.1 pulumi setup

cd resources/iaac

if venv is not yet generated

python -m venv venv

Active venv

source venv/bin/activate

pip install -r requirements.txt

1.2 Setup aws

Get you secret acesss key from AWS IAM. then setup you creds into cli. CLI setup instruction.

aws configure

**1.3 Spin up pulumi*

pulumi up

1.4 Spin down

pulumi down

2. Scraper Project instruction

Pre requsite: Postgres, redis.

Setup

cd /scaper

python -m venv venv

source venv/bin/active

2.1 Scraper Project

Crawls through imdb top 250 website visit each pages and sends datas to queue.

scrapy crawl imdb

2.2 Worker Project

Recieved datas from the scraper and upset them into database.

celery -A scraper worker --loglevel=info

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.github/workflows		.github/workflows
deployment/k8s		deployment/k8s
resources/iaac		resources/iaac
scraper		scraper
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker.compose.yml		docker.compose.yml
requirements.txt		requirements.txt
scrapy.cfg		scrapy.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EKS pulumi (Python+Scrapy+celary)

Utilising Pulumi to create a EKS insfra structure in a minute and deploy a scraping project that uses distributed message queue.

Library/Technology

Instructions

1. Setup Infrastucture

2. Scraper Project instruction

About

Releases 1

Packages

Languages

LitHaxor/EKS_SCRAPPER

Folders and files

Latest commit

History

Repository files navigation

EKS pulumi (Python+Scrapy+celary)

Utilising Pulumi to create a EKS insfra structure in a minute and deploy a scraping project that uses distributed message queue.

Library/Technology

Instructions

1. Setup Infrastucture

2. Scraper Project instruction

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages