Nous Aggregator

Overview

A content aggregator that collects metadata about articles from newspapers, journals, blogs, etc. The scraper uses information about the structure of the targeted pages as well as regular expressions in order to scrape selectively and filter the results. To facilitate the start of a new project, the data required by the scraper is extracted to files in the fixtures directory.

Inspired by AllTop.

Installation

Make sure Docker is installed on your system.

Clone the repository into a directory of your choice:

mkdir MYAPPDIR
git clone https://github.com/pi-sigma/nous-aggregator.git MYAPPDIR

Inside the new directory, create a file for the environment variables:

touch .env

Open the file with the editor of your choice and set the environment variables. See env-sample for instructions.

Build the Docker image:

docker-compose build

Start the web container in detached mode, apply the migrations, and initialize the database:

docker-compose up -d web
docker-compose run web python manage.py migrate
docker-compose run web python manage.py loaddata fixtures/sources.json

Create a superuser for the Django app:

docker-compose run web python manage.py createsuperuser

Stop the containers:

docker-compose stop web
docker-compose stop db

Usage

Start the Docker containers:

docker-compose up

You can access the page at one of the following addresses:

http://0.0.0.0:8000
http://127.0.0.1:8000
http://localhost:8000

If all went well, you should see the homepage of the app with a list of news sources arranged in a grid. The grids are empty to begin with and fill up when the celery workers start (depends on the schedule in scraper.tasks).

In order to extract data about the sources from the database, use the following command while the web container is running (the commands for the other tables are analogous):

docker-compose run web python manage.py dumpdata articles.source --indent 2 > fixtures/sources.json

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
.github/workflows		.github/workflows
bin		bin
fixtures		fixtures
logs		logs
nous_aggregator		nous_aggregator
requirements		requirements
src		src
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE.md		LICENSE.md
README.md		README.md
docker-compose.yaml		docker-compose.yaml
env-sample		env-sample
manage.py		manage.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Nous Aggregator

Overview

Installation

Usage

About

Releases

Packages

Contributors 2

Languages

License

pi-sigma/nous-aggregator

Folders and files

Latest commit

History

Repository files navigation

Nous Aggregator

Overview

Installation

Usage

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages