Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dev #134

Merged
merged 65 commits into from
Jan 9, 2025
Merged

Dev #134

Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
65 commits
Select commit Hold shift + click to select a range
c5cd286
Comment out uvicorn command
maxachis Jan 6, 2025
66df3c1
Add instructions for setting up uvicorn server
maxachis Jan 6, 2025
66d8faf
Fix bug in `/batch` query parameter filtering
maxachis Jan 6, 2025
18d198f
Add catch for CKAN "Not Found" error
maxachis Jan 7, 2025
a3489a7
Add additional manual test for CKAN collector
maxachis Jan 7, 2025
223a2f5
Expand descriptions for CKAN Config DTOs
maxachis Jan 7, 2025
1d88e20
Expand descriptions for CKAN Config DTOs
maxachis Jan 7, 2025
ed1f753
Add logic and test for updating DB batch status with error on Collect…
maxachis Jan 7, 2025
a7de69e
Add try-catch for get_organization CKAN Not Found error
maxachis Jan 7, 2025
7f4e216
Update url retrieval method in `run_migrations_offline` to use existi…
maxachis Jan 7, 2025
60eab0e
Add alembic.ini
maxachis Jan 7, 2025
7bec9b6
Move alembic.ini to root
maxachis Jan 7, 2025
56bf603
Update `target_metadata` to reference `Base` in `models.py`
maxachis Jan 7, 2025
c3a26cf
Add `DEV` environment variable
maxachis Jan 7, 2025
576782c
DBClient: Add alembic setup logic
maxachis Jan 7, 2025
b3b0568
Add conditional setup logic depending on if DEV or production environ…
maxachis Jan 7, 2025
868efb8
Move old log deletion to 10 minutes after setup.
maxachis Jan 7, 2025
75aace1
Add alembic to requirements.txt
maxachis Jan 7, 2025
4530afc
Add initial revision
maxachis Jan 7, 2025
15176a1
Move Alembic migration to Dockerfile and outside FastAPI app
maxachis Jan 8, 2025
73cc243
Migrate alembic logic to `apply_migrations.py`
maxachis Jan 8, 2025
0be5108
Migrate alembic logic to `apply_migrations.py`
maxachis Jan 8, 2025
bf95c8b
Clean up Dockerfile
maxachis Jan 8, 2025
0028d88
Update to pull connection string from ENV
maxachis Jan 8, 2025
67ab983
Attempt to fix import error
maxachis Jan 8, 2025
41d3804
Move apply_migrations to root
maxachis Jan 8, 2025
1ab5552
Move apply_migrations to root
maxachis Jan 8, 2025
3fc21e2
Move apply_migrations to root
maxachis Jan 8, 2025
0613273
Move apply_migrations to root
maxachis Jan 8, 2025
d1d2dcb
Move apply_migrations to root
maxachis Jan 8, 2025
5692a20
Move apply_migrations to root
maxachis Jan 8, 2025
f93add9
Correct grammar error in comment
maxachis Jan 8, 2025
648955e
Add info on Alembic directory contents and how to make migration
maxachis Jan 8, 2025
c2be66e
Update db_client_test to include alembic upgrades/downgrades in setup…
maxachis Jan 8, 2025
802d11d
Cleanup and add `update_url` method
maxachis Jan 8, 2025
d709fca
Add `updated_at` entry to URLs
maxachis Jan 8, 2025
89b001b
Add `updated_at` entry to URLs
maxachis Jan 8, 2025
05c96a3
Flesh out README.md
maxachis Jan 8, 2025
02bba59
Flesh out README.md
maxachis Jan 8, 2025
55a939e
Try to correct bug in GitHub Action
maxachis Jan 8, 2025
f2533a7
Try to correct bug in GitHub Action
maxachis Jan 8, 2025
cf8d8fc
Try to correct bug in GitHub Action
maxachis Jan 8, 2025
d28df47
Try to correct bug in GitHub Action
maxachis Jan 8, 2025
aad8599
Try to correct bug in GitHub Action
maxachis Jan 8, 2025
240eac0
Try to correct bug in GitHub Action
maxachis Jan 8, 2025
aad9c86
Try to correct bug in GitHub Action
maxachis Jan 8, 2025
47425eb
Try to correct bug in GitHub Action
maxachis Jan 8, 2025
7f9417b
Try to correct bug in GitHub Action
maxachis Jan 8, 2025
deb1cf4
Adjust tests to account for alembic migration
maxachis Jan 8, 2025
01ffdc6
Adjust tests to account for alembic migration
maxachis Jan 8, 2025
03a31ed
Adjust tests to account for alembic migration
maxachis Jan 8, 2025
878a92c
Adjust tests to account for alembic migration
maxachis Jan 8, 2025
64c459d
Adjust tests to account for alembic migration
maxachis Jan 8, 2025
103c8f2
Adjust tests to account for alembic migration
maxachis Jan 8, 2025
5f009ce
Adjust tests to account for alembic migration
maxachis Jan 8, 2025
33caf2b
Adjust tests to account for alembic migration
maxachis Jan 8, 2025
d46b10f
Adjust tests to account for alembic migration
maxachis Jan 8, 2025
62f28d3
Adjust tests to account for alembic migration
maxachis Jan 8, 2025
8c207bf
Adjust tests to account for alembic migration
maxachis Jan 8, 2025
e2d7cac
Adjust tests to account for alembic migration
maxachis Jan 9, 2025
4ca2724
Adjust tests to account for alembic migration
maxachis Jan 9, 2025
8d7140b
Adjust tests to account for alembic migration
maxachis Jan 9, 2025
00001e0
Try to correct bug in GitHub Action
maxachis Jan 8, 2025
bb039e0
Adjust tests to account for alembic migration
maxachis Jan 8, 2025
7e9d01b
Merge remote-tracking branch 'origin/dev' into dev
maxachis Jan 9, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
60 changes: 48 additions & 12 deletions .github/workflows/test_app.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,16 +3,52 @@
name: Test Source Collector App
on: pull_request

#jobs:
# build:
# runs-on: ubuntu-latest
# steps:
# - name: Checkout repository
# uses: actions/checkout@v4
# - name: Run docker-compose
# uses: hoverkraft-tech/[email protected]
# with:
# compose-file: "docker-compose.yml"
# - name: Execute tests in the running service
# run: |
# docker ps -a && docker exec data-source-identification-app-1 pytest /app/tests/test_automated

jobs:
build:
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: Run docker-compose
uses: hoverkraft-tech/[email protected]
with:
compose-file: "docker-compose.yml"
- name: Execute tests in the running service
run: |
docker exec data-source-identification-app-1 pytest /app/tests/test_automated
container-job:
runs-on: ubuntu-latest
container: python:3.12.8

services:
postgres:
image: postgres:15
env:
POSTGRES_PASSWORD: postgres
options: >-
--health-cmd pg_isready
--health-interval 10s
--health-timeout 5s
--health-retries 5

steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
- name: Run tests
run: |
pytest tests/test_automated
pytest tests/test_alembic
env:
POSTGRES_PASSWORD: postgres
POSTGRES_USER: postgres
POSTGRES_DB: postgres
POSTGRES_HOST: postgres
POSTGRES_PORT: 5432
GOOGLE_API_KEY: TEST
GOOGLE_CSE_ID: TEST
3 changes: 1 addition & 2 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -14,5 +14,4 @@ RUN pip install --no-cache-dir -r requirements.txt
# Expose the application port
EXPOSE 80

# Run FastAPI app with uvicorn
CMD ["uvicorn", "api.main:app", "--host", "0.0.0.0", "--port", "80"]
RUN chmod +x execute.sh
1 change: 1 addition & 0 deletions ENV.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,4 @@ Please ensure these are properly defined in a `.env` file in the root directory.
|`POSTGRES_HOST` | The host for the test database | `127.0.0.1` |
|`POSTGRES_PORT` | The port for the test database | `5432` |
|`DS_APP_SECRET_KEY`| The secret key used for decoding JWT tokens produced by the Data Sources App. Must match the secret token that is used in the Data Sources App for encoding. |`abc123`|
|`DEV`| Set to any value to run the application in development mode. |`true`|
6 changes: 6 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,12 @@ This can be done via the following command:
docker compose up -d
```

Following that, you will need to set up the uvicorn server using the following command:

```bash
docker exec data-source-identification-app-1 uvicorn api.main:app --host 0.0.0.0 --port 80
```

Note that while the container may mention the web app running on `0.0.0.0:8000`, the actual host may be `127.0.0.1:8000`.

To access the API documentation, visit `http://{host}:8000/docs`.
Expand Down
117 changes: 117 additions & 0 deletions alembic.ini
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
# A generic, single database configuration.

[alembic]
# path to migration scripts
# Use forward slashes (/) also on windows to provide an os agnostic path
script_location = collector_db/alembic

# template used to generate migration file names; The default value is %%(rev)s_%%(slug)s
# Uncomment the line below if you want the files to be prepended with date and time
# see https://alembic.sqlalchemy.org/en/latest/tutorial.html#editing-the-ini-file
# for all available tokens
# file_template = %%(year)d_%%(month).2d_%%(day).2d_%%(hour).2d%%(minute).2d-%%(rev)s_%%(slug)s

# sys.path path, will be prepended to sys.path if present.
# defaults to the current working directory.
prepend_sys_path = .

# timezone to use when rendering the date within the migration file
# as well as the filename.
# If specified, requires the python>=3.9 or backports.zoneinfo library.
# Any required deps can installed by adding `alembic[tz]` to the pip requirements
# string value is passed to ZoneInfo()
# leave blank for localtime
# timezone =

# max length of characters to apply to the "slug" field
# truncate_slug_length = 40

# set to 'true' to run the environment during
# the 'revision' command, regardless of autogenerate
# revision_environment = false

# set to 'true' to allow .pyc and .pyo files without
# a source .py file to be detected as revisions in the
# versions/ directory
# sourceless = false

# version location specification; This defaults
# to alembic/versions. When using multiple version
# directories, initial revisions must be specified with --version-path.
# The path separator used here should be the separator specified by "version_path_separator" below.
# version_locations = %(here)s/bar:%(here)s/bat:alembic/versions

# version path separator; As mentioned above, this is the character used to split
# version_locations. The default within new alembic.ini files is "os", which uses os.pathsep.
# If this key is omitted entirely, it falls back to the legacy behavior of splitting on spaces and/or commas.
# Valid values for version_path_separator are:
#
# version_path_separator = :
# version_path_separator = ;
# version_path_separator = space
# version_path_separator = newline
version_path_separator = os # Use os.pathsep. Default configuration used for new projects.

# set to 'true' to search source files recursively
# in each "version_locations" directory
# new in Alembic version 1.10
# recursive_version_locations = false

# the output encoding used when revision files
# are written from script.py.mako
# output_encoding = utf-8

sqlalchemy.url = postgresql://test_source_collector_user:[email protected]:5432/source_collector_test_db


[post_write_hooks]
# post_write_hooks defines scripts or Python functions that are run
# on newly generated revision scripts. See the documentation for further
# detail and examples

# format using "black" - use the console_scripts runner, against the "black" entrypoint
# hooks = black
# black.type = console_scripts
# black.entrypoint = black
# black.options = -l 79 REVISION_SCRIPT_FILENAME

# lint with attempts to fix using "ruff" - use the exec runner, execute a binary
# hooks = ruff
# ruff.type = exec
# ruff.executable = %(here)s/.venv/bin/ruff
# ruff.options = --fix REVISION_SCRIPT_FILENAME

# Logging configuration
[loggers]
keys = root,sqlalchemy,alembic

[handlers]
keys = console

[formatters]
keys = generic

[logger_root]
level = WARNING
handlers = console
qualname =

[logger_sqlalchemy]
level = WARNING
handlers =
qualname = sqlalchemy.engine

[logger_alembic]
level = INFO
handlers =
qualname = alembic

[handler_console]
class = StreamHandler
args = (sys.stderr,)
level = NOTSET
formatter = generic

[formatter_generic]
format = %(levelname)-5.5s [%(name)s] %(message)s
datefmt = %H:%M:%S
11 changes: 11 additions & 0 deletions api/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,12 +9,14 @@
from collector_db.DatabaseClient import DatabaseClient
from core.CoreLogger import CoreLogger
from core.SourceCollectorCore import SourceCollectorCore
from util.helper_functions import get_from_env


@asynccontextmanager
async def lifespan(app: FastAPI):
# Initialize shared dependencies
db_client = DatabaseClient()
await setup_database(db_client)
source_collector_core = SourceCollectorCore(
core_logger=CoreLogger(
db_client=db_client
Expand All @@ -34,6 +36,15 @@
pass


async def setup_database(db_client):

Check warning on line 39 in api/main.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] api/main.py#L39 <103>

Missing docstring in public function
Raw output
./api/main.py:39:1: D103 Missing docstring in public function
# Initialize database if dev environment, otherwise apply migrations
try:
get_from_env("DEV")
db_client.init_db()
except Exception as e:

Check warning on line 44 in api/main.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] api/main.py#L44 <841>

local variable 'e' is assigned to but never used
Raw output
./api/main.py:44:5: F841 local variable 'e' is assigned to but never used
return


app = FastAPI(
title="Source Collector API",
description="API for collecting data sources",
Expand Down
14 changes: 14 additions & 0 deletions apply_migrations.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
from alembic import command

Check warning on line 1 in apply_migrations.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] apply_migrations.py#L1 <100>

Missing docstring in public module
Raw output
./apply_migrations.py:1:1: D100 Missing docstring in public module
from alembic.config import Config

from collector_db.helper_functions import get_postgres_connection_string

if __name__ == "__main__":
print("Applying migrations...")
alembic_config = Config("alembic.ini")
alembic_config.set_main_option(
"sqlalchemy.url",
get_postgres_connection_string()
)
command.upgrade(alembic_config, "head")
print("Migrations applied.")

Check warning on line 14 in apply_migrations.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] apply_migrations.py#L14 <292>

no newline at end of file
Raw output
./apply_migrations.py:14:33: W292 no newline at end of file
2 changes: 2 additions & 0 deletions collector_db/DTOs/URLInfo.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
import datetime

Check warning on line 1 in collector_db/DTOs/URLInfo.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] collector_db/DTOs/URLInfo.py#L1 <100>

Missing docstring in public module
Raw output
./collector_db/DTOs/URLInfo.py:1:1: D100 Missing docstring in public module
from typing import Optional

from pydantic import BaseModel
Expand All @@ -11,3 +12,4 @@
url: str
url_metadata: Optional[dict] = None
outcome: URLOutcome = URLOutcome.PENDING
updated_at: Optional[datetime.datetime] = None
16 changes: 11 additions & 5 deletions collector_db/DatabaseClient.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,6 @@
from collector_db.helper_functions import get_postgres_connection_string
from collector_db.models import Base, Batch, URL, Log, Duplicate
from collector_manager.enums import CollectorType
from core.DTOs.BatchStatusInfo import BatchStatusInfo
from core.enums import BatchStatus


Expand All @@ -32,10 +31,12 @@
url=db_url,
echo=ConfigManager.get_sqlalchemy_echo(),
)
Base.metadata.create_all(self.engine)
self.session_maker = scoped_session(sessionmaker(bind=self.engine))
self.session = None

def init_db(self):

Check warning on line 37 in collector_db/DatabaseClient.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] collector_db/DatabaseClient.py#L37 <102>

Missing docstring in public method
Raw output
./collector_db/DatabaseClient.py:37:1: D102 Missing docstring in public method
Base.metadata.create_all(self.engine)

def session_manager(method):
@wraps(method)
def wrapper(self, *args, **kwargs):
Expand Down Expand Up @@ -214,13 +215,13 @@
# Get only the batch_id, collector_type, status, and created_at
limit = 100
query = (session.query(Batch)
.order_by(Batch.date_generated.desc())
.limit(limit)
.offset((page - 1) * limit))
.order_by(Batch.date_generated.desc()))
if collector_type:
query = query.filter(Batch.strategy == collector_type.value)
if status:
query = query.filter(Batch.status == status.value)
query = (query.limit(limit)
.offset((page - 1) * limit))
batches = query.all()
return [BatchInfo(**batch.__dict__) for batch in batches]

Expand Down Expand Up @@ -274,6 +275,11 @@
Log.created_at < datetime.now() - timedelta(days=1)
).delete()

@session_manager
def update_url(self, session, url_info: URLInfo):

Check warning on line 279 in collector_db/DatabaseClient.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] collector_db/DatabaseClient.py#L279 <102>

Missing docstring in public method
Raw output
./collector_db/DatabaseClient.py:279:1: D102 Missing docstring in public method
url = session.query(URL).filter_by(id=url_info.id).first()
url.url_metadata = url_info.url_metadata

if __name__ == "__main__":
client = DatabaseClient()
print("Database client initialized.")
30 changes: 28 additions & 2 deletions collector_db/alembic/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,30 @@
Generic single-database configuration.
Alembic is a lightweight Python library that helps manage database migrations.

## Files and Directories

The following files are present in this directory OR related to it:
- `script.py.mako`: This is a Mako template file which is used to generate new migration scripts. Whatever is here is used to generate new files within `versions/`. This is scriptable so that the structure of each migration file can be controlled, including standard imports to be within each, as well as changes to the structure of the `upgrade()` and `downgrade()` functions
- `env.py`: The main script that sets up the migration environment.
- `alembic.ini`: The `alembic` configuration file. Located in the root of the repository
- `/versions`: The directory which contains the migration scripts
- `apply_migrations.py`: A Python script, located in the root directory, which applies any outstanding migrations to the database
- `execute.sh`: A shell script in the root directory which runs the `apply_migrations.py` script. Called by DigitalOcean when deploying the application.

## Generating a Migration

To generate a new migration, run the following command from the root directory:

```bash
alembic revision --autogenerate -m "Description for migration"
```

Then, locate the new revision script in `/versions` and modify the update and downgrade functions as needed

Once you have generated a new migration, you can upgrade and downgrade the database using the `alembic` command line tool.

Finally, make sure to commit your changes to the repository.

## How does Alembic Work?

As long as new migrations are generated and stored in the `/versions` directory, Alembic will apply them, in the order they were made, to the production database.

- `script.py.mako`: This is a Mako template file which is used to generate new migration scripts. Whatever is here is used to generate new files within `versions/`. This is scriptable so that the structure of each migration file can be controlled, including standard imports to be within each, as well as changes to the structure of the `upgrade()` and `downgrade()` functions
7 changes: 5 additions & 2 deletions collector_db/alembic/env.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,9 @@

from alembic import context

from collector_db.helper_functions import get_postgres_connection_string
from collector_db.models import Base

# this is the Alembic Config object, which provides
# access to the values within the .ini file in use.
config = context.config
Expand All @@ -18,7 +21,7 @@
# for 'autogenerate' support
# from myapp import mymodel
# target_metadata = mymodel.Base.metadata
target_metadata = None
target_metadata = Base.metadata

# other values from the config, defined by the needs of env.py,
# can be acquired:
Expand All @@ -38,7 +41,7 @@ def run_migrations_offline() -> None:
script output.

"""
url = config.get_main_option("sqlalchemy.url")
url = get_postgres_connection_string()
context.configure(
url=url,
target_metadata=target_metadata,
Expand Down
Loading
Loading