Skip to content

Commit

Permalink
chore: fix misc grammar
Browse files Browse the repository at this point in the history
  • Loading branch information
joaodiaslobo committed Sep 7, 2024
1 parent b34cd81 commit 5fb398e
Show file tree
Hide file tree
Showing 4 changed files with 12 additions and 12 deletions.
10 changes: 5 additions & 5 deletions scraper/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ sudo pacman -S geckodriver firefox # Arch

| package | usage |
| ----------- | ------------------------------------------------------------------------------------------------------------------------------------------ |
| requests | To download previous commits files from our GitHub page and scrape subjects short names |
| requests | To download previous commit files from our GitHub page and scrape subjects short names |
| unidecode | To create short names to subjects (that weren't scraped), removing accents from chars. Ex.: Álgebra Linear para a Engenharia -> ÁLE -> ALE |
| selenium | Used to scrape the webpage. On this case is impossible use libraries like `beautifulsoup` due the web stack used by UMinho |
| geckodriver | A selenium dependency to interact with browsers |
Expand All @@ -51,17 +51,17 @@ $ python scraper/main.py

##### Subjects Short Names

[Calendarium](https://calendario.cesium.di.uminho.pt/) use some short names to easily identify some subjects. This names were chosen on previous versions of `filters.json`. The scrap can be done combining the files `data/filter.json` and `data/shifts.json` from a specific commit (when this files were a manual scrap) from [Calendarium Github Page](https://github.com/cesium/calendarium).
[Calendarium](https://calendario.cesium.di.uminho.pt/) use some short names to easily identify some subjects. These names were chosen on previous versions of `filters.json`. The scrape can be done combining the files `data/filter.json` and `data/shifts.json` from a specific commit (when these files were a manual scrape) from [Calendarium Github Page](https://github.com/cesium/calendarium).

If not founded, `scraper/subjects_short_names.json` will be generated by the schedule scraper. Read more at [subjects short names](./modules/README.md#subjects_short_names).
If not found, `scraper/subjects_short_names.json` will be generated by the schedule scraper. Read more at [subjects short names](./modules/README.md#subjects_short_names).

###### You can add manually names to this list

##### Subject IDs and Filter Ids

[Calendarium](https://calendario.cesium.di.uminho.pt/) use a subject ID and a filterID. On UMinho Courses pages, a list of all subjects, ordered first by year/semesters and next by alphabetic order, and the subject IDs are given. This is everything we need to complete `shifts.json` and generate a basic `filters.json` to Calendarium.
[Calendarium](https://calendario.cesium.di.uminho.pt/) uses a subject ID and a filterID. On UMinho Courses pages, a list of all subjects, ordered first by year/semesters and next by alphabetic order, and the subject IDs are given. This is everything we need to complete `shifts.json` and generate a basic `filters.json` to Calendarium.

If not founded, `scraper/subjects.json` will be generated by the schedule scraper. Read more at [subjects scraper documentation](./modules/README.md#subject-id-and-a-filter-id-scraper).
If not found, `scraper/subjects.json` will be generated by the schedule scraper. Read more at [subjects scraper documentation](./modules/README.md#subject-id-and-a-filter-id-scraper).

###### You can add manually subjects to this list

Expand Down
10 changes: 5 additions & 5 deletions scraper/modules/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,15 @@

##### (subjects_short_names_scraper.py)

[Calendarium](https://calendario.cesium.di.uminho.pt/) use some short names to easily identify some subjects. This names were chosen on previous versions of `filters.json`.
[Calendarium](https://calendario.cesium.di.uminho.pt/) uses some short names to easily identify some subjects. These names were chosen on previous versions of `filters.json`.

### Scraping this values

The scrap can be done combining the files `data/filter.json` and `data/shifts.json` from a specific commit (when this files were a manual scrap) from [Calendarium Github Page](https://github.com/cesium/calendarium).
The scrape can be done by combining the files `data/filter.json` and `data/shifts.json` from a specific commit (when these files were a manual scrape) from [Calendarium Github Page](https://github.com/cesium/calendarium).

#### Adding manual values

If for some reason you want add some subjects (a new one) to this scrap, you can edit the dictionary `manual_subject_names` at `scraper/modules/subjects_short_names_scraper.py` file. Follow the next schema:
If for some reason you want add some subjects (a new one) to this scrape, you can edit the dictionary `manual_subject_names` at `scraper/modules/subjects_short_names_scraper.py` file. Follow the next schema:

```python
manual_subject_names = {
Expand All @@ -23,7 +23,7 @@ manual_subject_names = {

#### Output

If not founded, `scraper/subjects_short_names.json` will be generated by the schedule scraper.
If not found, `scraper/subjects_short_names.json` will be generated by the schedule scraper.

## Subject ID and a Filter ID Scraper

Expand All @@ -35,7 +35,7 @@ If not founded, `scraper/subjects_short_names.json` will be generated by the sch
filterId = f"{university_year}{university_semester}{subject_code}"
```

Where the `subject code` is the position of the subject in an alphabetic ordered list. For example:
Where the `subject code` is the position of the subject in an alphabetical ordered list. For example:

```python
# 1st year & 1st semester subjects:
Expand Down
2 changes: 1 addition & 1 deletion scraper/modules/schedule_scraper.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ def schedule_scraper(driver: WebDriver, subject_codes: list[dict[str, int]]):
Parameters
----------
driver : WebDriver
The selenium driver. Need have the schedule ready
The selenium driver. Needs to have the schedule ready
subject_codes : list[dict[str, int]]
Every subject has its subject ID and filter ID. This IDs are stored on a list of dicts with the format:
Expand Down
2 changes: 1 addition & 1 deletion scraper/modules/subjects_short_names_scraper.py
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ def get_subjects_short_names_scraper():

names = {}

print("Not founded info on `shifts.json` about:")
print("Couldn't find info on `shifts.json` about:")

for subject in filters:
filter_id = subject["id"]
Expand Down

0 comments on commit 5fb398e

Please sign in to comment.