Skip to content

Commit

Permalink
Write to stdout instead of a file
Browse files Browse the repository at this point in the history
In the readme, explain how to direct output to the file.
  • Loading branch information
tpwo committed Jul 10, 2024
1 parent 4046eec commit e8ef193
Show file tree
Hide file tree
Showing 2 changed files with 52 additions and 44 deletions.
63 changes: 43 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,38 +29,61 @@ This command:
- creates virtual environment in the project directory: `./venv`
- installs all dependencies required to run the app and tests

## CLI
## How to use?

### CLI

Use `--help` to see available options.

```bash
python -m event_scrapper_srt --help
```

### Sample usage
### Saving output into the file

All logged messages are directed to `stderr`, and scrapped events to `stdout`. With `>` you can direct `stdout` to a file.

```bash
python -m event_scrapper_srt > output.json
```

Alternatively, you can direct logs to another file by using `1>` and `2>`:

```bash
python -m event_scrapper_srt 1> output.json 2> log.txt
```

You might consider `2>>` to append to log file instead of overwriting it:

```bash
python -m event_scrapper_srt 1> output.json 2>> logs.txt
```

### Sample output

```con
$ python -m event_scrapper_srt --output-path output.json
2024-07-10 21:44:20,521 - INFO - Found 69 events in the sitemap
2024-07-10 21:44:20,522 - INFO - Extracted 7 events from the sitemap
2024-07-10 21:44:20,739 - WARNING - No end time found for the date `<p><strong>12 lipca 2024</strong> 20:00<hr/></p>`, setting to None
2024-07-10 21:44:20,739 - WARNING - No end time found for the date `<p><strong>16 sierpnia 2024</strong> 20:00</p>`, setting to None
2024-07-10 21:44:20,945 - INFO - [Summertime Jump Party | Impreza na zakończenie sezonu] No date and time information found
2024-07-10 21:44:21,824 - INFO - [Trening Performance & Show | SPOTKANIE INFORMACYJNE] No date and time information found
2024-07-10 21:44:22,055 - INFO - Extracted details for 7 events
2024-07-10 21:44:22,055 - INFO - [SWING NA PERONIE | potańcówka & live music] Prepared 2 events for Gancio
2024-07-10 21:44:22,055 - INFO - [Summertime Jump Party | Impreza na zakończenie sezonu] No Gancio events created: no future `date_times` found
2024-07-10 21:44:22,055 - INFO - [Sunday Summer Night | CONIEDZIELNA POTAŃCÓWKA] Prepared 7 events for Gancio
2024-07-10 21:44:22,055 - INFO - [Practice & CHILL] Prepared 3 events for Gancio
2024-07-10 21:44:22,055 - INFO - [Practice & CHILL | Wersja FAST FEET] Prepared 3 events for Gancio
2024-07-10 21:44:22,055 - INFO - [Trening Performance & Show | SPOTKANIE INFORMACYJNE] No Gancio events created: no future `date_times` found
2024-07-10 21:44:22,055 - INFO - [Lindy Hop dla początkujacych | intensywne warsztaty] Prepared 1 events for Gancio
2024-07-10 21:44:22,058 - INFO - Saved 16 events to `/home/tpwo/ws/event-scrapper-srt/output.json`
$ python -m event_scrapper_srt > output.json
2024-07-10 22:26:44,296 - INFO - Found 69 events in the sitemap
2024-07-10 22:26:44,298 - INFO - Extracted 7 events from the sitemap
2024-07-10 22:26:44,509 - WARNING - No end time found for the date `<p><strong>12 lipca 2024</strong> 20:00<hr/></p>`, setting to None
2024-07-10 22:26:44,509 - WARNING - No end time found for the date `<p><strong>16 sierpnia 2024</strong> 20:00</p>`, setting to None
2024-07-10 22:26:44,730 - INFO - [Summertime Jump Party | Impreza na zakończenie sezonu] No date and time information found
2024-07-10 22:26:45,580 - INFO - [Trening Performance & Show | SPOTKANIE INFORMACYJNE] No date and time information found
2024-07-10 22:26:45,817 - INFO - Extracted details for 7 events
2024-07-10 22:26:45,817 - INFO - [SWING NA PERONIE | potańcówka & live music] Prepared 2 events for Gancio
2024-07-10 22:26:45,817 - INFO - [Summertime Jump Party | Impreza na zakończenie sezonu] No Gancio events created: no future `date_times` found
2024-07-10 22:26:45,817 - INFO - [Sunday Summer Night | CONIEDZIELNA POTAŃCÓWKA] Prepared 7 events for Gancio
2024-07-10 22:26:45,817 - INFO - [Practice & CHILL] Prepared 3 events for Gancio
2024-07-10 22:26:45,817 - INFO - [Practice & CHILL | Wersja FAST FEET] Prepared 3 events for Gancio
2024-07-10 22:26:45,817 - INFO - [Trening Performance & Show | SPOTKANIE INFORMACYJNE] No Gancio events created: no future `date_times` found
2024-07-10 22:26:45,817 - INFO - [Lindy Hop dla początkujacych | intensywne warsztaty] Prepared 1 events for Gancio
2024-07-10 22:26:45,817 - INFO - In total prepared 7 events for Gancio
2024-07-10 22:26:45,817 - INFO - Dumping output to stdout...
```

### Output file structure
### Generated structure

Output file is [Newline Delimited JSON](https://github.com/ndjson/ndjson-spec) format which means. Each line has the following structure:
Scrapped events directed to `stdout` are in [Newline Delimited JSON](https://github.com/ndjson/ndjson-spec) format. Each line has the following structure:

```json
{"title": "Lindy Hop dla początkujacych | intensywne warsztaty", "description": "<p>Daj się zarazić swingowym bakcylem...<snipped>", "place_name": "Studio Swing Revolution Trójmiasto", "place_address": "Łąkowa 35/38, Gdańsk", "online_locations": ["https://swingrevolution.pl/warsztaty-lindy-hop-od-podstaw/"], "start_datetime": 1722074400, "end_datetime": 1722085200, "multidate": 1, "tags": ["swing"], "image_url": "https://swingrevolution.pl/wp-content/uploads/2022/04/351150267_646835474155254_2037209978322475013_n.jpg"}
Expand Down
33 changes: 9 additions & 24 deletions event_scrapper_srt/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,8 @@
import argparse
import json
import logging
import sys
from dataclasses import asdict
from datetime import datetime
from pathlib import Path

from event_scrapper_srt import gancio
from event_scrapper_srt import scrapper
Expand All @@ -23,37 +22,23 @@ def main(argv: list[str] | None = None) -> int:
default=SITEMAP_URL,
help='events are scrapped from here (default: %(default)s)',
)
parser.add_argument(
'--output-path',
default=f'output/{datetime.now().isoformat()}.json',
help='NDJSON with scrapped events is saved there (default: %(default)s)',
)
args = parser.parse_args(argv)

events = scrapper.get_events(sitemap.get_urls(args.sitemap_url))
gancio_events = gancio.create_events(events)
logging.info(f'In total prepared {len(events)} events for Gancio')
logging.info('Dumping output to stdout...')

dump_events_to_json(gancio_events, output_path=args.output_path)
dump_events_to_json(gancio_events)

return 0


def dump_events_to_json(events: list[GancioEvent], output_path: str) -> None:
"""Saves scrapped events to Newline Delimited JSON."""
path = Path(output_path)

if path.exists():
raise SystemExit(f'Error: `{path.absolute()}` already exists')

if not path.parent.exists():
path.parent.mkdir(parents=True)
logging.info(f'Created folder `{path.parent.absolute()}`')

with open(path, 'w', encoding='utf-8') as file:
for event in events:
json.dump(asdict(event), file, indent=None, ensure_ascii=False, default=str)
file.write('\n')
logging.info(f'Saved {len(events)} events to `{path.absolute()}`')
def dump_events_to_json(events: list[GancioEvent]) -> None:
"""Dump scrapped events to stdout as Newline Delimited JSON."""
for event in events:
json.dump(asdict(event), sys.stdout, indent=None, ensure_ascii=False, default=str)
print()


if __name__ == '__main__':
Expand Down

0 comments on commit e8ef193

Please sign in to comment.