Skip to content

Commit

Permalink
Update README/docs
Browse files Browse the repository at this point in the history
  • Loading branch information
rmoralespp committed Oct 30, 2024
1 parent 645b758 commit 66bc9a2
Show file tree
Hide file tree
Showing 2 changed files with 82 additions and 3 deletions.
45 changes: 42 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,10 @@

## About

**jsonl** is a Python library designed to simplify working with JSON Lines data, adhering to the [JSON Lines format](https://jsonlines.org/).
**jsonl** is a Python library designed to simplify working with JSON Lines data, adhering to
the [JSON Lines format](https://jsonlines.org/).

### Key Features
### Features

- 🌎 Provides an API similar to Python's standard `json` module.
- 🚀 Supports custom serialization/deserialization callbacks, with the standard `json` module as the default.
Expand Down Expand Up @@ -57,9 +58,47 @@ iterable = jsonl.load("file.jsonl")
print(tuple(iterable))
```

**Incremental Writing to Multiple JSON Lines Files**

This example uses `jsonl.dump_fork` to incrementally write daily temperature data for multiple cities to separate JSON
Lines files, exporting records for the first days of specified years.
It efficiently manages data by creating individual files for each city, optimizing memory usage.

```python
import datetime
import itertools
import random

import jsonl


def get_temperature_by_city():
"""
Generates files for each city with daily temperature data for the initial days of
the specified years.
"""

years = [2023, 2024]
first_days = 10
cities = ["New York", "Los Angeles", "Chicago"]

for year, city in itertools.product(years, cities):
start = datetime.datetime(year, 1, 1)
dates = (start + datetime.timedelta(days=day) for day in range(first_days))
daily_temperature = (
{"date": date.isoformat(), "city": city, "temperature": round(random.uniform(-10, 35), 2)}
for date in dates
)
yield (f"{city}.jsonl", daily_temperature)

# Write the generated data to files in JSON Lines format
jsonl.dump_fork(get_temperature_by_city())
```

## Documentation

For more detailed information and usage examples, refer to the project [documentation](https://rmoralespp.github.io/jsonl/)
For more detailed information and usage examples, refer to the
project [documentation](https://rmoralespp.github.io/jsonl/)

## Development

Expand Down
40 changes: 40 additions & 0 deletions docs/dump_fork.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,46 @@ Dump multiple iterables incrementally to the specified jsonlines file paths, opt
The files can be compressed using `gzip`, `bzip2`, or `xz` formats. If the file extension is not recognized, it will be
dumped to a text file.

**Example #1**

This example uses `jsonl.dump_fork` to incrementally write daily temperature data for multiple cities to separate JSON
Lines files, exporting records for the first days of specified years.
It efficiently manages data by creating individual files for each city, optimizing memory usage.

```python
import datetime
import itertools
import random

import jsonl


def get_temperature_by_city():
"""
Generates files for each city with daily temperature data for the initial days of
the specified years.
"""

years = [2023, 2024]
first_days = 10
cities = ["New York", "Los Angeles", "Chicago"]

for year, city in itertools.product(years, cities):
start = datetime.datetime(year, 1, 1)
dates = (start + datetime.timedelta(days=day) for day in range(first_days))
daily_temperature = (
{"date": date.isoformat(), "city": city, "temperature": round(random.uniform(-10, 35), 2)}
for date in dates
)
yield (f"{city}.jsonl", daily_temperature)

# Write the generated data to files in JSON Lines format
jsonl.dump_fork(get_temperature_by_city())
```

**Example #2**

This example demonstrates how to dump data using different JSON libraries.
You can install `orjson` and `ujson` to run the following example.

```console
Expand Down

0 comments on commit 66bc9a2

Please sign in to comment.