Skip to content

Commit

Permalink
chore(docs): Update README
Browse files Browse the repository at this point in the history
  • Loading branch information
esloch committed Sep 15, 2023
1 parent e54d450 commit 0db7d4a
Showing 1 changed file with 87 additions and 1 deletion.
88 changes: 87 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1 +1,87 @@
# py-blast-dbf
# pyreaddbc

**pyreaddbc** is a Python library for working with DBC (Dengue Brazilian Case Registry) and DBF (dBase) files. It is an extension used in the PySUS package, which provides access to Brazilian public health data. For more information, you can visit the [PySUS repository](https://github.com/AlertaDengue/pysus).

## Installation

You can install **pyreaddbc** using pip:

```bash
pip install pyreaddbc
```

## Usage

### Reading DBC Files

To read a DBC file and convert it to a pandas DataFrame, use the `read_dbc` function:

```python
import pyreaddbc

dfs = pyreaddbc.read_dbc("LTPI2201.dbc", encoding="iso-8859-1")
```

### Exporting to CSV.GZ

To export a DataFrame to a compressed CSV file (CSV.GZ), you can use pandas:

```python
import pyreaddbc

df = pyreaddbc.read_dbc("./LTPI2201.dbc", encoding="iso-8859-1")
df.to_csv("LTPI2201.csv.gz", compression="gzip", index=False)
```

### Exporting to Parquet

To export a DataFrame to a Parquet file, you can use the `pyarrow` library:

```python
import pyreaddbc
import pyarrow.parquet as pq
import pandas as pd
from pathlib import Path

# Read DBC file and convert to DataFrame
df = pyreaddbc.read_dbc("./LTPI2201.dbc", encoding="iso-8859-1")

# Export to CSV.GZ
df.to_csv("LTPI2201.csv.gz", compression="gzip", index=False)

# Export to Parquet
pq.write_table(pa.Table.from_pandas(df), "parquets/LTPI2201.parquet")

# Read the Parquet files and decode DataFrame columns
parquet_dir = Path("parquets")
parquets = parquet_dir.glob("*.parquet")

chunks_list = [
pd.read_parquet(str(f), engine='fastparquet') for f in parquets
]

# Concatenate DataFrames
df_parquet = pd.concat(chunks_list, ignore_index=True)

```

## License

[GNU Affero General Public License (AGPL-3.0)](./LICENSE)

This license ensures that the software remains open-source and free to use, modify, and distribute while requiring that any changes or enhancements made to the codebase are also made available to the community under the same terms.

<span>
<b>Acknowledge</b></br>
============
</span>


This program decompresses .dbc files to .dbf. This code is based on the work
of Mark Adler <[email protected]> (zlib/blast), Pablo Fonseca
(https://github.com/eaglebh/blast-dbf).

[PySUS](https://github.com/AlertaDengue/PySUS) has further extended and adapted this code to
create **pyreaddbc**. The original work of Mark Adler and Pablo Fonseca is much appreciated for its contribution to this project.

**Note**: *pyreaddbc* is maintained with funding from [AlertaDengue](https://github.com/AlertaDengue).

0 comments on commit 0db7d4a

Please sign in to comment.