xlsx-is-not-a-database

a little code to help Tony survive grad school

system requirements

terminal

uname -v
# Darwin Kernel Version 21.6.0: Wed Aug 10 14:25:27 PDT 2022; root:xnu-8020.141.5~2/RELEASE_X86_64

python3 --version
# Python 3.9.13

note

ideally you would be using jupyter inside of a virtual environment so that there is less risk with dependencies drifting over time

conda create -n jl python=3.10
conda activate jl

python -m pip install jupyterlab
pip freeze > requirements.txt

excel as a database :(

Bad News: there are lots of libraries that have attempted to make it easy for python to interact with Excel files
Good News: pandas is amazing and has created a wrapper that combines these libraries
- xlwt for xls files
- xlsxwriter for xlsx files if xlsxwriter is installed otherwise openpyxl
- odswriter for ods files

This way you only need to understand how to use pandas.ExcelWriter

I'm going to assume you are using .xlsx files

Note

if your only goal is only to write data to an Excel file then pandas.DataFrame.to_excel is an easy to use option. the following dives into ExcelWriter because it gives you greater flexibility around writing and appending data to existing files

getting started

setup

create directory & files
install pandas
start a notebook

mkdir myfiles
touch myfiles/test-data.xlsx

# the "3" in python3 assumes you aren't using a virtual environment
python3 -m pip install pandas
python3 -m pip install XlsxWriter
python3 -m pip install openpyxl

jupyter lab

CRUD

see getting-started.ipynb so that you can execute each step

create
read
update
delete

example code

here is some code I use in production

def export_to_excel(outputs: dict, export_file_name: str, REPORT_FILE_PATH: str):
    writer = pd.ExcelWriter(f"{REPORT_FILE_PATH}/{export_file_name}.xlsx")
    if "table_of_contents" in list(outputs):
        df = outputs["table_of_contents"]
        df.to_excel(writer, sheet_name="table_of_contents")
        outputs.pop("table_of_contents")

    for table in outputs:
        df = outputs[table]
        df.to_excel(writer, sheet_name=table)
    writer.save()
    return f"export to {export_file_name}.xlsx complete"

this for loop takes a dictionary input containing data and metadata -- these two operations in particular are useful:
- pd.pivot_table
- df.groupby

for table_name, inputs in input_dict.items():
    print(inputs["type"], f"len df:{str(len(df))}")

    if inputs["type"] == "data_filter":
        df = df.loc[inputs["bool_op"](df[inputs["column"]], inputs["bool_arg"])]

    elif inputs["type"] == "data_reset":
        df = raw_df

    elif inputs["type"] == "pivot_table":
        df.loc[:, inputs["values"]] = df.loc[:, inputs["values"]].astype(float)
        table = pd.pivot_table(
            df,
            values=inputs["values"],
            index=inputs["index"],
            columns=inputs["columns"],
            aggfunc=np.sum,
        )
        table.columns = [j for i, j in list(table.columns)]
        if len(table) > 0:
            outputs[table_name] = table

    elif inputs["type"] == "groupby_table":
        headers = {i: inputs["aggfuncs"] for i in inputs["values"]}
        table = df.groupby(inputs["index"]).agg(headers)
        if len(table) > 0:
            outputs[table_name] = table

    elif inputs["type"] == "sum_on_previous_table":
        tmp = pd.DataFrame(table.sum(axis=inputs["axis"]))
        tmp.columns = ["sum"]
        outputs[table_name] = tmp

    else:
        print("invalid input type")

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
images		images
.DS_Store		.DS_Store
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
getting-started-script.py		getting-started-script.py
getting-started.ipynb		getting-started.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

xlsx-is-not-a-database

system requirements

note

excel as a database :(

Note

getting started

setup

CRUD

example code

About

Releases

Packages

Languages

License

will-wright-eng/xlsx-is-not-a-database

Folders and files

Latest commit

History

Repository files navigation

xlsx-is-not-a-database

system requirements

note

excel as a database :(

Note

getting started

setup

CRUD

example code

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages