Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dev #67

Open
wants to merge 26 commits into
base: main
Choose a base branch
from
Open

Dev #67

wants to merge 26 commits into from

Conversation

gnogueda
Copy link

Beginning of the pull request into main.

@gnogueda
Copy link
Author

Great work, @cgwhall! I just have a couple of suggestions on lib/merge_to_final_dataframe.py. Most of them are just aiming to make the code look more structured and make it easier to read for future users. All the functions I am suggesting still need docstrings. Also, please do not forget to briefly describe both of your .py in the README.md (name, how it is used, in which step of the pipeline intervenes).

import os
import zipfile
import pandas as pd
import numpy as np

def load_csv_or_zip(path: str) -> pd.DataFrame:
    '''
    Definition
    Input
    Output
    '''
    if os.path.exists(path):
        return pd.read_csv(path)
    else:
        with zipfile.ZipFile(f"{path}.zip", 'r') as zip_ref:
            zip_ref.extractall(os.path.dirname(path))
        return pd.read_csv(path)

def rename_columns(df: pd.DataFrame, column_map: dict) -> pd.DataFrame:
    '''
    Definition
    Input
    Output
    '''
    return df.rename(columns=column_map)

def calculate_health_indexes(df: pd.DataFrame, hlth_indx_vars: list, inverse_vars: list) -> pd.DataFrame:
    '''
    Definition
    Input
    Output
    '''
    for i in df.columns:
        if i in hlth_indx_vars:
            if i in inverse_vars:
                df[f'{i}_inv'] = 1 - (df[i]/100)
                df[i] = df[i]/100
            else:
                df[i] = df[i]/100
    return df

def main():
    '''
    Definition
    Input
    Output
    '''
    os.chdir('/Users/chandlerhall/Desktop/Github/broadbandequity') #Try using a relative path instead, you can look at https://stackoverflow.com/questions/918154/relative-paths-in-python
    file_paths = [
        'data/CDC_PLACES/500_Cities__Local_Data_for_Better_Health__2019_release.csv',
        'data/standard_dataframes/standard_acs_censustract_df_2017.csv',
        'data/Social Vulnerability Index/SVI2016_US.csv'
    ]
    cdc_clean = load_csv_or_zip(file_paths[0])

    # Rest of your code using previous functions

if __name__ == "__main__": # This is the line of code that will actually execute 
    main()

Copy link
Author

@gnogueda gnogueda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @vkielb! Great work! I have no comments for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants