diff --git a/README.md b/README.md index e11a864..7d5a765 100644 --- a/README.md +++ b/README.md @@ -6,34 +6,42 @@ Gives basic information, e.g. number of buildings and point types, as well as a Tells user the available types of buildings and asks which one the user wants to study -------------------------------------------------------------------------------------------------------- more_info(metadata , bdgs_selected) + Tells the user the top 20 sensor types of the buildings the user has selected along with their count. If a user prompts, they can see the whole list. dict_builder(metadata, bdgs_selected) + Asks user which type of sensor they want to study in the chosen building type Builds a dictionary of the desired point/building combination sensor data, where keys are building names and values are dataframes. This eases indexing df_builder (metadata, dict_data) + Determine the first and last timestamp of data in the built dictionary Make a dataframe with a timestamp range, spanning between the two dates found with a frequency of 5 minutes Take the dataframe of every building within dictionary, and merge it with the previous dataframe until a final dataframe is constructed that includes all data points of the sensor-building combination In case of too many data points or NANs, resample the data to a larger period For each sensor within the dataframe, determine start and end times of recorded data and write them to a separate file names startendtimes, for later access -nan_detector (df, startendtimes, max_nan = 0.99) +nan_detector (df, startendtimes, max_nan = 0.99) + Adds the sensor to a list, if the data recorded have more NANs that a threshold The threshold for maximum NAN content is 0.99 by default but user replaceable constant_detector (df, startendtimes, max_constant = 0.99) + For each sensor, constructs the differential dataframe for data, and if there are more zeros than the max_constant threshold, adds them to a list -negligible_detector (df, startendtimes, min_days = 1) +negligible_detector (df, startendtimes, min_days = 1) + Adds the sensors with less than min_days of data points to a list outlier_detector ( df, startendtimes, min_outlier=0.5, upr_parameter=1.5, lwr_parameter=1.5 ) + Utilizes the IQR method to determine an acceptable upper and lower bound for data using given parameters Adds the sensors that have outlier data for more than 0.5 times of all their data points to a list show_results(df, startendtimes, filtered_list, constant_list, negligible_list, outlier_list) + Combines all previously created lists into one dataframe that shows problemartic sensors and the reason for them being in the list Plots 5 of the problematic sensors data by random, to give the user a better visual understanding of the issue