Hina/refactoring - Final Branch of work #5

hinakhadim · 2022-09-19T05:25:32Z

This is the last branch. In this branch, all the codes for every question is presented. Along with solution of every task, I have done refactoring of task too.

…le loop O(n)

… remove errors by testing with different cases

Khuzaima-bashir · 2022-09-20T11:26:01Z

report_generator.py

+    def generate_report_for(self, flag, value, data_folder):
+        match flag:
+            case '-e':
+                self.generateReportHighLowTempHumidity(value, data_folder)
+            case '-a':
+                self.generateReportAverageTempHumidity(value, data_folder)
+            case '-c':
+                self.generateTemperatureChartReport(value, data_folder)


Follow a common approach for Method naming

Khuzaima-bashir · 2022-09-20T11:26:43Z

report_generator.py

+            case _:
+                print("No flag sent")


In case of wrong input use validation and return an exception

Khuzaima-bashir · 2022-09-20T11:27:50Z

report_generator.py

+        statistics_calculator = WeatherStatisticsCalculator(
+            matched_file_paths_with_date)


refactoring accroding to pep-8

Khuzaima-bashir · 2022-09-20T11:28:36Z

report_generator.py

+        report_template_of = ReportTemplate()
+        report_template_of.highest_lowest_temp_and_humidity(report_data)
+
+    # date = year/month


Code should be readable enough that there is not need of code

Khuzaima-bashir · 2022-09-20T11:58:49Z

report_template.py

+        print(
+            f"Highest Temperature: {highest_temp.max_temp}°C on "
+            f"{self.format_date_from(highest_temp.date)}")


Check .format() to add data to strings

Khuzaima-bashir · 2022-09-20T12:00:09Z

report_template.py

+        _, month, year = self.get_date_from(report_data[0].date)
+        print("================== Temperature Charts"


use dictionaries to get data through keys

Khuzaima-bashir · 2022-09-20T12:01:50Z

report_template.py

+
+class colors:
+    BLUE = '\033[96m'
+    RED = '\033[91m'
+    ENDC = '\033[0m'


Make a seperate file to handle all constants

Khuzaima-bashir · 2022-09-22T06:35:15Z

main.py

+            type=str,
+            metavar="year/mm",


in type you can add custom validation to validate input

Khuzaima-bashir · 2022-09-22T06:40:29Z

report_generator.py

+        print(
+            f"Highest Temperature: {highest_temp.max_temp}°C on "
+            f"{self.format_date_from(highest_temp.date)}"
+        )


Check .format() method for dynamically editing strings

Khuzaima-bashir · 2022-09-22T06:45:44Z

main.py

+        print(e)
+        print(traceback.format_exc())


raise the exception instead of printing it

Khuzaima-bashir · 2022-09-22T06:51:24Z

weather_data_analyzer.py

+import csv
+import sys
+from weather_record import WeatherRecord
+from report_types \


you can couple import using (), \ is not a very good approach

Khuzaima-bashir · 2022-09-22T06:52:26Z

weather_data_analyzer.py

+        for filepath in filepaths:
+            self.file_data_reader_and_add_to_file_records(filepath)


call this loop inside of a function.

Khuzaima-bashir · 2022-09-22T06:53:15Z

weather_data_analyzer.py

+dummy_record_for_comparison = {
+    'PKT': "2022-08-16",
+    'Mean TemperatureC': None,
+    'Max TemperatureC': -sys.maxsize,
+    'Min TemperatureC': sys.maxsize,
+    ' Min Humidity': sys.maxsize,
+    'Max Humidity': -sys.maxsize,
+    ' Mean Humidity': None
+}


for what do we need dummy record for?

for comparison. To generate report 1 (highest max_temp, lowest min_temp, highest max_humidity)
Actually, It is similar to logic:

min = sys.maxsize for num in array: if num < min: min = num return min

To generate report 1, I need max, min of columns. When comparing with the file records, i set the dummy_record as the first/initial max_record/min_record as min = sys.maxsize in the above code example and then doing the comparison.

I can set the first record from the file as max_record/min_record, but files data are not consistent. Means, It may possible that for a file, the first row Max TemperatureC is empty/None. which then gives errorError: can't compare 'int' with None.

Can you please tell me another way to improve that thing/logic?

Khuzaima-bashir · 2022-09-22T06:55:32Z

year_month_matched_file_paths_provider.py

+                    os.path.join(self.data_folder_path, file_name)
+                )
+                break
+                # since the given month of the given year has only 1 file


avoid comments in code

Khuzaima-bashir · 2022-09-22T06:57:01Z

year_month_matched_file_paths_provider.py

+    def get_matched_file_paths(self):
+        """
+        Returns the filepaths matched with the user given year_month
+
+        :return: List[FilePath]
+        """
+        return self.matched_file_paths_with_year_month


this is not method is not required, the class variable is already accessible within class, so making a method to get that data is not required

Okay. but I have this in mind because getting value through variable directly provides tight coupling.

"Client code will be coupled to the names of your instance variables. Client code should not be coupled to anything internal. Internal things could change in the future. External interface ( contract ) should not be affected by changed internals."

Do I have to replace the method with instance variable or have to go with no changing?

qasimgulzar · 2022-09-22T07:08:29Z

validations.py

+    """
+
+    if not month_number_string.isdigit():
+        raise Exception("Month name should be a valid Number")


you can also raise custom exception

qasimgulzar · 2022-09-22T07:09:38Z

validations.py

+
+    month = int(month_number_string)
+    if month < 1 or month > 12:
+        raise Exception("Month name should be valid number from 1 - 12")


define custom exception to handle invalid and missing value messages.

qasimgulzar · 2022-09-22T07:10:23Z

validations.py

+def is_month_exists(split_year_month):
+    """
+    Checks whether the given splitted year_month has month or not
+
+    :param split_year_month:
+    :return: bool
+    """
+


you can use Regex to validate string pattern

qasimgulzar · 2022-09-22T07:11:15Z

validations.py

+    """
+    Check whether the given year_month does not contain month
+
+    :param year_month:
+    :return: bool
+    """
+
+    split_year_month = year_month.split("/")
+    return len(split_year_month) <= 1


its fine it will do the job, but try using Regex to except of splitting strings

qasimgulzar · 2022-09-22T07:11:57Z

weather_data_analyzer.py

+    'PKT': "2022-08-16",
+    'Mean TemperatureC': None,
+    'Max TemperatureC': -sys.maxsize,
+    'Min TemperatureC': sys.maxsize,
+    ' Min Humidity': sys.maxsize,
+    'Max Humidity': -sys.maxsize,
+    ' Mean Humidity': None


Suggested change

'PKT': "2022-08-16",

'Mean TemperatureC': None,

'Max TemperatureC': -sys.maxsize,

'Min TemperatureC': sys.maxsize,

' Min Humidity': sys.maxsize,

'Max Humidity': -sys.maxsize,

' Mean Humidity': None

'PKT': "2022-08-16",

'Mean TemperatureC': None,

'Max TemperatureC': -sys.maxsize,

'Min TemperatureC': sys.maxsize,

'Min Humidity': sys.maxsize,

'Max Humidity': -sys.maxsize,

'Mean Humidity': None,

We can't do this. Because the fields are set according to the files header. The files has header ' Min TemperatureC', ' Mean Humidity'.

One way is to correct the headers of all files first but If i correct the header of files on my laptop, then it will not work for you or any other user.

then I would say let's add data cleaning step in your script.

qasimgulzar · 2022-09-22T07:15:59Z

weather_data_analyzer.py

+        with open(filepath, 'r') as file:
+            records_list = csv.DictReader(file)
+
+            for row in records_list:
+                record = WeatherRecord(row)
+                self.file_records.append(record)
+


its fine but the next step can be of using pandas to read and clean data from source files.

Thanks. I will try with pandas.

qasimgulzar · 2022-09-22T07:18:25Z

weather_data_analyzer.py

+        """
+        Calculates the average of mean_humidity from the file_records list
+
+        :return: average - float
+        """
+
+        total_sum_of_mean_humidity = 0
+
+        for record in self.file_records:
+
+            if record.mean_humidity:
+                total_sum_of_mean_humidity += int(record.mean_humidity)
+
+        return total_sum_of_mean_humidity / len(self.file_records)


you can make it more pythonic using array comprehension

Nice. Something like below:
We can use the 'mean' from python lib statistics. but the problem is that max_temp in a record can be None instead of integer.

max_temps = [r.max_temp for r in self.file_records if r.max_temp] total = sum(max_temps) return total / len(self.file_records)

Now it depends what you want to do with None records, either remove these values in data cleaning otherwise you can use r.get('max_temp',0) or 0 to assign default values.

qasimgulzar · 2022-09-22T07:21:36Z

weather_record.py

-        self.min_temp = record_item['Mean TemperatureC']
+        self.min_temp = record_item['Min TemperatureC']
        self.max_temp = record_item['Max TemperatureC']
-        self.avg_temp = record_item['Min Temperature']
+        self.mean_temp = record_item['Mean TemperatureC']
        self.min_humidity = record_item[' Min Humidity']
        self.max_humidity = record_item['Max Humidity']


try to keep variable and keys naming convention same. and you can also use pandas for column operations

Okay sure. the confusion was that the task document use the word "average" and data files has the word 'mean'. So, in this case which one i have to use?

it might be mean temperature of a day but not sure

qasimgulzar · 2022-09-22T07:24:32Z

year_month_matched_file_paths_provider.py

+from validations import check_month_is_valid_number, is_month_exists
+from utils import get_month_year_from
+
+Months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
+          'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']


Suggested change

from validations import check_month_is_valid_number, is_month_exists

from utils import get_month_year_from

Months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',

'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']

from validations import check_month_is_valid_number, is_month_exists

from utils import get_month_year_from

import calendar

Months = list(calendar.month_name)

I have tried the library. The months name are in this format in Calendar Library:
['', 'January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December'] . I need this in month_name(3 chars) format : ['Jan', 'Feb'.....].

qasimgulzar · 2022-09-22T07:25:23Z

year_month_matched_file_paths_provider.py

+        elif self.year:
+            self.store_file_paths_matched_with_year()
+        else:
+            raise Exception("Year/Month must be Required")


custom exception

qasimgulzar · 2022-09-22T07:28:07Z

year_month_matched_file_paths_provider.py

+    def store_file_paths_matched_with_year_and_month(self):
+        """
+        Add the filepaths in the list if filename contains the given month and
+        year
+        """
+
+        for file_name in os.listdir(self.data_folder_path):
+            file_month_year = get_month_year_from(file_name)
+            file_month, file_year = file_month_year.month, file_month_year.year
+
+            if self.month == file_month and self.year == file_year:
+                self.matched_file_paths_with_year_month.append(
+                    os.path.join(self.data_folder_path, file_name)
+                )
+                break
+                # since the given month of the given year has only 1 file


there shouldn't be a need to store list of files on disk probably you can use os.exists() to check if specified filename exists on disk.

Can you please elaborate this?
I am storing file paths (which matched with user given year and month) in an array. The os.path.join is using to store the full (absolute) file path instead of filename. Is there better way to do this?

it seems like you are maintaining a list of filenames, the only suggestion is if you know the pattern for filenames you can generate it on the go

nahi nahi. Actually, I am getting folder name as param in func which has all files.
Then iterating on those file names to check whether tthe file name contains the same month and year as the user given self.month and self.year . If matched, then i store the absolute path of file in tthe array (since relative path of file does not work.). from these absolute file paths, i am reading data in some other class.

…high_max_humidity

mataurrehman · 2022-09-27T02:34:37Z

main.py

+    )
+
+    arg_parser.add_argument(
+        '-c',


the name should be meaningful, it should show the intent. c, cs, e, a these should be changed.

Yes, but these are arguments given from command line. Mostly cmd arguments consists of one letter.
like : python3 main.py -c 2012/6. In linux, we also give arguments in one letter ls -a.

mataurrehman · 2022-09-27T02:39:18Z

main.py

-    file_path_matcher = FilePathMatcherWithDate(weather_data_files_path)
-    file_path_matcher.setDate(year)
-    matched_file_paths_with_year = file_path_matcher.get_files_path()
+    if args.cs:


args c and cs are identical till calling get_chart_data, this logic should be combined, e.g make a list [args.c, args.cs] and loop over it and check if the value is not None then do the rest.

like this:

for value in [args.c, args.cs]: if value is not None: data_provider = ReportDataProvider( value , weather_data_folder_path ) report_data = data_provider.get_chart_data() if args.c == value: report_data.gen_report_high_low_temperature_charts( report_data) if args.cs == value: report_data.gen_report_high_low_temperature_single_line_chart( report_data )

mataurrehman · 2022-09-27T02:42:42Z

report_generator.py

+        for record in report_data:
+            self.draw_2_charts_of_high_low_temp_for_1_day(record)
+
+    def draw_2_charts_of_high_low_temp_for_1_day(self, record: WeatherRecord):


instead of writing 1 and 2 in function names, it should be one and two

mataurrehman · 2022-09-27T02:43:00Z

report_generator.py

+        for record in report_data:
+            self.draw_1_chart_of_high_low_temp_for_1_day(record)
+
+    def draw_1_chart_of_high_low_temp_for_1_day(self, record: WeatherRecord):


change function name.

mataurrehman · 2022-09-27T02:50:13Z

report_generator.py

+        """
+
+        date = self.get_date_from(record.date)
+        formatted_day = self.append_zero_to_start_in_day(date.day)


the method should not be specific to day, add_leading_zero is a suggestion.

mataurrehman · 2022-09-27T02:50:54Z

report_generator.py

+        print("+" * self.get_integer_value_from(temperature_string), end="")
+
+    def append_zero_to_start_in_day(self, day):
+        return '0' + day if int(day) < 10 else day


check zfill

mataurrehman · 2022-09-27T02:57:26Z

weather_data_analyzer.py

+        )
+
+    def remove_none_values_of_max_temp(self):
+        """


these multiple methods can be converted to single, pass the relevant data and it should remove the None values from it.

I think They can't be converted into single loop. Our data is like :

| max_temp | min_temp | max_humidity | mean_humidity | |----------|----------|--------------|---------------| | None | 3 | None | 20 | | 38 | None | None | None | | None | None | 100 | None | | 36 | 2 | 95 | 50 |

ans = [] for rec in table_records: if rec.max_temp and rec.min_temp and rec.max_humidity and rec.mean_humidity: ans.append(rec)

If i use a single loop with single array ans to remove None values, then I will get the single row left from table (the last row in which all values of each column are not None) | 36 | 2 | 95 | 50 |.

To calculate the highest max_temp, I need all valid values of max_temp column i.e. [38, 36]. The above single loop technique is removing the max_temp 38 - 2nd row because other columns values in this row are None.

The other option is that use single loop with four different arrays:

max_temp = [], min_temp = [], max_humid = [], mean_humid = [] for rec in table_records: if rec.max_temp: max_temp.append(rec) if rec.min_temp: min_temp.append(rec) ...........

It will mess up the code and four arrays will return (in tuple/dict). so that's why i have used four different functions. Let me know, if the above four array with single loop is good in a way, then i will go for it.

mataurrehman · 2022-09-27T02:59:40Z

weather_data_analyzer.py

+        """
+
+        valid_max_temps = [
+            rec for rec in self.file_records if rec.max_temp


an alternate way to remove None can be like this

return list(filter(lambda rec: rec.max_temp is not None, self. file_records))

…tion rewritten, add python cleaner file to trim spaces of headers

hinakhadim added 7 commits September 15, 2022 18:45

feat : Refactored and complete Task1

5701c50

feat : generate average temperature and humidity report

e0f7285

refactor : Refactor main.py

14f6530

fix : fixes bugs of functionality in task1 and task2

2893374

feat: added single and double chart for temperature

0fb0b94

feat : improve logic of task 1 : convert the sorting O(nlogn) to sing…

386547e

…le loop O(n)

feat: write readme.md, add error handling cases if date is alphabets,…

a4ddf1c

… remove errors by testing with different cases

Khuzaima-bashir suggested changes Sep 20, 2022

View reviewed changes

hinakhadim added 3 commits September 20, 2022 20:06

fix: improving arguments validation using argparser

315d096

fix : refactor the requested review, rename files, classes, functions

a04bcb8

fix: Add docStrings in functions

54a51f0

Khuzaima-bashir suggested changes Sep 22, 2022

View reviewed changes

qasimgulzar reviewed Sep 22, 2022

View reviewed changes

hinakhadim added 3 commits September 22, 2022 20:10

fix : Refactored the code - changes string to .format(), improve imports

6c6967e

feat : Add custom exceptions, list comprehension, regex searching;

92152f2

fix: improve min/max logic / refactored high_mex_temp, low_min_temp, …

6d81919

…high_max_humidity

mataurrehman reviewed Sep 27, 2022

View reviewed changes

hinakhadim added 2 commits September 27, 2022 16:35

refactor: architect the report generator file again, remove none func…

4282fe9

…tion rewritten, add python cleaner file to trim spaces of headers

fix : refactored remove_none_values func and average functions

0f2bb82

		statistics_calculator = WeatherStatisticsCalculator(
		matched_file_paths_with_date)

		_, month, year = self.get_date_from(report_data[0].date)
		print("================== Temperature Charts"

		for filepath in filepaths:
		self.file_data_reader_and_add_to_file_records(filepath)

Hina/refactoring - Final Branch of work #5

Are you sure you want to change the base?

Hina/refactoring - Final Branch of work #5

Conversation

hinakhadim commented Sep 19, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment