Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hina/refactoring - Final Branch of work #5

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

hinakhadim
Copy link
Owner

This is the last branch. In this branch, all the codes for every question is presented. Along with solution of every task, I have done refactoring of task too.

Comment on lines 9 to 16
def generate_report_for(self, flag, value, data_folder):
match flag:
case '-e':
self.generateReportHighLowTempHumidity(value, data_folder)
case '-a':
self.generateReportAverageTempHumidity(value, data_folder)
case '-c':
self.generateTemperatureChartReport(value, data_folder)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow a common approach for Method naming

Comment on lines 20 to 21
case _:
print("No flag sent")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case of wrong input use validation and return an exception

Comment on lines 29 to 30
statistics_calculator = WeatherStatisticsCalculator(
matched_file_paths_with_date)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactoring accroding to pep-8

report_template_of = ReportTemplate()
report_template_of.highest_lowest_temp_and_humidity(report_data)

# date = year/month

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code should be readable enough that there is not need of code

Comment on lines 30 to 32
print(
f"Highest Temperature: {highest_temp.max_temp}°C on "
f"{self.format_date_from(highest_temp.date)}")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check .format() to add data to strings

Comment on lines 75 to 76
_, month, year = self.get_date_from(report_data[0].date)
print("================== Temperature Charts"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use dictionaries to get data through keys

Comment on lines 138 to 142

class colors:
BLUE = '\033[96m'
RED = '\033[91m'
ENDC = '\033[0m'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make a seperate file to handle all constants

main.py Outdated
Comment on lines 53 to 54
type=str,
metavar="year/mm",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in type you can add custom validation to validate input

Comment on lines 34 to 37
print(
f"Highest Temperature: {highest_temp.max_temp}°C on "
f"{self.format_date_from(highest_temp.date)}"
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check .format() method for dynamically editing strings

main.py Outdated
Comment on lines 105 to 106
print(e)
print(traceback.format_exc())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

raise the exception instead of printing it

import csv
import sys
from weather_record import WeatherRecord
from report_types \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can couple import using (), \ is not a very good approach

Comment on lines 28 to 29
for filepath in filepaths:
self.file_data_reader_and_add_to_file_records(filepath)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call this loop inside of a function.

Comment on lines 7 to 15
dummy_record_for_comparison = {
'PKT': "2022-08-16",
'Mean TemperatureC': None,
'Max TemperatureC': -sys.maxsize,
'Min TemperatureC': sys.maxsize,
' Min Humidity': sys.maxsize,
'Max Humidity': -sys.maxsize,
' Mean Humidity': None
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for what do we need dummy record for?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for comparison. To generate report 1 (highest max_temp, lowest min_temp, highest max_humidity)
Actually, It is similar to logic:

min = sys.maxsize

for num in array:
       if num < min: 
             min = num

return min

To generate report 1, I need max, min of columns. When comparing with the file records, i set the dummy_record as the first/initial max_record/min_record as min = sys.maxsize in the above code example and then doing the comparison.

I can set the first record from the file as max_record/min_record, but files data are not consistent. Means, It may possible that for a file, the first row Max TemperatureC is empty/None. which then gives errorError: can't compare 'int' with None.

Can you please tell me another way to improve that thing/logic?

os.path.join(self.data_folder_path, file_name)
)
break
# since the given month of the given year has only 1 file

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

avoid comments in code

Comment on lines +81 to +87
def get_matched_file_paths(self):
"""
Returns the filepaths matched with the user given year_month

:return: List[FilePath]
"""
return self.matched_file_paths_with_year_month

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not method is not required, the class variable is already accessible within class, so making a method to get that data is not required

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. but I have this in mind because getting value through variable directly provides tight coupling.

"Client code will be coupled to the names of your instance variables. Client code should not be coupled to anything internal. Internal things could change in the future. External interface ( contract ) should not be affected by changed internals."

Do I have to replace the method with instance variable or have to go with no changing?

validations.py Outdated
"""

if not month_number_string.isdigit():
raise Exception("Month name should be a valid Number")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can also raise custom exception

validations.py Outdated

month = int(month_number_string)
if month < 1 or month > 12:
raise Exception("Month name should be valid number from 1 - 12")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

define custom exception to handle invalid and missing value messages.

validations.py Outdated
Comment on lines 31 to 38
def is_month_exists(split_year_month):
"""
Checks whether the given splitted year_month has month or not

:param split_year_month:
:return: bool
"""

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use Regex to validate string pattern

validations.py Outdated
Comment on lines 43 to 51
"""
Check whether the given year_month does not contain month

:param year_month:
:return: bool
"""

split_year_month = year_month.split("/")
return len(split_year_month) <= 1

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its fine it will do the job, but try using Regex to except of splitting strings

Comment on lines 8 to 14
'PKT': "2022-08-16",
'Mean TemperatureC': None,
'Max TemperatureC': -sys.maxsize,
'Min TemperatureC': sys.maxsize,
' Min Humidity': sys.maxsize,
'Max Humidity': -sys.maxsize,
' Mean Humidity': None

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
'PKT': "2022-08-16",
'Mean TemperatureC': None,
'Max TemperatureC': -sys.maxsize,
'Min TemperatureC': sys.maxsize,
' Min Humidity': sys.maxsize,
'Max Humidity': -sys.maxsize,
' Mean Humidity': None
'PKT': "2022-08-16",
'Mean TemperatureC': None,
'Max TemperatureC': -sys.maxsize,
'Min TemperatureC': sys.maxsize,
'Min Humidity': sys.maxsize,
'Max Humidity': -sys.maxsize,
'Mean Humidity': None,

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't do this. Because the fields are set according to the files header. The files has header ' Min TemperatureC', ' Mean Humidity'.

One way is to correct the headers of all files first but If i correct the header of files on my laptop, then it will not work for you or any other user.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then I would say let's add data cleaning step in your script.

Comment on lines 38 to 44
with open(filepath, 'r') as file:
records_list = csv.DictReader(file)

for row in records_list:
record = WeatherRecord(row)
self.file_records.append(record)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its fine but the next step can be of using pandas to read and clean data from source files.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I will try with pandas.

Comment on lines 176 to 189
"""
Calculates the average of mean_humidity from the file_records list

:return: average - float
"""

total_sum_of_mean_humidity = 0

for record in self.file_records:

if record.mean_humidity:
total_sum_of_mean_humidity += int(record.mean_humidity)

return total_sum_of_mean_humidity / len(self.file_records)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can make it more pythonic using array comprehension

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. Something like below:
We can use the 'mean' from python lib statistics. but the problem is that max_temp in a record can be None instead of integer.

       max_temps = [r.max_temp for r in self.file_records if r.max_temp]
       total = sum(max_temps)
       return total / len(self.file_records)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now it depends what you want to do with None records, either remove these values in data cleaning otherwise you can use r.get('max_temp',0) or 0 to assign default values.

Comment on lines 6 to 10
self.min_temp = record_item['Mean TemperatureC']
self.min_temp = record_item['Min TemperatureC']
self.max_temp = record_item['Max TemperatureC']
self.avg_temp = record_item['Min Temperature']
self.mean_temp = record_item['Mean TemperatureC']
self.min_humidity = record_item[' Min Humidity']
self.max_humidity = record_item['Max Humidity']

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try to keep variable and keys naming convention same. and you can also use pandas for column operations

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay sure. the confusion was that the task document use the word "average" and data files has the word 'mean'. So, in this case which one i have to use?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might be mean temperature of a day but not sure

Comment on lines 3 to 7
from validations import check_month_is_valid_number, is_month_exists
from utils import get_month_year_from

Months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
from validations import check_month_is_valid_number, is_month_exists
from utils import get_month_year_from
Months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
from validations import check_month_is_valid_number, is_month_exists
from utils import get_month_year_from
import calendar
Months = list(calendar.month_name)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tried the library. The months name are in this format in Calendar Library:
['', 'January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December'] . I need this in month_name(3 chars) format : ['Jan', 'Feb'.....].

elif self.year:
self.store_file_paths_matched_with_year()
else:
raise Exception("Year/Month must be Required")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

custom exception

Comment on lines 64 to 79
def store_file_paths_matched_with_year_and_month(self):
"""
Add the filepaths in the list if filename contains the given month and
year
"""

for file_name in os.listdir(self.data_folder_path):
file_month_year = get_month_year_from(file_name)
file_month, file_year = file_month_year.month, file_month_year.year

if self.month == file_month and self.year == file_year:
self.matched_file_paths_with_year_month.append(
os.path.join(self.data_folder_path, file_name)
)
break
# since the given month of the given year has only 1 file

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there shouldn't be a need to store list of files on disk probably you can use os.exists() to check if specified filename exists on disk.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please elaborate this?
I am storing file paths (which matched with user given year and month) in an array. The os.path.join is using to store the full (absolute) file path instead of filename. Is there better way to do this?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems like you are maintaining a list of filenames, the only suggestion is if you know the pattern for filenames you can generate it on the go

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nahi nahi. Actually, I am getting folder name as param in func which has all files.
Then iterating on those file names to check whether tthe file name contains the same month and year as the user given self.month and self.year . If matched, then i store the absolute path of file in tthe array (since relative path of file does not work.). from these absolute file paths, i am reading data in some other class.

main.py Outdated
)

arg_parser.add_argument(
'-c',

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the name should be meaningful, it should show the intent. c, cs, e, a these should be changed.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but these are arguments given from command line. Mostly cmd arguments consists of one letter.
like : python3 main.py -c 2012/6. In linux, we also give arguments in one letter ls -a.

main.py Outdated
file_path_matcher = FilePathMatcherWithDate(weather_data_files_path)
file_path_matcher.setDate(year)
matched_file_paths_with_year = file_path_matcher.get_files_path()
if args.cs:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

args c and cs are identical till calling get_chart_data, this logic should be combined, e.g make a list [args.c, args.cs] and loop over it and check if the value is not None then do the rest.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like this:

for value in [args.c, args.cs]:

     if value is not None:
           data_provider = ReportDataProvider( value , weather_data_folder_path )
           report_data = data_provider.get_chart_data()
    
          if args.c == value:
                report_data.gen_report_high_low_temperature_charts( report_data)
          if args.cs == value:
               report_data.gen_report_high_low_temperature_single_line_chart( report_data )
   

for record in report_data:
self.draw_2_charts_of_high_low_temp_for_1_day(record)

def draw_2_charts_of_high_low_temp_for_1_day(self, record: WeatherRecord):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of writing 1 and 2 in function names, it should be one and two

for record in report_data:
self.draw_1_chart_of_high_low_temp_for_1_day(record)

def draw_1_chart_of_high_low_temp_for_1_day(self, record: WeatherRecord):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change function name.

"""

date = self.get_date_from(record.date)
formatted_day = self.append_zero_to_start_in_day(date.day)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the method should not be specific to day, add_leading_zero is a suggestion.

print("+" * self.get_integer_value_from(temperature_string), end="")

def append_zero_to_start_in_day(self, day):
return '0' + day if int(day) < 10 else day

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check zfill

)

def remove_none_values_of_max_temp(self):
"""

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these multiple methods can be converted to single, pass the relevant data and it should remove the None values from it.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think They can't be converted into single loop. Our data is like :

| max_temp | min_temp | max_humidity | mean_humidity |
|----------|----------|--------------|---------------|
| None     | 3        | None         | 20            |
| 38       | None     | None         | None          |
| None     | None     | 100          | None          |
| 36       | 2        | 95           | 50            |
ans = []

for rec in table_records:
     if rec.max_temp and rec.min_temp and rec.max_humidity and rec.mean_humidity:
           ans.append(rec)

If i use a single loop with single array ans to remove None values, then I will get the single row left from table (the last row in which all values of each column are not None) | 36 | 2 | 95 | 50 |.

To calculate the highest max_temp, I need all valid values of max_temp column i.e. [38, 36]. The above single loop technique is removing the max_temp 38 - 2nd row because other columns values in this row are None.

The other option is that use single loop with four different arrays:

max_temp = [], min_temp = [], max_humid = [], mean_humid = []

for rec in table_records:
     if rec.max_temp:
          max_temp.append(rec)

    if rec.min_temp:
          min_temp.append(rec)

...........

It will mess up the code and four arrays will return (in tuple/dict). so that's why i have used four different functions. Let me know, if the above four array with single loop is good in a way, then i will go for it.

"""

valid_max_temps = [
rec for rec in self.file_records if rec.max_temp

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an alternate way to remove None can be like this

return list(filter(lambda rec: rec.max_temp is not None, self. file_records))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants