-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hina/refactoring - Final Branch of work #5
base: main
Are you sure you want to change the base?
Conversation
… remove errors by testing with different cases
report_generator.py
Outdated
def generate_report_for(self, flag, value, data_folder): | ||
match flag: | ||
case '-e': | ||
self.generateReportHighLowTempHumidity(value, data_folder) | ||
case '-a': | ||
self.generateReportAverageTempHumidity(value, data_folder) | ||
case '-c': | ||
self.generateTemperatureChartReport(value, data_folder) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow a common approach for Method naming
report_generator.py
Outdated
case _: | ||
print("No flag sent") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case of wrong input use validation and return an exception
report_generator.py
Outdated
statistics_calculator = WeatherStatisticsCalculator( | ||
matched_file_paths_with_date) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refactoring accroding to pep-8
report_generator.py
Outdated
report_template_of = ReportTemplate() | ||
report_template_of.highest_lowest_temp_and_humidity(report_data) | ||
|
||
# date = year/month |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code should be readable enough that there is not need of code
report_template.py
Outdated
print( | ||
f"Highest Temperature: {highest_temp.max_temp}°C on " | ||
f"{self.format_date_from(highest_temp.date)}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check .format() to add data to strings
report_template.py
Outdated
_, month, year = self.get_date_from(report_data[0].date) | ||
print("================== Temperature Charts" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use dictionaries to get data through keys
report_template.py
Outdated
|
||
class colors: | ||
BLUE = '\033[96m' | ||
RED = '\033[91m' | ||
ENDC = '\033[0m' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make a seperate file to handle all constants
main.py
Outdated
type=str, | ||
metavar="year/mm", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in type you can add custom validation to validate input
report_generator.py
Outdated
print( | ||
f"Highest Temperature: {highest_temp.max_temp}°C on " | ||
f"{self.format_date_from(highest_temp.date)}" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check .format() method for dynamically editing strings
main.py
Outdated
print(e) | ||
print(traceback.format_exc()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raise the exception instead of printing it
weather_data_analyzer.py
Outdated
import csv | ||
import sys | ||
from weather_record import WeatherRecord | ||
from report_types \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can couple import using (), \ is not a very good approach
weather_data_analyzer.py
Outdated
for filepath in filepaths: | ||
self.file_data_reader_and_add_to_file_records(filepath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
call this loop inside of a function.
weather_data_analyzer.py
Outdated
dummy_record_for_comparison = { | ||
'PKT': "2022-08-16", | ||
'Mean TemperatureC': None, | ||
'Max TemperatureC': -sys.maxsize, | ||
'Min TemperatureC': sys.maxsize, | ||
' Min Humidity': sys.maxsize, | ||
'Max Humidity': -sys.maxsize, | ||
' Mean Humidity': None | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for what do we need dummy record for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for comparison. To generate report 1 (highest max_temp, lowest min_temp, highest max_humidity)
Actually, It is similar to logic:
min = sys.maxsize
for num in array:
if num < min:
min = num
return min
To generate report 1, I need max, min of columns. When comparing with the file records, i set the dummy_record
as the first/initial max_record/min_record
as min = sys.maxsize
in the above code example and then doing the comparison.
I can set the first record from the file as max_record/min_record
, but files data are not consistent. Means, It may possible that for a file, the first row Max TemperatureC is empty/None
. which then gives errorError: can't compare 'int' with None.
Can you please tell me another way to improve that thing/logic?
os.path.join(self.data_folder_path, file_name) | ||
) | ||
break | ||
# since the given month of the given year has only 1 file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
avoid comments in code
def get_matched_file_paths(self): | ||
""" | ||
Returns the filepaths matched with the user given year_month | ||
|
||
:return: List[FilePath] | ||
""" | ||
return self.matched_file_paths_with_year_month |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not method is not required, the class variable is already accessible within class, so making a method to get that data is not required
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. but I have this in mind because getting value through variable directly provides tight coupling.
"Client code will be coupled to the names of your instance variables. Client code should not be coupled to anything internal. Internal things could change in the future. External interface ( contract ) should not be affected by changed internals."
Do I have to replace the method with instance variable or have to go with no changing?
validations.py
Outdated
""" | ||
|
||
if not month_number_string.isdigit(): | ||
raise Exception("Month name should be a valid Number") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can also raise custom exception
validations.py
Outdated
|
||
month = int(month_number_string) | ||
if month < 1 or month > 12: | ||
raise Exception("Month name should be valid number from 1 - 12") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
define custom exception to handle invalid and missing value messages.
validations.py
Outdated
def is_month_exists(split_year_month): | ||
""" | ||
Checks whether the given splitted year_month has month or not | ||
|
||
:param split_year_month: | ||
:return: bool | ||
""" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can use Regex to validate string pattern
validations.py
Outdated
""" | ||
Check whether the given year_month does not contain month | ||
|
||
:param year_month: | ||
:return: bool | ||
""" | ||
|
||
split_year_month = year_month.split("/") | ||
return len(split_year_month) <= 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its fine it will do the job, but try using Regex to except of splitting strings
weather_data_analyzer.py
Outdated
'PKT': "2022-08-16", | ||
'Mean TemperatureC': None, | ||
'Max TemperatureC': -sys.maxsize, | ||
'Min TemperatureC': sys.maxsize, | ||
' Min Humidity': sys.maxsize, | ||
'Max Humidity': -sys.maxsize, | ||
' Mean Humidity': None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'PKT': "2022-08-16", | |
'Mean TemperatureC': None, | |
'Max TemperatureC': -sys.maxsize, | |
'Min TemperatureC': sys.maxsize, | |
' Min Humidity': sys.maxsize, | |
'Max Humidity': -sys.maxsize, | |
' Mean Humidity': None | |
'PKT': "2022-08-16", | |
'Mean TemperatureC': None, | |
'Max TemperatureC': -sys.maxsize, | |
'Min TemperatureC': sys.maxsize, | |
'Min Humidity': sys.maxsize, | |
'Max Humidity': -sys.maxsize, | |
'Mean Humidity': None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't do this. Because the fields are set according to the files header. The files has header ' Min TemperatureC'
, ' Mean Humidity'
.
One way is to correct the headers of all files first but If i correct the header of files on my laptop, then it will not work for you or any other user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then I would say let's add data cleaning step in your script.
weather_data_analyzer.py
Outdated
with open(filepath, 'r') as file: | ||
records_list = csv.DictReader(file) | ||
|
||
for row in records_list: | ||
record = WeatherRecord(row) | ||
self.file_records.append(record) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its fine but the next step can be of using pandas to read and clean data from source files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I will try with pandas.
weather_data_analyzer.py
Outdated
""" | ||
Calculates the average of mean_humidity from the file_records list | ||
|
||
:return: average - float | ||
""" | ||
|
||
total_sum_of_mean_humidity = 0 | ||
|
||
for record in self.file_records: | ||
|
||
if record.mean_humidity: | ||
total_sum_of_mean_humidity += int(record.mean_humidity) | ||
|
||
return total_sum_of_mean_humidity / len(self.file_records) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can make it more pythonic using array comprehension
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. Something like below:
We can use the 'mean' from python lib statistics. but the problem is that max_temp in a record can be None instead of integer.
max_temps = [r.max_temp for r in self.file_records if r.max_temp]
total = sum(max_temps)
return total / len(self.file_records)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now it depends what you want to do with None records, either remove these values in data cleaning otherwise you can use r.get('max_temp',0) or 0
to assign default values.
weather_record.py
Outdated
self.min_temp = record_item['Mean TemperatureC'] | ||
self.min_temp = record_item['Min TemperatureC'] | ||
self.max_temp = record_item['Max TemperatureC'] | ||
self.avg_temp = record_item['Min Temperature'] | ||
self.mean_temp = record_item['Mean TemperatureC'] | ||
self.min_humidity = record_item[' Min Humidity'] | ||
self.max_humidity = record_item['Max Humidity'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try to keep variable and keys naming convention same. and you can also use pandas for column operations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay sure. the confusion was that the task document use the word "average" and data files has the word 'mean'. So, in this case which one i have to use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it might be mean temperature
of a day but not sure
from validations import check_month_is_valid_number, is_month_exists | ||
from utils import get_month_year_from | ||
|
||
Months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', | ||
'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from validations import check_month_is_valid_number, is_month_exists | |
from utils import get_month_year_from | |
Months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', | |
'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'] | |
from validations import check_month_is_valid_number, is_month_exists | |
from utils import get_month_year_from | |
import calendar | |
Months = list(calendar.month_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tried the library. The months name are in this format in Calendar Library:
['', 'January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']
. I need this in month_name(3 chars) format : ['Jan', 'Feb'.....].
elif self.year: | ||
self.store_file_paths_matched_with_year() | ||
else: | ||
raise Exception("Year/Month must be Required") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
custom exception
def store_file_paths_matched_with_year_and_month(self): | ||
""" | ||
Add the filepaths in the list if filename contains the given month and | ||
year | ||
""" | ||
|
||
for file_name in os.listdir(self.data_folder_path): | ||
file_month_year = get_month_year_from(file_name) | ||
file_month, file_year = file_month_year.month, file_month_year.year | ||
|
||
if self.month == file_month and self.year == file_year: | ||
self.matched_file_paths_with_year_month.append( | ||
os.path.join(self.data_folder_path, file_name) | ||
) | ||
break | ||
# since the given month of the given year has only 1 file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there shouldn't be a need to store list of files on disk probably you can use os.exists() to check if specified filename exists on disk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please elaborate this?
I am storing file paths (which matched with user given year and month) in an array. The os.path.join is using to store the full (absolute) file path instead of filename. Is there better way to do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems like you are maintaining a list of filenames, the only suggestion is if you know the pattern for filenames you can generate it on the go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nahi nahi. Actually, I am getting folder name as param in func which has all files.
Then iterating on those file names to check whether tthe file name contains the same month and year as the user given self.month and self.year . If matched, then i store the absolute path of file in tthe array (since relative path of file does not work.). from these absolute file paths, i am reading data in some other class.
main.py
Outdated
) | ||
|
||
arg_parser.add_argument( | ||
'-c', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the name should be meaningful, it should show the intent. c, cs, e, a
these should be changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but these are arguments given from command line. Mostly cmd arguments consists of one letter.
like : python3 main.py -c 2012/6
. In linux, we also give arguments in one letter ls -a
.
main.py
Outdated
file_path_matcher = FilePathMatcherWithDate(weather_data_files_path) | ||
file_path_matcher.setDate(year) | ||
matched_file_paths_with_year = file_path_matcher.get_files_path() | ||
if args.cs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
args c
and cs
are identical till calling get_chart_data
, this logic should be combined, e.g make a list [args.c, args.cs]
and loop over it and check if the value is not None then do the rest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
like this:
for value in [args.c, args.cs]:
if value is not None:
data_provider = ReportDataProvider( value , weather_data_folder_path )
report_data = data_provider.get_chart_data()
if args.c == value:
report_data.gen_report_high_low_temperature_charts( report_data)
if args.cs == value:
report_data.gen_report_high_low_temperature_single_line_chart( report_data )
report_generator.py
Outdated
for record in report_data: | ||
self.draw_2_charts_of_high_low_temp_for_1_day(record) | ||
|
||
def draw_2_charts_of_high_low_temp_for_1_day(self, record: WeatherRecord): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of writing 1
and 2
in function names, it should be one
and two
report_generator.py
Outdated
for record in report_data: | ||
self.draw_1_chart_of_high_low_temp_for_1_day(record) | ||
|
||
def draw_1_chart_of_high_low_temp_for_1_day(self, record: WeatherRecord): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change function name.
report_generator.py
Outdated
""" | ||
|
||
date = self.get_date_from(record.date) | ||
formatted_day = self.append_zero_to_start_in_day(date.day) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the method should not be specific to day, add_leading_zero
is a suggestion.
report_generator.py
Outdated
print("+" * self.get_integer_value_from(temperature_string), end="") | ||
|
||
def append_zero_to_start_in_day(self, day): | ||
return '0' + day if int(day) < 10 else day |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check zfill
weather_data_analyzer.py
Outdated
) | ||
|
||
def remove_none_values_of_max_temp(self): | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these multiple methods can be converted to single, pass the relevant data and it should remove the None values from it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think They can't be converted into single loop. Our data is like :
| max_temp | min_temp | max_humidity | mean_humidity |
|----------|----------|--------------|---------------|
| None | 3 | None | 20 |
| 38 | None | None | None |
| None | None | 100 | None |
| 36 | 2 | 95 | 50 |
ans = []
for rec in table_records:
if rec.max_temp and rec.min_temp and rec.max_humidity and rec.mean_humidity:
ans.append(rec)
If i use a single loop with single array ans
to remove None values, then I will get the single row left from table (the last row in which all values of each column are not None) | 36 | 2 | 95 | 50 |
.
To calculate the highest max_temp
, I need all valid values of max_temp
column i.e. [38, 36]. The above single loop technique is removing the max_temp 38 - 2nd row
because other columns values in this row are None.
The other option is that use single loop with four different arrays:
max_temp = [], min_temp = [], max_humid = [], mean_humid = []
for rec in table_records:
if rec.max_temp:
max_temp.append(rec)
if rec.min_temp:
min_temp.append(rec)
...........
It will mess up the code and four arrays will return (in tuple/dict). so that's why i have used four different functions. Let me know, if the above four array with single loop is good in a way, then i will go for it.
weather_data_analyzer.py
Outdated
""" | ||
|
||
valid_max_temps = [ | ||
rec for rec in self.file_records if rec.max_temp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
an alternate way to remove None can be like this
return list(filter(lambda rec: rec.max_temp is not None, self. file_records))
…tion rewritten, add python cleaner file to trim spaces of headers
This is the last branch. In this branch, all the codes for every question is presented. Along with solution of every task, I have done refactoring of task too.