-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat/extract timespan #37
base: dev
Are you sure you want to change the base?
Conversation
Codecov Report
@@ Coverage Diff @@
## dev #37 +/- ##
=====================================
Coverage ? 0.00%
=====================================
Files ? 65
Lines ? 16825
Branches ? 0
=====================================
Hits ? 0
Misses ? 16825
Partials ? 0 Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
29e2bd3
to
37abb70
Compare
|
||
def extract_timespan_en(text, time_unit=TimespanUnit.TIMEDELTA, replace_token=""): | ||
""" | ||
Convert an english phrase into a number of seconds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this just extract_duration
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes but it changes return type and adds the replace token, its mainly a compatibility thing so extract_duration always means the same (mycroft) thing
@coderabbitai review |
Actions performedReview triggered.
|
WalkthroughThe recent changes introduce new functions and updates across multiple language modules to enhance text parsing capabilities in Changes
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 16
Outside diff range, codebase verification and nitpick comments (4)
lingua_franca/lang/parse_de.py (1)
Line range hint
1267-1269
: Refactor to useGermanNormalizer
.The TODO comment indicates that the function should use
GermanNormalizer
for normalization.- def normalize_de(text, remove_articles=True): - """ German string normalization """ - # TODO return GermanNormalizer().normalize(text, remove_articles) - words = text.split() # this also removed extra spaces - normalized = "" - for word in words: - if remove_articles and word in ["der", "die", "das", "des", "den", - "dem"]: - continue - - # Expand common contractions, e.g. "isn't" -> "is not" - contraction = ["net", "nett"] - if word in contraction: - expansion = ["nicht", "nicht"] - word = expansion[contraction.index(word)] - - # Convert numbers into digits, e.g. "two" -> "2" - - if word in _DE_NUMBERS: - word = str(_DE_NUMBERS[word]) - - normalized += " " + word - - return normalized[1:] # strip the initial space + def normalize_de(text, remove_articles=True): + return GermanNormalizer().normalize(text, remove_articles)Tools
Ruff
1267-1267: Redefinition of unused
normalize_de
from line 1218(F811)
lingua_franca/lang/parse_en.py (1)
1741-1757
: Suggestion: Improve the text normalization comment.The comment about the text normalization hack can be improved for clarity.
- # NOTE: above is a nasty hack, _convert_words_to_numbers normalized the - # string so we can do this, but this is essentially incorrect since each - # replaced number word should be replaced with a single replace_token + # NOTE: The above is a workaround. _convert_words_to_numbers normalizes the + # string, allowing us to replace two words (number and unit) with a single + # replace_token. Ideally, each number word should be replaced with a single + # replace_token.test/unittests/test_parse_en.py (2)
600-606
: Ensure consistency in test cases.The
test_extract_duration
method already covers some cases for months and years. Ensure consistency and avoid redundancy.- self.assertEqual(extract_duration("1 month"), - (timedelta(days=DAYS_IN_1_MONTH), "")) - self.assertEqual( - extract_duration("1 month"), - (timedelta(days=DAYS_IN_1_MONTH), ""))
Line range hint
2521-2527
: Consider localization for gender tests.The gender tests are not localized. Consider whether localization is needed for English.
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (11)
- lingua_franca/lang/parse_de.py (6 hunks)
- lingua_franca/lang/parse_en.py (4 hunks)
- lingua_franca/lang/parse_es.py (2 hunks)
- lingua_franca/lang/parse_pt.py (2 hunks)
- lingua_franca/parse.py (3 hunks)
- lingua_franca/res/text/de-de/normalize.json (1 hunks)
- lingua_franca/time.py (1 hunks)
- test/unittests/test_parse_de.py (7 hunks)
- test/unittests/test_parse_en.py (3 hunks)
- test/unittests/test_parse_es.py (3 hunks)
- test/unittests/test_parse_pt.py (3 hunks)
Files not summarized due to errors (2)
- test/unittests/test_parse_de.py: Error: Message exceeds token limit
- test/unittests/test_parse_en.py: Error: Message exceeds token limit
Additional context used
Ruff
lingua_franca/parse.py
16-16:
datetime
imported but unusedRemove unused import:
datetime
(F401)
test/unittests/test_parse_es.py
24-24:
lingua_franca.parse.get_gender
imported but unusedRemove unused import:
lingua_franca.parse.get_gender
(F401)
24-24: Redefinition of unused
extract_datetime
from line 22Remove definition:
extract_datetime
(F811)
24-24: Redefinition of unused
extract_number
from line 21Remove definition:
extract_number
(F811)
24-24: Redefinition of unused
normalize
from line 21Remove definition:
normalize
(F811)
24-24: Redefinition of unused
yes_or_no
from line 22Remove definition:
yes_or_no
(F811)
25-25:
lingua_franca.time.to_local
imported but unusedRemove unused import:
lingua_franca.time.to_local
(F401)
test/unittests/test_parse_pt.py
23-23:
lingua_franca.time.to_local
imported but unusedRemove unused import:
lingua_franca.time.to_local
(F401)
lingua_franca/lang/parse_es.py
1128-1128: Function definition does not bind loop variable
unit_en
(B023)
1131-1131:
re
may be undefined, or defined from star imports(F405)
1139-1139: Function definition does not bind loop variable
unit_en
(B023)
1141-1141: Function definition does not bind loop variable
unit_en
(B023)
1143-1143: Function definition does not bind loop variable
unit_en
(B023)
1145-1145: Function definition does not bind loop variable
unit_en
(B023)
1147-1147: Function definition does not bind loop variable
unit_en
(B023)
1152-1152:
re
may be undefined, or defined from star imports(F405)
lingua_franca/lang/parse_pt.py
1156-1156: Function definition does not bind loop variable
unit_en
(B023)
1167-1167: Function definition does not bind loop variable
unit_en
(B023)
1169-1169: Function definition does not bind loop variable
unit_en
(B023)
1171-1171: Function definition does not bind loop variable
unit_en
(B023)
1173-1173: Function definition does not bind loop variable
unit_en
(B023)
1175-1175: Function definition does not bind loop variable
unit_en
(B023)
lingua_franca/lang/parse_de.py
144-144: Function definition does not bind loop variable
unit_en
(B023)
1267-1267: Redefinition of unused
normalize_de
from line 1218(F811)
lingua_franca/lang/parse_en.py
605-605: Function definition does not bind loop variable
unit_en
(B023)
610-610: Function definition does not bind loop variable
unit_en
(B023)
612-612: Function definition does not bind loop variable
unit_en
(B023)
614-614: Function definition does not bind loop variable
unit_en
(B023)
616-616: Function definition does not bind loop variable
unit_en
(B023)
618-618: Function definition does not bind loop variable
unit_en
(B023)
Additional comments not posted (87)
lingua_franca/res/text/de-de/normalize.json (5)
1-8
: Configuration settings look good.The initial configuration settings for text normalization in German are correctly defined. They include options for lowercase conversion, number-to-digit conversion, contraction expansion, symbol removal, accent removal, article removal, and stopword removal.
9-25
: Contractions mapping is comprehensive.The contractions mapping for German contractions is well-defined and covers common contractions.
26-29
: Word replacements are appropriate.The word replacements for "m" to "meter" and "km" to "kilometer" are correctly defined.
30-63
: Number replacements are thorough.The number replacements for German words to digits are comprehensive and cover numbers from 0 to 90.
64-72
: Stopwords and articles are correctly listed.The stopwords list is empty, and the articles list includes common German articles.
lingua_franca/time.py (2)
21-38
: EnumTimespanUnit
is well-defined.The
TimespanUnit
enum is comprehensive and includes various time units for timespan calculations.
42-43
: Constants for time calculations are correctly defined.The constants
DAYS_IN_1_YEAR
andDAYS_IN_1_MONTH
are accurately defined for time calculations.lingua_franca/parse.py (2)
29-29
: Imports for time calculations are appropriate.The imports for
TimespanUnit
,DAYS_IN_1_MONTH
, andDAYS_IN_1_YEAR
fromlingua_franca.time
are correctly added.
132-203
: Functionextract_timespan
is well-defined but has a TODO comment.The
extract_timespan
function is comprehensive and handles various time units. However, there is a TODO comment indicating that the remainder is imperfect and ignores thereplace_token
.Ensure that the TODO comment is addressed in future updates.
test/unittests/test_parse_es.py (18)
16-17
: Imports fortimedelta
andrelativedelta
are appropriate.The imports for
timedelta
fromdatetime
andrelativedelta
fromdateutil.relativedelta
are correctly added.
22-22
: Importextract_duration
is appropriate.The import for
extract_duration
fromlingua_franca.parse
is correctly added.
249-267
: Tests forextract_duration
are comprehensive.The tests for the
extract_duration
function cover various scenarios and time units.
268-291
: Tests for non-standard units inextract_duration
are thorough.The tests for non-standard units like months, years, decades, centuries, and millennia are well-defined and cover various scenarios.
308-327
: Tests forextract_timespan
are comprehensive.The tests for the
extract_timespan
function cover various scenarios and time units.
329-367
: Tests forextract_timespan
withRELATIVEDELTA
are thorough.The tests for the
extract_timespan
function with theRELATIVEDELTA
time unit are well-defined and cover various scenarios.
368-396
: Tests forextract_timespan
withTOTAL_MICROSECONDS
are comprehensive.The tests for the
extract_timespan
function with theTOTAL_MICROSECONDS
time unit are well-defined and cover various scenarios.
397-425
: Tests forextract_timespan
withTOTAL_MILLISECONDS
are comprehensive.The tests for the
extract_timespan
function with theTOTAL_MILLISECONDS
time unit are well-defined and cover various scenarios.
426-450
: Tests forextract_timespan
withTOTAL_SECONDS
are thorough.The tests for the
extract_timespan
function with theTOTAL_SECONDS
time unit are well-defined and cover various scenarios.
451-471
: Tests forextract_timespan
withTOTAL_MINUTES
are comprehensive.The tests for the
extract_timespan
function with theTOTAL_MINUTES
time unit are well-defined and cover various scenarios.
472-493
: Tests forextract_timespan
withTOTAL_HOURS
are thorough.The tests for the
extract_timespan
function with theTOTAL_HOURS
time unit are well-defined and cover various scenarios.
494-516
: Tests forextract_timespan
withTOTAL_DAYS
are comprehensive.The tests for the
extract_timespan
function with theTOTAL_DAYS
time unit are well-defined and cover various scenarios.
517-538
: Tests forextract_timespan
withTOTAL_WEEKS
are thorough.The tests for the
extract_timespan
function with theTOTAL_WEEKS
time unit are well-defined and cover various scenarios.
539-561
: Tests forextract_timespan
withTOTAL_MONTHS
are comprehensive.The tests for the
extract_timespan
function with theTOTAL_MONTHS
time unit are well-defined and cover various scenarios.
562-583
: Tests forextract_timespan
withTOTAL_YEARS
are thorough.The tests for the
extract_timespan
function with theTOTAL_YEARS
time unit are well-defined and cover various scenarios.
584-604
: Tests forextract_timespan
withTOTAL_DECADES
are comprehensive.The tests for the
extract_timespan
function with theTOTAL_DECADES
time unit are well-defined and cover various scenarios.
605-625
: Tests forextract_timespan
withTOTAL_CENTURIES
are thorough.The tests for the
extract_timespan
function with theTOTAL_CENTURIES
time unit are well-defined and cover various scenarios.
626-646
: Tests forextract_timespan
withTOTAL_MILLENNIUMS
are comprehensive.The tests for the
extract_timespan
function with theTOTAL_MILLENNIUMS
time unit are well-defined and cover various scenarios.test/unittests/test_parse_pt.py (17)
260-287
: Ensure proper handling of imperfect remainders.The TODO comments indicate that the handling of imperfect remainders is not yet implemented. Ensure these cases are handled correctly in future updates.
376-396
: Tests forextract_timespan
look good!The tests cover various time units and edge cases.
397-435
: Tests forextract_timespan_delta
look good!The tests cover various time units using
TimespanUnit.RELATIVEDELTA
.
436-464
: Tests forextract_timespan_microsegundos
look good!The tests cover various time units in microseconds.
465-493
: Tests forextract_timespan_milisegundos
look good!The tests cover various time units in milliseconds.
494-518
: Tests forextract_timespan_segundos
look good!The tests cover various time units in seconds.
519-539
: Tests forextract_timespan_minutos
look good!The tests cover various time units in minutes.
540-561
: Tests forextract_timespan_horas
look good!The tests cover various time units in hours.
562-584
: Tests forextract_timespan_dias
look good!The tests cover various time units in days.
585-606
: Tests forextract_timespan_semanas
look good!The tests cover various time units in weeks.
607-629
: Tests forextract_timespan_months
look good!The tests cover various time units in months.
630-651
: Tests forextract_timespan_years
look good!The tests cover various time units in years.
652-672
: Tests forextract_timespan_decades
look good!The tests cover various time units in decades.
673-693
: Tests forextract_timespan_centuries
look good!The tests cover various time units in centuries.
694-714
: Tests forextract_timespan_millennia
look good!The tests cover various time units in millennia.
Line range hint
320-348
:
Tests forget_gender
look good!The tests cover various cases for gender extraction in Portuguese.
Line range hint
351-374
:
Tests foryes_or_no
look good!The tests cover various cases for yes/no extraction in Portuguese.
lingua_franca/lang/parse_es.py (1)
Line range hint
1159-1177
:
Ensureget_gender_es
handles context correctly.The function now has a default value for the
context
parameter. Ensure it handles context correctly.Tools
Ruff
1128-1128: Function definition does not bind loop variable
unit_en
(B023)
1131-1131:
re
may be undefined, or defined from star imports(F405)
1139-1139: Function definition does not bind loop variable
unit_en
(B023)
1141-1141: Function definition does not bind loop variable
unit_en
(B023)
1143-1143: Function definition does not bind loop variable
unit_en
(B023)
1145-1145: Function definition does not bind loop variable
unit_en
(B023)
1147-1147: Function definition does not bind loop variable
unit_en
(B023)
1152-1152:
re
may be undefined, or defined from star imports(F405)
lingua_franca/lang/parse_pt.py (2)
23-23
: LGTM! Imports are necessary and correctly used.The updated imports for
timedelta
,now_local
,DAYS_IN_1_MONTH
, andDAYS_IN_1_YEAR
are necessary for the new functionality in the code.Also applies to: 31-31
1102-1121
: Well-documented function.The docstring provides a clear and comprehensive explanation of the function's purpose, arguments, and return values.
lingua_franca/lang/parse_de.py (3)
117-117
: Ensure the correctness of the regex pattern and replacement logic.The regex pattern and replacement logic for converting German phrases to duration should be thoroughly tested to ensure they cover all edge cases.
154-154
: Ensure the correctness of the regex pattern and replacement logic.The regex pattern and replacement logic for converting German phrases to timespan should be thoroughly tested to ensure they cover all edge cases.
1263-1265
: Ensure the correctness of JSON loading.The usage of
resolve_resource_file
and JSON loading should be thoroughly tested to ensure the configuration is loaded correctly.test/unittests/test_parse_de.py (15)
703-746
: Ensure comprehensive test coverage forextract_timespan
with days.The new test cases for
extract_timespan
with days should cover various scenarios, including edge cases and common use cases. Verify that all relevant scenarios are tested.Verification successful
Ensure comprehensive test coverage for
extract_timespan
with days.The test cases for
extract_timespan
with days intest_parse_de.py
andtest_parse_en.py
cover various scenarios, including edge cases and common use cases. The provided test cases appear to be comprehensive.
test/unittests/test_parse_de.py
test/unittests/test_parse_en.py
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the test coverage for `extract_timespan` with days. # Test: Search for test cases related to `extract_timespan` with days. rg --type python -A 5 $'extract_timespan.*days'Length of output: 79
Script:
#!/bin/bash # Description: Verify the test coverage for `extract_timespan` with days. # Test: Search for test cases related to `extract_timespan` with days. rg --glob '*.py' -A 5 'extract_timespan.*days'Length of output: 3102
504-563
: Ensure comprehensive test coverage forextract_timespan
with milliseconds.The new test cases for
extract_timespan
with milliseconds should cover various scenarios, including edge cases and common use cases. Verify that all relevant scenarios are tested.Verification successful
Ensure comprehensive test coverage for
extract_timespan
with milliseconds.The test cases for
extract_timespan
with milliseconds intest/unittests/test_parse_de.py
cover various scenarios, including edge cases and common use cases. The test cases appear to be well-implemented and thorough.
test/unittests/test_parse_de.py
test/unittests/test_parse_en.py
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the test coverage for `extract_timespan` with milliseconds. # Test: Search for test cases related to `extract_timespan` with milliseconds. rg --type python -A 5 $'extract_timespan.*milliseconds'Length of output: 87
Script:
#!/bin/bash # Description: Verify the test coverage for `extract_timespan` with milliseconds. # Test: Search for test cases related to `extract_timespan` with milliseconds. rg --type py -A 5 $'extract_timespan.*milliseconds'Length of output: 1025
658-702
: Ensure comprehensive test coverage forextract_timespan
with hours.The new test cases for
extract_timespan
with hours should cover various scenarios, including edge cases and common use cases. Verify that all relevant scenarios are tested.Verification successful
Comprehensive test coverage for
extract_timespan
with hours verified.The test cases for
extract_timespan
with hours are well-covered in both the German (test_parse_de.py
) and English (test_parse_en.py
) test files. No further actions are required.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the test coverage for `extract_timespan` with hours. # Test: Search for test cases related to `extract_timespan` with hours. rg --type python -A 5 $'extract_timespan.*hours'Length of output: 80
Script:
#!/bin/bash # Description: Verify the test coverage for `extract_timespan` with hours. # Test: Search for test cases related to `extract_timespan` with hours. rg --type py -A 5 $'extract_timespan.*hours'Length of output: 2984
614-657
: Ensure comprehensive test coverage forextract_timespan
with minutes.The new test cases for
extract_timespan
with minutes should cover various scenarios, including edge cases and common use cases. Verify that all relevant scenarios are tested.Verification successful
Ensure comprehensive test coverage for
extract_timespan
with minutes.The test cases for
extract_timespan
with minutes in bothtest_parse_de.py
andtest_parse_en.py
provide comprehensive coverage, including various scenarios and edge cases. No further action is needed.
test/unittests/test_parse_de.py
test/unittests/test_parse_en.py
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the test coverage for `extract_timespan` with minutes. # Test: Search for test cases related to `extract_timespan` with minutes. rg --type python -A 5 $'extract_timespan.*minutes'Length of output: 82
Script:
#!/bin/bash # Description: Verify the test coverage for `extract_timespan` with minutes. # Test: Search for test cases related to `extract_timespan` with minutes. rg -A 5 'extract_timespan.*minutes'Length of output: 5779
566-613
: Ensure comprehensive test coverage forextract_timespan
with seconds.The new test cases for
extract_timespan
with seconds should cover various scenarios, including edge cases and common use cases. Verify that all relevant scenarios are tested.Verification successful
Ensure comprehensive test coverage for
extract_timespan
with seconds.The test cases for
extract_timespan
with seconds intest_parse_de.py
cover various scenarios, including edge cases and common use cases. Similar tests are also present intest_parse_en.py
, indicating comprehensive coverage.
test/unittests/test_parse_de.py
:test_extract_timespan_seconds_de
test/unittests/test_parse_en.py
:test_extract_timespan_seconds_en
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the test coverage for `extract_timespan` with seconds. # Test: Search for test cases related to `extract_timespan` with seconds. rg --type python -A 5 $'extract_timespan.*seconds'Length of output: 82
Script:
#!/bin/bash # Description: Verify the test coverage for `extract_timespan` with seconds. # Test: Search for test cases related to `extract_timespan` with seconds. rg --type py -A 5 $'extract_timespan.*seconds'Length of output: 6865
462-503
: Ensure comprehensive test coverage forextract_timespan
with microseconds.The new test cases for
extract_timespan
with microseconds should cover various scenarios, including edge cases and common use cases. Verify that all relevant scenarios are tested.
19-25
: Ensure new imports are used.The new imports for
extract_duration
,extract_timespan
, andTimespanUnit
should be used in the file. Verify their usage in the test methods.Verification successful
Ensure new imports are used.
The new imports for
extract_duration
,extract_timespan
, andTimespanUnit
are correctly used in the file.
extract_duration
is used in multiple test cases such astest_extract_duration_de
.extract_timespan
is used in multiple test cases such astest_extract_timespan_de
.TimespanUnit
is used in multiple test cases such astest_extract_timespan_ambiguous
.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the usage of new imports in the file. # Test: Search for the usage of `extract_duration`, `extract_timespan`, and `TimespanUnit`. rg --type python -A 5 $'extract_duration|extract_timespan|TimespanUnit'Length of output: 103
Script:
#!/bin/bash # Description: Verify the usage of new imports in the file. # Test: Search for the usage of `extract_duration`, `extract_timespan`, and `TimespanUnit`. rg --glob '*.py' -A 5 'extract_duration|extract_timespan|TimespanUnit'Length of output: 157997
Line range hint
220-250
:
Ensure comprehensive test coverage forextract_duration
.The new test cases for
extract_duration
should cover various scenarios, including edge cases and common use cases. Verify that all relevant scenarios are tested.Verification successful
Test coverage for
extract_duration
is comprehensive.The test cases for
extract_duration
cover a wide range of scenarios and languages, ensuring robust validation of the function.
test/unittests/test_parse_de.py
test/unittests/test_parse_ru.py
test/unittests/test_parse_pt.py
test/unittests/test_parse_nl.py
test/unittests/test_parse_pl.py
test/unittests/test_parse_cs.py
test/unittests/test_parse_fr.py
test/unittests/test_parse_sv.py
test/unittests/test_parse_az.py
test/unittests/test_parse_es.py
test/unittests/test_parse_fa.py
test/unittests/test_parse_en.py
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the test coverage for `extract_duration`. # Test: Search for test cases related to `extract_duration`. rg --type python -A 5 $'extract_duration'Length of output: 71
Script:
#!/bin/bash # Description: Verify the test coverage for `extract_duration`. # Test: Search for test cases related to `extract_duration`. rg --type py -A 5 $'extract_duration'Length of output: 57945
747-788
: Ensure comprehensive test coverage forextract_timespan
with weeks.The new test cases for
extract_timespan
with weeks should cover various scenarios, including edge cases and common use cases. Verify that all relevant scenarios are tested.Verification successful
Ensure comprehensive test coverage for
extract_timespan
with weeks.The test cases for
extract_timespan
with weeks intest_parse_de.py
are comprehensive and cover various scenarios, including different time units, edge cases, and phrases with additional text. No further actions are necessary.
test/unittests/test_parse_de.py
: Comprehensive test cases forextract_timespan
with weeks.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the test coverage for `extract_timespan` with weeks. # Test: Search for test cases related to `extract_timespan` with weeks. rg --type python -A 5 $'extract_timespan.*weeks'Length of output: 80
Script:
#!/bin/bash # Description: Verify the test coverage for `extract_timespan` with weeks. # Test: Search for test cases related to `extract_timespan` with weeks. rg --glob '*.py' -A 5 $'extract_timespan.*weeks'Length of output: 3148
260-289
: Ensure comprehensive test coverage forextract_timespan
.The new test cases for
extract_timespan
should cover various scenarios, including edge cases and common use cases. Verify that all relevant scenarios are tested.Verification successful
Comprehensive test coverage for
extract_timespan
verified.The test cases for
extract_timespan
are extensive and cover various scenarios, including different time units and edge cases. The function is well-tested across multiple languages.
- Files and lines with test cases:
test/unittests/test_parse_de.py
test/unittests/test_parse_pt.py
test/unittests/test_parse_es.py
test/unittests/test_parse_en.py
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the test coverage for `extract_timespan`. # Test: Search for test cases related to `extract_timespan`. rg --type python -A 5 $'extract_timespan'Length of output: 71
Script:
#!/bin/bash # Description: Verify the test coverage for `extract_timespan`. # Test: Search for test cases related to `extract_timespan`. rg --type py -A 5 $'extract_timespan'Length of output: 81947
789-833
: Ensure comprehensive test coverage forextract_timespan
with months.The new test cases for
extract_timespan
with months should cover various scenarios, including edge cases and common use cases. Verify that all relevant scenarios are tested.Verification successful
Ensure comprehensive test coverage for
extract_timespan
with months.The new test cases for
extract_timespan
with months intest_parse_de.py
contribute to a broader set of tests across multiple languages, ensuring comprehensive coverage.
test/unittests/test_parse_pt.py
test/unittests/test_parse_es.py
test/unittests/test_parse_de.py
test/unittests/test_parse_en.py
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the test coverage for `extract_timespan` with months. # Test: Search for test cases related to `extract_timespan` with months. rg --type python -A 5 $'extract_timespan.*months'Length of output: 81
Script:
#!/bin/bash # Description: Verify the test coverage for `extract_timespan` with months. # Test: Search for test cases related to `extract_timespan` with months. rg --type py -A 5 $'extract_timespan.*months'Length of output: 5467
877-915
: Ensure comprehensive test coverage forextract_timespan
with decades.The new test cases for
extract_timespan
with decades should cover various scenarios, including edge cases and common use cases. Verify that all relevant scenarios are tested.Verification successful
Ensure comprehensive test coverage for
extract_timespan
with decades.The test cases for
extract_timespan
with decades are present across multiple language files, indicating comprehensive coverage.
test/unittests/test_parse_pt.py
test/unittests/test_parse_es.py
test/unittests/test_parse_en.py
test/unittests/test_parse_de.py
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the test coverage for `extract_timespan` with decades. # Test: Search for test cases related to `extract_timespan` with decades. rg --type python -A 5 $'extract_timespan.*decades'Length of output: 82
Script:
#!/bin/bash # Description: Verify the test coverage for `extract_timespan` with decades. # Test: Search for test cases related to `extract_timespan` with decades. rg -A 5 'extract_timespan.*decades'Length of output: 2961
956-975
: Ensure comprehensive test coverage forextract_timespan
with millennia.The new test cases for
extract_timespan
with millennia should cover various scenarios, including edge cases and common use cases. Verify that all relevant scenarios are tested.Verification successful
Comprehensive test coverage for
extract_timespan
with millennia verified.The test cases for
extract_timespan
with millennia are well-covered across multiple language-specific test files, ensuring thorough validation of the function.
test/unittests/test_parse_de.py
test/unittests/test_parse_es.py
test/unittests/test_parse_pt.py
test/unittests/test_parse_en.py
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the test coverage for `extract_timespan` with millennia. # Test: Search for test cases related to `extract_timespan` with millennia. rg --type python -A 5 $'extract_timespan.*millennia'Length of output: 84
Script:
#!/bin/bash # Description: Verify the test coverage for `extract_timespan` with millennia. # Test: Search for test cases related to `extract_timespan` with millennia. rg --type py -A 5 'extract_timespan.*millennia'Length of output: 1960
834-876
: Ensure comprehensive test coverage forextract_timespan
with years.The new test cases for
extract_timespan
with years should cover various scenarios, including edge cases and common use cases. Verify that all relevant scenarios are tested.Verification successful
Ensure comprehensive test coverage for
extract_timespan
with years.The test cases for
extract_timespan
with years are well-covered across multiple language-specific test files, includingtest_parse_de.py
,test_parse_es.py
,test_parse_pt.py
, andtest_parse_en.py
. The variety of test cases intest_parse_de.py
confirms thorough testing for different durations and edge cases.
test/unittests/test_parse_de.py
test/unittests/test_parse_es.py
test/unittests/test_parse_pt.py
test/unittests/test_parse_en.py
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the test coverage for `extract_timespan` with years. # Test: Search for test cases related to `extract_timespan` with years. rg --type python -A 5 $'extract_timespan.*years'Length of output: 80
Script:
#!/bin/bash # Description: Verify the test coverage for `extract_timespan` with years. # Test: Search for test cases related to `extract_timespan` with years. rg --type py -A 5 'extract_timespan.*years'Length of output: 2918
916-955
: Ensure comprehensive test coverage forextract_timespan
with centuries.The new test cases for
extract_timespan
with centuries should cover various scenarios, including edge cases and common use cases. Verify that all relevant scenarios are tested.Verification successful
Comprehensive test coverage for
extract_timespan
with centuries verified.The test cases for
extract_timespan
with centuries cover a wide range of scenarios, including different time units and edge cases across multiple languages. No further action is needed.
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the test coverage for `extract_timespan` with centuries. # Test: Search for test cases related to `extract_timespan` with centuries. rg --type python -A 5 $'extract_timespan.*centuries'Length of output: 84
Script:
#!/bin/bash # Description: Verify the test coverage for `extract_timespan` with centuries. # Find all Python files and search for test cases related to `extract_timespan` with centuries. fd --type f --extension py --exec rg 'extract_timespan.*centuries' {}Length of output: 369
Script:
#!/bin/bash # Description: Extract the contents of test functions related to `extract_timespan` with centuries. # Extract the contents of the test functions found in the previous step. rg --type py -A 20 'def test_extract_timespan_centuries'Length of output: 6477
lingua_franca/lang/parse_en.py (6)
18-18
: LGTM! Import changes are necessary and appropriate.The addition of the
math
module and the modifications to thefrom lingua_franca.time
import statement are necessary for the new functionality.Also applies to: 31-31
1713-1713
: LGTM! The new functionnormalize_en
is necessary and appropriate.The addition of the
normalize_en
function for English string normalization is well-implemented.
1715-1737
: LGTM! The new functionextract_timespan_en
is comprehensive and well-implemented.The function handles converting English phrases into durations and includes comprehensive handling for various time units and conversions.
1769-1806
: LGTM! The handling of time units forTIMEDELTA
is comprehensive.The function correctly handles various time units for
TIMEDELTA
and includes appropriate conversions.
1822-1858
: LGTM! The handling of time units forRELATIVEDELTA
is comprehensive.The function correctly handles various time units for
RELATIVEDELTA
and includes appropriate conversions.
1892-1965
: LGTM! The handling of time units for otherTimespanUnit
values is comprehensive.The function correctly handles various time units for other
TimespanUnit
values and includes appropriate conversions.test/unittests/test_parse_en.py (16)
1934-1949
: Ensure accurate extraction of microseconds.The extraction of microseconds should be accurate and handle small floating point errors.
1977-1989
: Ensure accurate extraction of milliseconds.The extraction of milliseconds should be accurate and handle small floating point errors.
2046-2057
: Ensure accurate extraction of seconds.The extraction of seconds should be accurate and handle small floating point errors.
2098-2109
: Ensure accurate extraction of minutes.The extraction of minutes should be accurate and handle small floating point errors.
2146-2157
: Ensure accurate extraction of hours.The extraction of hours should be accurate and handle small floating point errors.
2194-2205
: Ensure accurate extraction of days.The extraction of days should be accurate and handle small floating point errors.
2242-2253
: Ensure accurate extraction of weeks.The extraction of weeks should be accurate and handle small floating point errors.
2288-2299
: Ensure accurate extraction of months.The extraction of months should be accurate and handle small floating point errors.
2340-2351
: Ensure accurate extraction of years.The extraction of years should be accurate and handle small floating point errors.
2387-2398
: Ensure accurate extraction of decades.The extraction of decades should be accurate and handle small floating point errors.
2430-2441
: Ensure accurate extraction of centuries.The extraction of centuries should be accurate and handle small floating point errors.
2474-2485
: Ensure accurate extraction of millennia.The extraction of millennia should be accurate and handle small floating point errors.
Line range hint
2530-2567
: Ensure comprehensive yes/no tests.The yes/no tests cover a wide range of cases. Ensure that all possible edge cases are covered.
Line range hint
2569-2581
: Ensure accurate language code extraction.The language code extraction tests should ensure accurate results with high confidence.
Line range hint
2583-2596
: Ensure accurate language code extraction for specific cases.The tests for specific language codes should ensure accurate results with high confidence.
1698-1711
: Ensure accurate normalization in tests.The normalization of "fifty seven and a half" to "57.5" may not be accurate. Consider reviewing the normalization logic.
time_units[unit_en] = 0 | ||
|
||
def repl(match): | ||
time_units[unit_en] += float(match.group(1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix the loop variable binding issue.
The function definitions inside the loop do not bind the loop variable unit_en
. This can lead to unexpected behavior.
- def repl(match):
- time_units[unit_en] += float(match.group(1))
+ def repl(unit):
+ def _repl(match):
+ time_units[unit] += float(match.group(1))
+ return _repl
- text = re.sub(unit_pattern, repl, text)
+ text = re.sub(unit_pattern, repl(unit_en), text)
- def repl_non_std(match):
- val = float(match.group(1))
- if unit_en == "months":
- val = DAYS_IN_1_MONTH * val
- if unit_en == "years":
- val = DAYS_IN_1_YEAR * val
- if unit_en == "decades":
- val = 10 * DAYS_IN_1_YEAR * val
- if unit_en == "centurys":
- val = 100 * DAYS_IN_1_YEAR * val
- if unit_en == "millenniums":
- val = 1000 * DAYS_IN_1_YEAR * val
- time_units["days"] += val
- return ''
+ def repl_non_std(unit):
+ def _repl_non_std(match):
+ val = float(match.group(1))
+ if unit == "months":
+ val = DAYS_IN_1_MONTH * val
+ if unit == "years":
+ val = DAYS_IN_1_YEAR * val
+ if unit == "decades":
+ val = 10 * DAYS_IN_1_YEAR * val
+ if unit == "centurys":
+ val = 100 * DAYS_IN_1_YEAR * val
+ if unit == "millenniums":
+ val = 1000 * DAYS_IN_1_YEAR * val
+ time_units["days"] += val
+ return ''
+ return _repl_non_std
- text = re.sub(unit_pattern, repl_non_std, text)
+ text = re.sub(unit_pattern, repl_non_std(unit_en), text)
Also applies to: 1167-1176
Tools
Ruff
1156-1156: Function definition does not bind loop variable
unit_en
(B023)
def extract_duration_pt(text): | ||
""" | ||
Convert an portuguese phrase into a number of seconds | ||
Convert things like: | ||
"10 Minutos" | ||
"3 dias 8 horas 10 Minutos e 49 Segundos" | ||
into an int, representing the total number of seconds. | ||
The words used in the duration will be consumed, and | ||
the remainder returned. | ||
As an example, "set a timer for 5 minutes" would return | ||
(300, "set a timer for"). | ||
Args: | ||
text (str): string containing a duration | ||
Returns: | ||
(timedelta, str): | ||
A tuple containing the duration and the remaining text | ||
not consumed in the parsing. The first value will | ||
be None if no duration is found. The text returned | ||
will have whitespace stripped from the ends. | ||
""" | ||
if not text: | ||
return None | ||
|
||
text = text.lower() | ||
time_units = { | ||
'microseconds': 'microsegundos', | ||
'milliseconds': 'milisegundos', | ||
'seconds': 'segundos', | ||
'minutes': 'minutos', | ||
'hours': 'horas', | ||
'days': 'dias', | ||
'weeks': 'semanas' | ||
} | ||
non_std_un = { | ||
"months": "meses", | ||
"years": "anos", | ||
'decades': "decadas", | ||
'centurys': "seculos", | ||
'millenniums': "milenios" | ||
} | ||
|
||
pattern = r"(?P<value>\d+(?:\.?\d+)?)(?:\s+|\-){unit}[s]?" | ||
|
||
text = text.replace("mês", "meses").replace("é", "e") | ||
text = text.replace("segundo", "_s_") # HACK - segundo (second) will be replaced with 2 | ||
text = PortugueseNormalizer().numbers_to_digits(text) | ||
text = text.replace("_s_", "segundo") # undo HACK | ||
|
||
for (unit_en, unit_pt) in time_units.items(): | ||
unit_pattern = pattern.format( | ||
unit=unit_pt[:-1]) # remove 's' from unit | ||
time_units[unit_en] = 0 | ||
|
||
def repl(match): | ||
time_units[unit_en] += float(match.group(1)) | ||
return '' | ||
|
||
text = re.sub(unit_pattern, repl, text) | ||
|
||
for (unit_en, unit_pt) in non_std_un.items(): | ||
unit_pattern = pattern.format( | ||
unit=unit_pt[:-1]) # remove 's' from unit | ||
|
||
def repl_non_std(match): | ||
val = float(match.group(1)) | ||
if unit_en == "months": | ||
val = DAYS_IN_1_MONTH * val | ||
if unit_en == "years": | ||
val = DAYS_IN_1_YEAR * val | ||
if unit_en == "decades": | ||
val = 10 * DAYS_IN_1_YEAR * val | ||
if unit_en == "centurys": | ||
val = 100 * DAYS_IN_1_YEAR * val | ||
if unit_en == "millenniums": | ||
val = 1000 * DAYS_IN_1_YEAR * val | ||
time_units["days"] += val | ||
return '' | ||
|
||
text = re.sub(unit_pattern, repl_non_std, text) | ||
|
||
text = text.strip() | ||
duration = timedelta(**time_units) if any(time_units.values()) else None | ||
|
||
return (duration, text) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding unit tests for extract_duration_pt
.
To ensure the correctness of this function, consider adding unit tests that cover various edge cases and typical use cases.
Do you want me to generate the unit testing code or open a GitHub issue to track this task?
Tools
Ruff
1156-1156: Function definition does not bind loop variable
unit_en
(B023)
1167-1167: Function definition does not bind loop variable
unit_en
(B023)
1169-1169: Function definition does not bind loop variable
unit_en
(B023)
1171-1171: Function definition does not bind loop variable
unit_en
(B023)
1173-1173: Function definition does not bind loop variable
unit_en
(B023)
1175-1175: Function definition does not bind loop variable
unit_en
(B023)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JarbasAl, here are the unit tests for the extract_duration_pt
function:
import unittest
from datetime import timedelta
from lingua_franca.lang.parse_pt import extract_duration_pt
class TestExtractDurationPT(unittest.TestCase):
def test_single_unit(self):
self.assertEqual(extract_duration_pt("10 segundos"), (timedelta(seconds=10), ""))
self.assertEqual(extract_duration_pt("5 minutos"), (timedelta(minutes=5), ""))
self.assertEqual(extract_duration_pt("2 horas"), (timedelta(hours=2), ""))
self.assertEqual(extract_duration_pt("3 dias"), (timedelta(days=3), ""))
self.assertEqual(extract_duration_pt("1 semana"), (timedelta(weeks=1), ""))
def test_multiple_units(self):
self.assertEqual(extract_duration_pt("3 dias 8 horas 10 minutos e 49 segundos"),
(timedelta(days=3, hours=8, minutes=10, seconds=49), ""))
self.assertEqual(extract_duration_pt("2 semanas 3 dias 4 horas"),
(timedelta(weeks=2, days=3, hours=4), ""))
self.assertEqual(extract_duration_pt("1 ano 2 meses 3 dias"),
(timedelta(days=365.2425 + 2 * 30.42 + 3), ""))
def test_non_standard_units(self):
self.assertEqual(extract_duration_pt("1 mês"), (timedelta(days=30.42), ""))
self.assertEqual(extract_duration_pt("1 ano"), (timedelta(days=365.2425), ""))
self.assertEqual(extract_duration_pt("1 década"), (timedelta(days=10 * 365.2425), ""))
self.assertEqual(extract_duration_pt("1 século"), (timedelta(days=100 * 365.2425), ""))
self.assertEqual(extract_duration_pt("1 milênio"), (timedelta(days=1000 * 365.2425), ""))
def test_mixed_units(self):
self.assertEqual(extract_duration_pt("1 ano 2 meses 3 dias 4 horas 5 minutos 6 segundos"),
(timedelta(days=365.2425 + 2 * 30.42 + 3, hours=4, minutes=5, seconds=6), ""))
def test_edge_cases(self):
self.assertEqual(extract_duration_pt(""), (None, ""))
self.assertEqual(extract_duration_pt("sem tempo"), (None, "sem tempo"))
self.assertEqual(extract_duration_pt("10"), (None, "10"))
if __name__ == '__main__':
unittest.main()
These tests cover various scenarios including single time units, multiple time units combined, non-standard time units like months and years, and edge cases such as empty strings and invalid input. This should help ensure the robustness of the extract_duration_pt
function.
Learnings added
Learnt from: JarbasAl
PR: OpenVoiceOS/ovos-lingua-franca#37
File: lingua_franca/lang/parse_pt.py:1102-1185
Timestamp: 2024-07-19T00:19:09.766Z
Learning: Generate unit tests for the `extract_duration_pt` function in `lingua_franca/lang/parse_pt.py` to cover various scenarios including single time units, multiple time units combined, non-standard time units, and edge cases.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
@@ -122,7 +127,7 @@ def extract_duration_de(text): | |||
} | |||
|
|||
# Einzahl und Mehrzahl | |||
pattern = r"(?P<value>\d+(?:\.?\d+)?)(?:\s+|\-){unit}[ne]?" | |||
pattern = r"(?P<value>\d+(?:[.,]?\d+)?|\bein[e]?[nsm]?\b)(?:\s+|\-)(?P<unit>{unit}[nes]?[sn]?\b)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reminder: Implement text-to-number conversion.
The TODO comment indicates that the text-to-number conversion is missing.
Do you want me to help implement the text-to-number conversion function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JarbasAl, I'll implement the text-to-number conversion function and integrate it into the extract_duration_de
function. Here is the updated code:
def _convert_words_to_numbers_de(text):
"""
Convert German number words in the text to digits.
Args:
text (str): The input text containing number words.
Returns:
str: The text with number words converted to digits.
"""
words = text.split()
for i, word in enumerate(words):
if word in de_numbers:
words[i] = str(de_numbers[word])
return ' '.join(words)
def extract_duration_de(text):
"""
Convert a German phrase into a number of seconds.
Convert things like:
"10 Minuten"
"3 Tage 8 Stunden 10 Minuten und 49 Sekunden"
into an int, representing the total number of seconds.
The words used in the duration will be consumed, and
the remainder returned.
As an example, "set a timer for 5 minutes" would return
(300, "set a timer for").
Args:
text (str): string containing a duration
Returns:
(timedelta, str):
A tuple containing the duration and the remaining text
not consumed in the parsing. The first value will
be None if no duration is found. The text returned
will have whitespace stripped from the ends.
"""
if not text:
return None
text = GermanNormalizer().numbers_to_digits(text.lower())
text = _convert_words_to_numbers_de(text) # Convert number words to digits
# die time_unit values werden für timedelta() mit dem jeweiligen Wert überschrieben
time_units = {
'microseconds': 'mikrosekunden',
'milliseconds': 'millisekunden',
'seconds': 'sekunden',
'minutes': 'minuten',
'hours': 'stunden',
'days': 'tage',
'weeks': 'wochen'
}
# Einzahl und Mehrzahl
pattern = r"(?P<value>\d+(?:[.,]?\d+)?|\bein[e]?[nsm]?\b)(?:\s+|\-)(?P<unit>{unit}[nes]?[sn]?\b)"
for (unit_en, unit_de) in time_units.items():
unit_pattern = pattern.format(
unit=unit_de[:-1]) # remove 'n'/'e' from unit
time_units[unit_en] = 0
def repl(match):
value = match.group(1).replace(",",".").replace("einem", "1") \
.replace("eines", "1").replace("einen", "1") \
.replace("eine", "1").replace("ein", "1")
time_units[unit_en] += float(value)
return ''
text = re.sub(unit_pattern, repl, text)
text = text.strip()
duration = timedelta(**time_units) if any(time_units.values()) else None
return (duration, text)
This code includes the _convert_words_to_numbers_de
function to convert German number words to digits and integrates it into the extract_duration_de
function.
Learnings added
Learnt from: JarbasAl
PR: OpenVoiceOS/ovos-lingua-franca#37
File: lingua_franca/lang/parse_de.py:130-130
Timestamp: 2024-07-19T00:18:56.842Z
Learning: The `extract_duration_de` function in `lingua_franca/lang/parse_de.py` requires a text-to-number conversion function to handle German number words.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
# TODO German version of _convert_words_to_numbers is missing completely | ||
# Have to port my implementation, up until then -> Normalizer + replace | ||
text = GermanNormalizer().numbers_to_digits(text.lower()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reminder: Implement text-to-number conversion.
The TODO comment indicates that the text-to-number conversion is missing.
Do you want me to help implement the text-to-number conversion function?
print("[WARNING] relativedelta requires {unit} to be an " | ||
"integer".format(unit=unit)) | ||
# fallback to timedelta resolution / raw tokens text with no flags | ||
return extract_timespan_de(text, TimespanUnit.TIMEDELTA) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid self-recursive calls without base case.
The call to extract_timespan_de
within itself can cause infinite recursion if not handled properly. Ensure there's a base case to prevent this.
from lingua_franca.lang.parse_es import extract_datetime_es, is_fractional_es | ||
from lingua_franca.time import default_timezone | ||
from lingua_franca.parse import get_gender, extract_datetime, extract_number, normalize, yes_or_no, extract_timespan | ||
from lingua_franca.time import default_timezone, to_local, DAYS_IN_1_YEAR, DAYS_IN_1_MONTH, TimespanUnit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove unused import.
The to_local
import from lingua_franca.time
is not used in the file.
- from lingua_franca.time import default_timezone, to_local, DAYS_IN_1_YEAR, DAYS_IN_1_MONTH, TimespanUnit
+ from lingua_franca.time import default_timezone, DAYS_IN_1_YEAR, DAYS_IN_1_MONTH, TimespanUnit
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
from lingua_franca.time import default_timezone, to_local, DAYS_IN_1_YEAR, DAYS_IN_1_MONTH, TimespanUnit | |
from lingua_franca.time import default_timezone, DAYS_IN_1_YEAR, DAYS_IN_1_MONTH, TimespanUnit |
Tools
Ruff
25-25:
lingua_franca.time.to_local
imported but unusedRemove unused import:
lingua_franca.time.to_local
(F401)
from lingua_franca.lang.parse_es import extract_datetime_es, is_fractional_es | ||
from lingua_franca.time import default_timezone | ||
from lingua_franca.parse import get_gender, extract_datetime, extract_number, normalize, yes_or_no, extract_timespan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove unused and redefined imports.
The imports for get_gender
, extract_datetime
, extract_number
, normalize
, and yes_or_no
are either unused or redefined.
- from lingua_franca.parse import get_gender, extract_datetime, extract_number, normalize, yes_or_no, extract_timespan
+ from lingua_franca.parse import extract_timespan
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
from lingua_franca.parse import get_gender, extract_datetime, extract_number, normalize, yes_or_no, extract_timespan | |
from lingua_franca.parse import extract_timespan |
Tools
Ruff
24-24:
lingua_franca.parse.get_gender
imported but unusedRemove unused import:
lingua_franca.parse.get_gender
(F401)
24-24: Redefinition of unused
extract_datetime
from line 22Remove definition:
extract_datetime
(F811)
24-24: Redefinition of unused
extract_number
from line 21Remove definition:
extract_number
(F811)
24-24: Redefinition of unused
normalize
from line 21Remove definition:
normalize
(F811)
24-24: Redefinition of unused
yes_or_no
from line 22Remove definition:
yes_or_no
(F811)
def test_extract_timespan_ambiguous(self): | ||
self.assertRaises(ValueError, extract_timespan, "1.3 months", | ||
time_unit=TimespanUnit.RELATIVEDELTA) | ||
self.assertRaises(ValueError, extract_timespan, "1.3 months", | ||
time_unit=TimespanUnit.RELATIVEDELTA_STRICT) | ||
self.assertEqual( | ||
extract_timespan("1.3 months", | ||
time_unit=TimespanUnit.RELATIVEDELTA_FALLBACK), | ||
(timedelta(days=1.3 * DAYS_IN_1_MONTH), "")) | ||
|
||
|
||
# NOTE: for some reason this test fails with | ||
# (relativedelta(months=+1, days=+9.126), '') != (relativedelta(months=+1, days=+9.126), '') | ||
# correct result is being returned | ||
|
||
#self.assertEqual( | ||
# extract_timespan("1.3 months", | ||
# resolution=TimespanUnit.RELATIVEDELTA_APPROXIMATE), | ||
# (relativedelta(months=1, days=0.3 * DAYS_IN_1_MONTH), "")) | ||
|
||
self.assertEqual( | ||
extract_timespan("1.3 months", | ||
time_unit=TimespanUnit.RELATIVEDELTA_APPROXIMATE | ||
)[0].months, 1) | ||
self.assertAlmostEquals( | ||
extract_timespan("1.3 months", | ||
time_unit=TimespanUnit.RELATIVEDELTA_APPROXIMATE | ||
)[0].days, 0.3 * DAYS_IN_1_MONTH) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle ambiguous timespan extraction gracefully.
The tests for ambiguous timespan extraction should handle exceptions gracefully and ensure accurate results.
- self.assertRaises(ValueError, extract_timespan, "1.3 months",
- time_unit=TimespanUnit.RELATIVEDELTA)
- self.assertRaises(ValueError, extract_timespan, "1.3 months",
- time_unit=TimespanUnit.RELATIVEDELTA_STRICT)
- self.assertEqual(
- extract_timespan("1.3 months",
- time_unit=TimespanUnit.RELATIVEDELTA_FALLBACK),
- (timedelta(days=1.3 * DAYS_IN_1_MONTH), ""))
+ with self.assertRaises(ValueError):
+ extract_timespan("1.3 months", time_unit=TimespanUnit.RELATIVEDELTA)
+ with self.assertRaises(ValueError):
+ extract_timespan("1.3 months", time_unit=TimespanUnit.RELATIVEDELTA_STRICT)
+ self.assertEqual(
+ extract_timespan("1.3 months", time_unit=TimespanUnit.RELATIVEDELTA_FALLBACK),
+ (timedelta(days=1.3 * DAYS_IN_1_MONTH), ""))
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def test_extract_timespan_ambiguous(self): | |
self.assertRaises(ValueError, extract_timespan, "1.3 months", | |
time_unit=TimespanUnit.RELATIVEDELTA) | |
self.assertRaises(ValueError, extract_timespan, "1.3 months", | |
time_unit=TimespanUnit.RELATIVEDELTA_STRICT) | |
self.assertEqual( | |
extract_timespan("1.3 months", | |
time_unit=TimespanUnit.RELATIVEDELTA_FALLBACK), | |
(timedelta(days=1.3 * DAYS_IN_1_MONTH), "")) | |
# NOTE: for some reason this test fails with | |
# (relativedelta(months=+1, days=+9.126), '') != (relativedelta(months=+1, days=+9.126), '') | |
# correct result is being returned | |
#self.assertEqual( | |
# extract_timespan("1.3 months", | |
# resolution=TimespanUnit.RELATIVEDELTA_APPROXIMATE), | |
# (relativedelta(months=1, days=0.3 * DAYS_IN_1_MONTH), "")) | |
self.assertEqual( | |
extract_timespan("1.3 months", | |
time_unit=TimespanUnit.RELATIVEDELTA_APPROXIMATE | |
)[0].months, 1) | |
self.assertAlmostEquals( | |
extract_timespan("1.3 months", | |
time_unit=TimespanUnit.RELATIVEDELTA_APPROXIMATE | |
)[0].days, 0.3 * DAYS_IN_1_MONTH) | |
with self.assertRaises(ValueError): | |
extract_timespan("1.3 months", time_unit=TimespanUnit.RELATIVEDELTA) | |
with self.assertRaises(ValueError): | |
extract_timespan("1.3 months", time_unit=TimespanUnit.RELATIVEDELTA_STRICT) | |
self.assertEqual( | |
extract_timespan("1.3 months", time_unit=TimespanUnit.RELATIVEDELTA_FALLBACK), | |
(timedelta(days=1.3 * DAYS_IN_1_MONTH), "")) |
self.assertEqual(extract_duration("3 months"), | ||
(timedelta(days=DAYS_IN_1_MONTH * 3), "")) | ||
self.assertEqual(extract_duration("a year"), | ||
(timedelta(days=DAYS_IN_1_YEAR), "")) | ||
self.assertEqual(extract_duration("1 year"), | ||
(timedelta(days=DAYS_IN_1_YEAR * 1), "")) | ||
self.assertEqual(extract_duration("5 years"), | ||
(timedelta(days=DAYS_IN_1_YEAR * 5), "")) | ||
self.assertEqual(extract_duration("a decade"), | ||
(timedelta(days=DAYS_IN_1_YEAR * 10), "")) | ||
self.assertEqual(extract_duration("1 decade"), | ||
(timedelta(days=DAYS_IN_1_YEAR * 10), "")) | ||
self.assertEqual(extract_duration("5 decades"), | ||
(timedelta(days=DAYS_IN_1_YEAR * 10 * 5), "")) | ||
self.assertEqual(extract_duration("1 century"), | ||
(timedelta(days=DAYS_IN_1_YEAR * 100), "")) | ||
self.assertEqual(extract_duration("a century"), | ||
(timedelta(days=DAYS_IN_1_YEAR * 100), "")) | ||
self.assertEqual(extract_duration("5 centuries"), | ||
(timedelta(days=DAYS_IN_1_YEAR * 100 * 5), "")) | ||
self.assertEqual(extract_duration("1 millennium"), | ||
(timedelta(days=DAYS_IN_1_YEAR * 1000), "")) | ||
self.assertEqual(extract_duration("5 millenniums"), | ||
(timedelta(days=DAYS_IN_1_YEAR * 1000 * 5), "")) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consolidate similar test cases.
The test cases for months and years can be consolidated to avoid redundancy and improve maintainability.
- self.assertEqual(extract_duration("3 months"),
- (timedelta(days=DAYS_IN_1_MONTH * 3), ""))
- self.assertEqual(extract_duration("a year"),
- (timedelta(days=DAYS_IN_1_YEAR), ""))
- self.assertEqual(extract_duration("1 year"),
- (timedelta(days=DAYS_IN_1_YEAR * 1), ""))
- self.assertEqual(extract_duration("5 years"),
- (timedelta(days=DAYS_IN_1_YEAR * 5), ""))
- self.assertEqual(extract_duration("a decade"),
- (timedelta(days=DAYS_IN_1_YEAR * 10), ""))
- self.assertEqual(extract_duration("1 decade"),
- (timedelta(days=DAYS_IN_1_YEAR * 10), ""))
- self.assertEqual(extract_duration("5 decades"),
- (timedelta(days=DAYS_IN_1_YEAR * 10 * 5), ""))
- self.assertEqual(extract_duration("1 century"),
- (timedelta(days=DAYS_IN_1_YEAR * 100), ""))
- self.assertEqual(extract_duration("a century"),
- (timedelta(days=DAYS_IN_1_YEAR * 100), ""))
- self.assertEqual(extract_duration("5 centuries"),
- (timedelta(days=DAYS_IN_1_YEAR * 100 * 5), ""))
- self.assertEqual(extract_duration("1 millennium"),
- (timedelta(days=DAYS_IN_1_YEAR * 1000), ""))
- self.assertEqual(extract_duration("5 millenniums"),
- (timedelta(days=DAYS_IN_1_YEAR * 1000 * 5), ""))
+ test_cases = [
+ ("3 months", timedelta(days=DAYS_IN_1_MONTH * 3)),
+ ("a year", timedelta(days=DAYS_IN_1_YEAR)),
+ ("1 year", timedelta(days=DAYS_IN_1_YEAR)),
+ ("5 years", timedelta(days=DAYS_IN_1_YEAR * 5)),
+ ("a decade", timedelta(days=DAYS_IN_1_YEAR * 10)),
+ ("1 decade", timedelta(days=DAYS_IN_1_YEAR * 10)),
+ ("5 decades", timedelta(days=DAYS_IN_1_YEAR * 10 * 5)),
+ ("1 century", timedelta(days=DAYS_IN_1_YEAR * 100)),
+ ("a century", timedelta(days=DAYS_IN_1_YEAR * 100)),
+ ("5 centuries", timedelta(days=DAYS_IN_1_YEAR * 100 * 5)),
+ ("1 millennium", timedelta(days=DAYS_IN_1_YEAR * 1000)),
+ ("5 millenniums", timedelta(days=DAYS_IN_1_YEAR * 1000 * 5))
+ ]
+ for text, expected in test_cases:
+ self.assertEqual(extract_duration(text), (expected, ""))
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
self.assertEqual(extract_duration("3 months"), | |
(timedelta(days=DAYS_IN_1_MONTH * 3), "")) | |
self.assertEqual(extract_duration("a year"), | |
(timedelta(days=DAYS_IN_1_YEAR), "")) | |
self.assertEqual(extract_duration("1 year"), | |
(timedelta(days=DAYS_IN_1_YEAR * 1), "")) | |
self.assertEqual(extract_duration("5 years"), | |
(timedelta(days=DAYS_IN_1_YEAR * 5), "")) | |
self.assertEqual(extract_duration("a decade"), | |
(timedelta(days=DAYS_IN_1_YEAR * 10), "")) | |
self.assertEqual(extract_duration("1 decade"), | |
(timedelta(days=DAYS_IN_1_YEAR * 10), "")) | |
self.assertEqual(extract_duration("5 decades"), | |
(timedelta(days=DAYS_IN_1_YEAR * 10 * 5), "")) | |
self.assertEqual(extract_duration("1 century"), | |
(timedelta(days=DAYS_IN_1_YEAR * 100), "")) | |
self.assertEqual(extract_duration("a century"), | |
(timedelta(days=DAYS_IN_1_YEAR * 100), "")) | |
self.assertEqual(extract_duration("5 centuries"), | |
(timedelta(days=DAYS_IN_1_YEAR * 100 * 5), "")) | |
self.assertEqual(extract_duration("1 millennium"), | |
(timedelta(days=DAYS_IN_1_YEAR * 1000), "")) | |
self.assertEqual(extract_duration("5 millenniums"), | |
(timedelta(days=DAYS_IN_1_YEAR * 1000 * 5), "")) | |
test_cases = [ | |
("3 months", timedelta(days=DAYS_IN_1_MONTH * 3)), | |
("a year", timedelta(days=DAYS_IN_1_YEAR)), | |
("1 year", timedelta(days=DAYS_IN_1_YEAR)), | |
("5 years", timedelta(days=DAYS_IN_1_YEAR * 5)), | |
("a decade", timedelta(days=DAYS_IN_1_YEAR * 10)), | |
("1 decade", timedelta(days=DAYS_IN_1_YEAR * 10)), | |
("5 decades", timedelta(days=DAYS_IN_1_YEAR * 10 * 5)), | |
("1 century", timedelta(days=DAYS_IN_1_YEAR * 100)), | |
("a century", timedelta(days=DAYS_IN_1_YEAR * 100)), | |
("5 centuries", timedelta(days=DAYS_IN_1_YEAR * 100 * 5)), | |
("1 millennium", timedelta(days=DAYS_IN_1_YEAR * 1000)), | |
("5 millenniums", timedelta(days=DAYS_IN_1_YEAR * 1000 * 5)) | |
] | |
for text, expected in test_cases: | |
self.assertEqual(extract_duration(text), (expected, "")) |
def extract_duration_es(text): | ||
""" | ||
Convert an spanish phrase into a number of seconds | ||
Convert things like: | ||
"10 Minutos" | ||
"3 dias 8 horas 10 Minutos e 49 Segundos" | ||
into an int, representing the total number of seconds. | ||
The words used in the duration will be consumed, and | ||
the remainder returned. | ||
As an example, "set a timer for 5 minutes" would return | ||
(300, "set a timer for"). | ||
Args: | ||
text (str): string containing a duration | ||
Returns: | ||
(timedelta, str): | ||
A tuple containing the duration and the remaining text | ||
not consumed in the parsing. The first value will | ||
be None if no duration is found. The text returned | ||
will have whitespace stripped from the ends. | ||
""" | ||
if not text: | ||
return None | ||
|
||
text = text.lower().replace("í", "i") | ||
time_units = { | ||
'microseconds': 'microsegundos', | ||
'milliseconds': 'milisegundos', | ||
'seconds': 'segundos', | ||
'minutes': 'minutos', | ||
'hours': 'horas', | ||
'days': 'dias', | ||
'weeks': 'semanas' | ||
} | ||
non_std_un = { | ||
"months": "mes", | ||
"years": "anos", | ||
'decades': "decadas", | ||
'centurys': "siglos", | ||
'millenniums': "milenios" | ||
} | ||
|
||
pattern = r"(?P<value>\d+(?:\.?\d+)?)(?:\s+|\-){unit}[s]?" | ||
|
||
text = text.replace("í", "i").replace("é", "e").replace("ñ", "n").replace("meses", "mes") | ||
text = SpanishNormalizer().numbers_to_digits(text) | ||
|
||
for (unit_en, unit_es) in time_units.items(): | ||
unit_pattern = pattern.format( | ||
unit=unit_es[:-1]) # remove 's' from unit | ||
time_units[unit_en] = 0 | ||
|
||
def repl(match): | ||
time_units[unit_en] += float(match.group(1)) | ||
return '' | ||
|
||
text = re.sub(unit_pattern, repl, text) | ||
|
||
for (unit_en, unit_es) in non_std_un.items(): | ||
unit_pattern = pattern.format( | ||
unit=unit_es[:-1]) # remove 's' from unit | ||
|
||
def repl_non_std(match): | ||
val = float(match.group(1)) | ||
if unit_en == "months": | ||
val = DAYS_IN_1_MONTH * val | ||
if unit_en == "years": | ||
val = DAYS_IN_1_YEAR * val | ||
if unit_en == "decades": | ||
val = 10 * DAYS_IN_1_YEAR * val | ||
if unit_en == "centurys": | ||
val = 100 * DAYS_IN_1_YEAR * val | ||
if unit_en == "millenniums": | ||
val = 1000 * DAYS_IN_1_YEAR * val | ||
time_units["days"] += val | ||
return '' | ||
|
||
text = re.sub(unit_pattern, repl_non_std, text) | ||
|
||
text = text.strip() | ||
duration = timedelta(**time_units) if any(time_units.values()) else None | ||
|
||
return (duration, text) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure re
is imported.
The re
module is used but not imported.
+ import re
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def extract_duration_es(text): | |
""" | |
Convert an spanish phrase into a number of seconds | |
Convert things like: | |
"10 Minutos" | |
"3 dias 8 horas 10 Minutos e 49 Segundos" | |
into an int, representing the total number of seconds. | |
The words used in the duration will be consumed, and | |
the remainder returned. | |
As an example, "set a timer for 5 minutes" would return | |
(300, "set a timer for"). | |
Args: | |
text (str): string containing a duration | |
Returns: | |
(timedelta, str): | |
A tuple containing the duration and the remaining text | |
not consumed in the parsing. The first value will | |
be None if no duration is found. The text returned | |
will have whitespace stripped from the ends. | |
""" | |
if not text: | |
return None | |
text = text.lower().replace("í", "i") | |
time_units = { | |
'microseconds': 'microsegundos', | |
'milliseconds': 'milisegundos', | |
'seconds': 'segundos', | |
'minutes': 'minutos', | |
'hours': 'horas', | |
'days': 'dias', | |
'weeks': 'semanas' | |
} | |
non_std_un = { | |
"months": "mes", | |
"years": "anos", | |
'decades': "decadas", | |
'centurys': "siglos", | |
'millenniums': "milenios" | |
} | |
pattern = r"(?P<value>\d+(?:\.?\d+)?)(?:\s+|\-){unit}[s]?" | |
text = text.replace("í", "i").replace("é", "e").replace("ñ", "n").replace("meses", "mes") | |
text = SpanishNormalizer().numbers_to_digits(text) | |
for (unit_en, unit_es) in time_units.items(): | |
unit_pattern = pattern.format( | |
unit=unit_es[:-1]) # remove 's' from unit | |
time_units[unit_en] = 0 | |
def repl(match): | |
time_units[unit_en] += float(match.group(1)) | |
return '' | |
text = re.sub(unit_pattern, repl, text) | |
for (unit_en, unit_es) in non_std_un.items(): | |
unit_pattern = pattern.format( | |
unit=unit_es[:-1]) # remove 's' from unit | |
def repl_non_std(match): | |
val = float(match.group(1)) | |
if unit_en == "months": | |
val = DAYS_IN_1_MONTH * val | |
if unit_en == "years": | |
val = DAYS_IN_1_YEAR * val | |
if unit_en == "decades": | |
val = 10 * DAYS_IN_1_YEAR * val | |
if unit_en == "centurys": | |
val = 100 * DAYS_IN_1_YEAR * val | |
if unit_en == "millenniums": | |
val = 1000 * DAYS_IN_1_YEAR * val | |
time_units["days"] += val | |
return '' | |
text = re.sub(unit_pattern, repl_non_std, text) | |
text = text.strip() | |
duration = timedelta(**time_units) if any(time_units.values()) else None | |
return (duration, text) | |
def extract_duration_es(text): | |
""" | |
Convert an spanish phrase into a number of seconds | |
Convert things like: | |
"10 Minutos" | |
"3 dias 8 horas 10 Minutos e 49 Segundos" | |
into an int, representing the total number of seconds. | |
The words used in the duration will be consumed, and | |
the remainder returned. | |
As an example, "set a timer for 5 minutes" would return | |
(300, "set a timer for"). | |
Args: | |
text (str): string containing a duration | |
Returns: | |
(timedelta, str): | |
A tuple containing the duration and the remaining text | |
not consumed in the parsing. The first value will | |
be None if no duration is found. The text returned | |
will have whitespace stripped from the ends. | |
""" | |
if not text: | |
return None | |
text = text.lower().replace("í", "i") | |
time_units = { | |
'microseconds': 'microsegundos', | |
'milliseconds': 'milisegundos', | |
'seconds': 'segundos', | |
'minutes': 'minutos', | |
'hours': 'horas', | |
'days': 'dias', | |
'weeks': 'semanas' | |
} | |
non_std_un = { | |
"months": "mes", | |
"years": "anos", | |
'decades': "decadas", | |
'centurys': "siglos", | |
'millenniums': "milenios" | |
} | |
pattern = r"(?P<value>\d+(?:\.?\d+)?)(?:\s+|\-){unit}[s]?" | |
text = text.replace("í", "i").replace("é", "e").replace("ñ", "n").replace("meses", "mes") | |
text = SpanishNormalizer().numbers_to_digits(text) | |
for (unit_en, unit_es) in time_units.items(): | |
unit_pattern = pattern.format( | |
unit=unit_es[:-1]) # remove 's' from unit | |
time_units[unit_en] = 0 | |
def repl(match): | |
time_units[unit_en] += float(match.group(1)) | |
return '' | |
text = re.sub(unit_pattern, repl, text) | |
for (unit_en, unit_es) in non_std_un.items(): | |
unit_pattern = pattern.format( | |
unit=unit_es[:-1]) # remove 's' from unit | |
def repl_non_std(match): | |
val = float(match.group(1)) | |
if unit_en == "months": | |
val = DAYS_IN_1_MONTH * val | |
if unit_en == "years": | |
val = DAYS_IN_1_YEAR * val | |
if unit_en == "decades": | |
val = 10 * DAYS_IN_1_YEAR * val | |
if unit_en == "centurys": | |
val = 100 * DAYS_IN_1_YEAR * val | |
if unit_en == "millenniums": | |
val = 1000 * DAYS_IN_1_YEAR * val | |
time_units["days"] += val | |
return '' | |
text = re.sub(unit_pattern, repl_non_std, text) | |
text = text.strip() | |
duration = timedelta(**time_units) if any(time_units.values()) else None | |
return (duration, text) |
Tools
Ruff
1128-1128: Function definition does not bind loop variable
unit_en
(B023)
1131-1131:
re
may be undefined, or defined from star imports(F405)
1139-1139: Function definition does not bind loop variable
unit_en
(B023)
1141-1141: Function definition does not bind loop variable
unit_en
(B023)
1143-1143: Function definition does not bind loop variable
unit_en
(B023)
1145-1145: Function definition does not bind loop variable
unit_en
(B023)
1147-1147: Function definition does not bind loop variable
unit_en
(B023)
1152-1152:
re
may be undefined, or defined from star imports(F405)
continuation of HelloChatterbox#27
Summary by CodeRabbit
New Features
Bug Fixes
Tests
Chores