Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/extract timespan #37

Draft
wants to merge 12 commits into
base: dev
Choose a base branch
from
Draft

Feat/extract timespan #37

wants to merge 12 commits into from

Conversation

JarbasAl
Copy link
Member

@JarbasAl JarbasAl commented Nov 26, 2022

continuation of HelloChatterbox#27

Summary by CodeRabbit

  • New Features

    • Introduced functions to extract and normalize time durations from text in German, English, Spanish, and Portuguese.
    • Added support for new time duration units (months, years, decades, centuries, and millennia).
  • Bug Fixes

    • Improved accuracy in converting phrases to durations by updating regex patterns and number normalization.
  • Tests

    • Added comprehensive tests for all new duration extraction functions across multiple languages.
  • Chores

    • Updated documentation and configuration files to support new features and improvements.

@JarbasAl JarbasAl added the enhancement New feature or request label Nov 26, 2022
@codecov
Copy link

codecov bot commented Nov 26, 2022

Codecov Report

❗ No coverage uploaded for pull request base (dev@135d8c5). Click here to learn what that means.
The diff coverage is n/a.

@@          Coverage Diff          @@
##             dev     #37   +/-   ##
=====================================
  Coverage       ?   0.00%           
=====================================
  Files          ?      65           
  Lines          ?   16825           
  Branches       ?       0           
=====================================
  Hits           ?       0           
  Misses         ?   16825           
  Partials       ?       0           

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

lingua_franca/lang/parse_en.py Show resolved Hide resolved

def extract_timespan_en(text, time_unit=TimespanUnit.TIMEDELTA, replace_token=""):
"""
Convert an english phrase into a number of seconds
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this just extract_duration?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes but it changes return type and adds the replace token, its mainly a compatibility thing so extract_duration always means the same (mycroft) thing

lingua_franca/lang/parse_en.py Show resolved Hide resolved
lingua_franca/lang/parse_en.py Show resolved Hide resolved
@JarbasAl JarbasAl marked this pull request as draft January 4, 2023 12:55
@JarbasAl
Copy link
Member Author

@coderabbitai review

Copy link

coderabbitai bot commented Jul 18, 2024

Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

coderabbitai bot commented Jul 18, 2024

Walkthrough

The recent changes introduce new functions and updates across multiple language modules to enhance text parsing capabilities in lingua_franca. Major additions include new functions for extracting durations and timespans in German, English, Spanish, and Portuguese, along with normalization functions and extensive updates to unit tests. These improvements enable more accurate and versatile time-related text processing.

Changes

Files Change Summary
lingua_franca/lang/parse_de.py Added imports, new extract_timespan_de function, modified extract_duration_de, updated GermanNormalizer, added normalize_de function.
lingua_franca/lang/parse_en.py Added imports, new extract_timespan_en function, modified extract_duration_en, added normalize_en function.
lingua_franca/lang/parse_es.py, lingua_franca/lang/parse_pt.py Added imports, new extract_duration_es and extract_duration_pt functions.
lingua_franca/parse.py Added new extract_timespan function.
lingua_franca/res/text/de-de/normalize.json Introduced new text normalization rules for German.
lingua_franca/time.py Added TimespanUnit enum, constants DAYS_IN_1_YEAR, DAYS_IN_1_MONTH.
test/unittests/test_parse_es.py, test/unittests/test_parse_pt.py Added imports, new tests for extract_duration, extract_timespan, and related functions.

Poem

Amidst the code, a change did bloom,
Enhancing time from dawn to gloom.
From days to years, in tongues so bright,
Durations clear, from morn to night.
With new commands, the code does flow,
Making time parse, a skill to show.
🌟⏳✨


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 16

Outside diff range, codebase verification and nitpick comments (4)
lingua_franca/lang/parse_de.py (1)

Line range hint 1267-1269: Refactor to use GermanNormalizer.

The TODO comment indicates that the function should use GermanNormalizer for normalization.

- def normalize_de(text, remove_articles=True):
-     """ German string normalization """
-     # TODO return GermanNormalizer().normalize(text, remove_articles)
-     words = text.split()  # this also removed extra spaces
-     normalized = ""
-     for word in words:
-         if remove_articles and word in ["der", "die", "das", "des", "den",
-                                         "dem"]:
-             continue
- 
-         # Expand common contractions, e.g. "isn't" -> "is not"
-         contraction = ["net", "nett"]
-         if word in contraction:
-             expansion = ["nicht", "nicht"]
-             word = expansion[contraction.index(word)]
- 
-         # Convert numbers into digits, e.g. "two" -> "2"
- 
-         if word in _DE_NUMBERS:
-             word = str(_DE_NUMBERS[word])
- 
-         normalized += " " + word
- 
-     return normalized[1:]  # strip the initial space
+ def normalize_de(text, remove_articles=True):
+     return GermanNormalizer().normalize(text, remove_articles)
Tools
Ruff

1267-1267: Redefinition of unused normalize_de from line 1218

(F811)

lingua_franca/lang/parse_en.py (1)

1741-1757: Suggestion: Improve the text normalization comment.

The comment about the text normalization hack can be improved for clarity.

- # NOTE: above is a nasty hack, _convert_words_to_numbers normalized the
- # string so we can do this, but this is essentially incorrect since each
- # replaced number word should be replaced with a single replace_token
+ # NOTE: The above is a workaround. _convert_words_to_numbers normalizes the
+ # string, allowing us to replace two words (number and unit) with a single
+ # replace_token. Ideally, each number word should be replaced with a single
+ # replace_token.
test/unittests/test_parse_en.py (2)

600-606: Ensure consistency in test cases.

The test_extract_duration method already covers some cases for months and years. Ensure consistency and avoid redundancy.

-        self.assertEqual(extract_duration("1 month"),
-                         (timedelta(days=DAYS_IN_1_MONTH), ""))
-        self.assertEqual(
-            extract_duration("1 month"),
-            (timedelta(days=DAYS_IN_1_MONTH), ""))

Line range hint 2521-2527: Consider localization for gender tests.

The gender tests are not localized. Consider whether localization is needed for English.

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 135d8c5 and dead6ff.

Files selected for processing (11)
  • lingua_franca/lang/parse_de.py (6 hunks)
  • lingua_franca/lang/parse_en.py (4 hunks)
  • lingua_franca/lang/parse_es.py (2 hunks)
  • lingua_franca/lang/parse_pt.py (2 hunks)
  • lingua_franca/parse.py (3 hunks)
  • lingua_franca/res/text/de-de/normalize.json (1 hunks)
  • lingua_franca/time.py (1 hunks)
  • test/unittests/test_parse_de.py (7 hunks)
  • test/unittests/test_parse_en.py (3 hunks)
  • test/unittests/test_parse_es.py (3 hunks)
  • test/unittests/test_parse_pt.py (3 hunks)
Files not summarized due to errors (2)
  • test/unittests/test_parse_de.py: Error: Message exceeds token limit
  • test/unittests/test_parse_en.py: Error: Message exceeds token limit
Additional context used
Ruff
lingua_franca/parse.py

16-16: datetime imported but unused

Remove unused import: datetime

(F401)

test/unittests/test_parse_es.py

24-24: lingua_franca.parse.get_gender imported but unused

Remove unused import: lingua_franca.parse.get_gender

(F401)


24-24: Redefinition of unused extract_datetime from line 22

Remove definition: extract_datetime

(F811)


24-24: Redefinition of unused extract_number from line 21

Remove definition: extract_number

(F811)


24-24: Redefinition of unused normalize from line 21

Remove definition: normalize

(F811)


24-24: Redefinition of unused yes_or_no from line 22

Remove definition: yes_or_no

(F811)


25-25: lingua_franca.time.to_local imported but unused

Remove unused import: lingua_franca.time.to_local

(F401)

test/unittests/test_parse_pt.py

23-23: lingua_franca.time.to_local imported but unused

Remove unused import: lingua_franca.time.to_local

(F401)

lingua_franca/lang/parse_es.py

1128-1128: Function definition does not bind loop variable unit_en

(B023)


1131-1131: re may be undefined, or defined from star imports

(F405)


1139-1139: Function definition does not bind loop variable unit_en

(B023)


1141-1141: Function definition does not bind loop variable unit_en

(B023)


1143-1143: Function definition does not bind loop variable unit_en

(B023)


1145-1145: Function definition does not bind loop variable unit_en

(B023)


1147-1147: Function definition does not bind loop variable unit_en

(B023)


1152-1152: re may be undefined, or defined from star imports

(F405)

lingua_franca/lang/parse_pt.py

1156-1156: Function definition does not bind loop variable unit_en

(B023)


1167-1167: Function definition does not bind loop variable unit_en

(B023)


1169-1169: Function definition does not bind loop variable unit_en

(B023)


1171-1171: Function definition does not bind loop variable unit_en

(B023)


1173-1173: Function definition does not bind loop variable unit_en

(B023)


1175-1175: Function definition does not bind loop variable unit_en

(B023)

lingua_franca/lang/parse_de.py

144-144: Function definition does not bind loop variable unit_en

(B023)


1267-1267: Redefinition of unused normalize_de from line 1218

(F811)

lingua_franca/lang/parse_en.py

605-605: Function definition does not bind loop variable unit_en

(B023)


610-610: Function definition does not bind loop variable unit_en

(B023)


612-612: Function definition does not bind loop variable unit_en

(B023)


614-614: Function definition does not bind loop variable unit_en

(B023)


616-616: Function definition does not bind loop variable unit_en

(B023)


618-618: Function definition does not bind loop variable unit_en

(B023)

Additional comments not posted (87)
lingua_franca/res/text/de-de/normalize.json (5)

1-8: Configuration settings look good.

The initial configuration settings for text normalization in German are correctly defined. They include options for lowercase conversion, number-to-digit conversion, contraction expansion, symbol removal, accent removal, article removal, and stopword removal.


9-25: Contractions mapping is comprehensive.

The contractions mapping for German contractions is well-defined and covers common contractions.


26-29: Word replacements are appropriate.

The word replacements for "m" to "meter" and "km" to "kilometer" are correctly defined.


30-63: Number replacements are thorough.

The number replacements for German words to digits are comprehensive and cover numbers from 0 to 90.


64-72: Stopwords and articles are correctly listed.

The stopwords list is empty, and the articles list includes common German articles.

lingua_franca/time.py (2)

21-38: Enum TimespanUnit is well-defined.

The TimespanUnit enum is comprehensive and includes various time units for timespan calculations.


42-43: Constants for time calculations are correctly defined.

The constants DAYS_IN_1_YEAR and DAYS_IN_1_MONTH are accurately defined for time calculations.

lingua_franca/parse.py (2)

29-29: Imports for time calculations are appropriate.

The imports for TimespanUnit, DAYS_IN_1_MONTH, and DAYS_IN_1_YEAR from lingua_franca.time are correctly added.


132-203: Function extract_timespan is well-defined but has a TODO comment.

The extract_timespan function is comprehensive and handles various time units. However, there is a TODO comment indicating that the remainder is imperfect and ignores the replace_token.

Ensure that the TODO comment is addressed in future updates.

test/unittests/test_parse_es.py (18)

16-17: Imports for timedelta and relativedelta are appropriate.

The imports for timedelta from datetime and relativedelta from dateutil.relativedelta are correctly added.


22-22: Import extract_duration is appropriate.

The import for extract_duration from lingua_franca.parse is correctly added.


249-267: Tests for extract_duration are comprehensive.

The tests for the extract_duration function cover various scenarios and time units.


268-291: Tests for non-standard units in extract_duration are thorough.

The tests for non-standard units like months, years, decades, centuries, and millennia are well-defined and cover various scenarios.


308-327: Tests for extract_timespan are comprehensive.

The tests for the extract_timespan function cover various scenarios and time units.


329-367: Tests for extract_timespan with RELATIVEDELTA are thorough.

The tests for the extract_timespan function with the RELATIVEDELTA time unit are well-defined and cover various scenarios.


368-396: Tests for extract_timespan with TOTAL_MICROSECONDS are comprehensive.

The tests for the extract_timespan function with the TOTAL_MICROSECONDS time unit are well-defined and cover various scenarios.


397-425: Tests for extract_timespan with TOTAL_MILLISECONDS are comprehensive.

The tests for the extract_timespan function with the TOTAL_MILLISECONDS time unit are well-defined and cover various scenarios.


426-450: Tests for extract_timespan with TOTAL_SECONDS are thorough.

The tests for the extract_timespan function with the TOTAL_SECONDS time unit are well-defined and cover various scenarios.


451-471: Tests for extract_timespan with TOTAL_MINUTES are comprehensive.

The tests for the extract_timespan function with the TOTAL_MINUTES time unit are well-defined and cover various scenarios.


472-493: Tests for extract_timespan with TOTAL_HOURS are thorough.

The tests for the extract_timespan function with the TOTAL_HOURS time unit are well-defined and cover various scenarios.


494-516: Tests for extract_timespan with TOTAL_DAYS are comprehensive.

The tests for the extract_timespan function with the TOTAL_DAYS time unit are well-defined and cover various scenarios.


517-538: Tests for extract_timespan with TOTAL_WEEKS are thorough.

The tests for the extract_timespan function with the TOTAL_WEEKS time unit are well-defined and cover various scenarios.


539-561: Tests for extract_timespan with TOTAL_MONTHS are comprehensive.

The tests for the extract_timespan function with the TOTAL_MONTHS time unit are well-defined and cover various scenarios.


562-583: Tests for extract_timespan with TOTAL_YEARS are thorough.

The tests for the extract_timespan function with the TOTAL_YEARS time unit are well-defined and cover various scenarios.


584-604: Tests for extract_timespan with TOTAL_DECADES are comprehensive.

The tests for the extract_timespan function with the TOTAL_DECADES time unit are well-defined and cover various scenarios.


605-625: Tests for extract_timespan with TOTAL_CENTURIES are thorough.

The tests for the extract_timespan function with the TOTAL_CENTURIES time unit are well-defined and cover various scenarios.


626-646: Tests for extract_timespan with TOTAL_MILLENNIUMS are comprehensive.

The tests for the extract_timespan function with the TOTAL_MILLENNIUMS time unit are well-defined and cover various scenarios.

test/unittests/test_parse_pt.py (17)

260-287: Ensure proper handling of imperfect remainders.

The TODO comments indicate that the handling of imperfect remainders is not yet implemented. Ensure these cases are handled correctly in future updates.


376-396: Tests for extract_timespan look good!

The tests cover various time units and edge cases.


397-435: Tests for extract_timespan_delta look good!

The tests cover various time units using TimespanUnit.RELATIVEDELTA.


436-464: Tests for extract_timespan_microsegundos look good!

The tests cover various time units in microseconds.


465-493: Tests for extract_timespan_milisegundos look good!

The tests cover various time units in milliseconds.


494-518: Tests for extract_timespan_segundos look good!

The tests cover various time units in seconds.


519-539: Tests for extract_timespan_minutos look good!

The tests cover various time units in minutes.


540-561: Tests for extract_timespan_horas look good!

The tests cover various time units in hours.


562-584: Tests for extract_timespan_dias look good!

The tests cover various time units in days.


585-606: Tests for extract_timespan_semanas look good!

The tests cover various time units in weeks.


607-629: Tests for extract_timespan_months look good!

The tests cover various time units in months.


630-651: Tests for extract_timespan_years look good!

The tests cover various time units in years.


652-672: Tests for extract_timespan_decades look good!

The tests cover various time units in decades.


673-693: Tests for extract_timespan_centuries look good!

The tests cover various time units in centuries.


694-714: Tests for extract_timespan_millennia look good!

The tests cover various time units in millennia.


Line range hint 320-348:
Tests for get_gender look good!

The tests cover various cases for gender extraction in Portuguese.


Line range hint 351-374:
Tests for yes_or_no look good!

The tests cover various cases for yes/no extraction in Portuguese.

lingua_franca/lang/parse_es.py (1)

Line range hint 1159-1177:
Ensure get_gender_es handles context correctly.

The function now has a default value for the context parameter. Ensure it handles context correctly.

Tools
Ruff

1128-1128: Function definition does not bind loop variable unit_en

(B023)


1131-1131: re may be undefined, or defined from star imports

(F405)


1139-1139: Function definition does not bind loop variable unit_en

(B023)


1141-1141: Function definition does not bind loop variable unit_en

(B023)


1143-1143: Function definition does not bind loop variable unit_en

(B023)


1145-1145: Function definition does not bind loop variable unit_en

(B023)


1147-1147: Function definition does not bind loop variable unit_en

(B023)


1152-1152: re may be undefined, or defined from star imports

(F405)

lingua_franca/lang/parse_pt.py (2)

23-23: LGTM! Imports are necessary and correctly used.

The updated imports for timedelta, now_local, DAYS_IN_1_MONTH, and DAYS_IN_1_YEAR are necessary for the new functionality in the code.

Also applies to: 31-31


1102-1121: Well-documented function.

The docstring provides a clear and comprehensive explanation of the function's purpose, arguments, and return values.

lingua_franca/lang/parse_de.py (3)

117-117: Ensure the correctness of the regex pattern and replacement logic.

The regex pattern and replacement logic for converting German phrases to duration should be thoroughly tested to ensure they cover all edge cases.


154-154: Ensure the correctness of the regex pattern and replacement logic.

The regex pattern and replacement logic for converting German phrases to timespan should be thoroughly tested to ensure they cover all edge cases.


1263-1265: Ensure the correctness of JSON loading.

The usage of resolve_resource_file and JSON loading should be thoroughly tested to ensure the configuration is loaded correctly.

test/unittests/test_parse_de.py (15)

703-746: Ensure comprehensive test coverage for extract_timespan with days.

The new test cases for extract_timespan with days should cover various scenarios, including edge cases and common use cases. Verify that all relevant scenarios are tested.

Verification successful

Ensure comprehensive test coverage for extract_timespan with days.

The test cases for extract_timespan with days in test_parse_de.py and test_parse_en.py cover various scenarios, including edge cases and common use cases. The provided test cases appear to be comprehensive.

  • test/unittests/test_parse_de.py
  • test/unittests/test_parse_en.py
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the test coverage for `extract_timespan` with days.

# Test: Search for test cases related to `extract_timespan` with days.
rg --type python -A 5 $'extract_timespan.*days'

Length of output: 79


Script:

#!/bin/bash
# Description: Verify the test coverage for `extract_timespan` with days.

# Test: Search for test cases related to `extract_timespan` with days.
rg --glob '*.py' -A 5 'extract_timespan.*days'

Length of output: 3102


504-563: Ensure comprehensive test coverage for extract_timespan with milliseconds.

The new test cases for extract_timespan with milliseconds should cover various scenarios, including edge cases and common use cases. Verify that all relevant scenarios are tested.

Verification successful

Ensure comprehensive test coverage for extract_timespan with milliseconds.

The test cases for extract_timespan with milliseconds in test/unittests/test_parse_de.py cover various scenarios, including edge cases and common use cases. The test cases appear to be well-implemented and thorough.

  • test/unittests/test_parse_de.py
  • test/unittests/test_parse_en.py
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the test coverage for `extract_timespan` with milliseconds.

# Test: Search for test cases related to `extract_timespan` with milliseconds.
rg --type python -A 5 $'extract_timespan.*milliseconds'

Length of output: 87


Script:

#!/bin/bash
# Description: Verify the test coverage for `extract_timespan` with milliseconds.

# Test: Search for test cases related to `extract_timespan` with milliseconds.
rg --type py -A 5 $'extract_timespan.*milliseconds'

Length of output: 1025


658-702: Ensure comprehensive test coverage for extract_timespan with hours.

The new test cases for extract_timespan with hours should cover various scenarios, including edge cases and common use cases. Verify that all relevant scenarios are tested.

Verification successful

Comprehensive test coverage for extract_timespan with hours verified.

The test cases for extract_timespan with hours are well-covered in both the German (test_parse_de.py) and English (test_parse_en.py) test files. No further actions are required.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the test coverage for `extract_timespan` with hours.

# Test: Search for test cases related to `extract_timespan` with hours.
rg --type python -A 5 $'extract_timespan.*hours'

Length of output: 80


Script:

#!/bin/bash
# Description: Verify the test coverage for `extract_timespan` with hours.

# Test: Search for test cases related to `extract_timespan` with hours.
rg --type py -A 5 $'extract_timespan.*hours'

Length of output: 2984


614-657: Ensure comprehensive test coverage for extract_timespan with minutes.

The new test cases for extract_timespan with minutes should cover various scenarios, including edge cases and common use cases. Verify that all relevant scenarios are tested.

Verification successful

Ensure comprehensive test coverage for extract_timespan with minutes.

The test cases for extract_timespan with minutes in both test_parse_de.py and test_parse_en.py provide comprehensive coverage, including various scenarios and edge cases. No further action is needed.

  • test/unittests/test_parse_de.py
  • test/unittests/test_parse_en.py
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the test coverage for `extract_timespan` with minutes.

# Test: Search for test cases related to `extract_timespan` with minutes.
rg --type python -A 5 $'extract_timespan.*minutes'

Length of output: 82


Script:

#!/bin/bash
# Description: Verify the test coverage for `extract_timespan` with minutes.

# Test: Search for test cases related to `extract_timespan` with minutes.
rg -A 5 'extract_timespan.*minutes'

Length of output: 5779


566-613: Ensure comprehensive test coverage for extract_timespan with seconds.

The new test cases for extract_timespan with seconds should cover various scenarios, including edge cases and common use cases. Verify that all relevant scenarios are tested.

Verification successful

Ensure comprehensive test coverage for extract_timespan with seconds.

The test cases for extract_timespan with seconds in test_parse_de.py cover various scenarios, including edge cases and common use cases. Similar tests are also present in test_parse_en.py, indicating comprehensive coverage.

  • test/unittests/test_parse_de.py: test_extract_timespan_seconds_de
  • test/unittests/test_parse_en.py: test_extract_timespan_seconds_en
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the test coverage for `extract_timespan` with seconds.

# Test: Search for test cases related to `extract_timespan` with seconds.
rg --type python -A 5 $'extract_timespan.*seconds'

Length of output: 82


Script:

#!/bin/bash
# Description: Verify the test coverage for `extract_timespan` with seconds.

# Test: Search for test cases related to `extract_timespan` with seconds.
rg --type py -A 5 $'extract_timespan.*seconds'

Length of output: 6865


462-503: Ensure comprehensive test coverage for extract_timespan with microseconds.

The new test cases for extract_timespan with microseconds should cover various scenarios, including edge cases and common use cases. Verify that all relevant scenarios are tested.


19-25: Ensure new imports are used.

The new imports for extract_duration, extract_timespan, and TimespanUnit should be used in the file. Verify their usage in the test methods.

Verification successful

Ensure new imports are used.

The new imports for extract_duration, extract_timespan, and TimespanUnit are correctly used in the file.

  • extract_duration is used in multiple test cases such as test_extract_duration_de.
  • extract_timespan is used in multiple test cases such as test_extract_timespan_de.
  • TimespanUnit is used in multiple test cases such as test_extract_timespan_ambiguous.
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the usage of new imports in the file.

# Test: Search for the usage of `extract_duration`, `extract_timespan`, and `TimespanUnit`.
rg --type python -A 5 $'extract_duration|extract_timespan|TimespanUnit'

Length of output: 103


Script:

#!/bin/bash
# Description: Verify the usage of new imports in the file.

# Test: Search for the usage of `extract_duration`, `extract_timespan`, and `TimespanUnit`.
rg --glob '*.py' -A 5 'extract_duration|extract_timespan|TimespanUnit'

Length of output: 157997


Line range hint 220-250:
Ensure comprehensive test coverage for extract_duration.

The new test cases for extract_duration should cover various scenarios, including edge cases and common use cases. Verify that all relevant scenarios are tested.

Verification successful

Test coverage for extract_duration is comprehensive.

The test cases for extract_duration cover a wide range of scenarios and languages, ensuring robust validation of the function.

  • test/unittests/test_parse_de.py
  • test/unittests/test_parse_ru.py
  • test/unittests/test_parse_pt.py
  • test/unittests/test_parse_nl.py
  • test/unittests/test_parse_pl.py
  • test/unittests/test_parse_cs.py
  • test/unittests/test_parse_fr.py
  • test/unittests/test_parse_sv.py
  • test/unittests/test_parse_az.py
  • test/unittests/test_parse_es.py
  • test/unittests/test_parse_fa.py
  • test/unittests/test_parse_en.py
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the test coverage for `extract_duration`.

# Test: Search for test cases related to `extract_duration`.
rg --type python -A 5 $'extract_duration'

Length of output: 71


Script:

#!/bin/bash
# Description: Verify the test coverage for `extract_duration`.

# Test: Search for test cases related to `extract_duration`.
rg --type py -A 5 $'extract_duration'

Length of output: 57945


747-788: Ensure comprehensive test coverage for extract_timespan with weeks.

The new test cases for extract_timespan with weeks should cover various scenarios, including edge cases and common use cases. Verify that all relevant scenarios are tested.

Verification successful

Ensure comprehensive test coverage for extract_timespan with weeks.

The test cases for extract_timespan with weeks in test_parse_de.py are comprehensive and cover various scenarios, including different time units, edge cases, and phrases with additional text. No further actions are necessary.

  • test/unittests/test_parse_de.py: Comprehensive test cases for extract_timespan with weeks.
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the test coverage for `extract_timespan` with weeks.

# Test: Search for test cases related to `extract_timespan` with weeks.
rg --type python -A 5 $'extract_timespan.*weeks'

Length of output: 80


Script:

#!/bin/bash
# Description: Verify the test coverage for `extract_timespan` with weeks.

# Test: Search for test cases related to `extract_timespan` with weeks.
rg --glob '*.py' -A 5 $'extract_timespan.*weeks'

Length of output: 3148


260-289: Ensure comprehensive test coverage for extract_timespan.

The new test cases for extract_timespan should cover various scenarios, including edge cases and common use cases. Verify that all relevant scenarios are tested.

Verification successful

Comprehensive test coverage for extract_timespan verified.

The test cases for extract_timespan are extensive and cover various scenarios, including different time units and edge cases. The function is well-tested across multiple languages.

  • Files and lines with test cases:
    • test/unittests/test_parse_de.py
    • test/unittests/test_parse_pt.py
    • test/unittests/test_parse_es.py
    • test/unittests/test_parse_en.py
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the test coverage for `extract_timespan`.

# Test: Search for test cases related to `extract_timespan`.
rg --type python -A 5 $'extract_timespan'

Length of output: 71


Script:

#!/bin/bash
# Description: Verify the test coverage for `extract_timespan`.

# Test: Search for test cases related to `extract_timespan`.
rg --type py -A 5 $'extract_timespan'

Length of output: 81947


789-833: Ensure comprehensive test coverage for extract_timespan with months.

The new test cases for extract_timespan with months should cover various scenarios, including edge cases and common use cases. Verify that all relevant scenarios are tested.

Verification successful

Ensure comprehensive test coverage for extract_timespan with months.

The new test cases for extract_timespan with months in test_parse_de.py contribute to a broader set of tests across multiple languages, ensuring comprehensive coverage.

  • test/unittests/test_parse_pt.py
  • test/unittests/test_parse_es.py
  • test/unittests/test_parse_de.py
  • test/unittests/test_parse_en.py
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the test coverage for `extract_timespan` with months.

# Test: Search for test cases related to `extract_timespan` with months.
rg --type python -A 5 $'extract_timespan.*months'

Length of output: 81


Script:

#!/bin/bash
# Description: Verify the test coverage for `extract_timespan` with months.

# Test: Search for test cases related to `extract_timespan` with months.
rg --type py -A 5 $'extract_timespan.*months'

Length of output: 5467


877-915: Ensure comprehensive test coverage for extract_timespan with decades.

The new test cases for extract_timespan with decades should cover various scenarios, including edge cases and common use cases. Verify that all relevant scenarios are tested.

Verification successful

Ensure comprehensive test coverage for extract_timespan with decades.

The test cases for extract_timespan with decades are present across multiple language files, indicating comprehensive coverage.

  • test/unittests/test_parse_pt.py
  • test/unittests/test_parse_es.py
  • test/unittests/test_parse_en.py
  • test/unittests/test_parse_de.py
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the test coverage for `extract_timespan` with decades.

# Test: Search for test cases related to `extract_timespan` with decades.
rg --type python -A 5 $'extract_timespan.*decades'

Length of output: 82


Script:

#!/bin/bash
# Description: Verify the test coverage for `extract_timespan` with decades.

# Test: Search for test cases related to `extract_timespan` with decades.
rg -A 5 'extract_timespan.*decades'

Length of output: 2961


956-975: Ensure comprehensive test coverage for extract_timespan with millennia.

The new test cases for extract_timespan with millennia should cover various scenarios, including edge cases and common use cases. Verify that all relevant scenarios are tested.

Verification successful

Comprehensive test coverage for extract_timespan with millennia verified.

The test cases for extract_timespan with millennia are well-covered across multiple language-specific test files, ensuring thorough validation of the function.

  • test/unittests/test_parse_de.py
  • test/unittests/test_parse_es.py
  • test/unittests/test_parse_pt.py
  • test/unittests/test_parse_en.py
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the test coverage for `extract_timespan` with millennia.

# Test: Search for test cases related to `extract_timespan` with millennia.
rg --type python -A 5 $'extract_timespan.*millennia'

Length of output: 84


Script:

#!/bin/bash
# Description: Verify the test coverage for `extract_timespan` with millennia.

# Test: Search for test cases related to `extract_timespan` with millennia.
rg --type py -A 5 'extract_timespan.*millennia'

Length of output: 1960


834-876: Ensure comprehensive test coverage for extract_timespan with years.

The new test cases for extract_timespan with years should cover various scenarios, including edge cases and common use cases. Verify that all relevant scenarios are tested.

Verification successful

Ensure comprehensive test coverage for extract_timespan with years.

The test cases for extract_timespan with years are well-covered across multiple language-specific test files, including test_parse_de.py, test_parse_es.py, test_parse_pt.py, and test_parse_en.py. The variety of test cases in test_parse_de.py confirms thorough testing for different durations and edge cases.

  • test/unittests/test_parse_de.py
  • test/unittests/test_parse_es.py
  • test/unittests/test_parse_pt.py
  • test/unittests/test_parse_en.py
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the test coverage for `extract_timespan` with years.

# Test: Search for test cases related to `extract_timespan` with years.
rg --type python -A 5 $'extract_timespan.*years'

Length of output: 80


Script:

#!/bin/bash
# Description: Verify the test coverage for `extract_timespan` with years.

# Test: Search for test cases related to `extract_timespan` with years.
rg --type py -A 5 'extract_timespan.*years'

Length of output: 2918


916-955: Ensure comprehensive test coverage for extract_timespan with centuries.

The new test cases for extract_timespan with centuries should cover various scenarios, including edge cases and common use cases. Verify that all relevant scenarios are tested.

Verification successful

Comprehensive test coverage for extract_timespan with centuries verified.

The test cases for extract_timespan with centuries cover a wide range of scenarios, including different time units and edge cases across multiple languages. No further action is needed.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the test coverage for `extract_timespan` with centuries.

# Test: Search for test cases related to `extract_timespan` with centuries.
rg --type python -A 5 $'extract_timespan.*centuries'

Length of output: 84


Script:

#!/bin/bash
# Description: Verify the test coverage for `extract_timespan` with centuries.

# Find all Python files and search for test cases related to `extract_timespan` with centuries.
fd --type f --extension py --exec rg 'extract_timespan.*centuries' {}

Length of output: 369


Script:

#!/bin/bash
# Description: Extract the contents of test functions related to `extract_timespan` with centuries.

# Extract the contents of the test functions found in the previous step.
rg --type py -A 20 'def test_extract_timespan_centuries' 

Length of output: 6477

lingua_franca/lang/parse_en.py (6)

18-18: LGTM! Import changes are necessary and appropriate.

The addition of the math module and the modifications to the from lingua_franca.time import statement are necessary for the new functionality.

Also applies to: 31-31


1713-1713: LGTM! The new function normalize_en is necessary and appropriate.

The addition of the normalize_en function for English string normalization is well-implemented.


1715-1737: LGTM! The new function extract_timespan_en is comprehensive and well-implemented.

The function handles converting English phrases into durations and includes comprehensive handling for various time units and conversions.


1769-1806: LGTM! The handling of time units for TIMEDELTA is comprehensive.

The function correctly handles various time units for TIMEDELTA and includes appropriate conversions.


1822-1858: LGTM! The handling of time units for RELATIVEDELTA is comprehensive.

The function correctly handles various time units for RELATIVEDELTA and includes appropriate conversions.


1892-1965: LGTM! The handling of time units for other TimespanUnit values is comprehensive.

The function correctly handles various time units for other TimespanUnit values and includes appropriate conversions.

test/unittests/test_parse_en.py (16)

1934-1949: Ensure accurate extraction of microseconds.

The extraction of microseconds should be accurate and handle small floating point errors.


1977-1989: Ensure accurate extraction of milliseconds.

The extraction of milliseconds should be accurate and handle small floating point errors.


2046-2057: Ensure accurate extraction of seconds.

The extraction of seconds should be accurate and handle small floating point errors.


2098-2109: Ensure accurate extraction of minutes.

The extraction of minutes should be accurate and handle small floating point errors.


2146-2157: Ensure accurate extraction of hours.

The extraction of hours should be accurate and handle small floating point errors.


2194-2205: Ensure accurate extraction of days.

The extraction of days should be accurate and handle small floating point errors.


2242-2253: Ensure accurate extraction of weeks.

The extraction of weeks should be accurate and handle small floating point errors.


2288-2299: Ensure accurate extraction of months.

The extraction of months should be accurate and handle small floating point errors.


2340-2351: Ensure accurate extraction of years.

The extraction of years should be accurate and handle small floating point errors.


2387-2398: Ensure accurate extraction of decades.

The extraction of decades should be accurate and handle small floating point errors.


2430-2441: Ensure accurate extraction of centuries.

The extraction of centuries should be accurate and handle small floating point errors.


2474-2485: Ensure accurate extraction of millennia.

The extraction of millennia should be accurate and handle small floating point errors.


Line range hint 2530-2567: Ensure comprehensive yes/no tests.

The yes/no tests cover a wide range of cases. Ensure that all possible edge cases are covered.


Line range hint 2569-2581: Ensure accurate language code extraction.

The language code extraction tests should ensure accurate results with high confidence.


Line range hint 2583-2596: Ensure accurate language code extraction for specific cases.

The tests for specific language codes should ensure accurate results with high confidence.


1698-1711: Ensure accurate normalization in tests.

The normalization of "fifty seven and a half" to "57.5" may not be accurate. Consider reviewing the normalization logic.

time_units[unit_en] = 0

def repl(match):
time_units[unit_en] += float(match.group(1))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix the loop variable binding issue.

The function definitions inside the loop do not bind the loop variable unit_en. This can lead to unexpected behavior.

-        def repl(match):
-            time_units[unit_en] += float(match.group(1))
+        def repl(unit):
+            def _repl(match):
+                time_units[unit] += float(match.group(1))
+            return _repl

-        text = re.sub(unit_pattern, repl, text)
+        text = re.sub(unit_pattern, repl(unit_en), text)

-        def repl_non_std(match):
-            val = float(match.group(1))
-            if unit_en == "months":
-                val = DAYS_IN_1_MONTH * val
-            if unit_en == "years":
-                val = DAYS_IN_1_YEAR * val
-            if unit_en == "decades":
-                val = 10 * DAYS_IN_1_YEAR * val
-            if unit_en == "centurys":
-                val = 100 * DAYS_IN_1_YEAR * val
-            if unit_en == "millenniums":
-                val = 1000 * DAYS_IN_1_YEAR * val
-            time_units["days"] += val
-            return ''
+        def repl_non_std(unit):
+            def _repl_non_std(match):
+                val = float(match.group(1))
+                if unit == "months":
+                    val = DAYS_IN_1_MONTH * val
+                if unit == "years":
+                    val = DAYS_IN_1_YEAR * val
+                if unit == "decades":
+                    val = 10 * DAYS_IN_1_YEAR * val
+                if unit == "centurys":
+                    val = 100 * DAYS_IN_1_YEAR * val
+                if unit == "millenniums":
+                    val = 1000 * DAYS_IN_1_YEAR * val
+                time_units["days"] += val
+                return ''
+            return _repl_non_std

-        text = re.sub(unit_pattern, repl_non_std, text)
+        text = re.sub(unit_pattern, repl_non_std(unit_en), text)

Also applies to: 1167-1176

Tools
Ruff

1156-1156: Function definition does not bind loop variable unit_en

(B023)

Comment on lines +1102 to +1185
def extract_duration_pt(text):
"""
Convert an portuguese phrase into a number of seconds
Convert things like:
"10 Minutos"
"3 dias 8 horas 10 Minutos e 49 Segundos"
into an int, representing the total number of seconds.
The words used in the duration will be consumed, and
the remainder returned.
As an example, "set a timer for 5 minutes" would return
(300, "set a timer for").
Args:
text (str): string containing a duration
Returns:
(timedelta, str):
A tuple containing the duration and the remaining text
not consumed in the parsing. The first value will
be None if no duration is found. The text returned
will have whitespace stripped from the ends.
"""
if not text:
return None

text = text.lower()
time_units = {
'microseconds': 'microsegundos',
'milliseconds': 'milisegundos',
'seconds': 'segundos',
'minutes': 'minutos',
'hours': 'horas',
'days': 'dias',
'weeks': 'semanas'
}
non_std_un = {
"months": "meses",
"years": "anos",
'decades': "decadas",
'centurys': "seculos",
'millenniums': "milenios"
}

pattern = r"(?P<value>\d+(?:\.?\d+)?)(?:\s+|\-){unit}[s]?"

text = text.replace("mês", "meses").replace("é", "e")
text = text.replace("segundo", "_s_") # HACK - segundo (second) will be replaced with 2
text = PortugueseNormalizer().numbers_to_digits(text)
text = text.replace("_s_", "segundo") # undo HACK

for (unit_en, unit_pt) in time_units.items():
unit_pattern = pattern.format(
unit=unit_pt[:-1]) # remove 's' from unit
time_units[unit_en] = 0

def repl(match):
time_units[unit_en] += float(match.group(1))
return ''

text = re.sub(unit_pattern, repl, text)

for (unit_en, unit_pt) in non_std_un.items():
unit_pattern = pattern.format(
unit=unit_pt[:-1]) # remove 's' from unit

def repl_non_std(match):
val = float(match.group(1))
if unit_en == "months":
val = DAYS_IN_1_MONTH * val
if unit_en == "years":
val = DAYS_IN_1_YEAR * val
if unit_en == "decades":
val = 10 * DAYS_IN_1_YEAR * val
if unit_en == "centurys":
val = 100 * DAYS_IN_1_YEAR * val
if unit_en == "millenniums":
val = 1000 * DAYS_IN_1_YEAR * val
time_units["days"] += val
return ''

text = re.sub(unit_pattern, repl_non_std, text)

text = text.strip()
duration = timedelta(**time_units) if any(time_units.values()) else None

return (duration, text)
Copy link

@coderabbitai coderabbitai bot Jul 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding unit tests for extract_duration_pt.

To ensure the correctness of this function, consider adding unit tests that cover various edge cases and typical use cases.

Do you want me to generate the unit testing code or open a GitHub issue to track this task?

Tools
Ruff

1156-1156: Function definition does not bind loop variable unit_en

(B023)


1167-1167: Function definition does not bind loop variable unit_en

(B023)


1169-1169: Function definition does not bind loop variable unit_en

(B023)


1171-1171: Function definition does not bind loop variable unit_en

(B023)


1173-1173: Function definition does not bind loop variable unit_en

(B023)


1175-1175: Function definition does not bind loop variable unit_en

(B023)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JarbasAl, here are the unit tests for the extract_duration_pt function:

import unittest
from datetime import timedelta
from lingua_franca.lang.parse_pt import extract_duration_pt

class TestExtractDurationPT(unittest.TestCase):
    
    def test_single_unit(self):
        self.assertEqual(extract_duration_pt("10 segundos"), (timedelta(seconds=10), ""))
        self.assertEqual(extract_duration_pt("5 minutos"), (timedelta(minutes=5), ""))
        self.assertEqual(extract_duration_pt("2 horas"), (timedelta(hours=2), ""))
        self.assertEqual(extract_duration_pt("3 dias"), (timedelta(days=3), ""))
        self.assertEqual(extract_duration_pt("1 semana"), (timedelta(weeks=1), ""))

    def test_multiple_units(self):
        self.assertEqual(extract_duration_pt("3 dias 8 horas 10 minutos e 49 segundos"), 
                         (timedelta(days=3, hours=8, minutes=10, seconds=49), ""))
        self.assertEqual(extract_duration_pt("2 semanas 3 dias 4 horas"), 
                         (timedelta(weeks=2, days=3, hours=4), ""))
        self.assertEqual(extract_duration_pt("1 ano 2 meses 3 dias"), 
                         (timedelta(days=365.2425 + 2 * 30.42 + 3), ""))

    def test_non_standard_units(self):
        self.assertEqual(extract_duration_pt("1 mês"), (timedelta(days=30.42), ""))
        self.assertEqual(extract_duration_pt("1 ano"), (timedelta(days=365.2425), ""))
        self.assertEqual(extract_duration_pt("1 década"), (timedelta(days=10 * 365.2425), ""))
        self.assertEqual(extract_duration_pt("1 século"), (timedelta(days=100 * 365.2425), ""))
        self.assertEqual(extract_duration_pt("1 milênio"), (timedelta(days=1000 * 365.2425), ""))

    def test_mixed_units(self):
        self.assertEqual(extract_duration_pt("1 ano 2 meses 3 dias 4 horas 5 minutos 6 segundos"), 
                         (timedelta(days=365.2425 + 2 * 30.42 + 3, hours=4, minutes=5, seconds=6), ""))
    
    def test_edge_cases(self):
        self.assertEqual(extract_duration_pt(""), (None, ""))
        self.assertEqual(extract_duration_pt("sem tempo"), (None, "sem tempo"))
        self.assertEqual(extract_duration_pt("10"), (None, "10"))

if __name__ == '__main__':
    unittest.main()

These tests cover various scenarios including single time units, multiple time units combined, non-standard time units like months and years, and edge cases such as empty strings and invalid input. This should help ensure the robustness of the extract_duration_pt function.


Learnings added
Learnt from: JarbasAl
PR: OpenVoiceOS/ovos-lingua-franca#37
File: lingua_franca/lang/parse_pt.py:1102-1185
Timestamp: 2024-07-19T00:19:09.766Z
Learning: Generate unit tests for the `extract_duration_pt` function in `lingua_franca/lang/parse_pt.py` to cover various scenarios including single time units, multiple time units combined, non-standard time units, and edge cases.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

@@ -122,7 +127,7 @@ def extract_duration_de(text):
}

# Einzahl und Mehrzahl
pattern = r"(?P<value>\d+(?:\.?\d+)?)(?:\s+|\-){unit}[ne]?"
pattern = r"(?P<value>\d+(?:[.,]?\d+)?|\bein[e]?[nsm]?\b)(?:\s+|\-)(?P<unit>{unit}[nes]?[sn]?\b)"
Copy link

@coderabbitai coderabbitai bot Jul 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reminder: Implement text-to-number conversion.

The TODO comment indicates that the text-to-number conversion is missing.

Do you want me to help implement the text-to-number conversion function?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JarbasAl, I'll implement the text-to-number conversion function and integrate it into the extract_duration_de function. Here is the updated code:

def _convert_words_to_numbers_de(text):
    """
    Convert German number words in the text to digits.
    Args:
        text (str): The input text containing number words.
    Returns:
        str: The text with number words converted to digits.
    """
    words = text.split()
    for i, word in enumerate(words):
        if word in de_numbers:
            words[i] = str(de_numbers[word])
    return ' '.join(words)

def extract_duration_de(text):
    """
    Convert a German phrase into a number of seconds.
    Convert things like:
        "10 Minuten"
        "3 Tage 8 Stunden 10 Minuten und 49 Sekunden"
    into an int, representing the total number of seconds.
    The words used in the duration will be consumed, and
    the remainder returned.
    As an example, "set a timer for 5 minutes" would return
    (300, "set a timer for").
    Args:
        text (str): string containing a duration
    Returns:
        (timedelta, str):
                    A tuple containing the duration and the remaining text
                    not consumed in the parsing. The first value will
                    be None if no duration is found. The text returned
                    will have whitespace stripped from the ends.
    """
    if not text:
        return None

    text = GermanNormalizer().numbers_to_digits(text.lower())
    text = _convert_words_to_numbers_de(text)  # Convert number words to digits
    # die time_unit values werden für timedelta() mit dem jeweiligen Wert überschrieben
    time_units = {
        'microseconds': 'mikrosekunden',
        'milliseconds': 'millisekunden',
        'seconds': 'sekunden',
        'minutes': 'minuten',
        'hours': 'stunden',
        'days': 'tage',
        'weeks': 'wochen'
    }

    # Einzahl und Mehrzahl
    pattern = r"(?P<value>\d+(?:[.,]?\d+)?|\bein[e]?[nsm]?\b)(?:\s+|\-)(?P<unit>{unit}[nes]?[sn]?\b)"

    for (unit_en, unit_de) in time_units.items():
        unit_pattern = pattern.format(
            unit=unit_de[:-1])  # remove 'n'/'e' from unit
        time_units[unit_en] = 0

        def repl(match):
            value = match.group(1).replace(",",".").replace("einem", "1") \
                          .replace("eines", "1").replace("einen", "1") \
                          .replace("eine", "1").replace("ein", "1")
            time_units[unit_en] += float(value)
            return ''
        text = re.sub(unit_pattern, repl, text)

    text = text.strip()
    duration = timedelta(**time_units) if any(time_units.values()) else None

    return (duration, text)

This code includes the _convert_words_to_numbers_de function to convert German number words to digits and integrates it into the extract_duration_de function.


Learnings added
Learnt from: JarbasAl
PR: OpenVoiceOS/ovos-lingua-franca#37
File: lingua_franca/lang/parse_de.py:130-130
Timestamp: 2024-07-19T00:18:56.842Z
Learning: The `extract_duration_de` function in `lingua_franca/lang/parse_de.py` requires a text-to-number conversion function to handle German number words.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

Comment on lines +193 to +195
# TODO German version of _convert_words_to_numbers is missing completely
# Have to port my implementation, up until then -> Normalizer + replace
text = GermanNormalizer().numbers_to_digits(text.lower())
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reminder: Implement text-to-number conversion.

The TODO comment indicates that the text-to-number conversion is missing.

Do you want me to help implement the text-to-number conversion function?

print("[WARNING] relativedelta requires {unit} to be an "
"integer".format(unit=unit))
# fallback to timedelta resolution / raw tokens text with no flags
return extract_timespan_de(text, TimespanUnit.TIMEDELTA)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid self-recursive calls without base case.

The call to extract_timespan_de within itself can cause infinite recursion if not handled properly. Ensure there's a base case to prevent this.

from lingua_franca.lang.parse_es import extract_datetime_es, is_fractional_es
from lingua_franca.time import default_timezone
from lingua_franca.parse import get_gender, extract_datetime, extract_number, normalize, yes_or_no, extract_timespan
from lingua_franca.time import default_timezone, to_local, DAYS_IN_1_YEAR, DAYS_IN_1_MONTH, TimespanUnit
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove unused import.

The to_local import from lingua_franca.time is not used in the file.

- from lingua_franca.time import default_timezone, to_local, DAYS_IN_1_YEAR, DAYS_IN_1_MONTH, TimespanUnit
+ from lingua_franca.time import default_timezone, DAYS_IN_1_YEAR, DAYS_IN_1_MONTH, TimespanUnit
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from lingua_franca.time import default_timezone, to_local, DAYS_IN_1_YEAR, DAYS_IN_1_MONTH, TimespanUnit
from lingua_franca.time import default_timezone, DAYS_IN_1_YEAR, DAYS_IN_1_MONTH, TimespanUnit
Tools
Ruff

25-25: lingua_franca.time.to_local imported but unused

Remove unused import: lingua_franca.time.to_local

(F401)

from lingua_franca.lang.parse_es import extract_datetime_es, is_fractional_es
from lingua_franca.time import default_timezone
from lingua_franca.parse import get_gender, extract_datetime, extract_number, normalize, yes_or_no, extract_timespan
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove unused and redefined imports.

The imports for get_gender, extract_datetime, extract_number, normalize, and yes_or_no are either unused or redefined.

- from lingua_franca.parse import get_gender, extract_datetime, extract_number, normalize, yes_or_no, extract_timespan
+ from lingua_franca.parse import extract_timespan
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from lingua_franca.parse import get_gender, extract_datetime, extract_number, normalize, yes_or_no, extract_timespan
from lingua_franca.parse import extract_timespan
Tools
Ruff

24-24: lingua_franca.parse.get_gender imported but unused

Remove unused import: lingua_franca.parse.get_gender

(F401)


24-24: Redefinition of unused extract_datetime from line 22

Remove definition: extract_datetime

(F811)


24-24: Redefinition of unused extract_number from line 21

Remove definition: extract_number

(F811)


24-24: Redefinition of unused normalize from line 21

Remove definition: normalize

(F811)


24-24: Redefinition of unused yes_or_no from line 22

Remove definition: yes_or_no

(F811)

Comment on lines +1712 to +1740
def test_extract_timespan_ambiguous(self):
self.assertRaises(ValueError, extract_timespan, "1.3 months",
time_unit=TimespanUnit.RELATIVEDELTA)
self.assertRaises(ValueError, extract_timespan, "1.3 months",
time_unit=TimespanUnit.RELATIVEDELTA_STRICT)
self.assertEqual(
extract_timespan("1.3 months",
time_unit=TimespanUnit.RELATIVEDELTA_FALLBACK),
(timedelta(days=1.3 * DAYS_IN_1_MONTH), ""))


# NOTE: for some reason this test fails with
# (relativedelta(months=+1, days=+9.126), '') != (relativedelta(months=+1, days=+9.126), '')
# correct result is being returned

#self.assertEqual(
# extract_timespan("1.3 months",
# resolution=TimespanUnit.RELATIVEDELTA_APPROXIMATE),
# (relativedelta(months=1, days=0.3 * DAYS_IN_1_MONTH), ""))

self.assertEqual(
extract_timespan("1.3 months",
time_unit=TimespanUnit.RELATIVEDELTA_APPROXIMATE
)[0].months, 1)
self.assertAlmostEquals(
extract_timespan("1.3 months",
time_unit=TimespanUnit.RELATIVEDELTA_APPROXIMATE
)[0].days, 0.3 * DAYS_IN_1_MONTH)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handle ambiguous timespan extraction gracefully.

The tests for ambiguous timespan extraction should handle exceptions gracefully and ensure accurate results.

-        self.assertRaises(ValueError, extract_timespan, "1.3 months",
-                          time_unit=TimespanUnit.RELATIVEDELTA)
-        self.assertRaises(ValueError, extract_timespan, "1.3 months",
-                          time_unit=TimespanUnit.RELATIVEDELTA_STRICT)
-        self.assertEqual(
-            extract_timespan("1.3 months",
-                             time_unit=TimespanUnit.RELATIVEDELTA_FALLBACK),
-            (timedelta(days=1.3 * DAYS_IN_1_MONTH), ""))
+        with self.assertRaises(ValueError):
+            extract_timespan("1.3 months", time_unit=TimespanUnit.RELATIVEDELTA)
+        with self.assertRaises(ValueError):
+            extract_timespan("1.3 months", time_unit=TimespanUnit.RELATIVEDELTA_STRICT)
+        self.assertEqual(
+            extract_timespan("1.3 months", time_unit=TimespanUnit.RELATIVEDELTA_FALLBACK),
+            (timedelta(days=1.3 * DAYS_IN_1_MONTH), ""))
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def test_extract_timespan_ambiguous(self):
self.assertRaises(ValueError, extract_timespan, "1.3 months",
time_unit=TimespanUnit.RELATIVEDELTA)
self.assertRaises(ValueError, extract_timespan, "1.3 months",
time_unit=TimespanUnit.RELATIVEDELTA_STRICT)
self.assertEqual(
extract_timespan("1.3 months",
time_unit=TimespanUnit.RELATIVEDELTA_FALLBACK),
(timedelta(days=1.3 * DAYS_IN_1_MONTH), ""))
# NOTE: for some reason this test fails with
# (relativedelta(months=+1, days=+9.126), '') != (relativedelta(months=+1, days=+9.126), '')
# correct result is being returned
#self.assertEqual(
# extract_timespan("1.3 months",
# resolution=TimespanUnit.RELATIVEDELTA_APPROXIMATE),
# (relativedelta(months=1, days=0.3 * DAYS_IN_1_MONTH), ""))
self.assertEqual(
extract_timespan("1.3 months",
time_unit=TimespanUnit.RELATIVEDELTA_APPROXIMATE
)[0].months, 1)
self.assertAlmostEquals(
extract_timespan("1.3 months",
time_unit=TimespanUnit.RELATIVEDELTA_APPROXIMATE
)[0].days, 0.3 * DAYS_IN_1_MONTH)
with self.assertRaises(ValueError):
extract_timespan("1.3 months", time_unit=TimespanUnit.RELATIVEDELTA)
with self.assertRaises(ValueError):
extract_timespan("1.3 months", time_unit=TimespanUnit.RELATIVEDELTA_STRICT)
self.assertEqual(
extract_timespan("1.3 months", time_unit=TimespanUnit.RELATIVEDELTA_FALLBACK),
(timedelta(days=1.3 * DAYS_IN_1_MONTH), ""))

Comment on lines +607 to +631
self.assertEqual(extract_duration("3 months"),
(timedelta(days=DAYS_IN_1_MONTH * 3), ""))
self.assertEqual(extract_duration("a year"),
(timedelta(days=DAYS_IN_1_YEAR), ""))
self.assertEqual(extract_duration("1 year"),
(timedelta(days=DAYS_IN_1_YEAR * 1), ""))
self.assertEqual(extract_duration("5 years"),
(timedelta(days=DAYS_IN_1_YEAR * 5), ""))
self.assertEqual(extract_duration("a decade"),
(timedelta(days=DAYS_IN_1_YEAR * 10), ""))
self.assertEqual(extract_duration("1 decade"),
(timedelta(days=DAYS_IN_1_YEAR * 10), ""))
self.assertEqual(extract_duration("5 decades"),
(timedelta(days=DAYS_IN_1_YEAR * 10 * 5), ""))
self.assertEqual(extract_duration("1 century"),
(timedelta(days=DAYS_IN_1_YEAR * 100), ""))
self.assertEqual(extract_duration("a century"),
(timedelta(days=DAYS_IN_1_YEAR * 100), ""))
self.assertEqual(extract_duration("5 centuries"),
(timedelta(days=DAYS_IN_1_YEAR * 100 * 5), ""))
self.assertEqual(extract_duration("1 millennium"),
(timedelta(days=DAYS_IN_1_YEAR * 1000), ""))
self.assertEqual(extract_duration("5 millenniums"),
(timedelta(days=DAYS_IN_1_YEAR * 1000 * 5), ""))

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consolidate similar test cases.

The test cases for months and years can be consolidated to avoid redundancy and improve maintainability.

-        self.assertEqual(extract_duration("3 months"),
-                         (timedelta(days=DAYS_IN_1_MONTH * 3), ""))
-        self.assertEqual(extract_duration("a year"),
-                         (timedelta(days=DAYS_IN_1_YEAR), ""))
-        self.assertEqual(extract_duration("1 year"),
-                         (timedelta(days=DAYS_IN_1_YEAR * 1), ""))
-        self.assertEqual(extract_duration("5 years"),
-                         (timedelta(days=DAYS_IN_1_YEAR * 5), ""))
-        self.assertEqual(extract_duration("a decade"),
-                         (timedelta(days=DAYS_IN_1_YEAR * 10), ""))
-        self.assertEqual(extract_duration("1 decade"),
-                         (timedelta(days=DAYS_IN_1_YEAR * 10), ""))
-        self.assertEqual(extract_duration("5 decades"),
-                         (timedelta(days=DAYS_IN_1_YEAR * 10 * 5), ""))
-        self.assertEqual(extract_duration("1 century"),
-                         (timedelta(days=DAYS_IN_1_YEAR * 100), ""))
-        self.assertEqual(extract_duration("a century"),
-                         (timedelta(days=DAYS_IN_1_YEAR * 100), ""))
-        self.assertEqual(extract_duration("5 centuries"),
-                         (timedelta(days=DAYS_IN_1_YEAR * 100 * 5), ""))
-        self.assertEqual(extract_duration("1 millennium"),
-                         (timedelta(days=DAYS_IN_1_YEAR * 1000), ""))
-        self.assertEqual(extract_duration("5 millenniums"),
-                         (timedelta(days=DAYS_IN_1_YEAR * 1000 * 5), ""))
+        test_cases = [
+            ("3 months", timedelta(days=DAYS_IN_1_MONTH * 3)),
+            ("a year", timedelta(days=DAYS_IN_1_YEAR)),
+            ("1 year", timedelta(days=DAYS_IN_1_YEAR)),
+            ("5 years", timedelta(days=DAYS_IN_1_YEAR * 5)),
+            ("a decade", timedelta(days=DAYS_IN_1_YEAR * 10)),
+            ("1 decade", timedelta(days=DAYS_IN_1_YEAR * 10)),
+            ("5 decades", timedelta(days=DAYS_IN_1_YEAR * 10 * 5)),
+            ("1 century", timedelta(days=DAYS_IN_1_YEAR * 100)),
+            ("a century", timedelta(days=DAYS_IN_1_YEAR * 100)),
+            ("5 centuries", timedelta(days=DAYS_IN_1_YEAR * 100 * 5)),
+            ("1 millennium", timedelta(days=DAYS_IN_1_YEAR * 1000)),
+            ("5 millenniums", timedelta(days=DAYS_IN_1_YEAR * 1000 * 5))
+        ]
+        for text, expected in test_cases:
+            self.assertEqual(extract_duration(text), (expected, ""))
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
self.assertEqual(extract_duration("3 months"),
(timedelta(days=DAYS_IN_1_MONTH * 3), ""))
self.assertEqual(extract_duration("a year"),
(timedelta(days=DAYS_IN_1_YEAR), ""))
self.assertEqual(extract_duration("1 year"),
(timedelta(days=DAYS_IN_1_YEAR * 1), ""))
self.assertEqual(extract_duration("5 years"),
(timedelta(days=DAYS_IN_1_YEAR * 5), ""))
self.assertEqual(extract_duration("a decade"),
(timedelta(days=DAYS_IN_1_YEAR * 10), ""))
self.assertEqual(extract_duration("1 decade"),
(timedelta(days=DAYS_IN_1_YEAR * 10), ""))
self.assertEqual(extract_duration("5 decades"),
(timedelta(days=DAYS_IN_1_YEAR * 10 * 5), ""))
self.assertEqual(extract_duration("1 century"),
(timedelta(days=DAYS_IN_1_YEAR * 100), ""))
self.assertEqual(extract_duration("a century"),
(timedelta(days=DAYS_IN_1_YEAR * 100), ""))
self.assertEqual(extract_duration("5 centuries"),
(timedelta(days=DAYS_IN_1_YEAR * 100 * 5), ""))
self.assertEqual(extract_duration("1 millennium"),
(timedelta(days=DAYS_IN_1_YEAR * 1000), ""))
self.assertEqual(extract_duration("5 millenniums"),
(timedelta(days=DAYS_IN_1_YEAR * 1000 * 5), ""))
test_cases = [
("3 months", timedelta(days=DAYS_IN_1_MONTH * 3)),
("a year", timedelta(days=DAYS_IN_1_YEAR)),
("1 year", timedelta(days=DAYS_IN_1_YEAR)),
("5 years", timedelta(days=DAYS_IN_1_YEAR * 5)),
("a decade", timedelta(days=DAYS_IN_1_YEAR * 10)),
("1 decade", timedelta(days=DAYS_IN_1_YEAR * 10)),
("5 decades", timedelta(days=DAYS_IN_1_YEAR * 10 * 5)),
("1 century", timedelta(days=DAYS_IN_1_YEAR * 100)),
("a century", timedelta(days=DAYS_IN_1_YEAR * 100)),
("5 centuries", timedelta(days=DAYS_IN_1_YEAR * 100 * 5)),
("1 millennium", timedelta(days=DAYS_IN_1_YEAR * 1000)),
("5 millenniums", timedelta(days=DAYS_IN_1_YEAR * 1000 * 5))
]
for text, expected in test_cases:
self.assertEqual(extract_duration(text), (expected, ""))

Comment on lines +1076 to +1157
def extract_duration_es(text):
"""
Convert an spanish phrase into a number of seconds
Convert things like:
"10 Minutos"
"3 dias 8 horas 10 Minutos e 49 Segundos"
into an int, representing the total number of seconds.
The words used in the duration will be consumed, and
the remainder returned.
As an example, "set a timer for 5 minutes" would return
(300, "set a timer for").
Args:
text (str): string containing a duration
Returns:
(timedelta, str):
A tuple containing the duration and the remaining text
not consumed in the parsing. The first value will
be None if no duration is found. The text returned
will have whitespace stripped from the ends.
"""
if not text:
return None

text = text.lower().replace("í", "i")
time_units = {
'microseconds': 'microsegundos',
'milliseconds': 'milisegundos',
'seconds': 'segundos',
'minutes': 'minutos',
'hours': 'horas',
'days': 'dias',
'weeks': 'semanas'
}
non_std_un = {
"months": "mes",
"years": "anos",
'decades': "decadas",
'centurys': "siglos",
'millenniums': "milenios"
}

pattern = r"(?P<value>\d+(?:\.?\d+)?)(?:\s+|\-){unit}[s]?"

text = text.replace("í", "i").replace("é", "e").replace("ñ", "n").replace("meses", "mes")
text = SpanishNormalizer().numbers_to_digits(text)

for (unit_en, unit_es) in time_units.items():
unit_pattern = pattern.format(
unit=unit_es[:-1]) # remove 's' from unit
time_units[unit_en] = 0

def repl(match):
time_units[unit_en] += float(match.group(1))
return ''

text = re.sub(unit_pattern, repl, text)

for (unit_en, unit_es) in non_std_un.items():
unit_pattern = pattern.format(
unit=unit_es[:-1]) # remove 's' from unit

def repl_non_std(match):
val = float(match.group(1))
if unit_en == "months":
val = DAYS_IN_1_MONTH * val
if unit_en == "years":
val = DAYS_IN_1_YEAR * val
if unit_en == "decades":
val = 10 * DAYS_IN_1_YEAR * val
if unit_en == "centurys":
val = 100 * DAYS_IN_1_YEAR * val
if unit_en == "millenniums":
val = 1000 * DAYS_IN_1_YEAR * val
time_units["days"] += val
return ''

text = re.sub(unit_pattern, repl_non_std, text)

text = text.strip()
duration = timedelta(**time_units) if any(time_units.values()) else None

return (duration, text)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ensure re is imported.

The re module is used but not imported.

+ import re
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def extract_duration_es(text):
"""
Convert an spanish phrase into a number of seconds
Convert things like:
"10 Minutos"
"3 dias 8 horas 10 Minutos e 49 Segundos"
into an int, representing the total number of seconds.
The words used in the duration will be consumed, and
the remainder returned.
As an example, "set a timer for 5 minutes" would return
(300, "set a timer for").
Args:
text (str): string containing a duration
Returns:
(timedelta, str):
A tuple containing the duration and the remaining text
not consumed in the parsing. The first value will
be None if no duration is found. The text returned
will have whitespace stripped from the ends.
"""
if not text:
return None
text = text.lower().replace("í", "i")
time_units = {
'microseconds': 'microsegundos',
'milliseconds': 'milisegundos',
'seconds': 'segundos',
'minutes': 'minutos',
'hours': 'horas',
'days': 'dias',
'weeks': 'semanas'
}
non_std_un = {
"months": "mes",
"years": "anos",
'decades': "decadas",
'centurys': "siglos",
'millenniums': "milenios"
}
pattern = r"(?P<value>\d+(?:\.?\d+)?)(?:\s+|\-){unit}[s]?"
text = text.replace("í", "i").replace("é", "e").replace("ñ", "n").replace("meses", "mes")
text = SpanishNormalizer().numbers_to_digits(text)
for (unit_en, unit_es) in time_units.items():
unit_pattern = pattern.format(
unit=unit_es[:-1]) # remove 's' from unit
time_units[unit_en] = 0
def repl(match):
time_units[unit_en] += float(match.group(1))
return ''
text = re.sub(unit_pattern, repl, text)
for (unit_en, unit_es) in non_std_un.items():
unit_pattern = pattern.format(
unit=unit_es[:-1]) # remove 's' from unit
def repl_non_std(match):
val = float(match.group(1))
if unit_en == "months":
val = DAYS_IN_1_MONTH * val
if unit_en == "years":
val = DAYS_IN_1_YEAR * val
if unit_en == "decades":
val = 10 * DAYS_IN_1_YEAR * val
if unit_en == "centurys":
val = 100 * DAYS_IN_1_YEAR * val
if unit_en == "millenniums":
val = 1000 * DAYS_IN_1_YEAR * val
time_units["days"] += val
return ''
text = re.sub(unit_pattern, repl_non_std, text)
text = text.strip()
duration = timedelta(**time_units) if any(time_units.values()) else None
return (duration, text)
def extract_duration_es(text):
"""
Convert an spanish phrase into a number of seconds
Convert things like:
"10 Minutos"
"3 dias 8 horas 10 Minutos e 49 Segundos"
into an int, representing the total number of seconds.
The words used in the duration will be consumed, and
the remainder returned.
As an example, "set a timer for 5 minutes" would return
(300, "set a timer for").
Args:
text (str): string containing a duration
Returns:
(timedelta, str):
A tuple containing the duration and the remaining text
not consumed in the parsing. The first value will
be None if no duration is found. The text returned
will have whitespace stripped from the ends.
"""
if not text:
return None
text = text.lower().replace("í", "i")
time_units = {
'microseconds': 'microsegundos',
'milliseconds': 'milisegundos',
'seconds': 'segundos',
'minutes': 'minutos',
'hours': 'horas',
'days': 'dias',
'weeks': 'semanas'
}
non_std_un = {
"months": "mes",
"years": "anos",
'decades': "decadas",
'centurys': "siglos",
'millenniums': "milenios"
}
pattern = r"(?P<value>\d+(?:\.?\d+)?)(?:\s+|\-){unit}[s]?"
text = text.replace("í", "i").replace("é", "e").replace("ñ", "n").replace("meses", "mes")
text = SpanishNormalizer().numbers_to_digits(text)
for (unit_en, unit_es) in time_units.items():
unit_pattern = pattern.format(
unit=unit_es[:-1]) # remove 's' from unit
time_units[unit_en] = 0
def repl(match):
time_units[unit_en] += float(match.group(1))
return ''
text = re.sub(unit_pattern, repl, text)
for (unit_en, unit_es) in non_std_un.items():
unit_pattern = pattern.format(
unit=unit_es[:-1]) # remove 's' from unit
def repl_non_std(match):
val = float(match.group(1))
if unit_en == "months":
val = DAYS_IN_1_MONTH * val
if unit_en == "years":
val = DAYS_IN_1_YEAR * val
if unit_en == "decades":
val = 10 * DAYS_IN_1_YEAR * val
if unit_en == "centurys":
val = 100 * DAYS_IN_1_YEAR * val
if unit_en == "millenniums":
val = 1000 * DAYS_IN_1_YEAR * val
time_units["days"] += val
return ''
text = re.sub(unit_pattern, repl_non_std, text)
text = text.strip()
duration = timedelta(**time_units) if any(time_units.values()) else None
return (duration, text)
Tools
Ruff

1128-1128: Function definition does not bind loop variable unit_en

(B023)


1131-1131: re may be undefined, or defined from star imports

(F405)


1139-1139: Function definition does not bind loop variable unit_en

(B023)


1141-1141: Function definition does not bind loop variable unit_en

(B023)


1143-1143: Function definition does not bind loop variable unit_en

(B023)


1145-1145: Function definition does not bind loop variable unit_en

(B023)


1147-1147: Function definition does not bind loop variable unit_en

(B023)


1152-1152: re may be undefined, or defined from star imports

(F405)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants