Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the big date extract rewrite #34

Draft
wants to merge 1 commit into
base: dev
Choose a base branch
from
Draft

Conversation

JarbasAl
Copy link
Member

@JarbasAl JarbasAl commented Nov 26, 2022

this is mainly a code backup of a WIP rewrite of the date extractor HelloChatterbox#26

Summary by CodeRabbit

  • New Features

    • Introduced extensive vocabulary for weekdays, months, hemispheres, seasons, and named eras in English.
    • Added functionalities for managing default locations and coordinates.
    • Enhanced date and time manipulation with new functions for extracting calendar durations, dates, times, and named dates and eras.
    • Improved date range and ordinal date calculations.
  • Bug Fixes

    • Improved fallback mechanisms for date extraction in multiple languages (Spanish, French, Italian, Portuguese).
  • Tests

    • Added comprehensive test cases for date-related functionalities and fallback mechanisms in multiple languages.
  • Dependencies

    • Added new dependencies: python-dateutil, holidays, and dateparser.

get_ordinal -> get_date_ordinal

test exceptions

fix other lang tests (fallback parser)

exception error mesage

more tests

fixes

eras fix

more eras

get_ordinal unittests

before present

negative eras - count years backwards from {reference_date}

add new_year + valentines_day holidays

fix week parsing (weeks start at sundays)

fix fallback parser

unittests - get_XXX_range

datetime fallback fix

upcoming {X}

strict parser fallback - extract_time

parser fallback - anchorDate param

parser fallback

fallback parser tests

strict parser fallback - extract_datetime

more date resolutions - Enum only

strict parser fallback

DatetimeResolution

eras as reference_dates not absolute_dates

named eras

version pinning

active_location

location based holidays test

location based holidays

"easter"

"mid summer"

named holidays support

timedelta fallback test

relative delta

extract_time in follow up PR

duplicate

rebase

use average number of days in 1 year/month for duration calculations

handle "the"

utterance remainder

comments + pep8

better year parsing

duplicate code

BC vs infinity error message

comments / readability

add location_markers for location parsing

handle "standalone" {season}

remove duplicate code

boiler plate cleanup

more tests

location aware seasons

extract_date start

typo

extract_calendar_duration

handle relativedelta constraints

relativedelta default

add "replace_token"

more tests

more tests

more tests

more resolutions, + tests

DurationResolution

average number of days in month/year

revert brain fart

use weeks instead of days

long duration
@JarbasAl JarbasAl added the enhancement New feature or request label Nov 26, 2022
@JarbasAl
Copy link
Member Author

@coderabbitai review

Copy link

coderabbitai bot commented Jul 18, 2024

Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

coderabbitai bot commented Jul 18, 2024

Walkthrough

The changes introduce new functionalities and data structures for improved handling of dates, times, locations, and seasons across multiple languages. This includes adding various enums, new functions for extracting and parsing dates, location management, and corresponding unit tests to ensure the accuracy of these enhancements.

Changes

Files/Groups Change Summary
lingua_franca/lang/common_data_en.py Added mappings for weekdays, months, hemispheres, seasons, and named eras in English.
lingua_franca/lang/parse_common.py Introduced enums DurationResolution, DateTimeResolution, and Season for enhanced time and season representations.
lingua_franca/location.py Added functionality to manage default locations and coordinates, including hemisphere determination.
lingua_franca/parse.py Enhanced date extraction functions and added support for new date, time, and named era parsing functions.
lingua_franca/time.py Added new functions for date manipulation, including determining week numbers, ordinal dates, and seasons.
requirements.txt Added dependencies for python-dateutil, holidays, and dateparser.
test/test_time_utils.py Added tests for date range calculations, ordinal dates, and season range determinations.
test/unittests/test_parse_*.py (es, fr, it, pt) Added and updated tests for date extraction in multiple languages, including fallback to dateparser.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant lingua_franca.location
    participant lingua_franca.parse
    User->>lingua_franca.location: set_active_location(latitude, longitude)
    User->>lingua_franca.parse: extract_date(text, lang, location)
    lingua_franca.parse->>lingua_franca.location: get_active_location()
    lingua_franca.location->>lingua_franca.parse: return active_location
    lingua_franca.parse->>User: return parsed_date
Loading

Poem

In code where dates and seasons weave,
Enhanced functions now achieve,
With languages vast and times precise,
Locations set with mere device.
Parsing moments, far and near,
Bringing clarity, no fear!
🐰✨


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 29

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 12ec4f8 and 484e5fe.

Files selected for processing (13)
  • lingua_franca/lang/common_data_en.py (8 hunks)
  • lingua_franca/lang/parse_common.py (2 hunks)
  • lingua_franca/lang/parse_en.py (7 hunks)
  • lingua_franca/location.py (1 hunks)
  • lingua_franca/parse.py (5 hunks)
  • lingua_franca/time.py (2 hunks)
  • requirements.txt (1 hunks)
  • test/test_time_utils.py (1 hunks)
  • test/unittests/test_parse_en.py (4 hunks)
  • test/unittests/test_parse_es.py (2 hunks)
  • test/unittests/test_parse_fr.py (3 hunks)
  • test/unittests/test_parse_it.py (2 hunks)
  • test/unittests/test_parse_pt.py (2 hunks)
Files not summarized due to errors (2)
  • lingua_franca/lang/parse_en.py: Error: Message exceeds token limit
  • test/unittests/test_parse_en.py: Error: Message exceeds token limit
Files skipped from review due to trivial changes (1)
  • requirements.txt
Additional context used
Ruff
test/unittests/test_parse_es.py

16-16: datetime.time imported but unused

Remove unused import: datetime.time

(F401)


268-268: Undefined name extract_date

(F821)


273-273: Undefined name extract_date

(F821)

test/unittests/test_parse_pt.py

23-23: Redefinition of unused get_gender from line 21

Remove definition: get_gender

(F811)


24-24: Redefinition of unused extract_datetime from line 21

Remove definition: extract_datetime

(F811)


26-26: Redefinition of unused extract_number from line 21

Remove definition: extract_number

(F811)


27-27: Redefinition of unused normalize from line 21

Remove definition: normalize

(F811)

lingua_franca/parse.py

28-28: lingua_franca.location.get_active_hemisphere imported but unused

Remove unused import: lingua_franca.location.get_active_hemisphere

(F401)


31-31: dateparser imported but unused

Remove unused import: dateparser

(F401)


216-216: Undefined name lang_code

(F821)


362-365: Use ternary operator hemisphere = Hemisphere.SOUTH if lat < 0 else Hemisphere.NORTH instead of if-else-block

Replace if-else-block with hemisphere = Hemisphere.SOUTH if lat < 0 else Hemisphere.NORTH

(SIM108)


368-368: Undefined name extract_date_en

(F821)


374-374: Undefined name _log_unsupported_language

(F821)


388-388: Local variable remainder is assigned to but never used

Remove assignment to unused variable remainder

(F841)


436-436: Local variable code is assigned to but never used

Remove assignment to unused variable code

(F841)


441-441: Undefined name _log_unsupported_language

(F821)


451-451: Local variable remainder is assigned to but never used

Remove assignment to unused variable remainder

(F841)


463-466: Use ternary operator year = anchor_date.year if anchor_date else now_local().year instead of if-else-block

Replace if-else-block with year = anchor_date.year if anchor_date else now_local().year

(SIM108)


476-476: Undefined name get_named_dates_en

(F821)


479-479: Undefined name _log_unsupported_language

(F821)


497-497: Undefined name get_named_eras_en

(F821)


498-498: Undefined name date

(F821)

lingua_franca/time.py

20-20: Redefinition of unused datetime from line 16

Remove definition: datetime

(F811)


132-132: Multiple isinstance calls for month, merge into a single call

Merge isinstance calls for month

(SIM101)


145-145: Multiple isinstance calls for weekday, merge into a single call

Merge isinstance calls for weekday

(SIM101)

test/test_time_utils.py

21-21: lingua_franca.time.get_week_number imported but unused

Remove unused import: lingua_franca.time.get_week_number

(F401)

test/unittests/test_parse_it.py

17-17: datetime.date imported but unused

Remove unused import

(F401)


17-17: datetime.timedelta imported but unused

Remove unused import

(F401)

lingua_franca/lang/parse_en.py

33-33: Redefinition of unused timedelta from line 18

Remove definition: timedelta

(F811)


33-33: Redefinition of unused datetime from line 18

Remove definition: datetime

(F811)


33-33: Redefinition of unused time from line 18

Remove definition: time

(F811)


35-35: Redefinition of unused invert_dict from line 30

Remove definition: invert_dict

(F811)


36-36: Redefinition of unused ReplaceableNumber from line 30

Remove definition: ReplaceableNumber

(F811)


36-36: Redefinition of unused partition_list from line 30

Remove definition: partition_list

(F811)


36-36: Redefinition of unused tokenize from line 30

Remove definition: tokenize

(F811)


36-36: Redefinition of unused Token from line 30

Remove definition: Token

(F811)


36-36: Redefinition of unused Normalizer from line 30

Remove definition: Normalizer

(F811)


37-37: Redefinition of unused is_numeric from line 29

Remove definition: is_numeric

(F811)


37-37: Redefinition of unused look_for_fractions from line 29

Remove definition: look_for_fractions

(F811)


38-38: Redefinition of unused _ARTICLES_EN from line 23

Remove definition: _ARTICLES_EN

(F811)


38-38: lingua_franca.lang.common_data_en._NUM_STRING_EN imported but unused

Remove unused import: lingua_franca.lang.common_data_en._NUM_STRING_EN

(F401)


39-39: Redefinition of unused _LONG_ORDINAL_EN from line 23

Remove definition: _LONG_ORDINAL_EN

(F811)


39-39: Redefinition of unused _LONG_SCALE_EN from line 23

Remove definition: _LONG_SCALE_EN

(F811)


39-39: Redefinition of unused _SHORT_SCALE_EN from line 23

Remove definition: _SHORT_SCALE_EN

(F811)


39-39: Redefinition of unused _SHORT_ORDINAL_EN from line 24

Remove definition: _SHORT_ORDINAL_EN

(F811)


42-42: Redefinition of unused re from line 17

Remove definition: re

(F811)


43-43: Redefinition of unused json from line 16

Remove definition: json

(F811)


45-45: Redefinition of unused resolve_resource_file from line 22

(F811)


50-50: Redefinition of unused now_local from line 31

Remove definition: now_local

(F811)


52-52: lingua_franca.location.get_active_location imported but unused

Remove unused import: lingua_franca.location.get_active_location

(F401)


2010-2010: Loop control variable dt not used within loop body

Rename unused dt to _dt

(B007)


2018-2018: Loop control variable dt not used within loop body

Rename unused dt to _dt

(B007)


2025-2025: Loop control variable dt not used within loop body

Rename unused dt to _dt

(B007)


2709-2709: Local variable wordPrevPrev is assigned to but never used

Remove assignment to unused variable wordPrevPrev

(F841)


2714-2714: Local variable wordNextNextNext is assigned to but never used

Remove assignment to unused variable wordNextNextNext

(F841)


2757-2760: Use ternary operator _delta = 7 - _w + int_week if int_week < _w else int_week - _w instead of if-else-block

Replace if-else-block with _delta = 7 - _w + int_week if int_week < _w else int_week - _w

(SIM108)


3295-3298: Use ternary operator method = 1 if _year.year < 1583 else 3 instead of if-else-block

Replace if-else-block with method = 1 if _year.year < 1583 else 3

(SIM108)


3308-3311: Use ternary operator method = 1 if _year.year < 1583 else 3 instead of if-else-block

Replace if-else-block with method = 1 if _year.year < 1583 else 3

(SIM108)


3318-3321: Use ternary operator method = 1 if _year < 1583 else 3 instead of if-else-block

Replace if-else-block with method = 1 if _year < 1583 else 3

(SIM108)


3328-3331: Use ternary operator method = 1 if ref_date.year < 1583 else 3 instead of if-else-block

Replace if-else-block with method = 1 if ref_date.year < 1583 else 3

(SIM108)


3516-3523: Use ternary operator _year = _base + _year if _delta > _year else _base - 100 + _year instead of if-else-block

(SIM108)

test/unittests/test_parse_en.py

35-35: Redefinition of unused DAYS_IN_1_YEAR from line 31

Remove definition: DAYS_IN_1_YEAR

(F811)


35-35: Redefinition of unused DAYS_IN_1_MONTH from line 31

Remove definition: DAYS_IN_1_MONTH

(F811)


39-39: Redefinition of unused datetime from line 17

Remove definition: datetime

(F811)


39-39: Redefinition of unused timedelta from line 17

Remove definition: timedelta

(F811)


2946-2946: simple_NER imported but unused; consider using importlib.util.find_spec to test for availability

(F401)

Additional comments not posted (36)
lingua_franca/location.py (4)

22-25: Enum Hemisphere implementation looks good.

The enum is correctly implemented with two values: NORTH and SOUTH.


27-33: Function get_active_location implementation looks good.

The function correctly returns the default latitude and longitude.


55-62: Function get_active_location_code implementation looks good.

The function correctly returns the default location ISO code.


17-19: Global variables initialization looks good.

The global variables __default_location, __latitude, and __longitude are correctly initialized.

lingua_franca/lang/common_data_en.py (7)

17-20: Imports look good.

The new imports are necessary for the new functionality and correctly implemented.


295-303: Dictionary _WEEKDAY_EN implementation looks good.

The dictionary correctly maps integer keys to weekday names.


305-318: Dictionary _MONTH_EN implementation looks good.

The dictionary correctly maps integer keys to month names.


320-328: Dictionary _WEEKDAY_SHORT_EN implementation looks good.

The dictionary correctly maps integer keys to short weekday names.


330-343: Dictionary _MONTH_SHORT_EN implementation looks good.

The dictionary correctly maps integer keys to short month names.


345-348: Dictionary _HEMISPHERES_EN implementation looks good.

The dictionary correctly maps Hemisphere enum values to lists of corresponding names.


350-355: Dictionary _SEASONS_EN implementation looks good.

The dictionary correctly maps Season enum values to lists of corresponding names.

test/unittests/test_parse_pt.py (1)

320-338: Test method test_fallback_parser implementation looks good.

The test method correctly checks the date extraction functionality in Portuguese.

lingua_franca/parse.py (2)

Line range hint 126-150:
Ensure correct integration of new parameters.

The new parameters resolution and replace_token are correctly integrated. Ensure that the function calls are updated to include these parameters.


102-123: Add docstring details and verify function usage.

The function extract_calendar_duration wraps extract_duration with a specific resolution. Ensure that the docstring provides enough detail and verify the function usage in the codebase.

However, ensure that all function calls to extract_calendar_duration are correctly implemented.

Verification successful

No issues found with the function usage.

The function extract_calendar_duration is not called anywhere else in the codebase, so there are no existing usages to verify against the new implementation.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify all function calls to `extract_calendar_duration` match the new implementation.

# Test: Search for the function usage. Expect: Only occurances of the new implementation.
rg --type python -A 5 $'extract_calendar_duration'

Length of output: 80


Script:

#!/bin/bash
# Description: Verify all function calls to `extract_calendar_duration` match the new implementation.

# Test: Search for the function usage. Expect: Only occurrences of the new implementation.
rg --glob '*.py' -A 5 $'extract_calendar_duration'

Length of output: 362

lingua_franca/time.py (6)

117-122: LGTM!

The function int_to_month is correctly implemented.


124-129: LGTM!

The function int_to_weekday is correctly implemented.


158-203: LGTM!

The functions for calculating date ranges are correctly implemented.


205-475: LGTM!

The function get_date_ordinal is correctly implemented.


477-650: LGTM!

The functions for handling seasons are correctly implemented.


652-654: LGTM!

The function get_week_number is correctly implemented.

test/unittests/test_parse_fr.py (3)

Line range hint 17-28:
LGTM!

The imports and the new test class TestExtractDate are correctly implemented.


30-42: LGTM!

The test method test_fallback_parser is correctly implemented and covers necessary cases.


30-42: LGTM!

The modifications to the existing test cases are correctly implemented.

test/test_time_utils.py (2)

27-77: LGTM!

The test cases in the TestRanges class are comprehensive and well-structured.


451-519: LGTM!

The test cases in the TestHemisphere class are comprehensive and well-structured.

lingua_franca/lang/parse_common.py (3)

466-484: LGTM!

The DurationResolution Enum is well-defined and each constant uniquely represents a specific time resolution.


486-953: LGTM!

The DateTimeResolution Enum is comprehensive and well-defined, with each constant uniquely representing a specific date or time resolution.


956-960: LGTM!

The Season Enum is straightforward and well-defined, with each constant uniquely representing a season.

test/unittests/test_parse_it.py (2)

25-26: LGTM!

The now_local and extract_date imports are necessary for the new test method.


40-47: LGTM!

The new test method test_fallback_parser is well-structured and necessary for testing fallback behavior to the dateparser module.

lingua_franca/lang/parse_en.py (6)

222-227: LGTM!

The changes to _extract_fraction_with_text_en are correct.


272-277: LGTM!

The changes to _extract_decimal_with_text_en are correct.


Line range hint 586-836:
LGTM!

The changes to extract_duration_en are correct.


1921-1943: LGTM!

The changes to get_named_dates_en are correct.


1946-1959: LGTM!

The changes to get_named_eras_en are correct.


2034-3560: LGTM!

The changes to extract_date_en are correct.

Tools
Ruff

2709-2709: Local variable wordPrevPrev is assigned to but never used

Remove assignment to unused variable wordPrevPrev

(F841)


2714-2714: Local variable wordNextNextNext is assigned to but never used

Remove assignment to unused variable wordNextNextNext

(F841)


2757-2760: Use ternary operator _delta = 7 - _w + int_week if int_week < _w else int_week - _w instead of if-else-block

Replace if-else-block with _delta = 7 - _w + int_week if int_week < _w else int_week - _w

(SIM108)


3295-3298: Use ternary operator method = 1 if _year.year < 1583 else 3 instead of if-else-block

Replace if-else-block with method = 1 if _year.year < 1583 else 3

(SIM108)


3308-3311: Use ternary operator method = 1 if _year.year < 1583 else 3 instead of if-else-block

Replace if-else-block with method = 1 if _year.year < 1583 else 3

(SIM108)


3318-3321: Use ternary operator method = 1 if _year < 1583 else 3 instead of if-else-block

Replace if-else-block with method = 1 if _year < 1583 else 3

(SIM108)


3328-3331: Use ternary operator method = 1 if ref_date.year < 1583 else 3 instead of if-else-block

Replace if-else-block with method = 1 if ref_date.year < 1583 else 3

(SIM108)


3516-3523: Use ternary operator _year = _base + _year if _delta > _year else _base - 100 + _year instead of if-else-block

(SIM108)

Comment on lines +17 to +18
from datetime import datetime, time, date, timedelta
from dateutil.relativedelta import relativedelta
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove unused imports.

The date and timedelta imports are unused and should be removed to clean up the code.

- from datetime import datetime, time, date, timedelta
+ from datetime import datetime, time
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from datetime import datetime, time, date, timedelta
from dateutil.relativedelta import relativedelta
from datetime import datetime, time
from dateutil.relativedelta import relativedelta
Tools
Ruff

17-17: datetime.date imported but unused

Remove unused import

(F401)


17-17: datetime.timedelta imported but unused

Remove unused import

(F401)

@@ -28,6 +28,15 @@
from lingua_franca.time import default_timezone, to_local
from lingua_franca.parse import extract_langcode
from lingua_franca.parse import yes_or_no
from lingua_franca.time import DAYS_IN_1_YEAR, DAYS_IN_1_MONTH
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove redefinition of DAYS_IN_1_MONTH.

The DAYS_IN_1_MONTH is redefined on line 35.

- from lingua_franca.time import DAYS_IN_1_YEAR, DAYS_IN_1_MONTH
+ from lingua_franca.time import DAYS_IN_1_YEAR
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from lingua_franca.time import DAYS_IN_1_YEAR, DAYS_IN_1_MONTH
from lingua_franca.time import DAYS_IN_1_YEAR

Remove redefinition of DAYS_IN_1_YEAR.

The DAYS_IN_1_YEAR is redefined on line 35.

- from lingua_franca.time import DAYS_IN_1_YEAR, DAYS_IN_1_MONTH
+ from lingua_franca.time import DAYS_IN_1_MONTH
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from lingua_franca.time import DAYS_IN_1_YEAR, DAYS_IN_1_MONTH
from lingua_franca.time import DAYS_IN_1_MONTH

from lingua_franca.lang.parse_common import DateTimeResolution, Season
from lingua_franca.location import Hemisphere
from lingua_franca.lang.parse_en import extract_date_en
from datetime import date, datetime, timedelta
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove redefinition of datetime and timedelta.

The datetime and timedelta are redefined on line 39.

- from datetime import date, datetime, timedelta
+ from datetime import date
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from datetime import date, datetime, timedelta
from datetime import date
Tools
Ruff

39-39: Redefinition of unused datetime from line 17

Remove definition: datetime

(F811)


39-39: Redefinition of unused timedelta from line 17

Remove definition: timedelta

(F811)

Season.SPRING)

try:
import simple_NER
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove unused import simple_NER.

The simple_NER is imported but not used.

- import simple_NER
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
import simple_NER
Tools
Ruff

2946-2946: simple_NER imported but unused; consider using importlib.util.find_spec to test for availability

(F401)

Comment on lines +80 to +110
class TestOrdinals(unittest.TestCase):
def test_first_day(self):
ref_date = date(day=27, month=2, year=4567)
week_start, week_end = get_week_range(ref_date)
weekend_start, weekend_end = get_weekend_range(ref_date)

self.assertEqual(get_date_ordinal(1, ref_date,
DateTimeResolution.DAY),
date(day=1, month=1, year=1))
# TODO not implemented yet
# self.assertEqual(get_ordinal(1, ref_date,
# DateTimeResolution.DAY_OF_WEEK),
# week_start)
# self.assertEqual(get_ordinal(1, ref_date,
# DateTimeResolution.DAY_OF_WEEKEND),
# weekend_start)
self.assertEqual(get_date_ordinal(1, ref_date,
DateTimeResolution.DAY_OF_MONTH),
ref_date.replace(day=1))
self.assertEqual(get_date_ordinal(1, ref_date,
DateTimeResolution.DAY_OF_YEAR),
ref_date.replace(day=1, month=1))
self.assertEqual(get_date_ordinal(1, ref_date,
DateTimeResolution.DAY_OF_DECADE),
date(day=1, month=1, year=4560))
self.assertEqual(get_date_ordinal(1, ref_date,
DateTimeResolution.DAY_OF_CENTURY),
date(day=1, month=1, year=4500))
self.assertEqual(get_date_ordinal(1, ref_date,
DateTimeResolution.DAY_OF_MILLENNIUM),
date(day=1, month=1, year=4000))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! But implement the missing tests.

The test cases in the TestOrdinals class are comprehensive and well-structured. However, there are TODO comments indicating missing tests.

Do you want me to implement the missing tests or open a GitHub issue to track this task?

Comment on lines +213 to +225
# hard-parse, fallback to dateparser
# this brings "free support" for many languages
print("No dates found, falling back to strict parser")
_dates = search_dates(text, languages=[lang_code],
settings={'RELATIVE_BASE': anchorDate,
'STRICT_PARSING': True})
if _dates is not None:
# return first datetime only
# TODO extract_datetimes
# TODO extract_datetime_range
date_str, extracted_date = _dates[0]
remainder = text.replace(date_str, "")
return extracted_date, remainder
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix undefined variable lang_code.

The variable lang_code is used but not defined in the function.

- _dates = search_dates(text, languages=[lang_code],
+ lang_code = get_primary_lang_code(lang)
+ _dates = search_dates(text, languages=[lang_code],
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# hard-parse, fallback to dateparser
# this brings "free support" for many languages
print("No dates found, falling back to strict parser")
_dates = search_dates(text, languages=[lang_code],
settings={'RELATIVE_BASE': anchorDate,
'STRICT_PARSING': True})
if _dates is not None:
# return first datetime only
# TODO extract_datetimes
# TODO extract_datetime_range
date_str, extracted_date = _dates[0]
remainder = text.replace(date_str, "")
return extracted_date, remainder
# hard-parse, fallback to dateparser
# this brings "free support" for many languages
print("No dates found, falling back to strict parser")
lang_code = get_primary_lang_code(lang)
_dates = search_dates(text, languages=[lang_code],
settings={'RELATIVE_BASE': anchorDate,
'STRICT_PARSING': True})
if _dates is not None:
# return first datetime only
# TODO extract_datetimes
# TODO extract_datetime_range
date_str, extracted_date = _dates[0]
remainder = text.replace(date_str, "")
return extracted_date, remainder
Tools
Ruff

216-216: Undefined name lang_code

(F821)

Comment on lines +299 to +391
@localized_function()
def extract_date(text, anchor_date=None, lang="", location=None):
"""
Extracts date information from a sentence. Parses many of the
common ways that humans express dates, including relative dates
like "5 days from today", "tomorrow', and "Tuesday".

Vague terminology are given arbitrary values, accounting for
geographic location, like:
- summer = XXX
- spring = XXX
- winter = XXX

Args:
text (str): the text to be interpreted
anchor_date (:obj:`datetime`, optional): the date to be used for
relative dating (for example, what does "tomorrow" mean?).
Defaults to the current local date/time.
lang (str): the BCP-47 code for the language to use, None uses default
location (str, float, float): ISO code, lat, lon of reference
location, used for holidays and seasons

Returns:
[:obj:`date`, :obj:`str`]: 'date' is the extracted date
as a date object in the user's local timezone.
'leftover_string' is the original phrase with all date
related keywords stripped out. See examples for further
clarification

Returns 'None' if no date related text is found.

Examples:

>>> extract_date(
... "What is the weather like the day after tomorrow?",
... date(2017, 06, 30)
... )
[datetime.date(2017, 7, 2), 'what is weather like']

>>> extract_date(
... "Set up an appointment 2 weeks from Sunday",
... date(2016, 02, 19)
... )
[datetime.datetime(2016, 3, 6), 'set up appointment']

>>> extract_date(
... "Set up an appointment",
... date(2016, 02, 19)
... )
None
"""

lang_code = get_primary_lang_code(lang)

if not anchor_date:
anchor_date = now_local()

if location is not None:
code, lat, lon = location
else:
code = get_active_location_code()
lat, lon = get_active_location()

if lat < 0:
hemisphere = Hemisphere.SOUTH
else:
hemisphere = Hemisphere.NORTH

if lang_code == "en":
extracted_date = extract_date_en(text, anchor_date,
hemisphere=hemisphere,
location_code=code)
else:
extracted_date = None
# TODO: extract_date for other languages
_log_unsupported_language(lang_code, ['en'])

if extracted_date is None:
# hard-parse, fallback to dateparser
# this brings "free support" for many languages
print("No dates found, falling back to strict parser")
_dates = search_dates(text, languages=[lang_code],
settings={'RELATIVE_BASE': anchor_date,
'STRICT_PARSING': True})
if _dates:
# return first date only
# TODO extract_dates
# TODO extract_date_range
date_str, extracted_datetime = _dates[0]
remainder = text.replace(date_str, "")
extracted_date = extracted_datetime.date()

return extracted_date
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix undefined variables and unused assignments.

The variables extract_date_en and _log_unsupported_language are undefined. The variable code is assigned but never used.

- code = get_active_location_code()
+ code = get_active_location_code()  # Ensure this is used or remove it.
- extracted_date = extract_date_en(text, anchor_date,
- _log_unsupported_language(lang_code, ['en'])
+ # Define or import `extract_date_en` and `_log_unsupported_language`

Committable suggestion was skipped due to low confidence.

Tools
Ruff

362-365: Use ternary operator hemisphere = Hemisphere.SOUTH if lat < 0 else Hemisphere.NORTH instead of if-else-block

Replace if-else-block with hemisphere = Hemisphere.SOUTH if lat < 0 else Hemisphere.NORTH

(SIM108)


368-368: Undefined name extract_date_en

(F821)


374-374: Undefined name _log_unsupported_language

(F821)


388-388: Local variable remainder is assigned to but never used

Remove assignment to unused variable remainder

(F841)

@@ -15,6 +15,16 @@
#
from datetime import datetime
from dateutil.tz import gettz, tzlocal
from dateutil.relativedelta import relativedelta
from lingua_franca.lang import get_primary_lang_code
from datetime import timedelta, datetime, date
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove redefined import.

The import datetime is redefined.

- from datetime import timedelta, datetime, date
+ from datetime import timedelta, date
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from datetime import timedelta, datetime, date
from datetime import timedelta, date
Tools
Ruff

20-20: Redefinition of unused datetime from line 16

Remove definition: datetime

(F811)

Comment on lines +144 to +155
def weekday_to_int(weekday, lang=None):
if isinstance(weekday, int) or isinstance(weekday, float):
return int(weekday)
lang_code = get_primary_lang_code(lang)
inv_map = {}
if lang_code.startswith("en"):
inv_map = {v: k for k, v in _WEEKDAY_SHORT_EN.items()}

for short in inv_map:
if weekday.startswith(short):
return inv_map[short]
return None
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merge isinstance calls.

Merge multiple isinstance calls for weekday.

- if isinstance(weekday, int) or isinstance(weekday, float):
+ if isinstance(weekday, (int, float)):
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def weekday_to_int(weekday, lang=None):
if isinstance(weekday, int) or isinstance(weekday, float):
return int(weekday)
lang_code = get_primary_lang_code(lang)
inv_map = {}
if lang_code.startswith("en"):
inv_map = {v: k for k, v in _WEEKDAY_SHORT_EN.items()}
for short in inv_map:
if weekday.startswith(short):
return inv_map[short]
return None
def weekday_to_int(weekday, lang=None):
if isinstance(weekday, (int, float)):
return int(weekday)
lang_code = get_primary_lang_code(lang)
inv_map = {}
if lang_code.startswith("en"):
inv_map = {v: k for k, v in _WEEKDAY_SHORT_EN.items()}
for short in inv_map:
if weekday.startswith(short):
return inv_map[short]
return None
Tools
Ruff

145-145: Multiple isinstance calls for weekday, merge into a single call

Merge isinstance calls for weekday

(SIM101)

Comment on lines +131 to +141
def month_to_int(month, lang=None):
if isinstance(month, int) or isinstance(month, float):
return int(month)
lang_code = get_primary_lang_code(lang)
inv_map = {}
if lang_code.startswith("en"):
inv_map = {v: k for k, v in _MONTH_SHORT_EN.items()}
for short in inv_map:
if month.startswith(short):
return inv_map[short]
return None
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merge isinstance calls.

Merge multiple isinstance calls for month.

- if isinstance(month, int) or isinstance(month, float):
+ if isinstance(month, (int, float)):
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def month_to_int(month, lang=None):
if isinstance(month, int) or isinstance(month, float):
return int(month)
lang_code = get_primary_lang_code(lang)
inv_map = {}
if lang_code.startswith("en"):
inv_map = {v: k for k, v in _MONTH_SHORT_EN.items()}
for short in inv_map:
if month.startswith(short):
return inv_map[short]
return None
def month_to_int(month, lang=None):
if isinstance(month, (int, float)):
return int(month)
lang_code = get_primary_lang_code(lang)
inv_map = {}
if lang_code.startswith("en"):
inv_map = {v: k for k, v in _MONTH_SHORT_EN.items()}
for short in inv_map:
if month.startswith(short):
return inv_map[short]
return None
Tools
Ruff

132-132: Multiple isinstance calls for month, merge into a single call

Merge isinstance calls for month

(SIM101)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant