Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix #88 and #93: multiple decimal places and multiple decimal numbers #91

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

ChanceNCounter
Copy link
Contributor

@ChanceNCounter ChanceNCounter commented Mar 16, 2020

  • extract_number(), extract_numbers(), and all
    helper functions gain a keyword parameter
    decimal_places (for helpers, just places)
    which does what it sounds like, using builtin
    round().

  • avoid capturing non-adjacent numbers as decimal
    places

  • avoid capturing already-used decimal places as
    separate numbers in extract_numbers()

  • add a few tests for the above


closes #88, #93

@devs-mycroft devs-mycroft added the CLA: Yes Contributor License Agreement exists (see https://github.com/MycroftAI/contributors) label Mar 16, 2020
@ChanceNCounter
Copy link
Contributor Author

found bug: multiple instances of "point" or "dot" in the input string confuse the parser.

 * extract_number(), extract_numbers(), and all
   helper functions gain a keyword parameter
   `decimal_places` (for helpers, just `places`)
   which does what it sounds like, using builtin
   round().

 * avoid capturing non-adjacent numbers as decimal
   places

 * avoid capturing already-used decimal places as
   separate numbers in extract_numbers()

 * add a few tests for the above
@ChanceNCounter ChanceNCounter changed the title fix #88: multiple decimal places fix #88 and #93: multiple decimal places and multiple decimal numbers Mar 24, 2020
Copy link
Contributor

@krisgesling krisgesling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Chance, loving all your work on this stuff.

A few questions and suggestions here, but push back if you disagree :)

lingua_franca/parse.py Outdated Show resolved Hide resolved
lingua_franca/lang/parse_en.py Show resolved Hide resolved
lingua_franca/lang/parse_en.py Show resolved Hide resolved
if "." not in str(decimal.text):
return number.value + float('0.' + str(decimal.value)), \
number.tokens + partitions[1] + decimal.tokens
if "." not in str(numbers2[0].text):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be updated to check for the new _DECIMAL_MARKER

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By this point, _DECIMAL_MARKER has done its job. This bit is easier to look at in a debugger than it is to explain, but numbers2 is a list of ReplaceableNumber objects corresponding to the digits of our decimal part. It's going to be joined, then appended to a decimal point, then cast to a float and summed with the whole number part of our result.

The if statement here goes back to Core, and it makes sure the first element of numbers2 doesn't already have a decimal point. It's hypothetically possible, but I can't figure out how, and removing the if clause doesn't break any test cases. Still, I left it there, just removed a superfluous variable is all.

I'll see if I can work backwards through git blame and ask whoever wrote the if statement. I was assuming it had something to do with the way the tokenizer handles certain input. If it's just a relic, I'm all for removing line 318 entirely.

Copy link
Collaborator

@JarbasAl JarbasAl Apr 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it doesnt break test cases, i say let's remove it or find a test case for it

lingua_franca/lang/parse_en.py Outdated Show resolved Hide resolved
lingua_franca/lang/parse_en.py Outdated Show resolved Hide resolved
Comment on lines 322 to 323
if return_value == int(return_value):
return_value = int(return_value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I was calling an extract_decimal function I'd expect a float is returned. What's the benefit of returning an integer instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be and not places, good catch! That is, this is fundamentally a helper invoked by extract_number(). Unless the user has asked for decimal places, I'd figured no rounding means a whole number is just that. On the other hand, you're right, that doesn't work either: "one" is a whole number. "one point zero" equals a whole number.

What if decimal_places=None does no rounding, decimal_places=0 always returns an int, and decimal_places > 0 does what it does?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Yeah that sounds intuitive to me

lingua_franca/lang/parse_en.py Show resolved Hide resolved
lingua_franca/lang/parse_en.py Outdated Show resolved Hide resolved
- more visible funcs now have more explicitly-named parameter for places
- `extract_number(decimal_places=...)` now has several options:
  - `decimal_places=n` will round to `n` places
  - `decimal_places=0` will round up to nearest int, equiv. ceil(result)
  - `decimal_places=-1` will round down to int, equiv. floor(result)
  - expanded comments and docstrings
- remove old commented-out code
@@ -1487,11 +1542,12 @@ def extract_numbers_en(text, short_scale=True, ordinals=False):
is now common in most English speaking countries.
See https://en.wikipedia.org/wiki/Names_of_large_numbers
ordinals (bool): consider ordinal numbers, e.g. third=3 instead of 1/3
decimal_places (int or False): rounds to # decimal places. uses builtin round()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more int or False

Comment on lines 97 to 103
decimal_places (int or None): Positive value will round to X places.
Val of 0 will round up to nearest int,
equivalent to `math.ceil(result)`
Val of -1 will round down to nearest int,
equivalent to `math.floor(result)`
Val of None will perform no rounding,
potentially returning a very long string.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this a previous idea for implementation? Assuming this will be the new behaviour you described where 0 is a regular rounding and there is no handling of a -1 input?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point, I can't decide which way is better. There are three things at play:

  • If these functions don't round, their output could be a string of stupendous length
  • You don't always want to round the same way
  • Sometimes you feel like an int, sometimes you don't

@ChanceNCounter
Copy link
Contributor Author

The status box on this PR seems to be broken. If I click on Travis, it shows all the builds completed successfully.

@forslund
Copy link
Collaborator

It happens sometimes with the checks, if there's a communication error on the webhook the status update can be lost.

@JarbasAl JarbasAl added bug Something isn't working en relates to english language labels Nov 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CLA: Yes Contributor License Agreement exists (see https://github.com/MycroftAI/contributors) en relates to english language
Projects
None yet
Development

Successfully merging this pull request may close these issues.

extract_number does not directly handle multiple decimal places
5 participants