Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve item scanner #11

Merged
merged 11 commits into from
Nov 14, 2023
Merged

Conversation

reaganchisholm
Copy link
Contributor

As discussed in Discord, this PR improves item scanner reliability (#10) as well as a few other updates. It should now recognize most expansion items minus Artifacts (I think we can fix this now that we get the item name consistently)

PR includes:

  • Update tesseract to 5.0
  • Add image processing step to more reliably extract text we need for identifying items
  • Add some additional optimizations to identifying function (get full item name, determine rarity, cutting out specific keywords for English)
  • Add testing for item scanner for a set of English sample images while laying the groundwork for adding samples for other languages as well

The biggest change is images now go through processing step after getting uploaded before being passed to the identifying function, we apply grayscale and a curve effect and then draw black squares over specific parts of the image to hide visual noise that confuses tesseract.

Example of what this looks like:
importer-solution-example

Some areas that can be further improved:

  • Currently have special handling to cut out "of the" suffixes for English, this seems to help improve results but not sure the best way to approach this for translations. I guess we would need to have a list of every suffix and check for that depending on the language.
  • Artifacts still don't seem to get recognized, I tried Endless Thirst and Tumbler Feetwraps, the data that gets returned from the function seem accurate but the item searching step doesn't seem to return the correct item even though we should have an exact name match.
  • Some times the "Magnify" stat line includes the word "highest:" which would trick the filter and end up in the perk list, I've added a fix for English but this wouldn't be reliable for other languages

Even with those areas that could be improved, I think this should help improve the consistency for users. Let me know if you'd like any changes or adjustments, or feel free to change anything needed.

@giniedp
Copy link
Owner

giniedp commented Nov 14, 2023

Great job. I'm going to merge later today.
Magnify is indeed a problem and is subject to change. I think on the PTR the wording has changed, yet the colon still stays.
For the item name we can solve for all languages as follows

  • the item name is a combination of PREFIX + NAME + SUFFIX
  • PREFIX is given by GEM perk
  • NAME is given by item
  • SUFFIX is given by Attribute perk
    prefix and suffix translations are already collected in the getPerkData function. Inside the buildItem function we can use that data to generate the localized item name, unless the item has IgnoreNameChanges set to true. Using that name, we should get a better match on name.

@giniedp giniedp merged commit c94e78f into giniedp:stage Nov 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants