Improve item scanner #11

reaganchisholm · 2023-11-14T04:04:57Z

As discussed in Discord, this PR improves item scanner reliability (#10) as well as a few other updates. It should now recognize most expansion items minus Artifacts (I think we can fix this now that we get the item name consistently)

PR includes:

Update tesseract to 5.0
Add image processing step to more reliably extract text we need for identifying items
Add some additional optimizations to identifying function (get full item name, determine rarity, cutting out specific keywords for English)
Add testing for item scanner for a set of English sample images while laying the groundwork for adding samples for other languages as well

The biggest change is images now go through processing step after getting uploaded before being passed to the identifying function, we apply grayscale and a curve effect and then draw black squares over specific parts of the image to hide visual noise that confuses tesseract.

Example of what this looks like:

Some areas that can be further improved:

Currently have special handling to cut out "of the" suffixes for English, this seems to help improve results but not sure the best way to approach this for translations. I guess we would need to have a list of every suffix and check for that depending on the language.
Artifacts still don't seem to get recognized, I tried Endless Thirst and Tumbler Feetwraps, the data that gets returned from the function seem accurate but the item searching step doesn't seem to return the correct item even though we should have an exact name match.
Some times the "Magnify" stat line includes the word "highest:" which would trick the filter and end up in the perk list, I've added a fix for English but this wouldn't be reliable for other languages

Even with those areas that could be improved, I think this should help improve the consistency for users. Let me know if you'd like any changes or adjustments, or feel free to change anything needed.

giniedp · 2023-11-14T08:07:05Z

Great job. I'm going to merge later today.
Magnify is indeed a problem and is subject to change. I think on the PTR the wording has changed, yet the colon still stays.
For the item name we can solve for all languages as follows

the item name is a combination of PREFIX + NAME + SUFFIX
PREFIX is given by GEM perk
NAME is given by item
SUFFIX is given by Attribute perk
prefix and suffix translations are already collected in the getPerkData function. Inside the buildItem function we can use that data to generate the localized item name, unless the item has IgnoreNameChanges set to true. Using that name, we should get a better match on name.

reaganchisholm added 11 commits November 11, 2023 12:29

Update tesseract to 5.0

8020d52

More work on improving the item scanner

582ab4c

wip: test

7b198bd

Revert test progress

02e4a32

Add more sample images, rename some

181c4e3

cleaning up file

7c40838

Load all languages

1773e04

Merge stage

1af1f37

more work on tests, changing up samples, updating recognize item

aa46fef

more work on item scanner test

7ab1638

Remove imbued logic

627d7c8

giniedp merged commit c94e78f into giniedp:stage Nov 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve item scanner #11

Improve item scanner #11

reaganchisholm commented Nov 14, 2023

giniedp commented Nov 14, 2023

Improve item scanner #11

Improve item scanner #11

Conversation

reaganchisholm commented Nov 14, 2023

giniedp commented Nov 14, 2023