Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Correct search count #259

Merged
merged 3 commits into from
Jul 13, 2023
Merged

🐛 Correct search count #259

merged 3 commits into from
Jul 13, 2023

Conversation

kirkkwang
Copy link
Contributor

@kirkkwang kirkkwang commented Jul 13, 2023

🐛 Correct search count

A funky bug was observed when on a parent work of a PDF, the count was
off by one. It registers OCR hits as both an OCR hit and a metadata
hit. This is likely because of adding snippets in the search since all
the file sets' texts need to be indexed on the parent work as well. The
parent work essentially would double all the texts found. This commit
is a bit hacky but it removes that extra hit while keeping functionality
for both OCR hits and metadata hits. This is a bit of future proofing
since it only would happen in applications with snippets enabled.

Before image
After image

♻️ Refactor and fix annotation count

This commit will refactor #annotation_list by breaking it down a bit and
also adjusts and compensates the json for removing the invalid hit.

IIIF Search JSON image

A funky bug was observed when on a parent work of a PDF, the count was
off by one.  It registers OCR hits as both an OCR hit and a metadata
hit.  This is likely because of adding snippets in the search since all
the file sets' texts need to be indexed on the parent work as well.  The
parent work essentially would double all the texts found.  This commit
is a bit hacky but it removes that extra hit while keeping functionality
for both OCR hits and metadata hits.  This is a  bit of future proofing
since it only would happen in applications with snippets enabled.
This commit will refactor #annotation_list by breaking it down a bit and
also adjusts and compensates the json for removing the invalid hit.
def check_coords_json_and_properties(coords_json, sanitized_query)
return if coords_json && coords_json['coords']

properties = @document.keys.select { |key| key.ends_with? "_tesim" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

returning_value = INVALID_MATCH_TEXT

@document.each_pair do |key, value|
   next if key.ends_with?("_tesim")
   next unless value.join.downcase.include?(sanitized_query)

   returning_value = default_coords
   break
  end
end

returning_value

return default_coords unless coords_json && coords_json['coords']
sanitized_query = query.match(additional_query_terms_regex)[1].strip

coords_check_result = check_coords_json_and_properties(coords_json, sanitized_query)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about moving the guard condition of the original method outside of the original method, as it has a surprising return value (based on the given method name).

return derived_coords_json_and_properties(coords_json, sanitized_query) unless coords_json['coords']

Copy link
Contributor

@jeremyf jeremyf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left comments that are non-blocking.

This commit will bring in review changes and make the method clearer to
read.
@kirkkwang kirkkwang merged commit b61644c into main Jul 13, 2023
9 checks passed
@kirkkwang kirkkwang deleted the ocr-count-correction branch July 13, 2023 19:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants