Skip to content

Commit

Permalink
🐛 Correct search count
Browse files Browse the repository at this point in the history
A funky bug was observed when on a parent work of a PDF, the count was
off by one.  It registers OCR hits as both an OCR hit and a metadata
hit.  This is likely because of adding snippets in the search since all
the file sets' texts need to be indexed on the parent work as well.  The
parent work essentially would double all the texts found.  This commit
is a bit hacky but it removes that extra hit while keeping functionality
for both OCR hits and metadata hits.  This is a  bit of future proofing
since it only would happen in applications with snippets enabled.
  • Loading branch information
kirkkwang committed Jul 12, 2023
1 parent fbe418a commit ec4b03c
Show file tree
Hide file tree
Showing 2 changed files with 28 additions and 2 deletions.
4 changes: 4 additions & 0 deletions app/models/iiif_print/iiif_search_response_decorator.rb
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,10 @@ def annotation_list
json_results = super
resources = json_results&.[]('resources')

resources.delete_if do |resource|
resource["on"].include?(IiifPrint::BlacklightIiifSearch::AnnotationDecorator::INVALID_MATCH_TEXT)
end

resources&.each do |result_hit|
next if result_hit['resource'].present?
result_hit['resource'] = {
Expand Down
26 changes: 24 additions & 2 deletions lib/iiif_print/blacklight_iiif_search/annotation_decorator.rb
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
module IiifPrint
module BlacklightIiifSearch
module AnnotationDecorator
INVALID_MATCH_TEXT = "xywh=INVALID,INVALID,INVALID,INVALID".freeze
##
# Create a URL for the annotation
# use a Hyrax-y URL syntax:
Expand All @@ -28,16 +29,23 @@ def canvas_uri_for_annotation
# @return [String]
def coordinates
return default_coords if query.blank?
coords_json = fetch_and_parse_coords
return default_coords unless coords_json && coords_json['coords']

sanitized_query = query.match(additional_query_terms_regex)[1].strip
coords_json = fetch_and_parse_coords

coords_check_result = check_coords_json_and_properties(coords_json, sanitized_query)
return coords_check_result if coords_check_result

query_terms = sanitized_query.split(' ').map(&:downcase)

matches = coords_json['coords'].select do |k, _v|
k.downcase =~ /(#{query_terms.join('|')})/
end
return default_coords if matches.blank?

coords_array = matches.values.flatten(1)[hl_index]
return default_coords unless coords_array

"#xywh=#{coords_array.join(',')}"
end

Expand All @@ -54,6 +62,20 @@ def fetch_and_parse_coords
end
end

# This is a bit hacky but it is checking if any of the properties contain the query term
# if there are no coords and there is a metadata property match
# then we return the default coords
# else we insert a invalid match text to be stripped out at a later point
# @see IiifPrint::IiifSearchResponseDecorator#annotation_list
def check_coords_json_and_properties(coords_json, sanitized_query)
return if coords_json && coords_json['coords']

properties = @document.keys.select { |key| key.ends_with? "_tesim" }
properties.each { |property| return default_coords if @document[property].join.downcase.include?(sanitized_query) }

INVALID_MATCH_TEXT
end

##
# a default set of coordinates
# @return [String]
Expand Down

0 comments on commit ec4b03c

Please sign in to comment.