Skip to content

Commit

Permalink
Fixed KeyError: already_indexed_links on processing web feed for an i…
Browse files Browse the repository at this point in the history
…ncremental index when more than one site is concurrently indexed
  • Loading branch information
m-i-l committed Jul 9, 2023
1 parent cf3edac commit c31629c
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions src/indexing/search_my_site_scheduler.py
Original file line number Diff line number Diff line change
Expand Up @@ -159,11 +159,11 @@
site_to_crawl['contents'] = contents
# already_indexed_links, i.e. pages on this domain which have already been indexed.
# This is only set if it is needed, i.e. for an incremental index.
if site['full_index'] == False:
if site_to_crawl['full_index'] == False:
already_indexed_links = get_already_indexed_links(site_to_crawl['domain'])
no_of_already_indexed_links = len(already_indexed_links)
indexing_page_limit = site_to_crawl['indexing_page_limit']
if no_of_already_indexed_links == indexing_page_limit:
if no_of_already_indexed_links >= indexing_page_limit:
# if the indexing_page_limit was reached in the last index then abandon this index
# update the status in the database so that it isn't selected again until the next scheduled full or incremental reindex
sites_to_crawl.remove(site_to_crawl)
Expand Down

0 comments on commit c31629c

Please sign in to comment.