Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Harvester not removing content from geoportal that has been removed from source WAF #188

Open
MikeRoyer-NOAA opened this issue Sep 30, 2022 · 5 comments

Comments

@MikeRoyer-NOAA
Copy link

MikeRoyer-NOAA commented Sep 30, 2022

Harvester 2.6.4
A harvester task is set up to pull from a WAF and some XML files that have been removed from the source WAF are not being removed from the geoportal. The harvester history for the task reports it acted upon 14537 xml files and the geoportal reports that it has 14852 items in the source of origin (i.e. harvester task). The tasks is not run incrementally. Isn't harvester supposed to remove anything that is not in the source WAF from the geoportal when the task runs?

@mhogeweg
Copy link
Member

mhogeweg commented Oct 5, 2022

have you set the Geoportal output broker to 'perform cleanup'? That is what determines if the harvester will attempt to remove existing items from Geoportal.

@MikeRoyer-NOAA
Copy link
Author

Yes, the Harvester output broker has the "Perform cleanup" checked. Does it perform cleanup every time a Harvester task is run or on some frequency?

@mhogeweg
Copy link
Member

mhogeweg commented Oct 5, 2022

It should do it every time it runs a task. Is your WAF public? I can do some testing on my end

@MikeRoyer-NOAA
Copy link
Author

I'm checking with my user base on whether the WAF is public.

In the meantime, can you tell me what the Failed (in/out) column on the history page means. Does "Failed in" mean that the xml is not formed properly and "Failed out" mean that there is some issue with the content within the XML and both situations are not loaded into the output broker?

@mhogeweg
Copy link
Member

mhogeweg commented Oct 8, 2022

I added some info on the history page in the wiki: https://github.com/Esri/geoportal-server-harvester/wiki/Tasks#history

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants