Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can this be used to to crawl a specific website and be used for a "website search" #216

Open
drwankingstein opened this issue Aug 20, 2024 · 2 comments

Comments

@drwankingstein
Copy link

Was wondering if it was possible to use this as a website specific search, in place of the "powered by google" search you often see. If so what would the process of setting this up look like? I did try to look into it, but i'm not sure how to setup the crawler and stuff to crawl specific website(s)

@mikkeldenker
Copy link
Member

Stracts crawler can't be limited to specific sites, but the index is built from plain .warc files so other crawlers such as nutch and heritrix should also work. I don't have experience with them so I don't know if they can be limited to specific sites, but they might.

As far as I know the 'powered by google' actually just executes a search {query} site:{site} to google which would very much be possible to build on top of stracts api as well.

@satonotdead
Copy link

This will be a must! Was discussed before :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants