Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting started #124

Open
yonas opened this issue Feb 4, 2024 · 1 comment
Open

Getting started #124

yonas opened this issue Feb 4, 2024 · 1 comment
Labels
documentation Improvements or additions to documentation

Comments

@yonas
Copy link

yonas commented Feb 4, 2024

I've compiled stract via cargo build --release. What do I do next?

How much disk space is required?

I can run the indexer / crawler / scraper via stract indexer, stract crawler and stract autosuggest-scrape.

  • do you need to run the crawler first?

I can run the search servers via stract search-server and stract entity-search-server.

I can run the API server via stract api.

@mikkeldenker
Copy link
Member

mikkeldenker commented Feb 5, 2024

Hi!
Yea I really need to write a proper getting started guide and provide some data that can bootstrap the index. You can get an idea of how to run the engine after the index has been built by studying the scripts/run_dev.py file and looking at the corresponding config files in configs/.

To build the index you would need to perform the following main steps

  • (optionally) run the crawler to crawl some pages and save them in .warc files. The current crawler architecture requires a crawl plan to be built before the crawl can be executed. Commoncrawl distributes a giant dataset of these files, so you can actually totally skip having to run Stracts own crawler, which makes it a lot easier to get started.
  • build the webgraph using the stract webgraph create command. The config file you want to look at here is located at configs/webgraph/create.toml.
  • calculate the harmonic centrality for each page/host using stract centrality.
  • build the index using stract indexer search. The config file configs/indexer/create.toml should help get you started.

This should create an index which you can run and execute searches against. Unfortunately I don't have a neat overview of the available fields in each config file, but all of them are defined in crates/core/src/config/mod.rs.

I'll keep this issue open until I have created a proper getting started page.

@mikkeldenker mikkeldenker added the documentation Improvements or additions to documentation label Feb 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants