We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Found this issue when analysing the result of the page Diffraction. ID: 8603 In section "Patterns" there are three bullet points:
The angular spacing of the features... ...
These bullet points are ignore and not included in the final cleaned text. I think is because of the asterisk.
To replicate:
I extracted the page with extractPage, then created a new file with the single page from its output. Then executed the WikiExtractor.
extractPage
WikiExtractor
python -m wikiextractor.extractPage --id 8603 enwiki-latest-pages-articles-multistream.xml.bz2
python -m wikiextractor.WikiExtractor page_8603.xml --json -o teste
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Found this issue when analysing the result of the page Diffraction. ID: 8603
In section "Patterns" there are three bullet points:
These bullet points are ignore and not included in the final cleaned text. I think is because of the asterisk.
To replicate:
I extracted the page with
extractPage
, then created a new file with the single page from its output. Then executed theWikiExtractor
.python -m wikiextractor.extractPage --id 8603 enwiki-latest-pages-articles-multistream.xml.bz2
python -m wikiextractor.WikiExtractor page_8603.xml --json -o teste
The text was updated successfully, but these errors were encountered: