-
Notifications
You must be signed in to change notification settings - Fork 151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't dump biografias.bcn.cl #314
Comments
* http://biografias.bcn.cl/api.php does not like the data to be POSTed. Just use URL parameters. Some wikis had anti-spam protections which made us POST everything, but for most wikis this should be fine. * If the index is not defined, don't fail. * Use only the base api.php URL, not parameters, in domain2prefix. #314
They have some weird webserver rule which breaks all the requests to index.php with the normal parameters: http://biografias.bcn.cl/index.php?title=Especial:Exportar&e=1 works, http://biografias.bcn.cl/index.php?title=Especial:Exportar is a 404. We have just introduced the However, it still fails: If you know the website maintainer, please ask them to inspect their webserver configuration. I see they're redirecting a lot of requests to some other CMS, they probably made some mistake in the middle of those rules. |
Hi Nemo, It's a weird wiki setup. If you visit http://biografias.bcn.cl/wiki/Especial:Exportar, it also works. The website is mantained by employees of the Library of the National Congress of Chile; and they seem to have created another website (i.e. https://www.bcn.cl/historiapolitica/resenas_parlamentarias/wiki/Jos%C3%A9_Manuel_Isla_Hevia) which "forks" the wiki articles into another layout. I'm writing them an email but I don't think they'll make a change :( |
Diego Grez-Cañete, 17/05/2018 09:58:
It's a weird wiki setup. If you visit
http://biografias.bcn.cl/wiki/Especial:Exportar, it also works.
Yeah. It's a relatively small wiki, you can also try and export the XML
by hand. Just paste the page titles (which dumpgenerator.py manages to
produce) in Especial:Exportar in groups of 5000, 1000, 500... whatever
doesn't break. Then compress the different XML files together and upload
to Internet Archive.
|
Yeah, I have done that already (dumping, I mean, just haven't uploaded it to the IA). But I haven't been able to download the images |
Nevermind, now it's downloading images! Thanks! |
It had some trouble with file names "Abdón". Added the following at the beginning of dumpgenerator and it went smooth afterwards
|
biografias.bcn.cl downloadable here https://archive.org/details/BCN.7z |
Ok, I don't think we can/should do that sys.setdefaultencoding: https://stackoverflow.com/a/3828742 (but we should definitely fix any Unicode issues sooner or later...). I think we can't do much more here, reopen if this wiki has some issue shared with other wikis. |
I'm having real trouble trying to dump http://biografias.bcn.cl/api.php
They're using Mediawiki 1.17.0 and not even the older versions of dumpgenerator.py will work with this!
I've tried dumping other wikis and they work, this one won't
If someone could help me, it would be much appreciated for sure!
C:\Python27>python "C:\Users\diego\Desktop\Diccionario\wikiteam-master\dumpgenerator.py" --api=http://biografias.bcn.cl/api.php --xml --images
\n\n\n'Please install the wikitools 1.3+ module if you want to use --xmlrevisions.
Checking API... http://biografias.bcn.cl/api.php
u'\n\n\n\t<title>MediaWiki API</title>\n\n\n
MediaWiki API returned data we could not parse
API not available. Trying with index.php only.
Checking index.php... http://biografias.bcn.cl/index.php
Traceback (most recent call last):
File "C:\Users\diego\Desktop\Diccionario\wikiteam-master\dumpgenerator.py", line 2195, in
main()
File "C:\Users\diego\Desktop\Diccionario\wikiteam-master\dumpgenerator.py", line 2140, in main
config, other = getParameters(params=params)
File "C:\Users\diego\Desktop\Diccionario\wikiteam-master\dumpgenerator.py", line 1533, in getParameters
index = '/'.join(index.split('/')[:-1])
AttributeError: 'NoneType' object has no attribute 'split'
C:\Python27>
The text was updated successfully, but these errors were encountered: