-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crash when converting a MediaWiki 1.15.5-1 to a fresh "Detritus" Dokuwiki #38
Comments
This seems like it should work. The Mediawiki API is returning some results (as shown by the version query and the "list of pages..." not returning errors. Maybe there is something in the older mediawiki install that causes it to fail. I don't know if updating it to a newer version is an option. Unfortunately I'm not able to provide much support for yamdwe any more, as advertised on the front page I'm looking for a new maintainer. Good luck finding the problem though! Angus |
I'm getting this same error on mediawiki 1.21.1. |
I seem to have fixed it by changing the URL I give to the script. I was doing python yamdwe.py http://localhost/mediawiki-1.21.1/ /var/www/dokuwiki/ by changing it to python yamdwe.py http://localhost/mediawiki-1.21.1/api.php /var/www/dokuwiki/ It then worked. I'm not sure if I missed something in the documentation telling you to put api.php, if it wasn't documented or if newer versions of mediawiki don't need you to put api.php? |
(As per @colinsauze's issue in comments of #38, and other previously reported problems.)
@colinsauze Thanks for following up. The docs do say to use /api.php (and in the original issue post from @malfonsi you'll see that /api.php was in the URL, so it's a different issue with a similar symptom.) However, people seem to miss this point in the docs a lot (which is a fair enough) so I just added a warning message if the URL doesn't end in api.php, and also a clearer error message if a non-JSON response comes back. @malfonsi I don't think any of this will help fix the problem you're seeing, unfortunately. For some reason the api.php on your wiki send back JSON for the first few requests, then returned some kind of non-JSON message. |
@malfonsi I just added a (It may be a lot of output depending on what the wiki is doing!) |
@projectgus Thanks for the extra command option. It helped to narrow down the source of the problem. Basically the script crash on the first page with this special sequence of characters (the extra information on the use of these characters comes from googling): Raw encoding (hex) | UTF-8 encoding | HTML entity (sorry if the table above does not show right) The funny parenthesis is that this is really present in the "wiki source code" (annoyingly invisible on simple text editors, but I "unveiled" them by copying & pasting the text on emacs and changing to ascii encoding). I have no clue how we did insert this special sequence in our pages ... maybe by copy & pasting from other web pages. Anyway let's close the parenthesis. Anyway, is there any option to catch and digest (ignore would be fine for me) this or similar sequences? I have only basic knowledge of python syntax and I am not familiar with any of the used modules, but maybe you can address me to the module that is making the conversion and I can try to go through the code (I have found few suggestions on stackoverflow by googling the error message). Thanks in advance for any additional hint. P.S. I attach below the error message in case my interpretation was completely wrong:
|
Hi @malfonsi, That's a very helpful update. Weird that the error message seems to have changed totally now, but well done tracking down the the specific problem with that Unicode sequence. I've just made an update that may solve this problem by allowing link filenames and captions to be unicode instead of just ASCII (Python 2 is really painful in its ASCII vs Unicode handling, I wish we could use Python 3 for yamdwe but the Mediawiki library mwlib doesn't support it yet.) Please let me know if that update improves things. Angus |
Dear @projectgus, sorry for the long silence, but I tried something by myself. The patch does not work, but I am now convinced that there is something wrong in my virtualenv (or in the Debian 8 environment at all). In fact I added these two lines to the main script "yamdwe.py", just after the main def: def main():
print("Start of the program")
print("Stupid sequence \342\200\216") and I get the error:
I made a shorter script "test.py": #from __future__ import print_function, unicode_literals, absolute_import, division
#import argparse, sys, codecs, locale, getpass, datetime
#from pprint import pprint
#import mediawiki, dokuwiki, wikicontent
def main():
print("Start")
print(u"Stupid string\u200e")
stupid = "Stupid string\342\200\216"
stupid2 = stupid.decode("utf-8")
print ( stupid2[13] )
print( len (stupid2.replace(u"\u200e","") ) )
main() which works UNTIL I KEEP COMMENTED THE IMPORTATION OF ALL THE MODULES. You can see there my attempt to get rid of this special html entity, because it cannot be part of a filename (if you go back to my previous post you see that the initial problem was while getting the name of the uploaded media) The error seems to come from one of the modules (again, I am not a python expert, I can only guess):
narrowing down the problem, it seems connected to the importation of the symbol Anyway: Thanks again, |
Something to add: I found this page that can maybe be useful: but with my python experience I need some time to digest it... |
I managed to solve the issue. Basically:
I think that this is something very specific to my case - and I admit that I have not really understood the real source of the problem, but rather I have just found a workaround for a tool that I need only once. |
Hey. I had the same error with the ascii codec stuff. |
(This is more a "help request" rather than an issue report, but I don't know how to communicate with the authors otherwise)
I am trying to convert an old Mediawiki (detected as v. 1.15.5-1) to the last version Dokuwiki "Detritus" on a Debian 8 system. I am not sure if I correctly followed the directions in the README.
The Dokuwiki installation was just installed, using the "install.php" script as suggested. I kept it publicly open and I checked that content can be added without problems (I basically edited the start page without logging in). I have write access (I mean as unix user) to these directories.
To run yamdwe, I set up a virtual environment as described. However the script crash after the message "Query page revisions (this may take a while)..."
You can see below the output:
Please let me know if you need additional information or you want to perform other tests
The text was updated successfully, but these errors were encountered: