Script throws AttributeError when parsing NCX navList #27

malsagulo · 2019-06-08T00:51:36Z

Not sure if you're even interested in coming back to this... but just for the record, I'm getting an AttributeError thrown when it tries to parse my NCX file for my fixed-layout book. I have a pageList in there acting as a sub-index for images/illustrations (which I got out of this part of the specification; Kindle Previewer 3 recognizes it). It looks something like this:

<!-- on same level as navMap toc -->
<navList>
  <navLabel><text>Images</text></navLabel>
  <navTarget id="i1">
    <navLabel><text>image caption</text></navLabel>
    <content src="pg1.html" />
  </navTarget>
  <navTarget id="i2">
    <navLabel><text>image caption</text></navLabel>
    <content src="pg3.html" />
  </navTarget>
  <navTarget id="i3">
    <navLabel><text>image caption</text></navLabel>
    <content src="pg7.html" />
  </navTarget>
  <!-- so on and so forth -->
</navList>

If this section is in the NCX file, the script throws the following error:

Traceback (most recent call last):
  File "KindleUnpack/lib/kindleunpack.py", line 1020, in <module>
    sys.exit(main())
  File "KindleUnpack/lib/kindleunpack.py", line 1008, in main
    unpackBook(infile, outdir, apnxfile, epubver, use_hd)
  File "KindleUnpack/lib/kindleunpack.py", line 923, in unpackBook
    process_all_mobi_headers(files, apnxfile, sect, mhlst, K8Boundary, False, epubver, use_hd)
  File "KindleUnpack/lib/kindleunpack.py", line 840, in process_all_mobi_headers
    processMobi8(mh, metadata, sect, files, rscnames, pagemapproc, k8resc, obfuscate_data, apnxfile, epubver)
  File "KindleUnpack/lib/kindleunpack.py", line 530, in processMobi8
    [junk1, junk2, junk3, fid, junk4, off] = ncxmap['pos_fid'].split(':')
AttributeError: 'NoneType' object has no attribute 'split'

Here's how I'm running the script:

python KindleUnpack/lib/kindleunpack.py -s fixed.mobi tmp

And here's the full verbose console output on the off chance it's of use to you:

KindleUnpack v0.82
   Based on initial mobipocket version Copyright © 2009 Charles M. Hannum <[email protected]>
   Extensive Extensions and Improvements Copyright © 2009-2014 
       by:  P. Durrant, K. Hendricks, S. Siebert, fandrieu, DiapDealer, nickredding, tkeo.
   This program is free software: you can redistribute it and/or modify
   it under the terms of the GNU General Public License as published by
   the Free Software Foundation, version 3.
Unpacking Book...
Palm DB type: BOOKMOBI, 143 sections.
Unpacking a Combination M8/KF8 book...
First Image, last Image 42 72
Processing Mobipocket 6 section of book...
Mobi Version: 6
Codec: utf-8
Title: Malsagulo's Book
Huffdic compression
Unpacking images, resources, fonts, etc
Extracting image: image00042.jpeg from section 42
Extracting image: image00043.jpeg from section 43
Extracting image: image00044.jpeg from section 44
Extracting image: image00045.jpeg from section 45
Extracting image: image00046.jpeg from section 46
Extracting image: image00047.jpeg from section 47
Extracting image: image00048.jpeg from section 48
Extracting image: image00049.jpeg from section 49
Extracting image: image00050.jpeg from section 50
Extracting image: image00051.jpeg from section 51
Extracting image: image00052.jpeg from section 52
Extracting image: image00053.jpeg from section 53
Extracting image: image00054.jpeg from section 54
Extracting image: image00055.jpeg from section 55
Extracting image: image00056.jpeg from section 56
Extracting image: image00057.jpeg from section 57
Extracting image: image00058.jpeg from section 58
Extracting image: image00059.jpeg from section 59
Extracting image: image00060.jpeg from section 60
Extracting image: image00061.jpeg from section 61
Extracting image: image00062.jpeg from section 62
Extracting image: image00063.jpeg from section 63
Extracting image: image00064.jpeg from section 64
Extracting image: image00065.jpeg from section 65
Extracting image: image00066.jpeg from section 66
Extracting image: image00067.jpeg from section 67
Extracting image: image00068.jpeg from section 68
Extracting image: cover00069.jpeg from section 69
Extracting image: image00071.jpeg from section 71
Extracting Page Map Information
File contains kindlegen source archive, extracting as kindlegensrc.zip
File contains kindlegen build log, extracting as kindlegenbuild.log
Unpacking raw markup language
Write ncx
Find link anchors
Insert data into html
Insert hrefs into html
Remove empty anchors from html
Insert image references into html
Building an opf for mobi7/azw4.
Processing K8 section of book...
Mobi Version: 8
Codec: utf-8
Title: Malsagulo's Book
Huffdic compression
Unpacking images, resources, fonts, etc
Extracting Page Map Information
Unpacking raw markup language
Processing ncx / toc
Traceback (most recent call last):
  File "KindleUnpack/lib/kindleunpack.py", line 1020, in <module>
    sys.exit(main())
  File "KindleUnpack/lib/kindleunpack.py", line 1008, in main
    unpackBook(infile, outdir, apnxfile, epubver, use_hd)
  File "KindleUnpack/lib/kindleunpack.py", line 923, in unpackBook
    process_all_mobi_headers(files, apnxfile, sect, mhlst, K8Boundary, False, epubver, use_hd)
  File "KindleUnpack/lib/kindleunpack.py", line 840, in process_all_mobi_headers
    processMobi8(mh, metadata, sect, files, rscnames, pagemapproc, k8resc, obfuscate_data, apnxfile, epubver)
  File "KindleUnpack/lib/kindleunpack.py", line 530, in processMobi8
    [junk1, junk2, junk3, fid, junk4, off] = ncxmap['pos_fid'].split(':')
AttributeError: 'NoneType' object has no attribute 'split'

Otherwise, thanks for the useful tool. It's been a big help -- and still will be, so long as I remember to comment out that part of the NCX. :)

The text was updated successfully, but these errors were encountered:

kevinhendricks · 2019-06-08T02:23:19Z

Fixed layouts are not part of the epub2 spec. So you should be feeding kindlegen an epub3 file with the page-list properly formatted in the nav and not an toc.ncx. Either that or use Sigil to create a proper ncx from a properly formatted nav. You should also be unpacking the mobi specifically for epub3, not 2. Happy to debug this but I will need a copy of the source epub3 and the resulting kindlegen generated mobi so that I can see if what is growing on. The error you are getting just means the mobi8 index used for the toc info is missing required fields.

…

On Jun 7, 2019, at 8:51 PM, malsagulo ***@***.***> wrote: Not sure if you're even interested in coming back to this... but just for the record, I'm getting an AttributeError thrown when it tries to parse my NCX file for my fixed-layout book. I have a pageList in there acting as a sub-index for images/illustrations (which I got out of this part of the specification; Kindle Previewer 3 recognizes it). It looks something like this:  <navList> <navLabel><text>Images</text></navLabel> <navTarget id="i1"> <navLabel><text>image caption</text></navLabel> <content src="pg1.html" /> </navTarget> <navTarget id="i2"> <navLabel><text>image caption</text></navLabel> <content src="pg3.html" /> </navTarget> <navTarget id="i3"> <navLabel><text>image caption</text></navLabel> <content src="pg7.html" /> </navTarget>  </navList> If this section is in the NCX file, the script throws the following error: Traceback (most recent call last): File "KindleUnpack/lib/kindleunpack.py", line 1020, in <module> sys.exit(main()) File "KindleUnpack/lib/kindleunpack.py", line 1008, in main unpackBook(infile, outdir, apnxfile, epubver, use_hd) File "KindleUnpack/lib/kindleunpack.py", line 923, in unpackBook process_all_mobi_headers(files, apnxfile, sect, mhlst, K8Boundary, False, epubver, use_hd) File "KindleUnpack/lib/kindleunpack.py", line 840, in process_all_mobi_headers processMobi8(mh, metadata, sect, files, rscnames, pagemapproc, k8resc, obfuscate_data, apnxfile, epubver) File "KindleUnpack/lib/kindleunpack.py", line 530, in processMobi8 [junk1, junk2, junk3, fid, junk4, off] = ncxmap['pos_fid'].split(':') AttributeError: 'NoneType' object has no attribute 'split' Here's how I'm running the script: ./KindleGen_Mac/kindlegen project-fixed/fixed.opf -c2 -verbose -o fixed.mobi And here's the full verbose console output on the off chance it's of use to you: KindleUnpack v0.82 Based on initial mobipocket version Copyright © 2009 Charles M. Hannum ***@***.***> Extensive Extensions and Improvements Copyright © 2009-2014 by: P. Durrant, K. Hendricks, S. Siebert, fandrieu, DiapDealer, nickredding, tkeo. This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, version 3. Unpacking Book... Palm DB type: BOOKMOBI, 143 sections. Unpacking a Combination M8/KF8 book... First Image, last Image 42 72 Processing Mobipocket 6 section of book... Mobi Version: 6 Codec: utf-8 Title: Malsagulo's Book Huffdic compression Unpacking images, resources, fonts, etc Extracting image: image00042.jpeg from section 42 Extracting image: image00043.jpeg from section 43 Extracting image: image00044.jpeg from section 44 Extracting image: image00045.jpeg from section 45 Extracting image: image00046.jpeg from section 46 Extracting image: image00047.jpeg from section 47 Extracting image: image00048.jpeg from section 48 Extracting image: image00049.jpeg from section 49 Extracting image: image00050.jpeg from section 50 Extracting image: image00051.jpeg from section 51 Extracting image: image00052.jpeg from section 52 Extracting image: image00053.jpeg from section 53 Extracting image: image00054.jpeg from section 54 Extracting image: image00055.jpeg from section 55 Extracting image: image00056.jpeg from section 56 Extracting image: image00057.jpeg from section 57 Extracting image: image00058.jpeg from section 58 Extracting image: image00059.jpeg from section 59 Extracting image: image00060.jpeg from section 60 Extracting image: image00061.jpeg from section 61 Extracting image: image00062.jpeg from section 62 Extracting image: image00063.jpeg from section 63 Extracting image: image00064.jpeg from section 64 Extracting image: image00065.jpeg from section 65 Extracting image: image00066.jpeg from section 66 Extracting image: image00067.jpeg from section 67 Extracting image: image00068.jpeg from section 68 Extracting image: cover00069.jpeg from section 69 Extracting image: image00071.jpeg from section 71 Extracting Page Map Information File contains kindlegen source archive, extracting as kindlegensrc.zip File contains kindlegen build log, extracting as kindlegenbuild.log Unpacking raw markup language Write ncx Find link anchors Insert data into html Insert hrefs into html Remove empty anchors from html Insert image references into html Building an opf for mobi7/azw4. Processing K8 section of book... Mobi Version: 8 Codec: utf-8 Title: Malsagulo's Book Huffdic compression Unpacking images, resources, fonts, etc Extracting Page Map Information Unpacking raw markup language Processing ncx / toc Traceback (most recent call last): File "KindleUnpack/lib/kindleunpack.py", line 1020, in <module> sys.exit(main()) File "KindleUnpack/lib/kindleunpack.py", line 1008, in main unpackBook(infile, outdir, apnxfile, epubver, use_hd) File "KindleUnpack/lib/kindleunpack.py", line 923, in unpackBook process_all_mobi_headers(files, apnxfile, sect, mhlst, K8Boundary, False, epubver, use_hd) File "KindleUnpack/lib/kindleunpack.py", line 840, in process_all_mobi_headers processMobi8(mh, metadata, sect, files, rscnames, pagemapproc, k8resc, obfuscate_data, apnxfile, epubver) File "KindleUnpack/lib/kindleunpack.py", line 530, in processMobi8 [junk1, junk2, junk3, fid, junk4, off] = ncxmap['pos_fid'].split(':') AttributeError: 'NoneType' object has no attribute 'split' Otherwise, thanks for the useful tool. It's been genuinely helpful up to this point. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

malsagulo · 2019-06-08T04:51:31Z

I'm, er, reluctant to send the entire project, but I can maybe see about copying and pasting some kind of test case together for you. This is my first Kindle project, so it is entirely possible I've done something (or many things) wrong somewhere. But if I did, I'm not seeing it.

You should also be unpacking the mobi specifically for epub3, not 2.

Yeah, I gave that a shot a little while after I posted. The error still occurs with the --epub_version=3 flag set. Also did a little bit of cursory debugging on my own. The problem seems to be coming from the ncxExtract.parseNCX method in mobi_ncx.py -- in fact, specifically from inside this particular if statement. Since there's no value set unless tag is of a certain value, tmp['pos_fid'] is defaulting to None. But this method over in kindleunpack.py seems to be expecting that property to be a String. You might be able to get around the entire problem just by checking for None in or around that line.

Either that or use Sigil to create a proper ncx from a properly formatted nav.

I don't believe the problem is with how my nav is formatted. Not only was I working directly from the spec, I checked the whole thing against the original DTD. I even went back and added the playOrder attribute to everything, just because the DTD said it was required (even though it doesn't actually do anything anymore.) Error still occurred. So I don't think we're looking at a validation or formatting problem here.

kevinhendricks · 2019-06-08T17:32:39Z

Without pos:fid there is no link. My guess is your extra ncx entries outside of the navMap simply have not been seen before and will need some work to decode them from test cases. These things are typically referred to now as LandMarks in epub3 and are normally part of the guide inside the opf in epub2. If you can create a test case that shows this issue, I will try to reverse engineer what the new index tags are that being generated by your extra piece of ncx and see if we can grok it. Kevin

…

On Jun 8, 2019, at 12:51 AM, malsagulo ***@***.***> wrote: I'm, er, reluctant to send the entire project, but I can maybe see about copying and pasting some kind of test case together for you. This is my first Kindle project, so it is entirely possible I've done something (or many things) wrong somewhere. But if I did, I'm not seeing it. You should also be unpacking the mobi specifically for epub3, not 2. Yeah, I gave that a shot a little while after I posted. The error still occurs with the --epub_version=3 flag set. Also did a little bit of cursory debugging on my own. The problem seems to be coming from the ncxExtract.parseNCX method in mobi_ncx.py -- in fact, specifically from inside this particular if statement. Since there's no value set unless tag is of a certain value, tmp['pos_fid'] is defaulting to None. But this method over in kindleunpack.py seems to be expecting that property to be a String. You might be able to get around the entire problem just by checking for None in or around that line. Either that or use Sigil to create a proper ncx from a properly formatted nav. I don't believe the problem is with how my nav is formatted. Not only was I working directly from the spec, I checked the whole thing against the original DTD. I even went back and added the playOrder attribute to everything, just because the DTD said it was required (even though it doesn't actually do anything anymore.) Error still occurred. So I don't think we're looking at a validation or formatting problem here. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

malsagulo · 2019-06-19T22:29:40Z

TypeError: ord() expected a character, but string of length 0 found

@duzhor -- why are you commenting in this issue? This sounds like an entirely different bug. If you want him to address it, open up a new issue.

kevinhendricks · 2019-06-20T14:55:28Z

There are many tools to "borkify" the text of an epub to make them meaningless. I believe there is a plugin for Sigil that does this as well as calibre has this capability.

So if at all possible (for me to debug this and add support, I really need a working test case.

So please consider creating a simple standalone test case from a copy of the original epub that has had all of its text changed and posting it here so that I can use it to reverse out how navList items are encoded in mobis and add support for that to kindleUnpack.

Repository owner deleted a comment from duzhor Jun 20, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Script throws AttributeError when parsing NCX navList #27

Script throws AttributeError when parsing NCX navList #27

malsagulo commented Jun 8, 2019 •

edited

Loading

kevinhendricks commented Jun 8, 2019 via email

malsagulo commented Jun 8, 2019

kevinhendricks commented Jun 8, 2019 via email

malsagulo commented Jun 19, 2019

kevinhendricks commented Jun 20, 2019

Script throws AttributeError when parsing NCX navList #27

Script throws AttributeError when parsing NCX navList #27

Comments

malsagulo commented Jun 8, 2019 • edited Loading

kevinhendricks commented Jun 8, 2019 via email

malsagulo commented Jun 8, 2019

kevinhendricks commented Jun 8, 2019 via email

malsagulo commented Jun 19, 2019

kevinhendricks commented Jun 20, 2019

malsagulo commented Jun 8, 2019 •

edited

Loading