Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rescue extract-xiso from oblivion #80

Open
wants to merge 59 commits into
base: master
Choose a base branch
from

Conversation

rapperskull
Copy link
Contributor

I think that this tool still has a lot of potential, and doesn't deserve to be abandoned, so I decided to rewrite traverse_xiso, the most difficult function to read and understand in the code, and split it into two pieces to make it easier to understand.

Judging from the changelog on top of the file, it appears that it once used recursion, but it has been rewritten as the spaghetti code is it now to avoid sporadic stack overflows with highly unoptimized ISOs written by some tools.

It looks like whomever is making xiso creation tools out there
has never heard of a binary tree and is sticking every
directory entry off the right subnode (at least on all the iso's
I've found so far). This causes extremely deep recursion for
iso's with lots of files (and also causes these iso's, when
burned to DVD, to behave as a linked list for file lookups, thus
providing worst case lookup performance at all times).

Not only do I find this annoying and extremely bad programming,
I've noticed that it is causing sporadic stack overflows with
my (formerly) otherwise good tree traversal code.

In order to combat such bad implementations, I've re-implemented
the traverse_xiso() routine to get rid of any non-directory
recursion. Additionally, I've made a few extra tweaks to
conserve even more memory. I can see now that I'm going to need
to write xiso creation as well and do it right. (rev to v1.5 beta)
NOT FOR RELEASE

I honestly don't think the cause of the overflows was the recursion: each iteration uses very little memory and the same ISOs are processed on the OG Xbox anyways. Probably it was some other bug that manifested itself only in some cases.

I tried to keep the memory usage as low as possible, by freeing up unused memory early, but it will probably use more memory than before. I guess that's the tradeoff for code legibility.

Anyways, the code has been tested to work and produce functionally identical ISOs. They are not hash-identical to the previous ones because of the order the files are processed. As as side note, it looks like that now it generates node trees identical to retail, so I guess it is more accurate? Anyways, if someone knows what tools produce such linked list ISOs, it would be awesome to test them and see if the stack overflow problem is fixed.

Of course it needs extensive testing, but I think that with traverse_xiso rewritten, it would be much easier to update the tool and/or solve bugs.

The next step would be to review all the open issues, and see if they still apply. In particular, I'm using 64-bit builds and they produce files identical to 32-bit ones, so maybe this particular issue is solved.

extract-xiso.c Outdated Show resolved Hide resolved
extract-xiso.c Outdated Show resolved Hide resolved
extract-xiso.c Outdated Show resolved Hide resolved
@JayFoxRox
Copy link
Member

the same ISOs are processed on the OG Xbox anyways

They are processed differently though.

My understanding (but it has been a while since I've looked at the XISO filesystem driver):

  • For file searches (such as opening an explicit filepath), the tree is used.
  • For listing files (such as opendir / readdir style APIs), the directory is processed like an array.

@rapperskull
Copy link
Contributor Author

I converted the PR to draft because I want to make some more checks just to make sure, and eventually solve some other problems.

@rapperskull rapperskull marked this pull request as ready for review March 5, 2023 23:02
@rapperskull
Copy link
Contributor Author

I think I'm done for now

@rapperskull
Copy link
Contributor Author

the same ISOs are processed on the OG Xbox anyways

They are processed differently though.

My understanding (but it has been a while since I've looked at the XISO filesystem driver):

* For file searches (such as opening an explicit filepath), the tree is used.

* For listing files (such as `opendir` / `readdir` style APIs), the directory is processed like an array.

I implemented what I called the "discover strategy", basically the linear listing of files in a directory. It is used when listing or extracting by default.

The rewrite mode, instead uses both strategies: first processes the tree, then lists the directories sequentially, hoping to find "hidden" files.

I wanted a single strategy for all cases, but unfortunately it's not possible unless we generate the tree also for listing and extracting.

The rewriting is done in two separate steps: first tree discovery, then linear search. Doing both at the same time would fail spectacularly on malformed ISOs, and introduce a lot of overhead even in normal circumstances.

This means that:

  • If there are hidden files after a node that's already beyond the end of a directory, it won't be found.
  • Every tree starting from an hidden node is now unveiled, but only processed linearly.

I don't know if you like it, or if changing the default strategy for file listing and extraction is the right thing to do, or even if it is useful at all. Let me know your feedback.

Fix file size test in verify_xiso() broken since 715d798
Move to 'softprops/action-gh-release@v1' action
Update 'actions/checkout' to v4
Replace invalid CP1252 characters with question mark
Symlinks are not followed below the root, to avoid potential recursions

Windows is an exception, since we don't have an 'lstat' equivalent
The new code is licensed under BSD
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants