Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Organisation of subprojects #3818

Open
YoshiRulz opened this issue Oct 31, 2023 · 17 comments
Open

Organisation of subprojects #3818

YoshiRulz opened this issue Oct 31, 2023 · 17 comments
Labels
Meta Relating to code organisation or to things that aren't code

Comments

@YoshiRulz
Copy link
Member

YoshiRulz commented Oct 31, 2023

tl;dr: I'm calling for a moratorium on copying third-party code into the repo, and I'm proposing that we move a few dirs around.

We all hate code duplication. So why is it okay for us to needlessly copy entire codebases?
snip (Cut out a bunch of back-and-forth here since I feel none of you need to be convinced this is a good idea, only that it's worth the effort to do it. edit: Apparently natt did need convincing. Scroll down for arguments.)

So let's assess the damage:

In total, the upper bound is 917 kLOC of source code, severed from its trunks and waiting to bitrot. (For reference, just the BizHawk solution is ~450 kLOC, and Mesen is ~325 kLOC.)
Not listed are several binaries in /Assets/dll which are rundeps for unmanaged cores. I'll be looking at these as part of my Nix experiments. I also discounted dependencies of these projects whenever I noticed e.g. .../vendor near the bottom of the filesize list—but they are nonetheless checked-in to this repo.

I propose that from today we only use Git submodules for this purpose:

  • if possible, pulling in the upstream and only checking-in build scripts and patches, or else having a mirror/fork in the TASEmulators org (TODO copy and expand 2af6bf3#commitcomment-140199736); and
  • placing submodules in /submodules, and supplemental files like Makefiles or shell scripts in one of /ExternalProjects, /ExternalCoreProjects, or /waterbox when those are necessary.

I also propose we migrate all subprojects, starting with those listed above, to the same scheme.
Adding a submodule and removing the checked-in copy can be squished into a single commit.

Moved here from my personal notes #118; see also #2423, my personal notes #11, and #2312.

@YoshiRulz YoshiRulz added the Meta Relating to code organisation or to things that aren't code label Oct 31, 2023
@vadosnaprimer
Copy link
Contributor

Bonus points if you manage to not lose our change history to those projects when moving them to submodule. Splitting folder history is possible in git, but re-applying changes from a different tree (even if the code matches) may be harder.

@CasualPokePlayer
Copy link
Member

CasualPokePlayer commented Nov 1, 2023

For many of those cores/projects, they're checked in as they have been forked and modified, some fairly lightly (e.g. tic80, which mostly just has waterbox interfacing added and minor adjustments and unneeded files excised) and some very heavily (e.g. virtualjaguar, much of which is unrecognizable compared to upstream), not to mention just ones extremely outdated from upstream (e.g. gpgx). Most of them will require forking in any case and can't just have upstream submoduled.

fork in the TASEmulators org

To be clear this is also mainly the reason I don't go immediately fork in TASEmulators org for projects
image
And getting someone to move a repo to the org is annoying. Way eaiser to just clone into the repo and call it a day, and no downsides (besides BizHawk clone size) if upstream is dead.

@vadosnaprimer
Copy link
Contributor

I didn't know lack of permissions was a roadblock for more cool work. I think that can be arranged.

@Morilli
Copy link
Collaborator

Morilli commented Nov 2, 2023

To be clear this is also mainly the reason I don't go immediately fork in TASEmulators org for projects
[image showing insufficient permissions]

This would also be a blocker for me. I do see the advantage in terms of organization if more stuff was in submodules, but it does require all maintainers to have access to all the relevant projects.

Cleaning up the root folder of the bizhawk repo is independent from that though, and I'd definitely approve moving stuff to a unified folder. There is ExternalCoreProjects, why isn't it used more?

@vadosnaprimer
Copy link
Contributor

If people tell me what to fork under tasemus, I can do that and invite you guys as admins to those forks. I may also discuss adding more admins to the org itself...

@YoshiRulz
Copy link
Member Author

YoshiRulz commented Nov 3, 2023

Well, most of the repos I linked should be forked, but pragmatically the larger ones are far more important, and as CPP said some projects have had more extensive changes than others. IMO forks can be made "as needed" i.e. the next time someone wants to update a core.

@nattthebear
Copy link
Contributor

What's the objective here; making it easier to merge upstream changes? In that case, the difficulty comes chiefly from how much the code has diverged from upstream, not the mechanism by which they're stored.

It's a tradeoff we've made different ways different times: Heavy modifications to source lets us get off the ground with something that works for us more quickly. Light modifications to source mean we have to do work that's sometimes more awkward and difficult to integrate with the existing code better, but we can take upstream changes more easily.

I don't think there's one clear answer. The cores that sync with upstream seem to be the better long-term bet, and we can see that especially in cases where we've done it both ways and can compare the two (e.g., early Mednafen efforts vs the Nyma system.) But we also have to consider that some of these ported cores might not even exist if we hadn't taken the "easy" way out.

I'm against any specific requirements here, because we're all volunteer and the most important thing is being useful for the person who will actually work on the core. I think core porters should consider the value of various approaches, but if they want to get hacking, and they're the only one who will do it, let them.

@YoshiRulz
Copy link
Member Author

To be clear, this issue is not arguing against the heavily-modified path. I'm just against adding hundreds of files directly to this repo.

@nattthebear
Copy link
Contributor

I don't understand why that's worth discussion. Apart from concerns on implementing the cores, and concerns about keeping the cores synced with upstream, why does it matter?

@YoshiRulz
Copy link
Member Author

YoshiRulz commented Nov 11, 2023

For one, having an extra million LOC in the repo drastically increases the time it takes to clone. That affects both humans and CI.
It also makes grep -r, find, etc. take longer and give more false positives.

@nattthebear
Copy link
Contributor

I guess; aren't most devs going to check out most sobmodules anyway?

@Morilli
Copy link
Collaborator

Morilli commented Nov 11, 2023

aren't most devs going to check out most submodules anyway?

I for one only ever checkout the submodules I require. That said, I don't think the C/C++ cores in the repo contribute much to the clone size; it's probably going to be the assets folder for the most part.

@YoshiRulz
Copy link
Member Author

YoshiRulz commented Nov 12, 2023

I guess; aren't most devs going to check out most sobmodules anyway?

Seeing as you don't need the submodules cloned to build the solution, and we don't really advertise their existence, I'd imagine no, most devs aren't cloning the submodules.
Like Morilli, I never bother to outside of the rare occasions I need to work with them.

That said, I don't think the C/C++ cores in the repo contribute much to the clone size; it's probably going to be the assets folder for the most part.

Downloading edf5f15 as a zip (not the same as cloning, but similar, and easier to measure):
Screenshot_20231112_183427
So your guess was broadly correct. (IMO that is also a problem; Citra and MAME together make up more than half of /Assets' filesize. See #3505.)
But if you omit /Assets, the remaining filesize is split at about a quarter /waterbox, a quarter /src, a quarter /libmupen64plus, and the long tail.
Notably, some of those largest white segments are .exes, > 18 MiB combined? Those sound like easy wins.

@Morilli
Copy link
Collaborator

Morilli commented Nov 12, 2023

Well if we're doing this, here's a probably at least somewhat accurate and representative list of what contributes most to the .git size:

Git object size list
Root folder Total object size
Assets 868.6MB
output 749.5MB
BizHawk.MultiClient 524.1MB
BizHawk.Client.EmuHawk 418.5MB
src 253.2MB
BizHawk.Emulation.Cores 172.9MB
ppsspp 108.4MB
waterbox 75.3MB
BizHawk.Emulation 66.3MB
References 61.4MB
BizHawk.Client.Common 56.8MB
libmupen64plus 54.7MB
output64 48.4MB
BizHawk.Client.EtoHawk 45.7MB
mednafen 30.4MB
23.2MB
psx 13.5MB
yabause 13.4MB
libgambatte 10.9MB
genplus-gx 10.6MB
citra 10.1MB
Dist 9.7MB
vbanext 8.5MB
LuaInterface 8.4MB
BizHawk.Emulation.Common 7.9MB
libHawk 7.7MB
Tools 7.3MB
BizHawk.Emulation.DiscSystem 5.0MB
libsnes 4.8MB
libmednahawk 4.2MB
Bizware 4.1MB
ExternalCoreProjects 4.1MB
BizHawk.Common 3.8MB
BizHawk.Client.MultiHawk 3.1MB
ExternalToolProjects 2.1MB
Version 1.9MB
genplus-gx32 1.6MB
CpuCoreGenerator 1.6MB
attic 1.5MB
BizHawk.Client.ApiHawk 1.4MB
quicknes 1.3MB
lynx 1.2MB
ExternalProjects 1.2MB
wonderswan 1.1MB

Initial list was generated using git rev-list --objects --all | git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' | sed -n 's/^blob //p' | sort --numeric-sort --key=2 | cut -c 1-12,41- | $(command -v gnumfmt || echo numfmt) --field=2 --round=nearest > objects.txt and then grouped by root folder.

@nattthebear
Copy link
Contributor

we don't really advertise their existence

They're in .gitmodules, what other advertising would be needed?

Anyway, that download size is awful, but the problem there is mostly unrelated to submodules and this ticket, as it's about prebuilt binaries. We'll need to get all of our crusty binary build setups into an easily reproducible build system.

@CasualPokePlayer
Copy link
Member

CasualPokePlayer commented Dec 9, 2023

placing submodules in /submodules, and supplemental files like Makefiles or shell scripts in one of /ExternalProjects, /ExternalCoreProjects, or /waterbox when those are necessary.

A minor problem with doing this specifically is cmake being used in places, and cmake does not allow including stuff in parent directories, you can only include stuff in the current directory and subdirectories. So unless you end up also having the cmakelists also be within the /submodules folder alongside the submodule, or possibly in the BizHawk root directory itself, you're screwed.

@YoshiRulz
Copy link
Member Author

YoshiRulz commented Dec 9, 2023

That's dumb. And I assume you need CMake on Windows too, so you can't just use a symlink, bind mount, etc. to work around it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Meta Relating to code organisation or to things that aren't code
Projects
None yet
Development

No branches or pull requests

5 participants