Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tag Generation Time Questions #9

Open
QuentinTorg opened this issue Mar 18, 2022 · 9 comments
Open

Tag Generation Time Questions #9

QuentinTorg opened this issue Mar 18, 2022 · 9 comments

Comments

@QuentinTorg
Copy link

I am attempting to generate new TagStandard41h* and TagStandard52h* families and I am seeing extremely long generation time estimates. I have attempted to generate some of the larger Standard families on my laptop without success after 72 hours. I am now moving to a larger server with 128 threads to see how much it helps.

Running java -cp april.jar april.tag.TagFamilyGenerator standard_9 12 (tagStandard41h12) on a 128 thread machine is showing estimated completion times of 400+ hours. The checked in version of TagStandard41h12.java shows a completion time of ~12hrs.

Running the tagStandard52h13 family shows 100,000+ hour or even "infinity" estimations until completion, compared to another ~12 hour completion time showed in TagStandard52h13.java

I suspect that either the TagStandard41h12 and TagStandard52h13 families were generated by a different version of the library that was much faster, they were exited early using one of the progress steps instead of the final output, or that a significantly faster computer/cluster was used.

@mkrogius, I see you checked in these files and they have not been changed since. Is there any more information about how these larger families were generated originally? What kind of CPU was used? If exiting early, how was the decision made when to exit?

@mkrogius
Copy link
Contributor

You are right that tagStandard52h13 was exited early. One issue is that b/c of the choice of variable types for the id in the c program, it cannot handle more than ~65k tags in a family.

My memory is that I did run the tagStandard41h12 to completion, but I doubt that I would have spent more than a week of a 24 core (and pretty slow cores at that) machine on the problem. I can think of a few possibilities:
a) Did I somehow run the program faster?
b) Has the program gotten slower?
c) Does using 128 cores hurt more than it helps? This one at least is easy to test out since you can try lowering the number of cores used on your server to 24 and see if it gives a speed-up.
d) Is the time estimate inacurate? I think there is a good chance that it is an overestimate early in the run when it is still discovering lots of new tags.

@mkrogius
Copy link
Contributor

I scanned through TagFamilyGenerator and it looks how I remember it looking, so I feel comfortable saying options a and b are unlikely

@QuentinTorg
Copy link
Author

QuentinTorg commented Mar 18, 2022

I don't see any significant commits to TagFamilyGenerator since the initial in 2018.

After just 45 minutes with 128 cores, it has already generated a TagStandard41h1 family with 2265 codes, 150 more than the current 41h12 famliy. I'm waiting for the next progress output, but its reporting 2309 tags at the moment. I think running with 128 cores is giving a significant speed up. Unless I'm misunderstanding the code, I think this means the checked in TagSTandard41h12 family was also stopped early. Is that correct?

TagStandard41h12.zip

If I forced the program to exit 2115 tags have been generated, would you expect it to have the exact same codes as the checked in version of TagStandard42h12? I will run this test next, but that's my understanding of how it would work.

@mkrogius
Copy link
Contributor

mkrogius commented Mar 18, 2022 via email

@QuentinTorg
Copy link
Author

Saving at 2115 tags did not give me matching output to the checked in family. I haven't checked if the codes are just ordered differently yet. Given the codes are stored in an array, I'm assuming this is unlikely

TagStandard41h12_2115codes.zip

@mkrogius
Copy link
Contributor

Yes the codes are probably not just in a different order. The one thing I'm noticing between what you've got and the checked-in version of tagStandard41h12 is that the checked in version says "minimum complexity 10" at the end of line 31. I guess there has been some code change to the tag generation between now and when the original was added. I can try figuring out what changed, later

@QuentinTorg
Copy link
Author

QuentinTorg commented Mar 18, 2022

Thanks for your help @mkrogius. I want to generate new custom families, but I want to verify that the library is working correctly before I depend on new generated families. It gives me pause that none of the checked in Tag*.java files were actually generated by this code. Without knowing about the original generation tool, its difficult to have confidence if this version is better/worse/equivalent.

@mkrogius
Copy link
Contributor

I tracked down what happened here, and the summary is that this code is the same code that generated the tag families.

On line 349 of TagFamilyGenerator, an initial value V0 is chosen via a seeded call to Random(). I believe that this value can be different depending on which JVM implementation is used, although I have not actually confirmed that this is true. What I did confirm is that the code generates the correct tag family if you set V0 to the correct value. You can do this by changing this line to: long V0 = 0x1bd8a64ad10L - PRIME for tagStandard41h12. Or if you want to generate other classes, replace 0x1bd8a64ad10L with the first code from that class (although you might need to subtract PRIME more than once, experimentation required).

As for the question of speed, the checked-in version of TagStandard41h12 says that it was generated in 44384s (~12hrs), and I now see no reason why this code shouldn't be able to reproduce that result

@mkrogius
Copy link
Contributor

You can also try testing out the branch maxkrogius/reorder_checks which may be faster (it seems to be substantially faster in the early stages of generation at least)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants