Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved benchmarking of compression algorithms #225

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

oshadura
Copy link
Collaborator

@oshadura oshadura commented May 20, 2021

I added a new CMake option: experiment-datafiles - that allows to download experiments files (CMS, ATLAS) that are bigger then 1.5 GB

I also used these benchmarks for testing FLZMA2 (suggestions are welcomed!)
(this pr replaces #216)

I added a new CMake option: experiment-datafiles - that allows to download experiments files (CMS, ATLAS) that are bigger then 1.5 GB
@oshadura oshadura requested a review from eguiraud May 20, 2021 13:23
Copy link
Member

@eguiraud eguiraud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Oksana, looks good except the questions below.

I can't comment on whether the pattern

auto newtree = oldtree->CloneTree();
timer.Start();
newfile->Write();
timer.Stop()

does what we want, but you know better.

The RDF benchmarks look ok as long as we remember that they measure both the writing and the data generation, differently from the other benchmarks.

double rtime = timer.RealTime();
double ctime = timer.CpuTime();
// For Run2012B_DoubleMuParked.root:
float size_mb = 774.619423;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these hardcoded values are a bit "scary", can we get them programmatically?

# We need to enable download of datafiles from oot.cern.ch
set(rootbench-datafiles ON CACHE BOOL "Download files from root.cern.ch" FORCE)
endif()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need a separate option? other files downloaded if rootbench-datafiles is on are not much smaller, and now we have two options that we need to set to get all benchmarks.

if we need the option, why does it imply root-benchdatafiles=ON? it would be simpler to keep them orthogonal - one turns on some benchmarks, the other turns on some other.

if we need the second option, it needs at least to be mentioned in the README.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants