Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to use EOS tokens with RDataFrame since 6.32 #16475

Open
1 task done
chrisburr opened this issue Sep 19, 2024 · 4 comments
Open
1 task done

Unable to use EOS tokens with RDataFrame since 6.32 #16475

chrisburr opened this issue Sep 19, 2024 · 4 comments
Assignees
Labels

Comments

@chrisburr
Copy link
Member

Check duplicate issues.

  • Checked for duplicates

Description

EOS tokens no longer work with RDataFrame in 6.32.04. In 6.30.08 everything is fine:

$ python3
Python 3.9.18 (main, Aug 23 2024, 00:00:00)
[GCC 11.4.1 20231218 (Red Hat 11.4.1-3)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import ROOT
>>> url = 'root://eosuser.cern.ch//eos/user/c/cburr/hsimple.root?xrd.wantprot=unix&authz=' + open("token.txt").read().strip()
>>> ROOT.TFile.Open(url).ls()
TNetXNGFile**		root://eosuser.cern.ch//eos/user/c/cburr/hsimple.root	Demo ROOT file with histograms
 TNetXNGFile*		root://eosuser.cern.ch//eos/user/c/cburr/hsimple.root	Demo ROOT file with histograms
  KEY: TH1F	hpx;1	This is the px distribution
  KEY: TH2F	hpxpy;1	py vs px
  KEY: TProfile	hprof;1	Profile of pz versus px
  KEY: TNtuple	ntuple;1	Demo ntuple
>>> df = ROOT.RDataFrame("ntuple", url)
>>>

Reproducer

On lxplus:

$ source /cvmfs/sft.cern.ch/lcg/app/releases/ROOT/6.32.04/x86_64-almalinux9.4-gcc114-opt/bin/thisroot.sh
$ cp /cvmfs/sft.cern.ch/lcg/app/releases/ROOT/6.32.04/x86_64-almalinux9.4-gcc114-opt/tutorials/hsimple.root /eos/user/c/cburr/hsimple.root
$ EOS_MGM_URL=root://eoshome-c.cern.ch eos token --path /eos/user/c/cburr/hsimple.root --permission=rx --expires=$(date +%s -d "30 minutes") > token.txt
$ kdestroy
$ python3
Python 3.9.18 (main, Aug 23 2024, 00:00:00)
[GCC 11.4.1 20231218 (Red Hat 11.4.1-3)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import ROOT
>>> url = 'root://eosuser.cern.ch//eos/user/c/cburr/hsimple.root?xrd.wantprot=unix&authz=' + open("token.txt").read().strip()
>>> ROOT.TFile.Open(url).ls()
TNetXNGFile**		root://eosuser.cern.ch//eos/user/c/cburr/hsimple.root	Demo ROOT file with histograms
 TNetXNGFile*		root://eosuser.cern.ch//eos/user/c/cburr/hsimple.root	Demo ROOT file with histograms
  KEY: TH1F	hpx;1	This is the px distribution
  KEY: TH2F	hpxpy;1	py vs px
  KEY: TProfile	hprof;1	Profile of pz versus px
  KEY: TNtuple	ntuple;1	Demo ntuple
>>> df = ROOT.RDataFrame("ntuple", url)
Error in <TNetXNGSystem::GetDirEntry>: Unable to give access - user access restricted - unauthorized identity used ; Permission denied
 *** Break *** segmentation violation

ROOT version

6.32.04

Installation method

sft.cern.ch

Operating system

Linux (lxplus)

Additional context

No response

@chrisburr chrisburr added the bug label Sep 19, 2024
@vepadulano
Copy link
Member

Dear @chrisburr ,

Thank you for reaching out and for the reproducer. I am on it. Meanwhile, I just wanted to point out that for the first case in 6.30, just calling ROOT.RDataFrame will not attempt to open the file, whereas 6.32 opens the file at construction time ( to homogenise the way different data formats are processed). Just as a confirmation, could you try running any operation that would need to read data from the file in the first case with 6.30?

@chrisburr
Copy link
Member Author

Thanks! This definitely used to be working (with 6.28 IIRC). If I find a minute I'll check with 6.30.

@vepadulano
Copy link
Member

The problem is that RDF tries to open the file to check that it's valid. The logic for the file opening is at

std::unique_ptr<TFile> OpenFileWithSanityChecks(std::string_view fileNameGlob)
. In particular, because of the presence of the ? token, the string is parsed as a glob. Now in many cases that would be harmless albeit a tiny overhead (it would just return the same file name to open), but in this particular case it triggers a faulty behaviour. The glob parsing attempts at traversing the remote xrootd directory (see here), but since the permission is just for the single file with the token and not for the entire directory, it leads to the user access restricted error you post above.

Now, I believe the most sane course of action would be to refine the logic that checks whether the input file name is a glob. I could simply add a check for the xrd.wantprot token, but probably we want to have a more authoritative list of all the tokens that should make the file name not be parsed as a glob. This probably includes not only xrootd tokens but also anything https-related. Or we could adopt a different strategy for glob detection altogether. Thoughts @dpiparo @pcanal ?

@chrisburr
Copy link
Member Author

Ah that makes sense. Extending the defintion of strings to add metadata to paths (globbing, the # syntax in TFile::Open, ...) is always going to be error prone.

but probably we want to have a more authoritative list of all the tokens that should make the file name not be parsed as a glob

This feels like an impossible task to define.

Maybe a simplier solution would be to not support ? when globbing and only apply globbing to the text before the query string? Or maybe just have a dedicated method (or argument type) for creating a RDataFrame from a glob rather than relying on huristics?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants