Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HPCC-30365 Add XREF Sasha service to K8s #18798

Draft
wants to merge 32 commits into
base: master
Choose a base branch
from

Conversation

jackdelv
Copy link
Contributor

@jackdelv jackdelv commented Jun 21, 2024

These changes add Xref functionality to the containerized version of the platform. There were a couple changes because thor file parts were striped across multiple directories.

I'm not sure that I correctly added the new XRef pod because everything still runs on the ECL Watch pod.

What is the best way to make this toggleable? An additional #ifdef? Should I still be using _CONTAINERIZED?

Type of change:

  • This change is a bug fix (non-breaking change which fixes an issue).
  • This change is a new feature (non-breaking change which adds functionality).
  • This change improves the code (refactor or other change that does not change the functionality)
  • This change fixes warnings (the fix does not alter the functionality or the generated code)
  • This change is a breaking change (fix or feature that will cause existing behavior to change).
  • This change alters the query API (existing queries will have to be recompiled)

Checklist:

  • My code follows the code style of this project.
    • My code does not create any new warnings from compiler, build system, or lint.
  • The commit message is properly formatted and free of typos.
    • The commit message title makes sense in a changelog, by itself.
    • The commit is signed.
  • My change requires a change to the documentation.
    • I have updated the documentation accordingly, or...
    • I have created a JIRA ticket to update the documentation.
    • Any new interfaces or exported functions are appropriately commented.
  • I have read the CONTRIBUTORS document.
  • The change has been fully tested:
    • I have added tests to cover my changes.
    • All new and existing tests passed.
    • I have checked that this change does not introduce memory leaks.
    • I have used Valgrind or similar tools to check for potential issues.
  • I have given due consideration to all of the following potential concerns:
    • Scalability
    • Performance
    • Security
    • Thread-safety
    • Cloud-compatibility
    • Premature optimization
    • Existing deployed queries will not be broken
    • This change fixes the problem, not just the symptom
    • The target branch of this pull request is appropriate for such a change.
  • There are no similar instances of the same problem that should be addressed
    • I have addressed them here
    • I have raised JIRA issues to address them separately
  • This is a user interface / front-end modification
    • I have tested my changes in multiple modern browsers
    • The component(s) render as expected

Smoketest:

  • Send notifications about my Pull Request position in Smoketest queue.
  • Test my draft Pull Request.

Testing:

@jackdelv jackdelv requested a review from jakesmith June 21, 2024 11:46
Copy link

Jira Issue: https://hpccsystems.atlassian.net//browse/HPCC-30365

Jirabot Action Result:
Workflow Transition: Merge Pending
Updated PR

Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jackdelv, it's looking good, I've given it a first pass, and made some comments. Please take a look.

@@ -246,6 +246,10 @@ RemoteFilename &constructPartFilename(IGroup *grp,unsigned partno,unsigned partm
StringBuffer fullname;
if (findPathSepChar(name)==NULL)
addPathSepChar(fullname.append(partdir));
#ifdef _CONTAINERIZED
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in general, using isContainerized() vs the #define is preferrable (so that the code is scanned/compiled by developers routinely etc.)
And, if both can be avoided and the code made general (even if the use case is not currently used in bare-metal), that is most preferrable.

This is a case in point, we should cope with this even in the bare-metal case in general utility classes/methods, even though it is not configurable in BM at the moment.

But also, it looks like this exposes a wide problem(!). This function is used not just by XREF, it's also used by dadfs.cpp removePhysicalFiles

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, just noticed that removePhysicalFiles is not used by anything - we should delete it avoid confusion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect constructPartFilename should be deprecated (probably removed too).
In XREF's case, it needs a way to get a physical part + endpoint given the meta data it's collected.

The 'lnentry' (CLogicalNameEntry) it has built from meta data, should perhaps have a method to built it, it should know (from the meta data) if the file is striped and if it has dirPerPart.

Btw, dirPerPart (a flag in the meta data) means it has a directory extension that includes the file part number (i.e. each part is in it's own directory)
'striping' is different, and an additional directory not per part, but enumerated over N striped disks. The stripNum is calculated from a hash (@lfnHash) stored in the file meta. Originally calculated and set there by hashing the logicalfilename.

See makePhysicalPartName and it's callers (and uses of lfnHash and calcStripeNumber).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed remotePhysicalFiles and added a constructPartFilename function to CLogicalNameEntry. I had some trouble finding the dirPerPart flag in the metadata. The only examples I was able to find was using queryDirPerPart() on a StoragePlane, but when trying that it seemed the metadata was missing and it was defaulting to the return value of isContainerized().

Added lfnHash to the CLogicalNameEntry and used calcStripeNumber on the part number.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old constructPartFilename function had some tests for it, so I couldn't delete the function without also removing the tests. Should there now be tests for the new constructPartFilename if it does get added to CLogicalNameEntry?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should remove the old constructPartFilename, that is now only used by tests, and if not covered already, add tests that exercise makePhysicalPartName.

But for now, for this PR, I would just add comments that it is deprecated, but test code uses it - and open a separate JIRA to revisit sometime.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the new CLogicalNameEntry::constructPartFilename, I think you can remove the changes here right? i.e.:

#ifdef _CONTAINERIZED
    if (partmax>1)
        addPathSepChar(fullname.append(partno+1)); // If there are multiple files they will be striped by part number
#endif

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the _Containerized changes and added a comment to the old constructPartFilename.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created HPCC-33139 to remove deprecatedConstructPartFilename.

StringBuffer stripeMask;
stripeMask.append(getPathSepChar(mname)).append("$P$").append(getPathSepChar(mname));
// Replace part number in stripe directory with mask
mname.replaceString(stripeDir, stripeMask);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not quite sure what this is for? (let's discuss)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the XREF code is trying to match all the parts of an orphan file and each part is in its own directory, it incorrectly creates separate entries for each part when they should be combined. Adding a part mask to the directory for each part allows them to be matched properly. Sorry for the confusion if I caused any because I was using the term striped incorrectly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A typical path on a striped storage that also has dirPerPart=true, will look like:
/var/lib/HPCCSystems/hpcc-data/d8/somescope/otherscope/35/thelfnname._35_of_400

where 'd8' is the stripe sub-directory, and '35' is the dir-per-part sub-directory.
When parsing a filename like this, the dirPerPart format can be deduced if the directory before the filename, is a numeric only (which is invalid for a scope/filename, which must begin with an alpha character), and that it matches the part # of the filename.
The fact that it's striped is less obvious, but could be deduced by the fact that the leading part of the filename '/var/lib/HPCCSystems/hpcc-data' is a prefix of one of the planes, and the plane specs will say numDevices>1

From the code here, it looks like it's dealing with the dir-per-part #, but code referencing 'stripe'.
I suspect this needs looking at closely, and testing with files that are on striped storage, and dirPerPlane, and ones that aren't - would be good to add some CPPUNIT tests here of various paths to make sure it is giving expected results.

I haven't looked in detail how the info is being used by the callers of parseFileName, but I suspect it may need to pass back, the deduced plane, and perhaps the stripe and/or flags like dirPerPart.. not sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have restructured how it gets the dir-per-part number and added checks for the stripe number. I was able to test by changing the string before passing into the function, but I am not sure how to add to the CPPUNIT tests.

dali/dfuXRefLib/dfuxreflib.cpp Outdated Show resolved Hide resolved
@@ -2600,7 +2616,11 @@ class CXRefManager: public CXRefManagerBase
Owned<IGroup> g;
unsigned j;
if (!nclusters) {
#ifdef _CONTAINERIZED
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

preferable to use isCotainerized()

In fact, I think it would be worth just changing the message to the planes version.
In BM, internally, we generate planes for all clusters, such that methods like getStoragePlane/getPlanesIterator work with clusters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to planes message.

dali/dfuXRefLib/dfuxreflib.cpp Show resolved Hide resolved
for (int i = 0; i < numdirs; i++)
{
Owned<IPropertyTree> storagePlane = getStoragePlane(clusters[i]);
dirs[i] = storagePlane->queryProp("@prefix");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure it makes sense to be enumerating the directories into 'dirs' and passing them xrefmanager.process.

In BM, the dirs are not different cluster dirs, but the primary/replicate dirs, which are consistent across all clusters.

I think we should keep it simple in cloud version, and assert that we are only dealing with 1 plane at a time (at least initially), the dirs should contain a single directory of the plane prefix being processes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to only pass in a single directory through dirs.

context.ensureFeatureAccess(XREF_FEATURE_URL, SecAccess_Read, ECLWATCH_DFU_XREF_ACCESS_DENIED, "WsDfuXRef::DFUXRefList: Permission denied.");

#ifdef _CONTAINERIZED
DBGLOG("CONTAINERIZED(CWsDfuXRefEx::onDFUXRefList)");
Owned<IPropertyTreeIterator> planesIter = getPlanesIterator("data", nullptr);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This k8s code may work for BM too.. would be good to unify if so.
BM has "data" planes automatically generated from the clusters, so it should in theory..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested the k8s code on BM and there were additional storage planes that didn't work. In BM when iterating through the available storage planes there were mythor, myroxie (which both worked normally), mythor_mirror, hthor_hthor, and hthor_myeclagent (which all returned "Not Found" when trying to generate the results). Additionally, there was no storage plane for SuperFiles. Should this still be added manually?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be missing something, did you manage to consolidate it such that BM works using common code (e.g. getPlanesIterator("data". .) ?

Additionally, there was no storage plane for SuperFiles. Should this still be added manually?

hm not sure - superfiles don't have their own plane, they could even contain files that were on different planes.
Not sure how XREF handles them tbh - will have to look more closely.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the BM code now iterates through the 'clusters' using getPlanesIterator("data", nullptr). It partially worked where the Thor and Roxie clusters showed up properly and could be run, but additional 'clusters' (mythor_mirror, hthor_hthor, etc) were found that failed when trying to generate results.

I have removed the DBGLOG with a containerized message from the generalized code to avoid confusion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mythor_mirror, hthor_hthor, etc

I think the (BM only) mirror planes should probably be ignored here. It would best if they were ignored not by name, but by 'copy', which currently isn't recorded when the plane IPropertyTree is created, but could be.

The hthor_hthor plane .. what was the error? (and can you check where "hthor_hthor" is being constructed?)
Hthor wasn't available as an option before in BM - I don't know why specifically it wasn't supported, but I think we can ignore for now this plane type, with a big fat comment saying we should revisit this functionality.
(but like ignoring the mirror, we should ignore by type, not by name, which may need a flag recording in the IPT).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was unable to find where the hthor_hthor plane is created. I don't even see it in the list of planes anymore. I'm really not sure what happened with that. I was able to see the error when hthor__hthor did show up. When I clicked generate, the status returned "Not Found".

@jakesmith
Copy link
Member

I'm not sure that I correctly added the new XRef pod because everything still runs on the ECL Watch pod.

that's fine for now. We can move it to a separate esp service pod if we really want to.

Copy link
Contributor Author

@jackdelv jackdelv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jakesmith I left some responses to your comments and questions. Back to you.

context.ensureFeatureAccess(XREF_FEATURE_URL, SecAccess_Read, ECLWATCH_DFU_XREF_ACCESS_DENIED, "WsDfuXRef::DFUXRefList: Permission denied.");

#ifdef _CONTAINERIZED
DBGLOG("CONTAINERIZED(CWsDfuXRefEx::onDFUXRefList)");
Owned<IPropertyTreeIterator> planesIter = getPlanesIterator("data", nullptr);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested the k8s code on BM and there were additional storage planes that didn't work. In BM when iterating through the available storage planes there were mythor, myroxie (which both worked normally), mythor_mirror, hthor_hthor, and hthor_myeclagent (which all returned "Not Found" when trying to generate the results). Additionally, there was no storage plane for SuperFiles. Should this still be added manually?

dali/dfuXRefLib/dfuxreflib.cpp Show resolved Hide resolved
StringBuffer stripeMask;
stripeMask.append(getPathSepChar(mname)).append("$P$").append(getPathSepChar(mname));
// Replace part number in stripe directory with mask
mname.replaceString(stripeDir, stripeMask);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the XREF code is trying to match all the parts of an orphan file and each part is in its own directory, it incorrectly creates separate entries for each part when they should be combined. Adding a part mask to the directory for each part allows them to be matched properly. Sorry for the confusion if I caused any because I was using the term striped incorrectly.

@@ -246,6 +246,10 @@ RemoteFilename &constructPartFilename(IGroup *grp,unsigned partno,unsigned partm
StringBuffer fullname;
if (findPathSepChar(name)==NULL)
addPathSepChar(fullname.append(partdir));
#ifdef _CONTAINERIZED
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed remotePhysicalFiles and added a constructPartFilename function to CLogicalNameEntry. I had some trouble finding the dirPerPart flag in the metadata. The only examples I was able to find was using queryDirPerPart() on a StoragePlane, but when trying that it seemed the metadata was missing and it was defaulting to the return value of isContainerized().

Added lfnHash to the CLogicalNameEntry and used calcStripeNumber on the part number.

@@ -246,6 +246,10 @@ RemoteFilename &constructPartFilename(IGroup *grp,unsigned partno,unsigned partm
StringBuffer fullname;
if (findPathSepChar(name)==NULL)
addPathSepChar(fullname.append(partdir));
#ifdef _CONTAINERIZED
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old constructPartFilename function had some tests for it, so I couldn't delete the function without also removing the tests. Should there now be tests for the new constructPartFilename if it does get added to CLogicalNameEntry?

@jackdelv jackdelv requested a review from jakesmith July 12, 2024 14:15
Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jackdelv - looks good overall.
Please see comments and questions.

@@ -246,6 +246,10 @@ RemoteFilename &constructPartFilename(IGroup *grp,unsigned partno,unsigned partm
StringBuffer fullname;
if (findPathSepChar(name)==NULL)
addPathSepChar(fullname.append(partdir));
#ifdef _CONTAINERIZED
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should remove the old constructPartFilename, that is now only used by tests, and if not covered already, add tests that exercise makePhysicalPartName.

But for now, for this PR, I would just add comments that it is deprecated, but test code uses it - and open a separate JIRA to revisit sometime.

@@ -246,6 +246,10 @@ RemoteFilename &constructPartFilename(IGroup *grp,unsigned partno,unsigned partm
StringBuffer fullname;
if (findPathSepChar(name)==NULL)
addPathSepChar(fullname.append(partdir));
#ifdef _CONTAINERIZED
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the new CLogicalNameEntry::constructPartFilename, I think you can remove the changes here right? i.e.:

#ifdef _CONTAINERIZED
    if (partmax>1)
        addPathSepChar(fullname.append(partno+1)); // If there are multiple files they will be striped by part number
#endif

Owned<IStoragePlane> plane = getDataStoragePlane(grpname, true);
const char * prefix = plane->queryPrefix();
unsigned stripeNum = calcStripeNumber(partNo+1, lfnHash, plane->numDevices());
bool dirPerPart = max>1?plane->queryDirPerPart():false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may not make any practical difference - but whether a file is dirPerPart is a flag in the logical file meta data.
In theory, a logical file may have this flag set, but the plane doesn't or vice versa. That could only happen if the plane spec. had changed since files were created.

You can determine it from the file from "@flags". See CFileDescriptor::getFlags()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added file to new constructPartFilename parameters to get dirPerPart from the flags.

}
expandMask(partName, pmask, partNo, max);

StringBuffer fullname;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trivial/style: better to co-locate closer to where used ~(i.e. just before makePhysicalPartName) for clarity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

StringBuffer stripeMask;
stripeMask.append(getPathSepChar(mname)).append("$P$").append(getPathSepChar(mname));
// Replace part number in stripe directory with mask
mname.replaceString(stripeDir, stripeMask);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A typical path on a striped storage that also has dirPerPart=true, will look like:
/var/lib/HPCCSystems/hpcc-data/d8/somescope/otherscope/35/thelfnname._35_of_400

where 'd8' is the stripe sub-directory, and '35' is the dir-per-part sub-directory.
When parsing a filename like this, the dirPerPart format can be deduced if the directory before the filename, is a numeric only (which is invalid for a scope/filename, which must begin with an alpha character), and that it matches the part # of the filename.
The fact that it's striped is less obvious, but could be deduced by the fact that the leading part of the filename '/var/lib/HPCCSystems/hpcc-data' is a prefix of one of the planes, and the plane specs will say numDevices>1

From the code here, it looks like it's dealing with the dir-per-part #, but code referencing 'stripe'.
I suspect this needs looking at closely, and testing with files that are on striped storage, and dirPerPlane, and ones that aren't - would be good to add some CPPUNIT tests here of various paths to make sure it is giving expected results.

I haven't looked in detail how the info is being used by the callers of parseFileName, but I suspect it may need to pass back, the deduced plane, and perhaps the stripe and/or flags like dirPerPart.. not sure.

dali/dfuXRefLib/dfuxreflib.cpp Show resolved Hide resolved
{
DBGLOG("CONTAINERIZED(runXRef)");
numdirs = 1;
Owned<IPropertyTree> storagePlane = getStoragePlane(clusters[0]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think xrefmanger.process() will now (in containerized) handle >1 cluster, processing them in series,
but getting the dir (prefix) from the 1st here, won't be correct if they different.

I think this directory getting code would probably make more sense inside the process() method, where in BM, it can do as it does now and assume same dir for all, but in containerized, it can get the correct dir for each 'cluster' (plane) each time in the loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it makes more sense this way. I have moved the directory getting code into process.

context.ensureFeatureAccess(XREF_FEATURE_URL, SecAccess_Read, ECLWATCH_DFU_XREF_ACCESS_DENIED, "WsDfuXRef::DFUXRefList: Permission denied.");

#ifdef _CONTAINERIZED
DBGLOG("CONTAINERIZED(CWsDfuXRefEx::onDFUXRefList)");
Owned<IPropertyTreeIterator> planesIter = getPlanesIterator("data", nullptr);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be missing something, did you manage to consolidate it such that BM works using common code (e.g. getPlanesIterator("data". .) ?

Additionally, there was no storage plane for SuperFiles. Should this still be added manually?

hm not sure - superfiles don't have their own plane, they could even contain files that were on different planes.
Not sure how XREF handles them tbh - will have to look more closely.

Copy link
Contributor Author

@jackdelv jackdelv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jakesmith I have responded to your comments and questions.

I think the new constructPartFilename function should be added to the CPPUNIT tests, but I am not sure how they work or the best way to extend them.

@@ -246,6 +246,10 @@ RemoteFilename &constructPartFilename(IGroup *grp,unsigned partno,unsigned partm
StringBuffer fullname;
if (findPathSepChar(name)==NULL)
addPathSepChar(fullname.append(partdir));
#ifdef _CONTAINERIZED
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the _Containerized changes and added a comment to the old constructPartFilename.

}
expandMask(partName, pmask, partNo, max);

StringBuffer fullname;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Owned<IStoragePlane> plane = getDataStoragePlane(grpname, true);
const char * prefix = plane->queryPrefix();
unsigned stripeNum = calcStripeNumber(partNo+1, lfnHash, plane->numDevices());
bool dirPerPart = max>1?plane->queryDirPerPart():false;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added file to new constructPartFilename parameters to get dirPerPart from the flags.

StringBuffer stripeMask;
stripeMask.append(getPathSepChar(mname)).append("$P$").append(getPathSepChar(mname));
// Replace part number in stripe directory with mask
mname.replaceString(stripeDir, stripeMask);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have restructured how it gets the dir-per-part number and added checks for the stripe number. I was able to test by changing the string before passing into the function, but I am not sure how to add to the CPPUNIT tests.

dali/dfuXRefLib/dfuxreflib.cpp Show resolved Hide resolved
{
DBGLOG("CONTAINERIZED(runXRef)");
numdirs = 1;
Owned<IPropertyTree> storagePlane = getStoragePlane(clusters[0]);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it makes more sense this way. I have moved the directory getting code into process.

context.ensureFeatureAccess(XREF_FEATURE_URL, SecAccess_Read, ECLWATCH_DFU_XREF_ACCESS_DENIED, "WsDfuXRef::DFUXRefList: Permission denied.");

#ifdef _CONTAINERIZED
DBGLOG("CONTAINERIZED(CWsDfuXRefEx::onDFUXRefList)");
Owned<IPropertyTreeIterator> planesIter = getPlanesIterator("data", nullptr);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the BM code now iterates through the 'clusters' using getPlanesIterator("data", nullptr). It partially worked where the Thor and Roxie clusters showed up properly and could be run, but additional 'clusters' (mythor_mirror, hthor_hthor, etc) were found that failed when trying to generate results.

I have removed the DBGLOG with a containerized message from the generalized code to avoid confusion.

@jackdelv jackdelv requested a review from jakesmith August 9, 2024 18:19
unsigned stripeNum = calcStripeNumber(partNo+1, lfnHash, plane->numDevices());

// Get dir-per-part # from file metadata or storage plan info
FileDescriptorFlags flags = static_cast<FileDescriptorFlags>(file.getPropInt("Attr/@flags"));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jakesmith I added file to the parameters for constructPartFilename, but I think that may be incorrect since it uses file in the CLogicalNameEntry constructor. Which is considered best practice, initializing CLogicalNameEntry with the dir-per-part/stripe-number and adding additional members to the class or passing file into constructPartFilename and getting the information right when we need it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think best to follow same pattern as for other info. extracted from the IPT file now, and pull them out and store them in CLogicalNameEntry as members. So, Attr/@flags can be read there once and used here. See previous command above. Otherwise, there is a lot of repeated work re-calling things for every part of the file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved IPT file extraction to the CLogicalNameEntry constructor to avoid re-calling things too many times. Added info as members of CLogicalNameEntry.

Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jackdelv - apologies for taking an age to get back to this.

Please see comments.
It's definitely getting closer.

@@ -508,7 +508,8 @@ sasha:
#disabled: true
#switchMinTime: 1
#queues: "*"


xref: {} # NB: no properties defined, use defaults
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be consistent with others, it would be good to list (commented out) common xref config options here.

Copy link
Contributor Author

@jackdelv jackdelv Nov 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some options that don't really mean anything. Should they be usable options or just placeholders? I don't think any would be used even if they were defined right now?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They should be..
saserver should be loading the component configuration (line 331 of saserver.cpp) into 'serverConfig',
and various points in saxref.cpp looks for properties in it.
Some may not be relevant for containerized.
Some will need reworking, e.g. xrefMaxMemory can prob. remain as an option, but it should by default use all available memory (as resourced for the container), e.g. 90% of container memory. Probably similarly with maxThreads.
The schedule ones are still relevant.
As the code stands, it's going to be trying to pick up properties from "DFUXRef/" because in BM, the sasha service configs are subtrees within 1 config. In Containerized it should assume the root is the component config
( e.g. see createSashaFileExpiryServer's way of conditionally treating the config depending on whether containerized or not ).

It may be okay as it is for now, it should default to semi ok default, and then we can open a separate JIRA to revisit.

esp/src/src-react/components/Xrefs.tsx Show resolved Hide resolved
context.ensureFeatureAccess(XREF_FEATURE_URL, SecAccess_Read, ECLWATCH_DFU_XREF_ACCESS_DENIED, "WsDfuXRef::DFUXRefList: Permission denied.");

#ifdef _CONTAINERIZED
DBGLOG("CONTAINERIZED(CWsDfuXRefEx::onDFUXRefList)");
Owned<IPropertyTreeIterator> planesIter = getPlanesIterator("data", nullptr);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mythor_mirror, hthor_hthor, etc

I think the (BM only) mirror planes should probably be ignored here. It would best if they were ignored not by name, but by 'copy', which currently isn't recorded when the plane IPropertyTree is created, but could be.

The hthor_hthor plane .. what was the error? (and can you check where "hthor_hthor" is being constructed?)
Hthor wasn't available as an option before in BM - I don't know why specifically it wasn't supported, but I think we can ignore for now this plane type, with a big fat comment saying we should revisit this functionality.
(but like ignoring the mirror, we should ignore by type, not by name, which may need a flag recording in the IPT).


BoolHash uniqueProcesses;
BoolHash uniquePlanes;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addUniqueXRefNode and it's use of a HT to track unique, doesn't seem useful now (and not sure it was).
i.e. each plane name is guarantee to be unique anyway..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the use of addUniqueXRef node. Should the function be deleted entirely if it isn't useful anymore?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably, but either way, we need to be careful this unification on planes doesn't break XREF in BM.

expandMask(partName, pmask, partNo, max);

// Get stripeNum from storage plane
Owned<IStoragePlane> plane = getDataStoragePlane(grpname, true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

potentially the plane may not exist.

This an other errors at this point should abort processing this file further I think, and somehow be part of the report of logical files which are 'bad'.

in fact it's calling this per file part.. we should minimize the work..
it looks like this and prefix (and anything else that is at the file level), should have been called/extracted once, probably in processFile, when the lnentry is created.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I think constructPartFilename should be moved outside of xref and into dadfs.cpp (where the deprecated version is), and extended to take any necessary parameters it needs.

To avoid confusion with the deprecated, but not yet completely deleted old method, we should rename the old method to e.g. deprecatedConstructPartFilename for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added check for plane. When an error occurs at this point, it is logged with CXRefManagerBase::error.

Moved file level info to lnentry creation and move constructPartFilename to dadfs.cpp

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should pass in required=false, otherwise getDataStoragePlane will throw an exception, instead of the more contextual error message below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed to pass in required=false.

for (;;) {
char c=*name;
if (!c)
break;
if ((c=='d')&&(isdigit(name[1])))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This routine should really parse the filename, looking for a plane prefix match first.
Reject if no match. All files should exist in a plane.

Then check next segment for striping, and check if corresponding plane has striping, if it hasn't then complain/reject.
Then parse remainder of path, and check penultimate segment for dir-per-part pattern (i.e. only check path segment before the filename).

Because this is called a lot, it does need to be efficient.
So I would collect the data plane prefixes upfront, then initially check to see if the name has a prefix of one of them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to check prefixes of data storage planes at the beginning. If no match is found throw an error. If a storage plane prefix is matched then it checks numDevices>1 and looks for striping in the next segment.

Skips what has already been added, parses the remainder of the path, and avoids copying until the filename is found. Once the file name is found it checks the last segment for a dir-per-part number. If a number is found and matches the part number of the path then the mask is added to mname.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's okay, but given how many times (maybe a billion times perhaps on prod), we may want to revisit, to cache the plane prefixes, to avoid iterating over the IPT each time.

Also, given it's fairly opaque functionality and critical, it would be good to add a CPPUNIT test with a list of paths to validate it behaves as expected for each. Could you add ?

dali/dfuXRefLib/dfuxreflib.cpp Outdated Show resolved Hide resolved
dali/dfuXRefLib/dfuxreflib.cpp Outdated Show resolved Hide resolved
dali/dfuXRefLib/dfuxreflib.cpp Show resolved Hide resolved
dali/dfuXRefLib/dfuxreflib.cpp Show resolved Hide resolved
Copy link
Contributor Author

@jackdelv jackdelv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jakesmith I responded to all your comments and left a couple small questions. Back to you.

expandMask(partName, pmask, partNo, max);

// Get stripeNum from storage plane
Owned<IStoragePlane> plane = getDataStoragePlane(grpname, true);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added check for plane. When an error occurs at this point, it is logged with CXRefManagerBase::error.

Moved file level info to lnentry creation and move constructPartFilename to dadfs.cpp

dali/dfuXRefLib/dfuxreflib.cpp Outdated Show resolved Hide resolved
unsigned stripeNum = calcStripeNumber(partNo+1, lfnHash, plane->numDevices());

// Get dir-per-part # from file metadata or storage plan info
FileDescriptorFlags flags = static_cast<FileDescriptorFlags>(file.getPropInt("Attr/@flags"));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved IPT file extraction to the CLogicalNameEntry constructor to avoid re-calling things too many times. Added info as members of CLogicalNameEntry.

unsigned n;
unsigned d;
mspec.calcPartLocation(partNo, max, copy, grp?grp->ordinality():max, n, d);
setReplicateFilename(fullname, d);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added !isContainerized().

@@ -923,17 +964,60 @@ static bool parseFileName(const char *name,StringBuffer &mname,unsigned &num,uns
name = nonrepdir.str();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added !isContainerized().

dali/dfuXRefLib/dfuxreflib.cpp Show resolved Hide resolved
context.ensureFeatureAccess(XREF_FEATURE_URL, SecAccess_Read, ECLWATCH_DFU_XREF_ACCESS_DENIED, "WsDfuXRef::DFUXRefList: Permission denied.");

#ifdef _CONTAINERIZED
DBGLOG("CONTAINERIZED(CWsDfuXRefEx::onDFUXRefList)");
Owned<IPropertyTreeIterator> planesIter = getPlanesIterator("data", nullptr);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was unable to find where the hthor_hthor plane is created. I don't even see it in the list of planes anymore. I'm really not sure what happened with that. I was able to see the error when hthor__hthor did show up. When I clicked generate, the status returned "Not Found".


BoolHash uniqueProcesses;
BoolHash uniquePlanes;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the use of addUniqueXRef node. Should the function be deleted entirely if it isn't useful anymore?

esp/src/src-react/components/Xrefs.tsx Show resolved Hide resolved
@@ -508,7 +508,8 @@ sasha:
#disabled: true
#switchMinTime: 1
#queues: "*"


xref: {} # NB: no properties defined, use defaults
Copy link
Contributor Author

@jackdelv jackdelv Nov 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some options that don't really mean anything. Should they be usable options or just placeholders? I don't think any would be used even if they were defined right now?

@jackdelv jackdelv requested a review from jakesmith November 8, 2024 21:01
Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jackdelv.
See follow up comments/replies.

Is the mirror plane being handled correctly now (being ignored) in BM?

@@ -508,7 +508,8 @@ sasha:
#disabled: true
#switchMinTime: 1
#queues: "*"


xref: {} # NB: no properties defined, use defaults
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They should be..
saserver should be loading the component configuration (line 331 of saserver.cpp) into 'serverConfig',
and various points in saxref.cpp looks for properties in it.
Some may not be relevant for containerized.
Some will need reworking, e.g. xrefMaxMemory can prob. remain as an option, but it should by default use all available memory (as resourced for the container), e.g. 90% of container memory. Probably similarly with maxThreads.
The schedule ones are still relevant.
As the code stands, it's going to be trying to pick up properties from "DFUXRef/" because in BM, the sasha service configs are subtrees within 1 config. In Containerized it should assume the root is the component config
( e.g. see createSashaFileExpiryServer's way of conditionally treating the config depending on whether containerized or not ).

It may be okay as it is for now, it should default to semi ok default, and then we can open a separate JIRA to revisit.


BoolHash uniqueProcesses;
BoolHash uniquePlanes;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably, but either way, we need to be careful this unification on planes doesn't break XREF in BM.

expandMask(partName, pmask, partNo, max);

// Get stripeNum from storage plane
Owned<IStoragePlane> plane = getDataStoragePlane(grpname, true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should pass in required=false, otherwise getDataStoragePlane will throw an exception, instead of the more contextual error message below.

dali/base/dadfs.cpp Show resolved Hide resolved
IGroup *grp = lnentry->queryGroup();
if (!grp) {
manager.warn(lnentry->lname.get(),"No group found, ignoring logical file");
return;
}
constructPartFilename(grp,partno,numparts,partname,partmask,partdir,replicate,replicateoffset,rfn);
lnentry->constructPartFilename(partno, replicate, rfn, file);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added replicate?1:0 because that was how it was handled in the deprecated version, but is that correct? What does copy mean?

I think good enough for now. It is assuming 2 copies (in BM), but theoretically there could be more.

@@ -1581,33 +1648,32 @@ void loadFromDFS(CXRefManagerBase &manager,IGroup *grp,unsigned numdirs,const ch
if (lnentry->outsidenodes.find(rep)==NotFound)
lnentry->outsidenodes.append(rep);
}
if (isContainerized())
{
UWARNLOG("Replication not supported in containerized version");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is going to generate a very large number of of logged warnings.
replication should really be a property of the plane.
For now, let's just turn this into a comment. In containerized we not expect planes to be replicated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment about replication being a property of the plane and commented out the logged warning.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this has not been commented out yet.
Change into :

// NB: Replication no supported in containerized version

IGroup *grp = lnentry->queryGroup();
if (!grp) {
manager.warn(lnentry->lname.get(),"No group found, ignoring logical file");
return;
}
constructPartFilename(grp,partno,numparts,partname,partmask,partdir,replicate,replicateoffset,rfn);
lnentry->constructPartFilename(partno, replicate, rfn, file);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added if (isContainerized()) to break out of loop and warn the user. Is the user the correct audience?

this will be hit very frequently. Should just be a comment. See other PR comment near code.

for (;;) {
char c=*name;
if (!c)
break;
if ((c=='d')&&(isdigit(name[1])))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's okay, but given how many times (maybe a billion times perhaps on prod), we may want to revisit, to cache the plane prefixes, to avoid iterating over the IPT each time.

Also, given it's fairly opaque functionality and critical, it would be good to add a CPPUNIT test with a list of paths to validate it behaves as expected for each. Could you add ?

name++;
while (*name&&isdigit(*name))
name++;
mname.append("d$P$");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why d$P$?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure what the mask for the stripe number should be, so I just replaced the number with the same characters used for the part and dir-per-part numbers. Since the 'd' character never changes I added it to the mask.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$P$ should be left as meaning the part # only.

Horrible, but if you look at how the result parseFilename 'mname' is used by the caller (i.e. 'orphanname'), it is expecting it to map it against logical file entries (lines 1702-1705), based on CLogicalNameEntry objects created by parseFile (line 1485) whose key value (in manager.logicaldirmap) is 'dirpartmask'. And, 'dirpartmask' is based on the logical files @Directory and it's @partmask only.

IOW, it has no concept of stripe or partPerDir at this stage.
It is trying to track physical files found that, in containerized, likely will have stripe and dir per part onto logical file entries that (where stripe nor dirperpart are part specific and the logical file is abstract from that).

So, parseFilename should not be trying to insert stripe or dirperpart into into the returned mname here. It should parse it only.
It may be useful to capture the info. and return the strip # and dir-per-part info as separate info (for future use maybe), but adding to the mname will cause every found part here, to fail to match up with the logical file it corresponds to.

Copy link
Contributor Author

@jackdelv jackdelv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jakesmith I addressed your comments and added some tests for parseFilename to datest. I also removed calls to deprecatedConstructPartFilename in a few places to test the new function.

For the constructPartFilename tests, I am not sure how to properly judge whether or not they pass. There weren't any exceptions in the constructPartFilename calls, but also nothing validating the output (maybe it is and I just don't see it). They passed on the first run but would cause errors on a second. Is this expected?

Also, to respond to your earlier question, no, the mirror planes are not ignored at the moment. Should there be a flag added so we can ignore by type?

name++;
while (*name&&isdigit(*name))
name++;
mname.append("d$P$");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure what the mask for the stripe number should be, so I just replaced the number with the same characters used for the part and dir-per-part numbers. Since the 'd' character never changes I added it to the mask.

@@ -1581,33 +1648,32 @@ void loadFromDFS(CXRefManagerBase &manager,IGroup *grp,unsigned numdirs,const ch
if (lnentry->outsidenodes.find(rep)==NotFound)
lnentry->outsidenodes.append(rep);
}
if (isContainerized())
{
UWARNLOG("Replication not supported in containerized version");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment about replication being a property of the plane and commented out the logged warning.

dali/base/dadfs.cpp Show resolved Hide resolved
@jackdelv jackdelv requested a review from jakesmith November 21, 2024 14:47
@jakesmith
Copy link
Member

For the constructPartFilename tests, I am not sure how to properly judge whether or not they pass. There weren't any exceptions in the constructPartFilename calls, but also nothing validating the output (maybe it is and I just don't see it). They passed on the first run but would cause errors on a second. Is this expected?

Unit tests should validate the output - using CPPUNIT_ASSERT_MESSAGE.

Also, to respond to your earlier question, no, the mirror planes are not ignored at the moment. Should there be a flag added so we can ignore by type?

We should not be performing XREF on the mirror planes (only relevant for BM), at least for now (as per how it used to be behave),
and since it is now standarizing on planes, it must be careful it doesn't XREF them unintentionally.
I previously made a comment that it could add a flag when mirror planes are created, so that XREF could 'see' them and ignore them. I think that still should be done.

Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jackdelv , please see comments. Happy to discuss any if that would help.

esp/services/ws_dfu/ws_dfuXRefService.cpp Outdated Show resolved Hide resolved
@@ -1581,33 +1648,32 @@ void loadFromDFS(CXRefManagerBase &manager,IGroup *grp,unsigned numdirs,const ch
if (lnentry->outsidenodes.find(rep)==NotFound)
lnentry->outsidenodes.append(rep);
}
if (isContainerized())
{
UWARNLOG("Replication not supported in containerized version");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this has not been commented out yet.
Change into :

// NB: Replication no supported in containerized version

dali/dfuXRefLib/dfuxreflib.cpp Show resolved Hide resolved
dali/datest/datest.cpp Outdated Show resolved Hide resolved
dali/dfuXRefLib/dfuxreflib.cpp Show resolved Hide resolved
dali/dfuXRefLib/dfuxreflib.cpp Outdated Show resolved Hide resolved
dali/dfuXRefLib/dfuxreflib.cpp Outdated Show resolved Hide resolved
while (*tailSlash&&*tailSlash!='/')
tailSlash--;
const char *d = tailSlash-1;
for (int i=1;*(d)&&isdigit(*d);i*=10,d--)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar to comment above, should never be possible to hit a null terminator (*d shoudl not be 0), but could in theory walk back to beginning of 'name'.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

}

if (mname.isEmpty())
throw makeStringExceptionV(-1, "Could not find matching prefix in plane definition for file %s", filename);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can the caller handle exceptions? I think not, or it looks like it will cause it to bail out of too much just because of 1 malformed physical filename.

Also, this function can now either throw exception or return false - which begs question how should they be considered differently? I think throwing an exception is fine if consistent.

I would remove the boolean result, throw an exception instead of the return false, and have the caller catch, log, skip further processing of this part, and continue

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NB: The alternative would be to CLogicalNameEntry create a dirpartmask that encapsulated an abstract stripe # and dirperpart, e.g. using $S for stripe.
e.g. so it filled dirpartmask with something like "/var/lib/HPCCSystems/hpcc-data/d$S$/mydata/sub/$P$/myfile._$P$of$N$" , and for parseFilename to do the same..

That would be a better approach I suspect, but requires more changes to processFile.

@jackdelv - happy to discuss further.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the boolean returns, and made exception throwing more consistent. The caller will now catch any error from parseFileName and log it with the manager.

I think it would fairly easy to build up a dirpartmask and return that alongside mname. Should that be done here, or in a follow up PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's leave for a subsequent PR. Can you open a separate JIRA to follow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created HPCC-33138 to follow up with this.

name++;
while (*name&&isdigit(*name))
name++;
mname.append("d$P$");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$P$ should be left as meaning the part # only.

Horrible, but if you look at how the result parseFilename 'mname' is used by the caller (i.e. 'orphanname'), it is expecting it to map it against logical file entries (lines 1702-1705), based on CLogicalNameEntry objects created by parseFile (line 1485) whose key value (in manager.logicaldirmap) is 'dirpartmask'. And, 'dirpartmask' is based on the logical files @Directory and it's @partmask only.

IOW, it has no concept of stripe or partPerDir at this stage.
It is trying to track physical files found that, in containerized, likely will have stripe and dir per part onto logical file entries that (where stripe nor dirperpart are part specific and the logical file is abstract from that).

So, parseFilename should not be trying to insert stripe or dirperpart into into the returned mname here. It should parse it only.
It may be useful to capture the info. and return the strip # and dir-per-part info as separate info (for future use maybe), but adding to the mname will cause every found part here, to fail to match up with the logical file it corresponds to.

@jackdelv jackdelv changed the base branch from candidate-9.6.x to master December 10, 2024 17:14
jackdelv and others added 17 commits December 10, 2024 12:15
- Not sure if any options are used
- Change deprecated constructPartFilename to deprecatedConstructPartFilename
- Keeps the constructPartFilename method simply for passing members to the function
- Move file level info to creation of lnentry rather than get it for each part
- Put guards on variables that can be null/empty
- Find match first and throw an error if one is not found
- Check next segment for striping if numDevices>1
- Only check last segment before filename for dir-per-part number
- avoid copying single chars and copy chunks after we look for dir-per-part
- Added the tests for constructPartFilename back and removed deprecated calls
- Add copy flag to mirror storage planes and avoid displaying them in xref
- Added the tests for constructPartFilename back and removed deprecated calls
@jackdelv
Copy link
Contributor Author

@jakesmith I added a flag "@copy" to the mirror storage planes. They are now ignored if copy is true.

@jackdelv jackdelv requested a review from jakesmith December 10, 2024 17:45
Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jackdelv - looks very close, please see new minor comments.

dali/dfuXRefLib/dfuxreflib.cpp Outdated Show resolved Hide resolved

// Standard file name with multiple parts
parseFileName("/var/lib/HPCCSystems/hpcc-data/test/myname._1_of_3", mname.clear(), num, max, stripeNum, dirPerPart, replicate);
CPPUNIT_ASSERT_EQUAL_STR("/var/lib/HPCCSystems/hpcc-data/test/myname._$P$_of_3",mname.str());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trivial: (no need to change) - maybe slight overkill for new macro that converts each arg. to std::string's I think.
Could could be:

CPPUNIT_ASSERT(strsame("/var/lib/HPCCSystems/hpcc-data/test/myname._$P$_of_3", mname));

dali/dfuXRefLib/dfuxreflib.cpp Outdated Show resolved Hide resolved
dali/dfuXRefLib/dfuxreflib.cpp Outdated Show resolved Hide resolved
dali/dfuXRefLib/dfuxreflib.cpp Outdated Show resolved Hide resolved
}

if (mname.isEmpty())
throw makeStringExceptionV(-1, "Could not find matching prefix in plane definition for file %s", filename);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's leave for a subsequent PR. Can you open a separate JIRA to follow?

- Fixed style issues
Copy link
Contributor Author

@jackdelv jackdelv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jakesmith I fixed the minor comments you left and opened a JIRA to follow up on one of the comments. Should I open any additional JIRAs to follow on with this?

dali/dfuXRefLib/dfuxreflib.cpp Outdated Show resolved Hide resolved
dali/dfuXRefLib/dfuxreflib.cpp Outdated Show resolved Hide resolved
dali/dfuXRefLib/dfuxreflib.cpp Outdated Show resolved Hide resolved
dali/dfuXRefLib/dfuxreflib.cpp Outdated Show resolved Hide resolved
}

if (mname.isEmpty())
throw makeStringExceptionV(-1, "Could not find matching prefix in plane definition for file %s", filename);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created HPCC-33138 to follow up with this.

@jackdelv jackdelv requested a review from jakesmith December 17, 2024 17:07
- Redoing this because it must have been lost in the rebase
Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jackdelv - looks good. Please squash (but don't rebase) for a final look. Can also convert from Draft PR

Have you done some testing of XREF with this PR in bare-metal, to ensure it hasn't introduced any issues?

else
replicate = false; // Replicate dir is not supported in containerized environment

Owned<IPropertyTreeIterator> planesIter = getPlanesIterator("data", nullptr);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code should be revisited after this PR is merged. I've opened and linked : https://hpccsystems.atlassian.net/browse/HPCC-33151

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants