Ignore metadata subtrees #43

psss · 2018-06-27T14:10:40Z

When exploring directories fmf should probably ignore other metadata trees in the directory tree. Yesterday we've discussed the following real-life scenario which seems to support this approach:

Standard Test Roles support fetching tests from remote repositories. At the same time there can be tests stored directly in dist-git. When these two options are mixed we should still be able to gather reasonable metadata for both.

Here's a simple example directory structure:

.
├── fetched
│   ├── fetched1
│   ├── fetched2
│   ├── fetched3
│   ├── .fmf
│   │   └── version
│   └── main.fmf
├── .fmf
│   └── version
├── main.fmf
├── stored1
├── stored2
└── stored3

Current output in the main tests directory (note the modified node identifiers):

/stored1
/stored2
/stored3
/fetched/fetched1
/fetched/fetched3
/fetched/fetched2

Current output for the fetched test cases in tests/fetched:

/fetched1
/fetched3
/fetched2

Expected output for the main directory:

/stored1
/stored2
/stored3

Guys, what are your thoughts on this?

The text was updated successfully, but these errors were encountered:

jkrysl · 2018-06-27T15:37:26Z

I like the solution we discussed some time ago: placing all trees to the same level (and creating forest by doing so). So the output in the main directory would look like this:

/stored1
/stored2
/stored3
/fetched1
/fetched2
/fetched3

This is great for merging downstream and upstream, where you can place one of them in subdirectory of the other and the metadata will seamlessly merge together. Usage of this is for example keeping credentials downstream but the test itself upstream.
But the main disadvantage is breaking the path_in_fmf_tree == path_in_filesystem. Each attribute would need to have a property of source, that would tell where it came from. So for example if the attribute is test file, I can access this property to figure out where the file is. Easy to do with python API, not sure about the CLI (or if it is needed there at all...)

Another approach is to have name specified in .fmf directory, which would replace the name of the root directory. This way the output would look like this:

/store_name/stored1
/store_name/stored2
/store_name/stored3
/fetched_name/fetched1
/fetched_name/fetched2
/fetched_name/fetched3

This has the same advantages / disadvantages as above, just stronger. It lets the user say if he wants to merge it and breaks the path_in_filesystem for everything if the name is different for the main directory too.

Third way could be having 'switch' in .fmf directory to say if we want to merge the subtrees or ignore them. This is probably the smallest change of these 3 and easiest to do. Something like
.fmf/config file with ignore_subtrees=1

Fourth way is not giving a choice at all and ignoring them right away...

For now I am in favour of the last approach, because it is the smallest one, gives user choice and does not limit implementing one of the first 2 in the future (I like the 2nd better). This has only one issue: Implementing one of the first 2 would introduce a breaking change.

Hope that helps :)

AloisMahdal · 2018-06-27T17:11:46Z

I think good practice is to mimic model of filesystem, ie. file/directory. Then looking at how common utilities work could answer some questions. For example:

fmf ls lists tests and test suites from $PWD. That is, it acknowledges existence of any subtrees.

This brings the question: "how to discern between tests and test suites"? Turns out the case of 'ls' is actually the trickiest one.
```
$ fmf ls
/fetched/
/stored1
/stored2
/stored3
```
fmf show shows tests in current test suite. That is, it ignores existence of any subtrees, and fails if $PWD is not a tree.
```
$ fmf show
/stored1
/stored2
/stored3
$ fmf show fetched
/fetched1
/fetched3
/fetched2
```
fmf stat shows meta-data of a particular item. That is, it already mandates specifying exactly one item, so subtrees are not relevant.
```
$ fmf stat fetched/fetched1
/fetched1
{ 'timeout': 30 }
```
fmf find finds everything from$PWD down:
```
$ fmf find
./
./stored1
./stored2
./stored3
./fetched/
./fetched/fetched1
./fetched/fetched3
./fetched/fetched2
```
- fmf find -type s finds all test suites (.fmf trees).
```
./
./fetched/
```
- fmf find -type t finds all tests including ones from deeper suites.
```
./stored1
./stored2
./stored3
./fetched/fetched1
./fetched/fetched3
./fetched/fetched2
```
Notice that examples above use filepath as default output so that the output can be re-used for fmf stat.

AloisMahdal · 2018-06-27T17:16:10Z

@jkrysl I'm not sure about the part of merging data. IMO the meta-data never merge on its own; that was kinda the point of #26. The example with credentials is not convincing; I would not really see that as good practice (credentials are property of provisioning test environment, not of test).

jscotka · 2018-06-27T17:16:47Z

Hi,
my preferred solution is always try to find the most top .fmf directory and take it as an root, and let tree structure as it is, with ignoring that there are other .fmf dirs, means that
fmf ls --path . or fmf ls --path fetched both leads to:

./stored1
./stored2
./stored3
./fetched/fetched1
./fetched/fetched3
./fetched/fetched2

I don't know why, but I found this the most intuitive for me :-)
move these subtrees to planar structure is interesting, but very tricky solution from my perspective.

jkrysl · 2018-06-28T07:21:31Z

@AloisMahdal By credentials I mean user / pass to server and its address. For example I have test that tests a thing. It runs on already provisioned clean server, but it has to connect to different servers to do its thing. Some of these servers are downstream (private), some upstream. It does not make much sense to copy this test multiple times and split it to upstream / downstream test, as the code is the same and only the credentials change. IMO much better approach is let the layer above handle it (e.g. test suite) by providing this metadata to the test and running it multiple times with different metadata. Adding new server to this testing is just matter of adding single leaf to FMF, no code change needed. Basically I use metadata to not only describe the test but also to give it some values to alter its behaviour. So the test is defined by not only its code, but also its metadata.
Because not all metadata is private and I hate duplicating, I can put almost everything in the upstream and leave just the private metadata downstream. And merge it with FMF.
I can either do this merge myself (giving the suite location of downstream and upstream and let it merge with FMF API) or let the FMF do it for me seamlessly by simply cloning the downstream as a subtree of upstream. Or I can keep doing it as I am doing now: copying downstream to the upstream when setting up the test environment, but I have to be very careful to have different files:

/test/main.fmf (upstream, copied from location X)
/test.fmf (downstream, copied from location Y)

So I would prefer to see it like this:

filesystem:
/test/main.fmf (upstream, name=X)
/downstream/test/main.fmf (downstream, name=X)

FMF:
/test
  data_from_upstream
  data_from_downstream

But for this to happen we need much broader discussion and implement some sort of metadata of metadata, so user knows where each attribute value came from. Other nice examples of this metadata is type of the attribute, its description...

Regarding the #26: its point was to have the same metadata no matter where you are in the tree, not alter merging.

psss · 2018-07-12T11:34:34Z

Guys, thank you very much for your detailed opinions and many good points. I've read the comments carefully and here's my summary:

The support for merging metadata trees might come handy but, as Jakub mentioned, would need some more detailed discussion.
From the options drafted by Jakub I find the "merging-switch" as probably the most promising for the future, but agree that the last option (ignoring) is probably the best choice for now. Later we can implement a new config option to enable merging, defaulting to ignore which would be backward compatible.
I also like the fmf find functionality outlined by Alois, we could use this for easy discovery of all trees present under given filesystem directory (and then handle them one by one if desired). Filed Implement fmf find #45 to implement this feature.

I think that the solution should definitely align with the following statement:

Object name/identifier should be unique, constant, clearly predictable.
It should not be affected by the merging strategy or filesystem location.

That's why finding the top-most .fmf directory and prefixing the object identifier does not seem to be the right way. Finally to sum up: For now we're going to ignore any metadata subtrees and we'll investigate merging strategies in the future.

psss mentioned this issue Jul 12, 2018

Implement fmf find #45

Open

psss self-assigned this Jul 12, 2018

psss added the enhancement New feature or request label Jul 12, 2018

psss added a commit that referenced this issue Jul 12, 2018

Ignore metadata subtrees [fix #43]

a6f2b95

psss closed this as completed Jul 16, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ignore metadata subtrees #43

Ignore metadata subtrees #43

psss commented Jun 27, 2018 •

edited

Loading

jkrysl commented Jun 27, 2018

AloisMahdal commented Jun 27, 2018

AloisMahdal commented Jun 27, 2018

jscotka commented Jun 27, 2018 •

edited

Loading

jkrysl commented Jun 28, 2018

psss commented Jul 12, 2018 •

edited

Loading

Ignore metadata subtrees #43

Ignore metadata subtrees #43

Comments

psss commented Jun 27, 2018 • edited Loading

jkrysl commented Jun 27, 2018

AloisMahdal commented Jun 27, 2018

AloisMahdal commented Jun 27, 2018

jscotka commented Jun 27, 2018 • edited Loading

jkrysl commented Jun 28, 2018

psss commented Jul 12, 2018 • edited Loading

psss commented Jun 27, 2018 •

edited

Loading

jscotka commented Jun 27, 2018 •

edited

Loading

psss commented Jul 12, 2018 •

edited

Loading