Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle "extras" when multiple plates are concatenated #37

Open
hillierdani opened this issue Feb 10, 2023 · 5 comments
Open

Handle "extras" when multiple plates are concatenated #37

hillierdani opened this issue Feb 10, 2023 · 5 comments

Comments

@hillierdani
Copy link

hillierdani commented Feb 10, 2023

Hi Kale,

Thanks for all your efforts in maintaining this pioneer project. I suspect this is more of a feature request than a bug report:

I have an aggregator.toml file:

extra_main = 1

[meta.concat]
'5h' = '20220901_nluc_AAV9.AB_B22_5h.toml'
'1h' = '20220901_nluc_AAV9.AB_B22_1h.toml'

and the two other toml files. Each of those also contains extras. When doing
wellmap.load(aggregator.toml, extras=True)

only the extra_main=1 is returned but none of the "extras" defined in the actual layout TOML files. Is this intended? Should it not return a dict of dicts with the top level aggregator dict having keys defined under [meta.concat]?

The use case I am building aims to link data descriptors to routines that plot the data processed from the wells. E.g. in this case incubation time (1h vs 5h) should show up in subplot titles. I feel such annotative pieces of text have the right place in TOML files, this way plotting routines can remain much more generic and reusable.

@hillierdani
Copy link
Author

hillierdani commented Feb 10, 2023

I cooked a workaround script as I am not 100% sure what would be the best place to add this in wellmap.file:

def parse_multiplate_extras(toml_path: Path):
    """
    Parses extra parameters from each plate into a dict. 
    
    Parameters
    ----------
    toml_path: relative or absolute path to aggregator TOML that lists plate TOML files under [concat]

    Returns
    -------
    dict or dicts
    """
    import tomllib
    with open(toml_path, 'rb') as agg:
        plates = tomllib.load(agg)
    if 'meta' not in plates or 'concat' not in plates['meta']:
        return
    pathp = toml_path.parent  # individual plate TOML files are assumed to be specified relative to same directory
    extras = []
    for pk1, pv1 in plates['meta']['concat'].items():
        extras[pk1] = wellmap.config_from_toml(pathp/pv1)[3]
    return extras

@kalekundert
Copy link
Owner

Thanks for the feature request! This should definitely be supported; I think I just overlooked this combination of features. In your specific case, though, I'd probably recommend specifying the incubation times in the layout itself (e.g. expt.incubation_time or plate.incubation_time) instead of using extras. You could then make the subplots you want by grouping-by the incubation_time column in the resulting data frame. My thinking is that incubation time is a legitimate experimental parameter, so it belongs with all the other experimental parameters. I use extras more for things that only matter aesthetically, e.g. what colors to use for each sample, how to arrange subplots, etc.

Going back to the feature request, it's a bit awkward to try to fit all the concatenated extras into a single dictionary. The issue is that the aggregator file can also have extras, and there needs to be a way to distinguish those keys and the keys identifying the concatenated files. If they both go at the top-level of the dictionary, they would be indistinguishable. If the aggregator file gets it's own special key in the top level dictionary, then the format of the extras dictionary would change depending on whether the file uses meta.concat or not, and that would make it hard to write general-purpose scripts.

I think a better solution is to return two extras dictionaries: one for the main file, and then another for the concatenated (and maybe included?) files. That way, both of these dictionaries would always have the same structure, and none of the keys would be ambiguous. Scripts that want to handle concatenated input files would have to check both sets of extras, and merge them as necessary. I'd rather not make scripts go through this effort, but I think it's unavoidable since there's no general purpose way to merge arbitrary data structures like these.

The signature for load() is already a mess because there are so many optional return values, and this idea would just add another one. See #38 for how I'm thinking about handling that.

@hillierdani
Copy link
Author

I think a better solution is to return two extras dictionaries: one for the main file, and then another for the concatenated (and maybe included?) files. That way, both of these dictionaries would always have the same structure, and none of the keys would be ambiguous. Scripts that want to handle concatenated input files would have to check both sets of extras, and merge them as necessary. I'd rather not make scripts go through this effort, but I think it's unavoidable since there's no general purpose way to merge arbitrary data structures like these.

Thanks very much for sharing your thoughts. What I ended up doing goes along what you suggested:

def parse_multiplate_extras(toml_path: Path):
    """
    Parses extra parameters from each well into a dict.

    Parameters
    ----------
    toml_path: relative or absolute path to aggregator TOML that lists plate TOML files under [concat]

    Returns
    -------
    dict or dicts
    """
    import tomllib
    with open(toml_path, 'rb') as agg:
        plates = tomllib.load(agg)
    if 'meta' not in plates or 'concat' not in plates['meta']:
        # not multiplate, return dict with plate name as key
        cfg, dummy, dummy, extras1, dummy = wellmap.config_from_toml(toml_path)
        if 'expt' not in cfg or 'plate' not in cfg['expt']:
            raise wellmap.LayoutError(f"TOML {toml_path} must have an ['expt'] plate = 'name' section defined.")
        return {cfg['expt']['plate']: extras1}
    pathp = Path(toml_path).parent  # individual plate TOML files are assumed to be specified relative to same directory
    cfg, dummy, dummy, extras1, dummy = wellmap.config_from_toml(toml_path)
    extras = {'concat': extras1}  # store extras specified in the aggregator file
    for pk1, pv1 in plates['meta']['concat'].items():
        extras[pk1] = wellmap.config_from_toml(pathp/pv1)[3]
    return extras

Not sure how elegant this is (not very much) but I have built downstream analysis based on this approach. This becomes a convention in my plate TOML files: if something is defined in the aggregator then it applies to all plates.

 # If multiplate: clone settings from aggregator to plate extras dicts
    extra_keys_clone = ['basic_stats', 'plot']  # normalize can be defined both in aggregator TOML or in each plate, leave that in extras['concat']
    if 'concat' in extras:
        for pk1 in plates:
            if 'normalize' in extras[pk1]:
                raise ValueError(
                    f"Multiplate TOML files must define normalize.groupby in the aggregator TOML file, not in individual plate TOML files. {extras}")
            else:
                extras[pk1].update({ek1: ev1 for ek1, ev1 in extras['concat'].items() if ek1 in extra_keys_clone})

In summary, I adopted this solution to avoid repeating the same parameters across plates, such repeats are always source of evil. I hope this is somewhat instructive to others. This approach still allows defining different parameters in the individual plates. A consistency check of "if defined in aggregator and plate TOML -> throw error" is left to the user for now.

@hillierdani
Copy link
Author

I close this as it became stale and I can live with my own solution sketched above.

@kalekundert
Copy link
Owner

I'm going to keep this issue open, because I do want to eventually handle concatenated extras in the way outlined above.

@kalekundert kalekundert reopened this Apr 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants