Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[proposal] Towards using JQ only for complicated cases #241

Closed
lquerel opened this issue Jul 13, 2024 · 0 comments · Fixed by #246
Closed

[proposal] Towards using JQ only for complicated cases #241

lquerel opened this issue Jul 13, 2024 · 0 comments · Fixed by #246
Labels
enhancement New feature or request

Comments

@lquerel
Copy link
Contributor

lquerel commented Jul 13, 2024

Status: This proposal is abandoned in favor of #246

Rational

JQ is a very expressive and powerful generic language for filtering, projecting, sorting, grouping, and performing all kinds of calculations on structured objects. However, its use requires significant learning, which may not be desirable for managing the simplest cases of processing for semantic convention registries. This GitHub issue proposes an alternative that can be used as a replacement for simple use cases or in addition to JQ expressions for complex use cases.

Proposal

This alternative takes the following form:

registry_processing:
  stages:
    - retain_groups_if_any: 
        id_matches: regex                        # optional
        types_in: [GroupType]                    # optional
        is_deprecated: bool                      # optional
        stability_in: [Stability]                # optional
        has_no_attributes: bool                  # optional
    - remove_groups_if_any: 
        id_matches: regex                        # optional
        types_in: [GroupType]                    # optional
        is_deprecated: bool                      # optional
        stability_in: [Stability]                # optional
        has_no_attributes: bool                  # optional
    - retain_attributes_if_any:
        name_matches: regex                      # optional
        is_deprecated: bool                      # optional
        stability_in: [Stability]                # optional
    - remove_attributes_if_any: 
        name_matches: regex                      # optional
        is_deprecated: bool                      # optional
        stability_in: [Stability]                # optional
    - sort_groups_by: 
        field: id
        order_by: asc|desc  # default = asc        
  
  group_by:
     field: namespace

This sequential format is simple, reuses semconv concepts, and is free of control structures such as if/then/else and loops. For a detailed description, see the following section.

Note: Any of the field values described in the YAML document above can be either a constant value or a dynamic value defined in the params section of the weaver.yaml. As a reminder, the params section can be updated via the command line, providing a simple mechanism to drive the generation directly from the CLI.

Explanation:

Registry Processing Pipeline

The registry processing pipeline provides a structured and straightforward way to filter, retain, remove, sort, and group semantic convention groups inside a registry without the need for complex JQ expressions. It is designed to simplify common processing tasks for semantic convention registries, making it more accessible for users who may not be familiar with JQ.

Stages

The pipeline consists of multiple stages that are executed sequentially. Each stage specifies conditions for retaining or removing groups and attributes based on various criteria. The conditions within each stage use OR logic, meaning that if any condition in a stage is met, the action (retain or remove) is applied. The stages themselves are linked with AND logic, meaning that all stages are applied in sequence to the registry groups.

  • retain_groups_if_any: Retains groups if any of the specified conditions are met. Conditions can include matching the group's ID with a regex, being of certain types, being deprecated, having specific stability levels, or having no attributes.
  • remove_groups_if_any: Removes groups if any of the specified conditions are met. Similar conditions as retain_groups_if_any can be specified.
  • retain_attributes_if_any: Retains attributes if any of the specified conditions are met. Conditions can include matching the attribute's name with a regex, being deprecated, or having specific stability levels.
  • remove_attributes_if_any: Removes attributes if any of the specified conditions are met. Similar conditions as retain_attributes_if_any can be specified.
  • sort_groups_by: Specifies the field by which the registry items should be sorted. The field key defines the field to sort by (e.g., id), and the order_by key specifies the sort order (asc for ascending, desc for descending). The default order is ascending. Multiple sort_groups_by stages can be applied sequentially, and the sort algorithms used are stable, ensuring that the relative order of items that compare equal will be preserved.

Grouping

After filtering, retaining, removing, and sorting the groups and attributes, the registry can be further organized by grouping:

  • group_by: Specifies a field for grouping the registry items. For example, grouping by namespace organizes the registry items based on their namespace.

Combining with JQ Expressions

JQ expressions can still be used to manage complex processing that goes beyond the capabilities of this YAML-based pipeline. It is also possible to combine the simple registry_processing with JQ expressions. If both registry_processing and JQ expressions are present, the registry_processing is applied first, followed by the JQ expression. This allows for a flexible and powerful processing pipeline that leverages the simplicity of YAML for common tasks and the power of JQ for more complex scenarios.

Below is an example of a projection performed with JQ (that we can't defined with registry_processing), combined with filtering expressed using registry_processing.

templates:
  - pattern: metrics/metrics.j2
    registry_processing:
      stages:
        - retain_groups_if_any:
            types_in: [metric]  
        - remove_attributes_if_any:
           stability_in: [experimental]
        - remove_groups_if_any:
           has_no_attributes: true
        - sort_groups_by:
            field: id
    filter: >
      .groups
      | map({
          prefix: .[0].id | split(".") | .[1],
          groups: .          
      })   
    application_mode: each

Top-level registry_processing by default

In most cases, the user wants to control the generation using parameters to produce a registry with or without experimental items, with or without certain groups, sorted alphabetically. To support this, this proposal defines the default value of the top-level registry_processing.

registry_processing:
  stages:
    - remove_attributes_if_any:
       stability_in: $exclude_stability
    - remove_groups_if_any:
       has_no_attributes: true
    - remove_groups_if_any:
       id_matches: $exclude_group_ids
    - sort_groups_by:
        field: id

If the params $exclude_stability and $exclude_group_ids are not set, then the registry_processing will be equivalent to:

registry_processing:
  stages:
    - remove_groups_if_any:
       has_no_attributes: true
    - sort_groups_by:
        field: id

Benefits

This processing pipeline definition format is simpler to understand and master. It covers a large percentage of needs, knowing that one can fall back on JQ at any time for the most complex cases.

Defining a default pipeline at the top level allows for further simplification. In most cases, users will only need to define the type of group they want to inject into a Jinja template (see example below).

# Insert here definition of text_maps, annotation, ...
    
templates:
  - pattern: attributes.j2
    registry_processing:
      stages:
        - retain_groups_if_any:
            types_in: [attribute_group]
    application_mode: each
  - pattern: metrics.j2
    registry_processing:
      stages:
        - retain_groups_if_any:
            types_in: [metric]     
    application_mode: each
  - pattern: span.j2
    registry_processing:
      stages:
        - retain_groups_if_any:
            types_in: [span]     
    application_mode: each

With this configuration file, the three templates will be populated with the corresponding groups. Depending on the parameters passed to the command line, these groups may or may not contain experimental attributes, some of these groups may be excluded, and the groups will be sorted.

At any time, the user can redefine the default processing pipeline and even combine it with JQ if necessary.

Examples

An example of weaver.yaml to configure the inputs of 2 templates.

# Common operations applied to all templates
# - Experimental attributes are removed.
# - Groups without attributes are removed.
# - Groups are sorted by id.
registry_processing:
  stages:
    - remove_attributes_if_any:
       stability_in: [experimental]
    - remove_groups_if_any:
       has_no_attributes: true
    - sort_groups_by:
        field: id
    
templates:
  - pattern: attributes/attributes.j2
    registry_processing:
      stages:
        - retain_groups_if_any:
            types_in: [attribute_group]
        - retain_groups_if_any:
            id_matches: "registry\\."       
    application_mode: each
  - pattern: metrics/metrics.j2
    registry_processing:
      stages:
        - retain_groups_if_any:
            types_in: [metric]     
    application_mode: each

If you want to group the groups by namespace.

# Common operations applied to all templates
# - Experimental attributes are removed.
# - Groups without attributes are removed.
# - Groups are sorted by id.
registry_processing:
  stages:
    - remove_attributes_if_any:
       stability_in: [experimental]
    - remove_groups_if_any:
       has_no_attributes: true
    - sort_groups_by:
        field: id
    
templates:
  - pattern: attributes/attributes.j2
    registry_processing:
      stages:
        - retain_groups_if_any:
            types_in: [attribute_group]
        - retain_groups_if_any:
            id_matches: "registry\\."
      group_by:
        field: namespace        
    application_mode: each
  - pattern: metrics/metrics.j2
    registry_processing:
      stages:
        - retain_groups_if_any:
            types_in: [metric]
      group_by:
        field: namespace        
    application_mode: each
@lquerel lquerel added the enhancement New feature or request label Jul 13, 2024
@lquerel lquerel moved this to Todo in OTel Weaver Project Jul 13, 2024
@lquerel lquerel linked a pull request Jul 17, 2024 that will close this issue
@lquerel lquerel moved this from Todo to In Progress in OTel Weaver Project Jul 17, 2024
@github-project-automation github-project-automation bot moved this from In Progress to Done in OTel Weaver Project Jul 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

1 participant