Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(scatter): jittering for category data #19941

Open
wants to merge 8 commits into
base: next
Choose a base branch
from
Open

feat(scatter): jittering for category data #19941

wants to merge 8 commits into from

Conversation

Ovilia
Copy link
Contributor

@Ovilia Ovilia commented May 16, 2024

Brief Information

This pull request is in the type of:

  • bug fixing
  • new feature
  • others

What does this PR do?

This PR proposes a jittering effect for category axes and single axes, solving #18432 as well as providing a foundation to support the violin series.

image image image

It also provide an option jitterOverlap to support a beeswarm-like effect where scatters try not to overlap each other.

Screenshot 2024-05-29 at 19 06 23

Fixed issues

#18432

API Changes

A new axis.jitter option in number type, which is the jitter max range in pixels

axis
    jitter: number, max range in pixels that scatters can range from
    jitterOverlap: boolean, whether scatters can overlap each other
    jitterMargin: if jitterOverlap is false, the margin of scatters from each other

Details

Jittering is useful in scatter plots for:

  • Overlapping Data Points: Prevents points with the same or similar values from overlapping.
  • Categorical Axes: Distinguishes points within the same category.
  • Small Sample Sizes: Separates closely clustered data points.
  • Discrete Data: Helps with data naturally falling into specific values (e.g., counts).

Document Info

One of the following should be checked.

  • This PR doesn't relate to document changes
  • The document should be updated later
  • The document changes have been made in apache/echarts-doc@5b6f4ec

Misc

ZRender Changes

  • This PR depends on ZRender changes (ecomfe/zrender#xxx).

Related test cases or examples to use the new APIs

N.A.

Others

Merging options

  • Please squash the commits into a single one when merging.

Other information

Copy link

echarts-bot bot commented May 16, 2024

Thanks for your contribution!
The community will review it ASAP. In the meanwhile, please checkout the coding standard and Wiki about How to make a pull request.

The pull request is marked to be PR: author is committer because you are a committer of this project.

Document changes are required in this PR. Please also make a PR to apache/echarts-doc for document changes and update the issue id in the PR description. When the doc PR is merged, the maintainers will remove the PR: awaiting doc label.

@Ovilia Ovilia added this to the 6.0.0 milestone May 16, 2024
@Ovilia Ovilia linked an issue May 16, 2024 that may be closed by this pull request
Copy link
Contributor

github-actions bot commented May 16, 2024

The changes brought by this PR can be previewed at: https://echarts.apache.org/examples/editor?version=PR-19941@e9fdf5f

Ovilia added a commit to apache/echarts-doc that referenced this pull request May 17, 2024
@echarts-bot echarts-bot bot added PR: doc ready and removed PR: awaiting doc Document changes is required for this PR. labels May 17, 2024
Ovilia added a commit to apache/echarts-examples that referenced this pull request May 20, 2024
Ovilia added a commit to apache/echarts-doc that referenced this pull request May 20, 2024
@Ovilia Ovilia marked this pull request as draft May 23, 2024 06:44
@xyy7260
Copy link

xyy7260 commented May 28, 2024

@Ovilia 这个还有多久可以发布呢

@Ovilia
Copy link
Contributor Author

Ovilia commented May 29, 2024

@xyy7260 This feature is planned for ECharts 6.0, which is expected to be release in the first season of 2025. If you are interested in using it before then, you may fork it after being merged and use it locally.

@pull-request-size pull-request-size bot added size/L and removed size/S labels May 29, 2024
@Ovilia Ovilia marked this pull request as ready for review May 30, 2024 11:36
@xyy7260
Copy link

xyy7260 commented Jun 5, 2024

@Ovilia 没尝试过如何提前合并。 如果我要提前合并,在哪里查看你这个 fork

@Ovilia
Copy link
Contributor Author

Ovilia commented Jun 5, 2024

@Ovilia 没尝试过如何提前合并。 如果我要提前合并,在哪里查看你这个 fork

Step 1: Clone the repository or update your local repository with the latest changes.

git pull origin next
Step 2: Switch to the base branch of the pull request.

git checkout next
Step 3: Merge the head branch into the base branch.

git merge feat-scatter

@xyy7260
Copy link

xyy7260 commented Jun 6, 2024

@Ovilia OK

const baseAxis = coordinateSystem.getBaseAxis();
const { type: scaleType } = baseAxis.scale;
const seriesValid = coordType === 'cartesian2d'
&& (scaleType === 'category' || scaleType === 'ordinal')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scaleType can never be 'category'.
only ordinal, interval, log, time are possible.


export function needFixJitter(seriesModel: SeriesModel, axis: Axis): boolean {
const { coordinateSystem } = seriesModel;
const { type: coordType } = coordinateSystem;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A series instance is not necessarily has a coordinateSystem instance.
It would be better if adding a null checking here.

}
}
return y;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering the size issue, should we consider that introduce this kind of features on demand rather than including it by default?
That is, support to import it by users manually.
I'm not sure yet 🤔.

minFloat = fixJitterIgnoreOverlaps(floatCoord, jitter);
}

items.push({ fixedCoord, floatCoord: minFloat, r: radius });
Copy link
Member

@100pah 100pah Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that implicitly adding a big array and pushing data item to the input fixedAxis is probably not a good practice and error-prone.

The instance of fixedAxis happens to be recreated each time going through the rending pipeline. But it's not necessarily like this. If someone change that behavior to reuse the axis instance, the implement here will lead to mistakes and not easy to be found.

Theoretically it would be better that a util function is a pure function (not modify any input data), especially when the name of the function appears to have that connotation according to convention. (sometimes a cache can break the principle but must be very careful about the memory issue.) (edit at Nov 13, 2024)
In this scenario I think there is anther inappropriate factor due to this code structure. It makes the time complexity of the algorithm O(n^2) (caused by `placeJitterOnDirection`). But I think it's supposed to be O(n). (edit at Nov 13, 2024)
So I think the jitter fixing should be performed at the layout phase, rather then in the final render phase. That it, should not be called in `SymbolDraw`, should better be called in some place like `layout/points.ts`. And in that layout phase, we can perform the jitter in or around the loop logic, preparing some auxiliary array before the loop in `placeJitterOnDirection` performed, and do not need to make it persistent. (edit at Nov 13, 2024)

(edit at Nov 13, 2024)
I misunderstood the algorithm earlier.
Considering multiple series, an axis based store is needed, as it's implemented currently. And O(n) seems not possible.

But there are suggestions about the code structure (in my personal opinion):

  • Instead of using inner to mount a store to the axis instance, declare the store property on axis explicitly to make it noticeable and comment that the lifetime of the store is a concern if some modification is needed in future. For example create some ts interface like

    interface JitterStorable {
        // some comment about the lifetime.
        jitterStore: JitterData[]
    }
    class Axis2D extends Axis, JitterStorable { /* ... */ }
    class SingleAxis extends Axis, JitterStorable { /* ... */ }
  • Prefer to add an new processor with the priority of registers.PRIORITY.VISUAL.POST_CHART_LAYOUT to perform the jitter, instead of currently doing it in SymbolDraw, where is literally and conventionally not a place to do layout jobs. And if some other components need the layout data, it's not easy to get the final accurate data if there will be modified by SymbolDraw.

@@ -81,6 +81,9 @@ export interface AxisBaseOptionCommon extends ComponentOption,
*/
max?: ScaleDataValue | 'dataMax' | ((extent: {min: number, max: number}) => ScaleDataValue);

jitter?: number;
Copy link
Member

@100pah 100pah Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • (readability issue): The meaning of this option jitter can't be presumed by the naming and no comment here.

  • I think we should better to support automatic calculation of the jitter boundary (radius) based on tick span of the axis? Otherwise, if the chart is resizable, or if their are multiple values of the category axis, users may be not able to find an appropriate pixel value of boundary radius. If using a wrong jitter value, some of the outcome coords might be inappropriate (e.g. a negative value or out of the cartesian and not drawn). If auto calculation is supported, users can simply configured it as

    • xAxis: { jitter: true } to get a perfect result, where jitter radius is auto calculated by tick span.
    • xAxis: { jitter: { radius: 400, overlap: false }} to set as 400px.
    • xAxis: { jitter: { radius: '80%', overlap: false }} to get 80% of the tick span.

}

function fixJitterIgnoreOverlaps(floatCoord: number, jitter: number): number {
return floatCoord + (Math.random() - 0.5) * jitter;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think also need to clamp with the coordinate system boundary, otherwise the invalid outcome coords will cause that some points can not be draw and hard to be discovered by developers. The same goes for the overlap processing.

const newY = item.floatCoord + Math.sqrt(r * r - dx * dx) * direction;
if (direction > 0 && newY > y || direction < 0 && newY < y) {
y = newY;
i = 0; // Back to check from the first item.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if the items is ordered by floatCoord, the backtracking (i = 0) here is not necessary. With this backtracking, the entire algorithm could degraded to O(n^3) in the worst case. But without it, it can keep O(n^2).

Just have a try, to modify it (with a linked list and sort in ascending order by floatCoord):

export type JitterData = {
    fixedCoord: number;
    floatCoord: number;
    r: number;
    next: JitterData | null;
    prev: JitterData | null;
};

// Items is a circular linked list, in the ascending order by floatCoord.
const inner = makeInner<{ items: JitterData }, Axis2D | SingleAxis>();

function fixJitterAvoidOverlaps(
    fixedAxis: Axis2D | SingleAxis,
    fixedCoord: number,
    floatCoord: number,
    radius: number,
    jitter: number,
    margin: number
): number {
    const store = inner(fixedAxis);
    if (!store.items) {
        store.items = {
            fixedCoord: -1,
            floatCoord: -1,
            r: -1,
            next: null, // head of a link list
            prev: null, // tail of a link list
        };
        store.items.next = store.items;
        store.items.prev = store.items;
    }
    const items = store.items;

    const overlapA = placeJitterOnDirection(items, fixedCoord, floatCoord, radius, jitter, margin, 1);
    const overlapB = placeJitterOnDirection(items, fixedCoord, floatCoord, radius, jitter, margin, -1);
    const overlapResult = Math.abs(overlapA.resultCoord - floatCoord) < Math.abs(overlapB.resultCoord - floatCoord)
        ? overlapA : overlapB;
    let minFloat = overlapResult.resultCoord;
    if (Math.abs(minFloat - floatCoord) > jitter / 2) {
        // If the new item is moved too far, then give up.
        // Fall back to random jitter.
        minFloat = fixJitterIgnoreOverlaps(floatCoord, jitter);
    }

    // Insert to store
    const insertBy = overlapResult.insertBy;
    const resultDirection = overlapResult.direction;
    const pointer1 = resultDirection > 0 ? 'next' : 'prev';
    const pointer2 = resultDirection > 0 ? 'prev' : 'next';
    const newItem: JitterData = {
        fixedCoord: fixedCoord,
        floatCoord: overlapResult.resultCoord,
        r: radius,
        next: null,
        prev: null,
    };
    newItem[pointer1] = insertBy[pointer1];
    newItem[pointer2] = insertBy;
    insertBy[pointer1][pointer2] = newItem;
    insertBy[pointer1] = newItem;

    return minFloat;
}

function placeJitterOnDirection(
    items: JitterData,
    fixedCoord: number,
    floatCoord: number,
    radius: number,
    jitter: number,
    margin: number,
    direction: 1 | -1
): {
    resultCoord: number;
    insertBy: JitterData;
    direction: 1 | -1;
} {
    // Check for overlap with previous items.
    let y = floatCoord;
    const pointer1 = direction > 0 ? 'next' : 'prev';
    let insertBy = items;
    let item = items[pointer1];

    while (item !== items) {
        const dx = fixedCoord - item.fixedCoord;
        const dy = y - item.floatCoord;
        const d2 = dx * dx + dy * dy;
        const r = radius + item.r + margin;
        if (d2 < r * r) {
            // Overlap. Try to move the new item along otherCoord direction.
            y = item.floatCoord + Math.sqrt(r * r - dx * dx) * direction;
            insertBy = item;

            if (Math.abs(y - floatCoord) > jitter / 2) {
                // If the new item is moved too far, then give up.
                // Fall back to random jitter.
                return {resultCoord: Number.MAX_VALUE, insertBy, direction};
            }
        }

        item = item[pointer1];
    }

    return {resultCoord: y, insertBy, direction};
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Almost Done
Development

Successfully merging this pull request may close these issues.

[Feature] add jitter option for scatter plots
3 participants