Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Find Distance at Which Value Changes by X%? #125

Open
patrickcgray opened this issue Feb 23, 2020 · 5 comments
Open

Find Distance at Which Value Changes by X%? #125

patrickcgray opened this issue Feb 23, 2020 · 5 comments

Comments

@patrickcgray
Copy link

Hello, I've been really impressed with the pysal library and especially the documentation I've found in the ongoing Geographic Data Science Book: https://geographicdata.science/book/.

But I have a question that hasn't been addressed in the resources I've found. I have a set of polygons and each polygon has a continuous value from 0-1. I'm looking to find the average distance at which this value typically changes by a certain amount, let's say 10%. Is there a clean way to do this in pysal? If pysal doesn't have this functionality is there a name for this type of problem so that I research it further?

Basically what I'm hoping to do is say, even if I don't have data in a given location, how far away can a polygon with data be for me to still count it as the value for a different location.

@sjsrey
Copy link
Member

sjsrey commented Feb 23, 2020

Thanks for the kind words on PySAL.

On your question, I think there are a couple of things to explore. If you generate a representative point for each polygon [1], those can be used to measure distances between each pair of polygons. With the pairwise distances in hand you then can measure the pairwise differences in the attribute values and plot the differences in function of the distances.

I think one thing to keep in mind is that the spatial support you are starting with are areal units in which case the attribute value applies to the unit as a whole. In other words, your attribute values are not continuous in space so the above approach of plotting differences in function of distance might give the misguided impression that the attributes are measured in continuous space. With that qualification, I think the approach above might get you started towards your query.

1: http://darribas.org/gds_scipy16/ipynb_md/01_data_processing.html

@ljwolf
Copy link
Member

ljwolf commented Mar 2, 2020

This pattern, computing a function centered at i reducing some d-neighbourhood around i (possibly conditional on the smaller area between i and some smaller d-neighbourhood) is a really common problem, but we don't have something directly for it....

Maybe we should have something like that? I did this here, but never contributed it back up. For the percentage change, you'd get a different curve for each observation?

@sjsrey
Copy link
Member

sjsrey commented Jul 3, 2020

Relocating to esda

@sjsrey sjsrey transferred this issue from pysal/pysal Jul 3, 2020
@ljwolf
Copy link
Member

ljwolf commented Jul 19, 2023

Reflecting on this problem, I think there may be some data structures we can adapt to provide this functionality easily. I wonder, either, if the conga line could be adapted here. The basic implementation would be something like:

def spatial_cumulant(locations, values, threshold, return_values=True):
    """
    For each location, find the distance `d` 
    at which the sum of `values` at `locations` 
    closer than `d` exceeds `threshold`. 
    """
    tree = build_incremental_knn_structure(locations)
    complete = numpy.zeros_like(values).astype(bool)
    k_vector = numpy.zeros_like(values).astype(int)
    cumulant = numpy.zeros_like(values).astype(values.dtype)
    while not complete.all():
        d_near, ix_near = tree.next_nearest(locations, k=k_vector)
        cumulant += values[ix_near] 
        complete |= cumulant > threshold
        k_vector += 1 - complete.astype(int)
    if return_values: 
        return d_near, cumulant
    return d_near 

where the next_nearest() function returns the distance and index to just the k_vectorth neighbour. If a cached_query() function were available that returned all k efficiently (i.e. without re-doing the work for all k-1), we could then implement arbitrary reductions (like, numpy.median()), rather than just allowing for sum.

So... with a suitable data structure, this should be very easy to do!

@ljwolf
Copy link
Member

ljwolf commented Jul 19, 2023

Such a data structure is in cgal, which includes swig bindings that cover incremental spatial search, but I don't see these wrapped by scikit-geometry yet...

@ljwolf ljwolf mentioned this issue Jul 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants