You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the chapter Point Pattern Analysis, you suggest using seaborn.kdeplot to do KDE over a point pattern based on coordinates. However, the result is wrong. Only because the bounding box of the dataframe is roughly square, the issue is not noticeable, but as soon as the dataframe becomes more elongated, the KDE gets stretched along the longer dimension.
The alternative is to use sklearn's implementation of KernelDensity instead, but that means building the whole thing yourself, instead of a single call to seaborn.
I have noticed this reported on mastodon and figured out what is happening in the following thread. The issue has been reported to seaborn (mwaskom/seaborn#3472), but it boils down to the behavior of scipy's gaussian_kde, which estimates the bandwidth independently for each variable, which is wrong in our case where the bandwidth linked to a euclidean distance between the points.
I think that until there is a change in scipy that is exposed in seaborn, we should not use or suggest using seaborn.kdeplot on point patterns.
The text was updated successfully, but these errors were encountered:
In the chapter Point Pattern Analysis, you suggest using
seaborn.kdeplot
to do KDE over a point pattern based on coordinates. However, the result is wrong. Only because the bounding box of the dataframe is roughly square, the issue is not noticeable, but as soon as the dataframe becomes more elongated, the KDE gets stretched along the longer dimension.The alternative is to use sklearn's implementation of KernelDensity instead, but that means building the whole thing yourself, instead of a single call to seaborn.
I have noticed this reported on mastodon and figured out what is happening in the following thread. The issue has been reported to seaborn (mwaskom/seaborn#3472), but it boils down to the behavior of scipy's
gaussian_kde
, which estimates the bandwidth independently for each variable, which is wrong in our case where the bandwidth linked to a euclidean distance between the points.I think that until there is a change in scipy that is exposed in seaborn, we should not use or suggest using
seaborn.kdeplot
on point patterns.The text was updated successfully, but these errors were encountered: