Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Function _semicircle_integral in class CrystalNN gives unwanted results #3973

Open
nisse3000 opened this issue Aug 5, 2024 · 2 comments
Open
Labels

Comments

@nisse3000
Copy link
Contributor

Python version

Python 3.11.5

Pymatgen version

2024.7.18

Operating system version

No response

Current behavior

Thanks to Dr. Wagner from MPI for Chemical Physics of Solids in Dresden I became aware of the issue that the function _semicircle_integral in class CrystalNN does not correctly calculate the normalized areas of circular segments of a unit quadrant. Dr. Wagner discovered the issue by trying to reproduce results for coordination probabilities of BCC sites that were published in Reference 1. Therefore, such sites will be used in the following to illustrate the problem.

As described in References 1 und 2, CrystalNN determines coordination probabilities by projecting weights from Voronoi neighbor finding (normalized solid angles) onto a unit quadrant in descending order. The probability of coordination number CN is then determined as the area of the quadrant segment between the weight of CN and the weight of the next larger CN in the set. For BCC, this means, for example, that the probability of a coordination number of eight (nearest neighbors) is determined as the segment between the weight of CN=8 and the weight associated with CN=14 (next-nearest neighbors). Finally, the resulting area is normalized by the area of the unit quadrant (pi/4).

The current _semicircle_integral yields a probability of p(CN=8)=0.55, whereas it should be 0.76 using the Voronoi weights of 1 for nearest neighbors and 0.36 for next-nearest neighbors. The normalized weight of the next-nearest neighbors was already published by O'Keeffe in Acta Cryst., 1979, and agrees with VoronoiNN results from pymatgen.

Using the formula from https://en.wikipedia.org/wiki/Circular_segment for the area of a circular segment, rearranging the expression under the square root to "2Rh - h**2", and then dividing the entire expression by two to account for the fact that we are aiming for a quadrant I obtain the value of 0.76 (see Figure below), which Dr. Wagner also indicated.

cn_weights_nn

When I systematically compare the results for two sets of neighbors (nearest and next-nearest neighbors) with varying weight for the next-nearest neighbors (the weight of nearest neighbors is always 1), I get the curves displayed in the Figure above. The currently implemented method thus gives consistently lower coordination probabilities than results with the quadrant area function that should be mathematically correct.


References

[1] Local structure order parameters and site fingerprints for quantification of coordination environment and crystal structure similarity; Zimmermann, Jain, RSC Advances, 10, 6063-6081, 2020

[2] Benchmarking coordination number prediction algorithms on inorganic crystal structures; Pan, Ganose, Horton, Aykol, Persson, Zimmermann, Jain, Inorganic Chemistry, 60, 1590-1603, 2021

Expected Behavior

I expected the semicircle function to calculate circular segments of a unit quadrant, which it apparently did not do correctly.

Minimal example

import math

def semicircle_integral(dist_bins, idx):
    radius = 1

    x1 = dist_bins[idx]
    x2 = dist_bins[idx + 1]

    if dist_bins[idx] == 1:
        area1 = 0.25 * math.pi * radius**2
    else:
        area1 = 0.5 * (
            (x1 * math.sqrt(radius**2 - x1**2)) + (radius**2 * math.atan(x1 / math.sqrt(radius**2 - x1**2)))
        )

    area2 = 0.5 * ((x2 * math.sqrt(radius**2 - x2**2)) + (radius**2 * math.atan(x2 / math.sqrt(radius**2 - x2**2))))

    return (area1 - area2) / (0.25 * math.pi * radius**2)



def quadrant_integral(dist_bins, idx):
    radius = 1

    x1 = dist_bins[idx]
    x2 = dist_bins[idx + 1]

    areaquarter = 0.25 * math.pi * radius**2

    area1 = areaquarter - 0.5 * (radius**2 * math.acos(1.0 - x1 / radius) - (radius - x1) * math.sqrt(2.0 * radius * x1 - x1**2))
    area2 = areaquarter - 0.5 * (radius**2 * math.acos(1.0 - x2 / radius) - (radius - x2) * math.sqrt(2.0 * radius * x2 - x2**2))

    return (area2 - area1) / areaquarter


for i in range(1, 500):
    x = 1.0 - float(i) / 500.0
    print(i, x, quadrant_integral([1, x], 0), semicircle_integral([1, x], 0))

Relevant files to reproduce this bug

No response

@janosh
Copy link
Member

janosh commented Aug 5, 2024

thanks @nisse3000 for the detailed issue and for your PR! 👍 not my area of expertise but i'll review the code and wait for @mkhorton to weigh on the discrepancy between quadrant and semicircle functions

@computron
Copy link
Member

Hi all,

Thanks for your work in this PR. Do we have a sense for whether the new function (which is now the default with no option to change to the old function as far as I can tell) gives better results than the old function? At the end of the day, users will just want to use the scheme that provides more accurate results, especially sine the scheme is more or less just invented without much theoretical basis. Since I believe we benchmarked the coordination scheme using the old (and "incorrect") function, I fear that users may actually be less satisfied with the more correct function.

@nisse3000 do you have a sense for how the "reasonableness" of the coordination numbers produced by CrystalNN change with this update?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants