Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Katz_FD #36

Open
wants to merge 13 commits into
base: master
Choose a base branch
from
Open

Conversation

PiethonProgram
Copy link

Fixed Katz_FD implementation by utilizing Euclidean distances. Also modified test cases to include tests for new method.

@PiethonProgram PiethonProgram marked this pull request as ready for review September 29, 2024 03:42
@raphaelvallat raphaelvallat self-assigned this Sep 30, 2024
@raphaelvallat raphaelvallat added the enhancement New feature or request label Sep 30, 2024
@raphaelvallat
Copy link
Owner

Thank you @PiethonProgram — I'll aim to review the PR later this week. In the meantime, please make sure that the CI tests and lint tests are all passing.

@PiethonProgram
Copy link
Author

Applied formatting changes and should pass lint cases.

Changed self.assertEqual(np.round(katz_fd(x_k), 3), VALUE) from VALUE = 5.783 to 1.503 to reflect formula change
Please double check calculations ^

@raphaelvallat raphaelvallat linked an issue Oct 4, 2024 that may be closed by this pull request
Comment on lines 188 to 214
# euclidian distance calculation
euclidean_distance = np.sqrt(1 + np.square(np.diff(x, axis=axis)))

# total and average path lengths
total_path_length = euclidean_distance.sum(axis=axis)
average_path_length = euclidean_distance.mean(axis=axis)

# max distance from first to all
horizontal_diffs = np.arange(1, x.shape[axis])
vertical_diffs = np.take(x, indices=np.arange(1, x.shape[axis]), axis=axis) - np.take(
x, indices=[0], axis=axis
)

if axis == 1: # reshape if needed
horizontal_diffs = horizontal_diffs.reshape(1, -1)
elif axis == 0:
horizontal_diffs = horizontal_diffs.reshape(-1, 1)

# Euclidean distance and max distance
distances = np.sqrt(np.square(horizontal_diffs) + np.square(vertical_diffs))
max_distance = np.max(distances, axis=axis)

# Katz Fractal Dimension Calculation
full_distance = np.log10(total_path_length / average_path_length)
kfd = np.squeeze(full_distance / (full_distance + np.log10(max_distance / total_path_length)))

# ensure scalar output
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @PiethonProgram,

I think that this implementation can be simplified, for example by following the proposed new implementation in: #34, or by leveraging the Neurokit2 implementation (which as of present gives the same output as Antropy): https://github.com/neuropsychology/NeuroKit/blob/45c9ad90d863ebf4e9d043b975a10d9f8fdeb06b/neurokit2/complexity/fractal_katz.py#L6

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, I will make the adjustments.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello, I have taken a look at the implementations you mentioned. Are you sure it can be simplified? Previous implementations that you mentioned are shorter since they are all single-channel.
If you want to only offer single-channel feature extraction then I can make the changes, but otherwise, unless you want to try and decrease time using Numba, I don't think there is much I can "simplify."

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point about the support for ND arrays. I have not yet found the time to do a deep dive into this, but can we just take the existing implementation of Antropy (see below) and replace the distance calculation by the Euclidean distance?

dists = np.abs(np.diff(x, axis=axis))
ll = dists.sum(axis=axis)
ln = np.log10(ll / dists.mean(axis=axis))
aux_d = x - np.take(x, indices=[0], axis=axis)
d = np.max(np.abs(aux_d), axis=axis)
kfd = np.squeeze(ln / (ln + np.log10(d / ll)))

or is there more to your implementation that I'm missing?

Thank you

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the essence of the code is the same, but some additional "bits" are needed when using Euclidean distance in n-dimensions in order to check for distances from one to the other.
If we were only speaking in 1-dimension, then yes, we can simply just replace the distance calculation line.

@PiethonProgram
Copy link
Author

Should I be concerned with the unsuccessful checks :
Python tests / build (macos-latest, 3.8) (pull_request)
Python tests / build (macos-latest, 3.9) (pull_request)
Python tests / build (macos-latest, 3.10) (pull_request)

It seems the issue is related to GitHub versions, and not the code itself.

@raphaelvallat
Copy link
Owner

Yeah don't worry about the CI failures, I need to make some upgrade to the GitHub Actions workflow. Thanks!

@PiethonProgram
Copy link
Author

Reformatted and "simplified" code.

Note : Black formatting caused line 208 in fractal.py to expand into 7 lines (likely due to nested parenthesis restrictions) <= No impact on functions, just aesthetics.

@raphaelvallat
Copy link
Owner

Hey,

Thanks again for the implementation and the PR.

  1. I was looking at the output of the proposed implementation, and it seems that, regardless of the complexity of the input time-series, the output values are always between 1 and 1.01. The original paper described that the value should be ~1.5 for random signals:

Under this formulation, fractal dimensions range from D = 1.0, for strainght lines through approximately D = 1.15 for random-walk waveforms, to D approaching 1.5 for the most convoluted waveforms.

import stochastic.processes.noise as sn
rng = np.random.default_rng(seed=42)

X = np.vstack([
    np.arange(1000),
    np.sin(2 * np.pi * 1 * np.arange(1000) / 100),
    sn.FractionalGaussianNoise(hurst=0.1, rng=rng).sample(1000),
    sn.FractionalGaussianNoise(hurst=0.9, rng=rng).sample(1000),
    rng.random(1000),
    rng.random(1000)]
)
katz_new(X)
array([1.        , 1.00014255, 1.03727155, 1.00000014, 1.01079616, 1.01072551])
  1. I also have found this paper which uses the same definition as the current implementation in Antropy:

fractalfract-08-00009.pdf

image
  1. I also took a stab at improving the current implementation based on the neurokit2 implementation:
def katz(x):
    # Define total length of curve
    dists = np.abs(np.diff(x, axis=axis))
    length = np.sum(dists, axis=axis)
    
    # Average distance between successive points
    a = np.mean(dists, axis=axis)
    
    # Compute farthest distance between starting point and any other point
    d = np.max(np.abs(x.T - x[..., 0]).T, axis=axis)
    return np.log10(length / a) / (np.log10(d / a))

@PiethonProgram
Copy link
Author

Hi @raphaelvallat , thanks for the catch.

  1. I will look into it for possible issues
  2. I believe the implementation in the paper refers to single dimension time-series
  3. will build and test on this. Solid implementation

@PiethonProgram
Copy link
Author

PiethonProgram commented Nov 9, 2024

Hi @raphaelvallat ,

  1. I wasn’t able to replicate the exact output from your implementation, but I’ve adjusted it to ensure it functions as expected. It now produces consistent results.
  2. It appears that the approach described in the report differs from our current implementation, though both seem to yield the same output when tested.

Please let me know your thoughts.

Copy link
Owner

@raphaelvallat raphaelvallat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for looking into this! Two very minor comments but otherwise I think we are good to merge.

To clarify for others: this PR is just a simplification of the existing implementation. It does not update the distance calculation to use Euclidean distance, as originally discussed in #34

@@ -181,17 +182,22 @@ def katz_fd(x, axis=-1):
>>> x = np.arange(1000)
>>> print(f"{ant.katz_fd(x):.4f}")
1.0000
euclidean_distance = np.sqrt(1 + np.square(np.diff(x, axis=axis)))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove extra line

a = np.mean(dists, axis=axis)

# Compute the farthest distance between starting point and any other point
# d = np.max(np.abs(x.T - x[..., 0]).T, axis=axis)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feel free to remove this commented line

@PiethonProgram
Copy link
Author

Hi, the changes have been made. Please let me know if there is anything else, and thank you for your cooperation as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Calculation in katz_fd seems to be incorrect
2 participants