Merge pull request #2 from kyegomez/master

Catching up 202312291935
kyegomez · Dec 30, 2023 · 0498d1e · 0498d1e
2 parents df6172b + 28ed38a
commit 0498d1e
Show file tree

Hide file tree

Showing 48 changed files with 4,090 additions and 50 deletions.
diff --git a/docs/zeta/ops/_matrix_inverse_root_newton.md b/docs/zeta/ops/_matrix_inverse_root_newton.md
@@ -0,0 +1,109 @@
+# _matrix_inverse_root_newton
+
+
+Inverse square root of a matrix is a vital operation in various fields such as computer graphics, machine learning, and numerical analysis. The `_matrix_inverse_root_newton` method in `zeta.ops` provides an efficient way to calculate the inverse root of a matrix, which is crucial in techniques like whitening transformations, principal component analysis (PCA), and more.
+
+### Purpose and Importance
+
+The Newton iteration method used for matrix inverse root is highly valued for its convergence properties. It can ensure precise outcomes while requiring fewer iterations compared to more direct numerical methods. Using this method, `_matrix_inverse_root_newton` computes a matrix that, when raised to a given power, results in the original matrix's inverse square root. This is instrumental in algorithms that require matrix normalization steps for stability and convergence.
+
+### Architecture and Class Design
+
+The `_matrix_inverse_root_newton` function does not belong to a class; it is a standalone method. It leverages PyTorch tensors for GPU acceleration and takes advantage of batch operations in the PyTorch library, ensuring compatibility with the overall PyTorch ecosystem.
+
+## Function Definition
+
+The `_matrix_inverse_root_newton` function is formulated as follows:
+
+```python
+def _matrix_inverse_root_newton(
+ A,
+ root: int,
+ epsilon: float = 0.0,
+ max_iterations: int = 1000,
+ tolerance: float = 1e-6,
+) -> Tuple[Tensor, Tensor, NewtonConvergenceFlag, int, Tensor]:
+ ...
+```
+
+### Parameters and Returns
+
+| Argument | Type | Default Value | Description |
+|------------------|----------|---------------|--------------------------------------------------------------------------------|
+| `A` | Tensor | None | The input matrix of interest. |
+| `root` | int | None | The required root. Typically, for an inverse square root, this would be 2. |
+| `epsilon` | float | 0.0 | Regularization term added to the matrix before computation. |
+| `max_iterations` | int | 1000 | Maximum number of iterations allowed for the algorithm. |
+| `tolerance` | float | 1e-6 | Convergence criterion based on the error between iterations. |
+
+#### Returns:
+
+| Returns | Type | Description |
+|-----------------------|--------------------------|-------------------------------------------------|
+| `A_root` | Tensor | The inverse root of the input matrix `A`. |
+| `M` | Tensor | The matrix after the final iteration. |
+| `termination_flag` | NewtonConvergenceFlag | Convergence flag indicating the result status. |
+| `iteration` | int | Number of iterations performed. |
+| `error` | Tensor | The final error between `M` and the identity. |
+
+### Usage and Examples
+
+#### Example 1: Basic Usage
+
+```python
+import torch
+from zeta.ops import _matrix_inverse_root_newton
+
+# Defining the input matrix A
+A = torch.randn(3, 3)
+A = A @ A.T # Making A symmetric positive-definite
+
+# Computing the inverse square root of A
+A_root, M, flag, iters, err = _matrix_inverse_root_newton(A, root=2)
+```
+
+#### Example 2: Custom Tolerance and Iterations
+
+```python
+import torch
+from zeta.ops import _matrix_inverse_root_newton
+
+# Defining the input matrix A
+A = torch.randn(5, 5)
+A = A @ A.T # Making A symmetric positive-definite
+
+# Computing the inverse square root with custom tolerance and max_iterations
+A_root, M, flag, iters, err = _matrix_inverse_root_newton(A, root=2, epsilon=0.001, max_iterations=500, tolerance=1e-8)
+```
+
+#### Example 3: Handling Outputs and Convergence
+
+```python
+import torch
+from zeta.ops import _matrix_inverse_root_newton, NewtonConvergenceFlag
+
+# Defining the input matrix A
+A = torch.randn(4, 4)
+A = A @ A.T # Making A symmetric positive-definite
+
+# Computing the inverse square root and handling convergence
+A_root, M, flag, iters, err = _matrix_inverse_root_newton(A, root=2)
+
+# Check if the iteration has converged
+if flag == NewtonConvergenceFlag.CONVERGED:
+ print(f"Converged in {iters} iterations with an error of {err}")
+else:
+ print("Reached maximum iterations without convergence")
+```
+
+## Explanation of the Algorithm
+
+The `_matrix_inverse_root_newton` function calculates the inverse root of a matrix using an iterative Newton's method. The key concept behind the operation is to generate a sequence of matrices that progressively approach the inverse root of the given matrix. Training deep neural networks often involves numerous matrix operations such as multiplications, inversions, and factorizations. Efficient and stable computation of these operations is essential for achieving good performance and ensuring numerical stability.
+
+After initializing matrices and parameters, the function enters an iterative block which runs until the convergence criteria are met or the maximum number of iterations is reached. In each iteration, the function updates the estimate of the matrix's inverse root and checks the error to decide whether to continue the iterations further.
+
+## Additional Information and Tips
+
+- Regularization `epsilon`: Advantageous in preventing numerical issues when the matrix `A` is close to singular or ill-conditioned.
+- Convergence: The parameters `max_iterations` and `tolerance` are crucial in achieving convergence. It might be necessary to adjust these values depending on your specific problem and matrix properties.
+
diff --git a/docs/zeta/ops/_matrix_root_eigen.md b/docs/zeta/ops/_matrix_root_eigen.md
@@ -0,0 +1,117 @@
+# _matrix_root_eigen
+
+
+The principal function within the zeta.ops library is `_matrix_root_eigen`, which computes the (inverse) root of a given symmetric positive (semi-)definite matrix using eigendecomposition. The computation is based on the relation `A = Q * L * Q^T`, where `A` is the initial matrix, `Q` is a matrix of eigenvectors, and `L` is a diagonal matrix with eigenvalues. This function is particularly useful in applications such as signal processing, quantum mechanics, and machine learning, where matrix root computations are often required.
+
+
+The `_matrix_root_eigen` function is the cornerstone of the zeta.ops library. Its purpose is to calculate the root or inverse root of a matrix by decomposing it into its eigenvectors and eigenvalues, modifying the eigenvalues as per the desired operation (root or inverse root), and then reconstructing the matrix.
+
+## Architecture of `_matrix_root_eigen`
+
+The `_matrix_root_eigen` function is built upon PyTorch's linear algebra capabilities and follows a clear sequence of steps:
+
+1. Verify if the root is a positive integer.
+2. Calculate the power to which the eigenvalues need to be raised (`alpha`).
+3. Perform eigendecomposition on the input matrix `A`.
+4. Modify the eigenvalues to ensure they are positive if the `make_positive_semidefinite` flag is set.
+5. Add a small `epsilon` value if necessary to ensure numerical stability.
+6. Compute the (inverse) root matrix using the modified eigenvalues and the eigenvectors.
+
+This architecture ensures that even matrices that might have numerical stability issues or slightly negative eigenvalues due to floating-point errors can be handled gracefully.
+
+## `_matrix_root_eigen`: Method Signature
+
+Below is the method signature for the `_matrix_root_eigen` function, alongside an explanation of its arguments and returned values:
+
+| Argument | Type | Default Value | Description |
+|----------------------------|-----------|-----------------------|-------------------------------------------------------------------------------------|
+| A | Tensor | Required | The square matrix of interest. |
+| root | int | Required | The root of interest, which should be a natural number. |
+| epsilon | float | 0.0 | A small value added to the matrix to avoid numerical instability. |
+| inverse | bool | True | If set to True, the function returns the inverse root matrix; otherwise, the root. |
+| exponent_multiplier | float | 1.0 | A multiplier applied to the eigenvalue exponent in the root calculation. |
+| make_positive_semidefinite | bool | True | Perturbs eigenvalues to ensure the matrix is positive semi-definite. |
+| retry_double_precision | bool | True | Retries eigendecomposition with higher precision if initial attempt fails. |
+
+Returns:
+
+| Returned Value | Type | Description |
+|----------------|---------|-------------------------------------------------------------------------------------|
+| X | Tensor | The computed (inverse) root of matrix A. |
+| L | Tensor | Eigenvalues of matrix A. |
+| Q | Tensor | Orthogonal matrix consisting of eigenvectors of matrix A. |
+
+## Usage Examples
+
+In the following sections, we'll look at three different ways to use the `_matrix_root_eigen` function from the zeta.ops library, along with the required imports and full example code.
+
+### Example 1: Basic Matrix Root Calculation
+
+In this example, we'll calculate the square root of a 2x2 symmetric positive definite matrix.
+
+```python
+import torch
+from zeta.ops import _matrix_root_eigen
+
+# Define a 2x2 symmetric positive definite matrix
+A = torch.tensor([[2.0, 1.0], [1.0, 2.0]])
+
+# Calculate the square root of the matrix
+X, L, Q = _matrix_root_eigen(A, root=2)
+
+print("Matrix A:\n", A)
+print("Square Root of A:\n", X)
+```
+
+### Example 2: Matrix Inverse Root with Epsilon Perturbation
+
+In this example, an `epsilon` perturbation is added for numerical stability, and the inverse square root is calculated.
+
+```python
+import torch
+from zeta.ops import _matrix_root_eigen
+
+# Define a 3x3 symmetric positive definite matrix
+A = torch.tensor([[4.0, 2.0, 0.0], [2.0, 4.0, 1.0], [0.0, 1.0, 3.0]])
+
+# Calculate the inverse square root of the matrix, adding epsilon for stability
+X, L, Q = _matrix_root_eigen(A, root=2, epsilon=1e-5, inverse=True)
+
+print("Matrix A:\n", A)
+print("Inverse Square Root of A with Epsilon:\n", X)
+```
+
+### Example 3: High-Precision Calculation with Positive Semi-Definite Guarantee
+
+This example demonstrates a more robust usage where the calculation is attempted in high precision, and the function ensures the matrix is positive semi-definite before computing its root.
+
+```python
+import torch
+from zeta.ops import _matrix_root_eigen
+
+# Define a 3x3 symmetric positive semi-definite matrix with potential numerical issues
+A = torch.tensor([[1e-5, 0.0, 0.0], [0.0, 5.0, 4.0], [0.0, 4.0, 5.0]])
+
+# Calculate the square root, ensuring positive semi-definiteness and retrying in double precision if needed
+X, L, Q = _matrix_root_eigen(A, root=2, make_positive_semidefinite=True, retry_double_precision=True)
+
+print("Matrix A:\n", A)
+print("Square Root with Positive Semi-Definite Guarantee:\n", X)
+```
+
+## Additional Remarks
+
+When using the `_matrix_root_eigen` function, keep in mind that it assumes the input matrix `A` is symmetric. If the matrix is not symmetric, the results will not be valid. Also, use caution when setting the `epsilon` value to ensure that it does not distort the accurate computation of the matrix root more than necessary for numerical stability.
+
+## Conclusion
+
+The zeta.ops library, specifically the `_matrix_root_eigen` function, is a powerful tool for scientific computation, providing advanced functionality for matrix root operations using eigendecomposition. By understanding the parameters and utilizing the provided examples, users can effectively leverage this functionality for their research or computational needs.
+
+## References and Further Reading
+
+To learn more about the mathematical operations used in this library, consult the following resources:
+
+- "Numerical Linear Algebra" by Lloyd N. Trefethen and David Bau, III.
+- "Matrix Analysis" by Rajendra Bhatia.
+- PyTorch Documentation: https://pytorch.org/docs/stable/index.html
+
diff --git a/docs/zeta/ops/channel_shuffle_new.md b/docs/zeta/ops/channel_shuffle_new.md
@@ -0,0 +1,94 @@
+# channel_shuffle_new
+
+
+The `channel_shuffle_new` function is a utility within the `zeta.ops` library designed to rearrange the channels of a 4D tensor that typically represents a batch of images with multiple channels. This operation is particularly useful in the context of neural networks that handle convolutional layers, where shuffling channels can allow for better cross-channel information flow and model regularization.
+
+Channel shuffling is an operation commonly used in ShuffleNet architectures, which are efficient convolutional neural network architectures designed for mobile and computational resource-limited environments. By strategically shuffling channels, these architectures can maintain information flow between convolutional layer groups while reducing computational complexity.
+
+## `channel_shuffle_new` Function Definition
+
+Here is a breakdown of the `channel_shuffle_new` function parameters:
+
+| Parameter | Type | Description |
+|-----------|------------|----------------------------------------------------------------------------------------------------------|
+| `x` | Tensor | The input tensor with shape `(b, c, h, w)` where `b` is the batch size, `c` is the number of channels, `h` is the height, and `w` is the width. |
+| `groups` | int | The number of groups to divide the channels into for shuffling. |
+
+## Functionality and Usage
+
+The function `channel_shuffle_new` works by reorganizing the input tensor's channels. Specifically, given an input tensor `x` with a certain number of channels, the channels are divided into `groups`, and the channels' order within each group is shuffled.
+
+The rearrangement pattern `"b (c1 c2) h w -> b (c2 c1) h w"` indicates that `x` is reshaped such that:
+
+- `b` remains the batch size,
+- `c1` and `c2` are dimensions used to split the original channel dimension, with `c1` corresponding to the number of groups (`groups` parameter) and `c2` being the quotient of the original channels divided by the number of groups,
+- `h` and `w` remain the height and width of the image tensor, respectively.
+
+Here, `rearrange` is assumed to be a function (such as the one from the `einops` library) that allows advanced tensor manipulation using pattern strings.
+
+### Examples
+
+#### Example 1: Shuffle Channels in a 3-Channel Image
+
+This basic usage example demonstrates how to use `channel_shuffle_new` for a single image with 3 RGB channels.
+
+```python
+import torch
+from einops import rearrange
+from zeta.ops import channel_shuffle_new
+
+
+# Create a sample tensor to represent a single RGB image (batch size = 1)
+x = torch.randn(1, 3, 64, 64) # Shape (b=1, c=3, h=64, w=64)
+
+# Shuffle the channels with groups set to 1 (no actual shuffle since it equals the number of channels)
+shuffled_x = channel_shuffle_new(x, groups=1)
+```
+
+This example did not produce an actual shuffle since the number of groups is equal to the number of channels.
+
+#### Example 2: Shuffle Channels for a Batch of Images with 4 Channels
+
+In this example, we shuffle the channels of a batch of images with 4 channels each, into 2 groups.
+
+```python
+import torch
+from einops import rearrange
+from zeta.ops import channel_shuffle_new
+
+# Create a sample tensor to represent a batch of images with 4 channels each
+x = torch.randn(20, 4, 64, 64) # Shape (b=20, c=4, h=64, w=64)
+
+# Shuffle the channels with groups set to 2
+shuffled_x = channel_shuffle_new(x, groups=2)
+# The channels are now shuffled within two groups
+```
+
+#### Example 3: Shuffle Channels for a Large Batch of High-Channel Images
+
+For a more complex scenario, we shuffle the channels of a large batch of images with 32 channels, using 8 groups.
+
+```python
+import torch
+from einops import rearrange
+from zeta.ops import channel_shuffle_new
+
+
+# Create a sample tensor to represent a large batch of high-channel images
+x = torch.randn(50, 32, 128, 128) # Shape (b=50, c=32, h=128, w=128)
+
+# Shuffle the channels with groups set to 8
+shuffled_x = channel_shuffle_new(x, groups=8)
+# The channels are now shuffled within eight groups
+```
+
+## Additional Information and Tips
+
+- The number of groups (`groups`) must be a divisor of the number of channels in the input tensor `x`. If it is not, the operation will cause an error due to the mismatch in tensor shapes.
+- Channel shuffling can lead to performance improvements in certain network architectures, but it should be used thoughtfully. It might not always yield benefits and could lead to loss of information if not used correctly.
+- The `einops` library provides powerful tensor manipulation features that can be combined with PyTorch for flexible operations like channel shuffling.
+
+## References
+
+- "ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices." Ma, Ningning, et al. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
+- `einops` documentation: [EinOps - flexible and powerful tensor operations for readable and reliable code](https://einops.rocks/)