Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Synthetic data smoke test. #75

Merged
merged 29 commits into from
Nov 1, 2024
Merged

Synthetic data smoke test. #75

merged 29 commits into from
Nov 1, 2024

Conversation

jf514
Copy link
Contributor

@jf514 jf514 commented Oct 25, 2024

#80

  • Commit minimal implementation of synthetic generation model (mjcf + yamls + sample data).
  • NOTE: Does not generate data as yet.
  • Added single frame of synthetic data NOT generated by this model for sake of utility as smoke test.
  • Add it to CI for smoke test. (Runs in 20 seconds.)
  • Add ROOT_OPTIMIZATION_KEYPOINT to model yaml files to alleviate necessity of code changes per model.
  • Update all model yamls.
  • Updated some comments.

Work is to be continued in #81.

jf514 and others added 6 commits October 8, 2024 16:25
* fix offset shape when saving

* update demo config

* linter

* fix demo.yaml

* update configs

* update config test

---------

Co-authored-by: Charles Zhang <[email protected]>
* run stac experiment

* Fixed yaml

* Test fail with bad input.

* Should fail.

* Corrected input - test should pass.

* Update demo.yaml - enable ik_only()

* Revert update demo.yaml
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 18

🧹 Outside diff range and nitpick comments (11)
configs/stac/stac_fly_treadmill.yaml (1)

Line range hint 9-9: Consider parameterizing GPU selection.

The GPU selection is hardcoded to '1'. Consider making this configurable through environment variables or command-line arguments for better flexibility across different environments.

-gpu: '1'
+gpu: ${GPU_ID:-1}  # Default to GPU 1 if not specified
configs/stac/stac_fly_tethered.yaml (1)

Line range hint 3-7: Consider adding data source documentation.

The comments about unpublished data and potential sources are helpful. Consider moving this documentation to a more permanent location (e.g., README.md or CONTRIBUTING.md) to make it more discoverable.

Would you like me to help draft the documentation for the data sources?

🧰 Tools
🪛 yamllint

[error] 5-5: trailing spaces

(trailing-spaces)

tests/unit/test_main.py (1)

Line range hint 8-22: Consider enhancing test coverage for synthetic model configurations.

Given that this PR introduces synthetic model support (changing from 'rodent' to 'synth'), consider adding additional assertions to validate synthetic model-specific configuration values.

Example additions:

 def test_load_configs(config):
     # Check that utils.params is not defined before loading
     with pytest.raises(AttributeError):
         io.params

     # Call the function
     cfg = main.load_configs(config)

     # Assert that the configs are the correct type
     assert isinstance(cfg, DictConfig)

     # Assert that the resulting configs contain the expected data
     assert cfg.stac.fit_offsets_path == "fit.p"
     assert cfg.model.N_FRAMES_PER_CLIP == 360
+    # Validate synthetic model specific configs
+    assert cfg.model.name == "synth"
+    assert hasattr(cfg.model, "synthetic_params")
configs/model/synth.yaml (2)

45-56: Consolidate duplicate documentation.

The documentation for SITES_TO_REGULARIZE and M_REG_COEF is duplicated and fragmented. Consider consolidating these related configurations and their documentation in a single section.

# If you have reason to believe that the initial offsets are correct for particular keypoints, 
# you can regularize those sites using M_REG_COEF. 
SITES_TO_REGULARIZE: 
  - part_0

RENDER_FPS: 200

N_SAMPLE_FRAMES: 1

-# If you have reason to believe that the initial offsets are correct for particular keypoints, 
-# you can regularize those sites using _SITES_TO_REGULARIZE. 
M_REG_COEF: 1
🧰 Tools
🪛 yamllint

[error] 45-45: trailing spaces

(trailing-spaces)


[error] 46-46: trailing spaces

(trailing-spaces)


[error] 47-47: trailing spaces

(trailing-spaces)


[error] 54-54: trailing spaces

(trailing-spaces)


[error] 55-55: trailing spaces

(trailing-spaces)


1-58: Fix YAML formatting issues.

The YAML file contains multiple trailing spaces that should be removed for consistency.

Affected lines: 7, 13, 16, 29, 30, 36, 45, 46, 47, 54, 55

🧰 Tools
🪛 yamllint

[warning] 1-1: too many blank lines

(1 > 0) (empty-lines)


[error] 7-7: trailing spaces

(trailing-spaces)


[error] 13-13: trailing spaces

(trailing-spaces)


[error] 16-16: trailing spaces

(trailing-spaces)


[error] 29-29: trailing spaces

(trailing-spaces)


[error] 30-30: trailing spaces

(trailing-spaces)


[error] 36-36: trailing spaces

(trailing-spaces)


[error] 45-45: trailing spaces

(trailing-spaces)


[error] 46-46: trailing spaces

(trailing-spaces)


[error] 47-47: trailing spaces

(trailing-spaces)


[error] 54-54: trailing spaces

(trailing-spaces)


[error] 55-55: trailing spaces

(trailing-spaces)

.github/workflows/ci.yml (1)

103-107: Fix YAML formatting issues.

There are two minor formatting issues:

  1. Line 103 contains trailing spaces
  2. File is missing a newline at the end

Apply this diff to fix the formatting:

-      
+
       # Test probably delete this
       - name: Smoke Test
         shell: bash -l {0}
-        run: python run_stac.py
\ No newline at end of file
+        run: python run_stac.py
+
🧰 Tools
🪛 yamllint

[error] 103-103: trailing spaces

(trailing-spaces)


[error] 107-107: no new line character at the end of file

(new-line-at-end-of-file)

stac_mjx/stac.py (1)

393-393: LGTM! Consider adding shape validation.

The reshape operation correctly ensures that offsets are in a consistent 2D array format with 3 columns for x,y,z coordinates.

Consider adding a validation check to ensure self._offsets length is a multiple of 3:

+            if len(self._offsets) % 3 != 0:
+                raise ValueError(f"Expected offsets length to be multiple of 3, got {len(self._offsets)}")
             offsets = self._offsets.reshape((-1, 3))
demos/create_synth_data.ipynb (1)

40-91: Remove commented out model definition.

Instead of keeping the old model definition in comments, consider using version control to track changes. This will improve code readability while maintaining history.

Remove lines 40-65 and document the model changes in the commit message or notebook markdown.

Mat-to-Nwb-Synth-Data.ipynb (3)

41-43: Consider specifying dependencies outside the notebook

Installing packages within the notebook using !pip install is suitable for quick tests, but for reproducibility and environment management, consider specifying dependencies in a requirements.txt file or managing them through a conda environment.


69-70: Ensure data availability and accessibility

Downloading data using gdown from Google Drive is convenient, but it may lead to issues if the link changes or becomes inaccessible. Consider hosting the data in a more stable location or providing instructions for data acquisition to enhance the robustness of the notebook.


278-346: Complete the TODOs in the code

There are TODO comments indicating that the actual node names need to be replaced and edges added for visualization. Completing these TODOs will enhance the clarity and functionality of the NWB file.

Would you like assistance in implementing the actual node names and adding edges for visualization? I can help generate the code or open a GitHub issue to track these tasks.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 11819ee and 7ed9f25.

📒 Files selected for processing (16)
  • .github/workflows/ci.yml (1 hunks)
  • Mat-to-Nwb-Synth-Data.ipynb (1 hunks)
  • configs/config.yaml (1 hunks)
  • configs/model/synth.yaml (1 hunks)
  • configs/stac/demo.yaml (1 hunks)
  • configs/stac/stac.yaml (1 hunks)
  • configs/stac/stac_fly_tethered.yaml (1 hunks)
  • configs/stac/stac_fly_treadmill.yaml (1 hunks)
  • configs/stac/stac_mouse.yaml (1 hunks)
  • demos/create_synth_data.ipynb (1 hunks)
  • models/synth_model.xml (1 hunks)
  • stac_mjx/io.py (1 hunks)
  • stac_mjx/main.py (3 hunks)
  • stac_mjx/stac.py (1 hunks)
  • tests/configs/stac/test_stac.yaml (1 hunks)
  • tests/unit/test_main.py (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • models/synth_model.xml
🧰 Additional context used
🪛 yamllint
.github/workflows/ci.yml

[error] 103-103: trailing spaces

(trailing-spaces)


[error] 107-107: no new line character at the end of file

(new-line-at-end-of-file)

configs/model/synth.yaml

[warning] 1-1: too many blank lines

(1 > 0) (empty-lines)


[error] 7-7: trailing spaces

(trailing-spaces)


[error] 13-13: trailing spaces

(trailing-spaces)


[error] 16-16: trailing spaces

(trailing-spaces)


[error] 29-29: trailing spaces

(trailing-spaces)


[error] 30-30: trailing spaces

(trailing-spaces)


[error] 36-36: trailing spaces

(trailing-spaces)


[error] 45-45: trailing spaces

(trailing-spaces)


[error] 46-46: trailing spaces

(trailing-spaces)


[error] 47-47: trailing spaces

(trailing-spaces)


[error] 54-54: trailing spaces

(trailing-spaces)


[error] 55-55: trailing spaces

(trailing-spaces)

🪛 Gitleaks
demos/create_synth_data.ipynb

124-124: Identified a pattern that may indicate AWS credentials, risking unauthorized cloud resource access and data breaches on AWS platforms.

(aws-access-token)

🔇 Additional comments (19)
configs/config.yaml (1)

3-3: LGTM! Verify synthetic model config.

The change from rodent to synth model aligns with the PR's objective of implementing synthetic E2E testing.

Let's verify the existence of the synthetic model config:

✅ Verification successful

Synthetic model config verified successfully

The synth.yaml configuration file exists at the expected location configs/model/synth.yaml, confirming proper setup for the synthetic model change.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify that the synthetic model config exists
# Expected: Should find configs/model/synth.yaml

fd --type f "synth.yaml" configs/model/

Length of output: 64

configs/stac/demo.yaml (3)

6-8: LGTM: Processing control flags are properly configured.

The new skip flags (skip_fit_offsets and skip_ik_only) are consistently set to False, which ensures all processing steps will be executed. This is appropriate for a demo configuration.


Line range hint 10-13: LGTM: MuJoCo solver settings are appropriate.

The solver settings (newton solver with 1 iteration and 4 line search iterations) appear reasonable for a demo configuration.


1-3: Verify the existence of referenced data files.

The configuration now points to synthetic data files. Let's verify these paths exist to prevent runtime failures.

#!/bin/bash
# Description: Check if the referenced files exist in the repository

# Check for the pickle files in the current directory
echo "Checking for pickle files..."
ls -l synth_fit.p synth_ik_only.p 2>/dev/null || echo "Warning: Pickle files not found in current directory"

# Check for the NWB file in tests/data
echo "Checking for NWB file..."
ls -l tests/data/save_data_AVG_2.nwb 2>/dev/null || echo "Warning: NWB file not found in tests/data"
configs/stac/stac_mouse.yaml (2)

1-1: LGTM! Key renamed for consistency.

The renaming of fit_path to fit_offsets_path aligns with the standardization across configuration files.

Let's verify this change is consistent across the codebase:

#!/bin/bash
# Description: Verify the key renaming across all YAML configs and its usage in Python files

# Test 1: Check if any YAML files still use the old key name
echo "Checking for any remaining usage of 'fit_path' in YAML files..."
rg "fit_path:" -g "*.yaml" -g "*.yml"

# Test 2: Check Python files for the new key name usage
echo "Checking Python files using the new key name..."
rg "fit_offsets_path" -g "*.py"

# Test 3: Check for any potential missed references to the old key
echo "Checking for any references to the old key name..."
rg "fit_path" -g "*.py"

Line range hint 1-11: Verify test data path exists.

The configuration references test data at tests/data/test_mouse_mocap_3600_frames.h5. Since this PR introduces synthetic E2E testing, we should verify this test data file exists and is appropriate for synthetic testing.

✅ Verification successful

The provided output seems unrelated to verifying the existence of the test data file. Let me generate a new script to check this.


Test data file exists and is properly referenced

The test data file test_mouse_mocap_3600_frames.h5 exists in the expected location tests/data/ and is consistently referenced in both the configuration (configs/stac/stac_mouse.yaml) and test fixtures (tests/fixtures/datasets.py). The file size of 2.76MB suggests it contains appropriate motion capture data for synthetic testing.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the test data file exists and check its format

# Test 1: Check if the test data file exists
echo "Checking if test data file exists..."
test -f "tests/data/test_mouse_mocap_3600_frames.h5" && echo "File exists" || echo "File missing"

# Test 2: List all test data files to see available options
echo "Listing available test data files..."
fd -e h5 . tests/data

Length of output: 1230


Script:

#!/bin/bash
# Check if the test data file exists and explore available test data
echo "Checking if test data directory exists..."
test -d "tests/data" && echo "Directory exists" || echo "Directory missing"

echo -e "\nListing contents of tests/data directory..."
ls -l tests/data/

echo -e "\nSearching for h5 files in the repository..."
fd -e h5 .

echo -e "\nSearching for references to test_mouse_mocap in the codebase..."
rg "test_mouse_mocap" -g '!*.h5'

Length of output: 1318

configs/stac/stac.yaml (1)

1-4: Verify test data path compatibility with synthetic tests.

Since this PR introduces synthetic E2E testing, we should verify if the default test data path is appropriate for synthetic testing scenarios.

#!/bin/bash
# Check if the test data file exists and examine other data paths in use
test_data="tests/data/test_rodent_mocap_1000_frames.nwb"
if [ -f "$test_data" ]; then
    echo "Test data file exists"
else
    echo "Warning: Test data file not found"
fi

# Look for other synthetic test data paths that might be more appropriate
rg "synth.*\.nwb" --type yaml
configs/stac/stac_fly_treadmill.yaml (2)

3-5: Well-documented external data dependency.

Excellent documentation of the external data source with both the dataset DOI and direct download link. This helps users understand where to obtain the large data file that isn't committed to the repository.


1-2: Verify path renaming consistency across codebase.

The renaming of configuration paths looks good, but let's ensure all references are updated consistently.

✅ Verification successful

Path renaming has been consistently updated across the codebase

The verification shows that:

  • All active YAML configurations use the new naming convention (fit_offsets_path and ik_only_path)
  • The only occurrences of old names (fit_path and transform_path) are in commented-out lines in configs/stac/stac.yaml
  • The new naming is consistently used across all configuration files including:
    • Main configs (stac.yaml, stac_mouse.yaml, stac_fly_treadmill.yaml, stac_fly_tethered.yaml)
    • Demo config (demo.yaml)
    • Test configs (test_stac.yaml)
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for any remaining old path references
echo "Checking for any remaining old 'fit_path' references:"
rg "fit_path:" --type yaml

echo "Checking for any remaining old 'transform_path' references:"
rg "transform_path:" --type yaml

echo "Verifying new path naming consistency:"
rg "fit_offsets_path:|ik_only_path:" --type yaml

Length of output: 1319

configs/stac/stac_fly_tethered.yaml (1)

1-2: LGTM! Verify file paths exist.

The key renaming from fit_path to fit_offsets_path and transform_path to ik_only_path improves clarity by better describing their purposes.

Let's verify these files exist in the repository:

tests/unit/test_main.py (1)

21-21: LGTM! The assertion update aligns with the config schema changes.

The change from fit_path to fit_offsets_path correctly reflects the updated configuration structure.

configs/model/synth.yaml (2)

16-18: Consider expanding keypoint configuration.

The model currently defines only a single keypoint ('part_0'). This seems limited for a synthetic model. Consider if additional keypoints are needed for proper model functionality.

#!/bin/bash
# Check keypoint configurations in other model configs for comparison
rg -g "*.yaml" "KP_NAMES:" configs/model/
🧰 Tools
🪛 yamllint

[error] 16-16: trailing spaces

(trailing-spaces)


37-43: Verify scale factors match data units.

Both SCALE_FACTOR and MOCAP_SCALE_FACTOR are set to 1. Please verify these values are appropriate for your data units. The comment suggests MOCAP_SCALE_FACTOR should be 0.001 if mocap data is in millimeters.

#!/bin/bash
# Check for any mocap data files to verify units
fd -e nwb -e c3d . | head -n 5
.github/workflows/ci.yml (1)

102-102: LGTM! Secure token implementation.

The codecov token is properly sourced from GitHub secrets.

stac_mjx/main.py (3)

59-59: LGTM! Variable renaming improves clarity

The renaming from fit_path to fit_offsets_path better describes the variable's purpose and maintains consistency with configuration file changes.

Also applies to: 72-73


78-78: LGTM! Return statement correctly updated

The return statement properly reflects the variable renaming while maintaining the expected tuple return type.


86-86: LGTM! File loading path correctly updated

The pickle load operation correctly uses the renamed path variable.

stac_mjx/stac.py (1)

Line range hint 411-425: Verify type compatibility with h5 file format.

There's a commented section indicating type errors with h5 file format. Consider documenting the type requirements or adding type conversion if needed.

Let's check for h5 file usage in the codebase:

Mat-to-Nwb-Synth-Data.ipynb (1)

272-274: Address the warning about data and timestamps length mismatch

A warning is raised indicating that the length of the data does not match the length of the timestamps in PoseEstimationSeries. This suggests a potential issue with the dimensions of the data arrays.

After applying the suggested fixes to the data slicing and timestamps, rerun the notebook to ensure that the warning is resolved.

tests/configs/stac/test_stac.yaml Outdated Show resolved Hide resolved
configs/stac/demo.yaml Outdated Show resolved Hide resolved
configs/stac/stac.yaml Outdated Show resolved Hide resolved
configs/model/synth.yaml Outdated Show resolved Hide resolved
configs/model/synth.yaml Outdated Show resolved Hide resolved
Comment on lines +9 to +18
"import mujoco\n",
"import mediapy as media\n",
"import time\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"\n",
"\n",
"import imageio_ffmpeg\n",
"\n",
"!command -v ffmpeg >/dev/null || (apt update && apt install -y ffmpeg)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider using Python's subprocess module for dependency checks.

Instead of using shell command directly with !, consider using Python's subprocess module for better error handling and cross-platform compatibility.

-!command -v ffmpeg >/dev/null || (apt update && apt install -y ffmpeg)
+import subprocess
+import sys
+
+def check_ffmpeg():
+    try:
+        subprocess.run(['ffmpeg', '-version'], capture_output=True, check=True)
+    except (subprocess.CalledProcessError, FileNotFoundError):
+        print("ffmpeg not found. Please install it first.")
+        sys.exit(1)
+
+check_ffmpeg()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"import mujoco\n",
"import mediapy as media\n",
"import time\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"\n",
"\n",
"import imageio_ffmpeg\n",
"\n",
"!command -v ffmpeg >/dev/null || (apt update && apt install -y ffmpeg)"
"import mujoco\n",
"import mediapy as media\n",
"import time\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"\n",
"\n",
"import imageio_ffmpeg\n",
"import subprocess\n",
"import sys\n",
"\n",
"def check_ffmpeg():\n",
" try:\n",
" subprocess.run(['ffmpeg', '-version'], capture_output=True, check=True)\n",
" except (subprocess.CalledProcessError, FileNotFoundError):\n",
" print(\"ffmpeg not found. Please install it first.\")\n",
" sys.exit(1)\n",
"\n",
"check_ffmpeg()"

demos/create_synth_data.ipynb Show resolved Hide resolved
Mat-to-Nwb-Synth-Data.ipynb Show resolved Hide resolved
Mat-to-Nwb-Synth-Data.ipynb Show resolved Hide resolved
Mat-to-Nwb-Synth-Data.ipynb Show resolved Hide resolved
charles-zhng and others added 4 commits October 25, 2024 17:05
* fix offset shape when saving

* update demo config

* linter

* fix demo.yaml

* update configs

* update config test

---------

Co-authored-by: Charles Zhang <[email protected]>
* fix offset shape when saving

* update demo config

* linter

* fix demo.yaml

* update configs

* update config test

---------

Co-authored-by: Charles Zhang <[email protected]>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 7ed9f25 and 935c922.

📒 Files selected for processing (1)
  • configs/stac/demo.yaml (1 hunks)
🧰 Additional context used
🪛 yamllint
configs/stac/demo.yaml

[error] 4-4: duplication of key "fit_offsets_path" in mapping

(key-duplicates)


[error] 5-5: duplication of key "ik_only_path" in mapping

(key-duplicates)


[error] 6-6: duplication of key "data_path" in mapping

(key-duplicates)

🔇 Additional comments (2)
configs/stac/demo.yaml (2)

8-8: The n_fit_frames: 1 value remains unusually low

This comment was previously raised and remains valid. The value is significantly lower than other configurations (test_stac.yaml uses 42 frames, while production configs use 250-1800 frames). This could lead to underfitting issues.


4-6: Verify the existence of synthetic data files

The configuration now references new synthetic data files. Let's verify these paths exist in the repository.

#!/bin/bash
# Check if the new synthetic data files exist
echo "Checking for synthetic data files..."
test -f "tests/data/save_data_AVG_2.nwb" && echo "✓ NWB file exists" || echo "✗ NWB file missing"

# Look for related synthetic data generation scripts
echo -e "\nRelated synthetic data files:"
rg -l "save_data_AVG" --type py --type ipynb
🧰 Tools
🪛 yamllint

[error] 4-4: duplication of key "fit_offsets_path" in mapping

(key-duplicates)


[error] 5-5: duplication of key "ik_only_path" in mapping

(key-duplicates)


[error] 6-6: duplication of key "data_path" in mapping

(key-duplicates)

configs/stac/demo.yaml Outdated Show resolved Hide resolved
configs/stac/demo.yaml Outdated Show resolved Hide resolved
Copy link

codecov bot commented Oct 26, 2024

Codecov Report

Attention: Patch coverage is 36.36364% with 7 lines in your changes missing coverage. Please review.

Project coverage is 39.86%. Comparing base (11819ee) to head (dc881c3).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
stac_mjx/stac.py 36.36% 7 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #75      +/-   ##
==========================================
- Coverage   39.93%   39.86%   -0.07%     
==========================================
  Files          10       10              
  Lines         576      582       +6     
==========================================
+ Hits          230      232       +2     
- Misses        346      350       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (4)
configs/model/synth_model.yaml (3)

8-11: Address TODO and document tolerance values

  1. The TODO indicates that these tolerance values are not yet implemented in the optimizer loops.
  2. Consider documenting the expected impact of each tolerance value:
    • FTOL: 5.0e-03 (full model)
    • ROOT_FTOL: 1.0e-05 (root optimization)
    • LIMB_FTOL: 1.0e-06 (limb optimization)

Would you like me to help implement the optimizer loops using these tolerance values?


33-43: Enhance documentation for scale parameters

While MOCAP_SCALE_FACTOR is well documented, consider adding similar explanatory comments for:

  1. SCALE_FACTOR: Document its impact on the registration process
  2. KEYPOINT_COLOR_PAIRS: Document the RGBA color format
🧰 Tools
🪛 yamllint

[error] 36-36: trailing spaces

(trailing-spaces)


1-56: Fix YAML formatting issues

Multiple trailing spaces were detected. Consider running a YAML formatter to clean up these issues.

🧰 Tools
🪛 yamllint

[warning] 1-1: too many blank lines

(1 > 0) (empty-lines)


[error] 7-7: trailing spaces

(trailing-spaces)


[error] 13-13: trailing spaces

(trailing-spaces)


[error] 16-16: trailing spaces

(trailing-spaces)


[error] 29-29: trailing spaces

(trailing-spaces)


[error] 36-36: trailing spaces

(trailing-spaces)


[error] 45-45: trailing spaces

(trailing-spaces)


[error] 46-46: trailing spaces

(trailing-spaces)


[error] 47-47: trailing spaces

(trailing-spaces)


[error] 54-54: trailing spaces

(trailing-spaces)


[error] 55-55: trailing spaces

(trailing-spaces)

stac_mjx/stac.py (1)

Line range hint 116-121: LGTM! Consider adding type hints.

The change improves error handling by explicitly checking for key existence instead of value, preventing potential AttributeError. The empty list default is a sensible fallback when no individual part optimization is configured.

Consider adding type hints for better code maintainability:

-    def part_opt_setup(self):
+    def part_opt_setup(self) -> List[jp.ndarray]:
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 2e6e01c and f14910b.

📒 Files selected for processing (5)
  • configs/model/fly_tethered.yaml (0 hunks)
  • configs/model/fly_treadmill.yaml (0 hunks)
  • configs/model/synth_model.yaml (1 hunks)
  • configs/stac/demo.yaml (1 hunks)
  • stac_mjx/stac.py (1 hunks)
💤 Files with no reviewable changes (2)
  • configs/model/fly_tethered.yaml
  • configs/model/fly_treadmill.yaml
🚧 Files skipped from review as they are similar to previous changes (1)
  • configs/stac/demo.yaml
🧰 Additional context used
🪛 yamllint
configs/model/synth_model.yaml

[warning] 1-1: too many blank lines

(1 > 0) (empty-lines)


[error] 7-7: trailing spaces

(trailing-spaces)


[error] 13-13: trailing spaces

(trailing-spaces)


[error] 16-16: trailing spaces

(trailing-spaces)


[error] 29-29: trailing spaces

(trailing-spaces)


[error] 36-36: trailing spaces

(trailing-spaces)


[error] 45-45: trailing spaces

(trailing-spaces)


[error] 46-46: trailing spaces

(trailing-spaces)


[error] 47-47: trailing spaces

(trailing-spaces)


[error] 54-54: trailing spaces

(trailing-spaces)


[error] 55-55: trailing spaces

(trailing-spaces)

🔇 Additional comments (3)
configs/model/synth_model.yaml (3)

4-5: Verify if single frame per clip is intended

Setting N_FRAMES_PER_CLIP: 1 seems unusually low for animation data. Please verify if this is intended for your synthetic E2E test case.


13-14: Verify if single iteration is sufficient

Setting N_ITERS: 1 might not provide enough iterations for the pose and offset optimization to converge. Consider increasing this value or making it configurable based on convergence criteria.

🧰 Tools
🪛 yamllint

[error] 13-13: trailing spaces

(trailing-spaces)


16-18: Expand keypoint configuration for synthetic model

The current keypoint configuration seems minimal with only part_0. Consider:

  1. Adding more keypoints for a realistic synthetic model
  2. Documenting the purpose of each keypoint
  3. Validating if the base offset (0 0 0.01) is sufficient

Also applies to: 20-22, 24-26

🧰 Tools
🪛 yamllint

[error] 16-16: trailing spaces

(trailing-spaces)

configs/model/synth_model.yaml Outdated Show resolved Hide resolved
configs/model/synth_model.yaml Outdated Show resolved Hide resolved
configs/model/synth_model.yaml Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (2)
configs/model/synth_model.yaml (2)

8-8: TODO: Implementation needed for optimizer tolerances.

The comment indicates that the optimizer loops need to be updated to use these tolerance values.

Would you like me to help implement the optimizer loops using these tolerance values?


16-22: Consider using more descriptive keypoint names.

The current naming part_0 is generic. Consider using anatomically meaningful names that reflect the actual body parts being tracked (e.g., head, spine, tail_base).

🧰 Tools
🪛 yamllint

[error] 16-16: trailing spaces

(trailing-spaces)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between f14910b and 6f7da79.

📒 Files selected for processing (5)
  • configs/config.yaml (1 hunks)
  • configs/model/synth_model.yaml (1 hunks)
  • configs/stac/demo.yaml (1 hunks)
  • configs/stac/stac.yaml (1 hunks)
  • stac_mjx/stac.py (4 hunks)
✅ Files skipped from review due to trivial changes (1)
  • configs/stac/stac.yaml
🚧 Files skipped from review as they are similar to previous changes (3)
  • configs/config.yaml
  • configs/stac/demo.yaml
  • stac_mjx/stac.py
🧰 Additional context used
🪛 yamllint
configs/model/synth_model.yaml

[warning] 1-1: too many blank lines

(1 > 0) (empty-lines)


[error] 7-7: trailing spaces

(trailing-spaces)


[error] 13-13: trailing spaces

(trailing-spaces)


[error] 16-16: trailing spaces

(trailing-spaces)


[error] 29-29: trailing spaces

(trailing-spaces)


[error] 36-36: trailing spaces

(trailing-spaces)


[error] 45-45: trailing spaces

(trailing-spaces)


[error] 46-46: trailing spaces

(trailing-spaces)


[error] 47-47: trailing spaces

(trailing-spaces)


[error] 54-54: trailing spaces

(trailing-spaces)


[error] 55-55: trailing spaces

(trailing-spaces)

🔇 Additional comments (2)
configs/model/synth_model.yaml (2)

37-43: Verify scale factor values.

Both SCALE_FACTOR and MOCAP_SCALE_FACTOR are set to 1. Please verify:

  1. If the model and mocap data are indeed in the same units
  2. If the animal size matches the model size without scaling
#!/bin/bash
# Search for usage of scale factors to understand their impact
rg "MOCAP_SCALE_FACTOR|SCALE_FACTOR" -A 5

13-14: Verify if single iteration is sufficient.

N_ITERS=1 might not provide enough iterations for the alternating optimization to converge. Consider increasing this value or making it configurable based on convergence criteria.

🧰 Tools
🪛 yamllint

[error] 13-13: trailing spaces

(trailing-spaces)

configs/model/synth_model.yaml Outdated Show resolved Hide resolved
configs/model/synth_model.yaml Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (7)
configs/model/synth_model.yaml (3)

8-11: Address TODO and document tolerance values.

The TODO indicates incomplete implementation of optimizer loops. Additionally, please document:

  1. The meaning of each tolerance value
  2. How these values were determined
  3. What range of values is acceptable

Would you like me to help implement the optimizer loops using these tolerance values?


16-34: Expand keypoint configuration for better model representation.

The current configuration uses only a single keypoint (part_0), which might be insufficient for:

  1. Accurate pose estimation
  2. Realistic motion capture
  3. Proper model registration

Consider adding more keypoints to better represent the synthetic model's structure.

Would you like assistance in designing a more comprehensive keypoint configuration?

🧰 Tools
🪛 yamllint

[error] 16-16: trailing spaces

(trailing-spaces)


[error] 31-31: trailing spaces

(trailing-spaces)


1-58: Fix YAML formatting issues.

The file has multiple formatting issues:

  1. Trailing spaces on multiple lines
  2. Extra blank lines at the start

Consider using a YAML formatter to clean up these issues automatically.

🧰 Tools
🪛 yamllint

[warning] 1-1: too many blank lines

(1 > 0) (empty-lines)


[error] 7-7: trailing spaces

(trailing-spaces)


[error] 13-13: trailing spaces

(trailing-spaces)


[error] 16-16: trailing spaces

(trailing-spaces)


[error] 31-31: trailing spaces

(trailing-spaces)


[error] 38-38: trailing spaces

(trailing-spaces)


[error] 47-47: trailing spaces

(trailing-spaces)


[error] 48-48: trailing spaces

(trailing-spaces)


[error] 49-49: trailing spaces

(trailing-spaces)


[error] 56-56: trailing spaces

(trailing-spaces)


[error] 57-57: trailing spaces

(trailing-spaces)

stac_mjx/compute_stac.py (1)

17-17: Update docstring to document the new parameter.

The root_kp_idx parameter has been added but is not documented in the function's docstring. Please add a description of this parameter in the Args section.

Add this to the Args section of the docstring:

    Args:
        mjx_model (mjx.Model): MJX Model
        mjx_data (mjx.Data): MJX Data
        kp_data (jp.Array): Keypoint data
+       root_kp_idx (int): Index of the keypoint to use as root for optimization
        lb (jp.ndarray): Array of lower bounds for corresponding qpos elements
stac_mjx/stac.py (3)

83-93: Remove debug print statements and fix typo.

The code contains several debug print statements and a typo that should be cleaned up before merging:

  1. "fouind" should be "found"
  2. Debug print statements should be replaced with proper logging
-        if "ROOT_OPTIMIZATION_KEYPOINT" in self.cfg.model:
-            print("fouind root opt key")
-            self._root_kp_idx = self._kp_names.index(
-                self.cfg.model.ROOT_OPTIMIZATION_KEYPOINT
-            )
-        else:
-            print("NOOOO ROOT KP")
-            self._root_kp_idx = -1
-
-        print(self.cfg.model.keys())
-        print("Using root_kp_idx = ", self._root_kp_idx)
+        self._root_kp_idx = (
+            self._kp_names.index(self.cfg.model.ROOT_OPTIMIZATION_KEYPOINT)
+            if "ROOT_OPTIMIZATION_KEYPOINT" in self.cfg.model
+            else -1
+        )
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 84-85: stac_mjx/stac.py#L84-L85
Added lines #L84 - L85 were not covered by tests


359-363: Maintain consistency with fit_offsets improvements.

The same improvements suggested for fit_offsets should be applied here:

  1. Use proper logging instead of print
  2. Add test coverage for this condition
-            print(
-                "Missing or invalid ROOT_OPTIMIZATION_KEYPOINT, skipping root_optimization()"
-            )
+            logger.info("Missing or invalid ROOT_OPTIMIZATION_KEYPOINT, skipping root_optimization()")
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 359-360: stac_mjx/stac.py#L359-L360
Added lines #L359 - L360 were not covered by tests


[warning] 363-363: stac_mjx/stac.py#L363
Added line #L363 was not covered by tests


Line range hint 83-363: Consider architectural improvements for better maintainability.

Several architectural improvements could enhance the code:

  1. Add proper logging configuration instead of using print statements
  2. Update class docstring to document the ROOT_OPTIMIZATION_KEYPOINT behavior
  3. Consider extracting the root optimization logic into a separate method to reduce code duplication between fit_offsets and ik_only
  4. Add comprehensive test coverage for the new root optimization handling logic
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 84-85: stac_mjx/stac.py#L84-L85
Added lines #L84 - L85 were not covered by tests

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 6f7da79 and 2405815.

📒 Files selected for processing (4)
  • configs/model/rodent.yaml (4 hunks)
  • configs/model/synth_model.yaml (1 hunks)
  • stac_mjx/compute_stac.py (2 hunks)
  • stac_mjx/stac.py (4 hunks)
🧰 Additional context used
🪛 yamllint
configs/model/synth_model.yaml

[warning] 1-1: too many blank lines

(1 > 0) (empty-lines)


[error] 7-7: trailing spaces

(trailing-spaces)


[error] 13-13: trailing spaces

(trailing-spaces)


[error] 16-16: trailing spaces

(trailing-spaces)


[error] 31-31: trailing spaces

(trailing-spaces)


[error] 38-38: trailing spaces

(trailing-spaces)


[error] 47-47: trailing spaces

(trailing-spaces)


[error] 48-48: trailing spaces

(trailing-spaces)


[error] 49-49: trailing spaces

(trailing-spaces)


[error] 56-56: trailing spaces

(trailing-spaces)


[error] 57-57: trailing spaces

(trailing-spaces)

🪛 GitHub Check: codecov/patch
stac_mjx/stac.py

[warning] 84-85: stac_mjx/stac.py#L84-L85
Added lines #L84 - L85 were not covered by tests


[warning] 239-240: stac_mjx/stac.py#L239-L240
Added lines #L239 - L240 were not covered by tests


[warning] 243-243: stac_mjx/stac.py#L243
Added line #L243 was not covered by tests


[warning] 359-360: stac_mjx/stac.py#L359-L360
Added lines #L359 - L360 were not covered by tests


[warning] 363-363: stac_mjx/stac.py#L363
Added line #L363 was not covered by tests

🔇 Additional comments (8)
configs/model/synth_model.yaml (3)

39-45: Validate scale factors for your use case.

Both scale factors are set to 1, which assumes:

  1. The synthetic model size matches the desired registration size
  2. The mocap data is already in the same units as the model (meters)

Please verify these assumptions for your specific use case.


52-54: Review rendering configuration for consistency.

The current configuration has:

  • High render FPS (200)
  • Single sample frame

This combination might not provide meaningful output. Consider:

  1. Increasing N_SAMPLE_FRAMES to capture more motion
  2. Adjusting RENDER_FPS based on your motion capture frequency
#!/bin/bash
# Search for typical rendering configurations
rg "RENDER_FPS|N_SAMPLE_FRAMES" configs/

13-14: Verify if single iteration is sufficient for convergence.

Setting N_ITERS to 1 might not allow sufficient iterations for the pose and offset optimization to converge. Consider:

  1. Increasing the number of iterations
  2. Adding a convergence check
🧰 Tools
🪛 yamllint

[error] 13-13: trailing spaces

(trailing-spaces)

configs/model/rodent.yaml (2)

92-93: LGTM: Good choice for root optimization keypoint.

Setting SpineL as the root optimization keypoint is appropriate since it's:

  1. A stable, central point on the trunk
  2. Already part of trunk optimization keypoints
  3. Has a neutral initial offset [0, 0, 0]

4-4: Verify the significant reduction in sampling parameters.

The following parameters have been drastically reduced:

  • N_FRAMES_PER_CLIP: 250 → 25 (90% reduction)
  • N_ITERS: 6 → 1 (83% reduction)
  • N_SAMPLE_FRAMES: 100 → 1 (99% reduction)

While these reductions might be intentional for synthetic E2E testing, such dramatic decreases could impact the quality of pose estimation in production scenarios.

Let's check if these parameters are specifically overridden for testing:

Also applies to: 13-13, 194-194

✅ Verification successful

Parameter reductions are consistent with synthetic testing patterns

The reduced parameters in rodent.yaml (N_FRAMES_PER_CLIP: 25, N_ITERS: 1, N_SAMPLE_FRAMES: 1) align with synthetic testing patterns, as evidenced by:

  • synth_model.yaml uses identical minimal values (1/1/1)
  • Test configs and production models use higher values:
    • Test configs: 360/6/100
    • Fly models: 300-581/6/100
    • Mouse model: 360/6/50

These reductions are intentional and appropriate for synthetic E2E testing purposes.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for any overrides of these parameters in test configurations
rg -l "N_FRAMES_PER_CLIP:|N_ITERS:|N_SAMPLE_FRAMES:" "tests/" "configs/"

Length of output: 435


Script:

#!/bin/bash
# Check the values of these parameters in test configurations and other model configs
echo "=== Test configurations ==="
rg "N_FRAMES_PER_CLIP:|N_ITERS:|N_SAMPLE_FRAMES:" "tests/configs/model/"

echo -e "\n=== Other model configurations ==="
rg "N_FRAMES_PER_CLIP:|N_ITERS:|N_SAMPLE_FRAMES:" "configs/model/" --glob '!rodent.yaml'

Length of output: 1857

stac_mjx/compute_stac.py (1)

54-55: 🛠️ Refactor suggestion

Add input validation for root_kp_idx.

The comments indicate that root_kp_idx varies by model type (e.g., 3*18 for rodent.xml, 3*5 for mouse model, 0 for FLY_MODEL). Consider adding validation to ensure the provided index is valid for the current model.

Let's verify the keypoint indices across different models:

Consider adding validation at the start of the function:

def root_optimization(...):
    # Validate root keypoint index
    n_keypoints = kp_data.shape[1] // 3  # Assuming xyz coordinates
    if not (0 <= root_kp_idx < kp_data.shape[1]):
        raise ValueError(
            f"root_kp_idx {root_kp_idx} is out of bounds. "
            f"Must be between 0 and {kp_data.shape[1]-1}"
        )
    if root_kp_idx % 3 != 0:
        raise ValueError(
            f"root_kp_idx {root_kp_idx} must be a multiple of 3 "
            "to align with xyz coordinates"
        )
stac_mjx/stac.py (2)

Line range hint 128-135: LGTM! More pythonic key existence check.

The change to check for key existence rather than None value is a good improvement.


239-243: Replace print with logging and add test coverage.

The root optimization skip condition is a good addition but needs some improvements:

  1. Use proper logging instead of print
  2. Add test coverage for this condition
-            print(
-                "Missing or invalid ROOT_OPTIMIZATION_KEYPOINT, skipping root_optimization()"
-            )
+            logger.info("Missing or invalid ROOT_OPTIMIZATION_KEYPOINT, skipping root_optimization()")
#!/bin/bash
# Check test coverage for root optimization skipping
rg -A 5 "def test.*root_optimization" tests/
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 239-240: stac_mjx/stac.py#L239-L240
Added lines #L239 - L240 were not covered by tests


[warning] 243-243: stac_mjx/stac.py#L243
Added line #L243 was not covered by tests

stac_mjx/stac.py Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (6)
configs/stac/stac_synth_data.yaml (1)

1-3: Consider improving path configuration for better flexibility and security.

  1. Use environment variables or configuration inheritance for base paths to make the configuration more portable.
  2. Be consistent with relative paths across all entries.
  3. Consider documenting the expected file formats, especially for pickle files which may have security implications.

Example structure:

base_path: ${STAC_DATA_DIR}  # Environment variable
paths:
  fit_offsets: "${base_path}/synth_fit.p"
  ik_only: "${base_path}/synth_ik_only.p"
  test_data: "${base_path}/tests/data/test_synth_1_frames.nwb"
configs/model/synth_data.yaml (3)

8-11: Address TODO comment regarding optimizer loops

The comment indicates that the tolerance parameters are not yet implemented in the optimizer loops. This could affect the optimization accuracy.

Would you like me to help implement the tolerance-based optimization loops or create a GitHub issue to track this task?


16-18: Consider using more descriptive keypoint names

The keypoint name part_0 is generic and might not clearly indicate its anatomical or functional purpose. Consider using more descriptive names that reflect the actual body parts or landmarks being tracked.

KP_NAMES: 
-  - part_0
+  - head_center  # Example of a more descriptive name

-ROOT_OPTIMIZATION_KEYPOINT: part_0
+ROOT_OPTIMIZATION_KEYPOINT: head_center

Also applies to: 19-20

🧰 Tools
🪛 yamllint

[error] 16-16: trailing spaces

(trailing-spaces)


56-58: Remove duplicate comment and improve documentation

The comment for SITES_TO_REGULARIZE is duplicated. Additionally, the impact of M_REG_COEF's value should be better documented.

-# If you have reason to believe that the initial offsets are correct for particular keypoints, 
-# you can regularize those sites using _SITES_TO_REGULARIZE. 
-M_REG_COEF: 1
+# Regularization coefficient for sites listed in SITES_TO_REGULARIZE
+# Higher values (>1) increase regularization strength, enforcing initial offsets more strictly
+# Lower values (<1) allow more deviation from initial offsets
+M_REG_COEF: 1.0
🧰 Tools
🪛 yamllint

[error] 56-56: trailing spaces

(trailing-spaces)


[error] 57-57: trailing spaces

(trailing-spaces)

demos/create_synth_data.ipynb (2)

40-65: Remove commented out code.

The commented out XML model definition should be removed to improve code readability and maintainability.

Apply this diff to remove the commented code:

-# chaotic_pendulum = \"\"\"\n",
-# <mujoco>\n",
-#   <option timestep=\".001\">\n",
-#     <flag energy=\"enable\" contact=\"disable\"/>\n",
-#   </option>\n",
-# \n",
-#   <default>\n",
-#     <joint type=\"hinge\" axis=\"0 -1 0\"/>\n",
-#     <geom type=\"capsule\" size=\".02\"/>\n",
-#   </default>\n",
-# \n",
-#   <worldbody>\n",
-#     <light pos=\"0 -.4 1\"/>\n",
-#     <camera name=\"fixed\" pos=\"0 -1 0\" xyaxes=\"1 0 0 0 0 1\"/>\n",
-#     <body name=\"0\" pos=\"0 0 .2\">\n",
-#       <joint name=\"root\"/>\n",
-#       <geom fromto=\"0 0 0 0 0 -.25\" rgba=\"1 1 0 1\"/>\n",
-#       <body name=\"1\" pos=\"0 0 -.25\">\n",
-#          <joint/>\n",
-#          <geom fromto=\"0 0 0 0 0 -.2\" rgba=\"0 0 1 1\"/>\n",
-#          <site name=\"end_eff\" type=\"sphere\" size=\".3\" rgba=\"1 0 0 0\" pos=\"0 0 -.2\"/>\n",
-#       </body>\n",
-#     </body>\n",
-#   </worldbody>\n",
-# </mujoco>\n",
-# \"\"\"\n",

213-222: Remove commented out plotting code.

The commented out plotting lines should be removed to improve code readability.

Apply this diff to remove the commented code:

-  #angle[i] = data.joint('root').qpos
  x0[i] = data.body('0').xpos
-  #x1[i] = data.body('1').xpos
-  #print(data.body('1').xpos[0])

# plot
ax[0].plot(sim_time, x0[:,2])
ax[1].plot(sim_time, x1[:,2])
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 2405815 and 86d333e.

📒 Files selected for processing (7)
  • .github/workflows/ci.yml (1 hunks)
  • configs/config_synth_data.yaml (1 hunks)
  • configs/model/fly_treadmill.yaml (1 hunks)
  • configs/model/synth_data.yaml (1 hunks)
  • configs/stac/stac_synth_data.yaml (1 hunks)
  • demos/create_synth_data.ipynb (1 hunks)
  • models/synth_model.xml (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • configs/config_synth_data.yaml
🚧 Files skipped from review as they are similar to previous changes (2)
  • configs/model/fly_treadmill.yaml
  • models/synth_model.xml
🧰 Additional context used
🪛 yamllint
.github/workflows/ci.yml

[error] 107-107: no new line character at the end of file

(new-line-at-end-of-file)

configs/model/synth_data.yaml

[warning] 1-1: too many blank lines

(1 > 0) (empty-lines)


[error] 7-7: trailing spaces

(trailing-spaces)


[error] 13-13: trailing spaces

(trailing-spaces)


[error] 16-16: trailing spaces

(trailing-spaces)


[error] 31-31: trailing spaces

(trailing-spaces)


[error] 38-38: trailing spaces

(trailing-spaces)


[error] 47-47: trailing spaces

(trailing-spaces)


[error] 48-48: trailing spaces

(trailing-spaces)


[error] 49-49: trailing spaces

(trailing-spaces)


[error] 56-56: trailing spaces

(trailing-spaces)


[error] 57-57: trailing spaces

(trailing-spaces)

🪛 Gitleaks
demos/create_synth_data.ipynb

119-119: Identified a pattern that may indicate AWS credentials, risking unauthorized cloud resource access and data breaches on AWS platforms.

(aws-access-token)

🔇 Additional comments (5)
configs/stac/stac_synth_data.yaml (2)

9-12: Review solver parameters for convergence reliability.

The current MuJoCo solver configuration might not ensure reliable convergence:

  1. iterations: 1 seems too low for a Newton solver to converge properly
  2. Missing important parameters like tolerances and convergence criteria

Let's check the MuJoCo documentation and similar configurations:

#!/bin/bash
# Search for other MuJoCo solver configurations in the codebase
rg -g "*.yaml" -g "*.yml" -B 2 -A 4 "solver:\s*newton"

5-7: Verify processing parameters and add documentation.

  1. The value n_fit_frames: 1 seems unusually low for a fitting process. Is this intentional?
  2. Consider adding documentation comments to explain:
    • The expected range for n_fit_frames
    • The implications of the skip flags
    • The relationship between these parameters

Let's check if there are any similar configurations in the codebase for reference:

configs/model/synth_data.yaml (2)

13-14: Verify if single iteration is sufficient

Setting N_ITERS: 1 might not allow enough iterations for the pose and offset optimization to converge. Consider increasing this value or making it configurable based on convergence criteria.

#!/bin/bash
# Search for the optimization loop implementation to understand iteration usage
ast-grep --pattern 'for $_ in range(N_ITERS):
  $$$'
🧰 Tools
🪛 yamllint

[error] 13-13: trailing spaces

(trailing-spaces)


4-5: Verify if single frame per clip is intended

Setting N_FRAMES_PER_CLIP: 1 seems unusual for a transform operation. Please verify if this is intentional, as typically multiple frames would be processed for meaningful motion analysis.

✅ Verification successful

Based on the code context, I can now provide a final response:

Single frame per clip is intentional for synthetic data transformation

The N_FRAMES_PER_CLIP: 1 setting in synth_data.yaml is appropriate for this specific configuration. The codebase shows that this parameter is used to divide the input data into chunks for processing, and for synthetic data transformation, processing one frame at a time is a valid approach. This differs from other configs in the repository (like rodent/mouse configs that use larger values) because those deal with motion capture data requiring temporal context.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for usage of N_FRAMES_PER_CLIP to understand the context
rg -A 5 "N_FRAMES_PER_CLIP"

Length of output: 54


Script:

#!/bin/bash
# Search for any references to frames per clip in the codebase
rg -i "frames.?per.?clip"

# Search for transform-related configurations
fd "transform" --type f --exec rg -l "frame"

# Look for any yaml files that might contain related configuration
fd "\.ya?ml$" --type f --exec cat {} \;

Length of output: 48134


Script:

#!/bin/bash
# Search for code that processes frames or uses N_FRAMES_PER_CLIP
rg -A 5 "N_FRAMES_PER_CLIP" "stac_mjx/"

# Look for transform-related code
ast-grep --pattern 'def transform($$$): 
  $$$'

Length of output: 887

.github/workflows/ci.yml (1)

106-107: ⚠️ Potential issue

Improve smoke test implementation.

The smoke test implementation still needs improvement:

  1. The comment "Test probably delete this" creates uncertainty about the test's permanence
  2. The script path should be workspace-relative
  3. No error handling or result verification
  4. Missing documentation about what this synthetic E2E test verifies

Apply this diff to improve the implementation:

-      # Test probably delete this
-      - name: Smoke Test
+      - name: Run Synthetic E2E Test
         shell: bash -l {0}
-        run: python run_stac.py --config-name config_synth_data.yaml
+        run: |
+          # Run synthetic E2E test with config and verify results
+          python ./tests/smoke/run_stac.py --config-name config_synth_data.yaml
+          if [ $? -ne 0 ]; then
+            echo "Synthetic E2E test failed!"
+            exit 1
+          fi

Let's verify the script location and implementation:

🧰 Tools
🪛 yamllint

[error] 107-107: no new line character at the end of file

(new-line-at-end-of-file)

configs/model/synth_data.yaml Show resolved Hide resolved
configs/model/synth_data.yaml Show resolved Hide resolved
.github/workflows/ci.yml Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Outside diff range and nitpick comments (3)
tests/configs/model/test_rodent_no_kp_names.yaml (1)

61-62: Consider adding documentation.

Consider adding a comment explaining:

  • The purpose of ROOT_OPTIMIZATION_KEYPOINT
  • Why SpineL was chosen as the root keypoint
  • How this parameter affects the optimization process

Add documentation above the parameter:

+# Specifies the keypoint used as the root for pose optimization.
+# SpineL is chosen as it corresponds to the pelvis and provides a stable reference point.
ROOT_OPTIMIZATION_KEYPOINT: SpineL
tests/configs/model/test_rodent.yaml (1)

83-84: Add documentation for the new parameter.

The addition of ROOT_OPTIMIZATION_KEYPOINT is appropriate, using SpineL as the root point makes anatomical sense since it maps to the pelvis in the model. Consider adding a comment explaining its purpose and impact on the optimization process.

+# Specifies the keypoint to use as the root for pose optimization.
+# This keypoint serves as the reference point for calculating relative positions.
 ROOT_OPTIMIZATION_KEYPOINT: SpineL
tests/configs/model/test_mouse.yaml (1)

Line range hint 133-189: Consider documenting or cleaning up commented configurations.

The INDIVIDUAL_PART_OPTIMIZATION section contains extensive commented-out configurations. Consider either:

  1. Removing if these configurations are no longer needed
  2. Adding documentation explaining when these configurations should be uncommented
  3. Moving to a separate example configuration file if they serve as templates
🧰 Tools
🪛 yamllint

[warning] 128-128: wrong indentation: expected 2 but found 1

(indentation)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 86d333e and d9fe782.

📒 Files selected for processing (9)
  • configs/model/mouse.yaml (1 hunks)
  • configs/model/rodent.yaml (1 hunks)
  • configs/stac/stac_fly_treadmill.yaml (1 hunks)
  • stac_mjx/stac.py (4 hunks)
  • tests/configs/model/test_mouse.yaml (1 hunks)
  • tests/configs/model/test_rodent.yaml (1 hunks)
  • tests/configs/model/test_rodent_label3d.yaml (1 hunks)
  • tests/configs/model/test_rodent_less_kp_names.yaml (1 hunks)
  • tests/configs/model/test_rodent_no_kp_names.yaml (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • configs/model/rodent.yaml
🧰 Additional context used
🪛 GitHub Check: codecov/patch
stac_mjx/stac.py

[warning] 88-88: stac_mjx/stac.py#L88
Added line #L88 was not covered by tests


[warning] 234-235: stac_mjx/stac.py#L234-L235
Added lines #L234 - L235 were not covered by tests


[warning] 238-238: stac_mjx/stac.py#L238
Added line #L238 was not covered by tests


[warning] 354-355: stac_mjx/stac.py#L354-L355
Added lines #L354 - L355 were not covered by tests


[warning] 358-358: stac_mjx/stac.py#L358
Added line #L358 was not covered by tests

🔇 Additional comments (9)
configs/stac/stac_fly_treadmill.yaml (1)

3-5: Excellent documentation of data source!

The comments providing the Dryad repository link and direct file download URL are very helpful for reproducibility.

tests/configs/model/test_rodent_no_kp_names.yaml (1)

61-62: Verify root keypoint selection.

The selection of "SpineL" as the root optimization keypoint appears appropriate since:

  1. It maps to the "pelvis" in KEYPOINT_MODEL_PAIRS
  2. It's included in SITES_TO_REGULARIZE, indicating it's a stable reference point

Let's verify this is consistent across related files:

✅ Verification successful

Root optimization keypoint selection is consistent

The selection of "SpineL" as the ROOT_OPTIMIZATION_KEYPOINT is verified to be consistent across all rodent-related configurations:

  • All rodent configs (rodent.yaml, test_rodent*.yaml) use "SpineL"
  • Each animal model has its appropriate root keypoint:
    • Mouse configs use "Trunk"
    • Fly treadmill uses "head"
    • Synthetic data uses "part_0"

The keypoint configurations show that SpineL is used in conjunction with other stable points (HandL, HandR, SpineF) in SITES_TO_REGULARIZE, confirming its role as a reliable reference point for optimization.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check ROOT_OPTIMIZATION_KEYPOINT usage across config files
# Expected: Consistent usage of SpineL for rodent configs

# Search for ROOT_OPTIMIZATION_KEYPOINT in all YAML files
rg "ROOT_OPTIMIZATION_KEYPOINT:" -A 1 "\.ya?ml$"

Length of output: 97


Script:

#!/bin/bash
# Let's try a different approach to find YAML files first and then search within them
# First, find all yaml/yml files
fd "\.(yaml|yml)$" | while read -r file; do
    echo "=== $file ==="
    grep -A 1 "ROOT_OPTIMIZATION_KEYPOINT:" "$file" 2>/dev/null
done

# Also search for keypoint-related configurations to understand the context
echo -e "\nChecking keypoint configurations:"
fd "\.(yaml|yml)$" | while read -r file; do
    echo "=== $file ==="
    grep -A 3 "KEYPOINT_MODEL_PAIRS\|SITES_TO_REGULARIZE" "$file" 2>/dev/null
done

Length of output: 10588

tests/configs/model/test_rodent_label3d.yaml (1)

63-64: LGTM! Well-chosen root optimization keypoint.

The addition of SpineL as the ROOT_OPTIMIZATION_KEYPOINT is anatomically sound since:

  1. It maps to the pelvis in the model
  2. It's already part of the regularized sites
  3. It's integrated with trunk optimization
configs/model/mouse.yaml (2)

Line range hint 4-6: Consider addressing the TODO comment alongside this change.

The addition of ROOT_OPTIMIZATION_KEYPOINT might be related to the TODO about re-implementing optimizer loops to use tolerance parameters. Consider whether these changes should be coordinated.

Would you like help in implementing the optimizer loops to properly utilize these tolerance parameters?

🧰 Tools
🪛 yamllint

[warning] 128-128: wrong indentation: expected 2 but found 1

(indentation)


125-126: Verify the root optimization keypoint selection.

The selection of "Trunk" as the root optimization keypoint appears reasonable as it:

  1. Maps to the thoracic vertebrae (mouse__Thoracic1) which is central to the skeletal structure
  2. Is already included in the TRUNK_OPTIMIZATION_KEYPOINTS list
  3. Has defined mappings in all required configuration sections

However, please verify if this selection impacts the TODO comment about re-implementing optimizer loops, as both features involve optimization parameters.

tests/configs/model/test_mouse.yaml (2)

Line range hint 7-9: Please clarify the status of optimizer loop implementation.

There's a TODO comment about re-implementing optimizer loops to use the tolerance values. Given the addition of ROOT_OPTIMIZATION_KEYPOINT, it would be helpful to understand:

  1. Is this TODO still relevant?
  2. How does the new root optimization keypoint interact with these tolerances?

Let's check for related implementation files:

#!/bin/bash
# Description: Look for optimizer implementation files
# Search for files containing optimizer-related code
rg -l "optimizer.*loop|optimization.*loop" --type py
🧰 Tools
🪛 yamllint

[warning] 128-128: wrong indentation: expected 2 but found 1

(indentation)


125-126: LGTM! The ROOT_OPTIMIZATION_KEYPOINT configuration is properly defined.

The "Trunk" keypoint is well-defined in the configuration with proper mappings and is consistently used across the configuration file.

Let's verify the consistency of this configuration across other files:

stac_mjx/stac.py (2)

83-88: Test coverage needed for root keypoint initialization.

The root keypoint initialization logic looks correct, but lacks test coverage as noted in previous reviews.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 88-88: stac_mjx/stac.py#L88
Added line #L88 was not covered by tests


123-123: LGTM: Improved configuration handling.

The change to check for key existence instead of None is more robust and follows better configuration handling practices.

configs/stac/stac_fly_treadmill.yaml Show resolved Hide resolved
tests/configs/model/test_rodent.yaml Show resolved Hide resolved
stac_mjx/stac.py Show resolved Hide resolved
stac_mjx/stac.py Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between d9fe782 and 418a1e9.

📒 Files selected for processing (2)
  • .github/workflows/ci.yml (1 hunks)
  • stac_mjx/stac.py (4 hunks)
🧰 Additional context used
🪛 yamllint
.github/workflows/ci.yml

[error] 107-107: no new line character at the end of file

(new-line-at-end-of-file)

🪛 GitHub Check: codecov/patch
stac_mjx/stac.py

[warning] 88-88: stac_mjx/stac.py#L88
Added line #L88 was not covered by tests


[warning] 234-235: stac_mjx/stac.py#L234-L235
Added lines #L234 - L235 were not covered by tests


[warning] 238-238: stac_mjx/stac.py#L238
Added line #L238 was not covered by tests


[warning] 354-355: stac_mjx/stac.py#L354-L355
Added lines #L354 - L355 were not covered by tests


[warning] 358-358: stac_mjx/stac.py#L358
Added line #L358 was not covered by tests

🔇 Additional comments (3)
.github/workflows/ci.yml (1)

104-107: ⚠️ Potential issue

Improve smoke test implementation and documentation.

The smoke test step needs several improvements:

  1. The comment suggests this might be temporary - please clarify the intention
  2. The script path should be workspace-relative
  3. The config file path needs to be workspace-relative
  4. Missing error handling and test result verification

Apply this diff to improve the implementation:

-      # Smoke test. Shows end to end run with out crashing.
-      - name: Smoke Test
+      # Synthetic E2E smoke test to verify the complete pipeline
+      # This test:
+      # 1. Loads synthetic data configuration
+      # 2. Runs STAC pipeline end-to-end
+      # 3. Verifies output structure and basic metrics
+      - name: Run Synthetic E2E Test
         shell: bash -l {0}
-        run: python run_stac.py --config-name config_synth_data.yaml
+        run: |
+          # Run with workspace-relative paths and proper error handling
+          python ./tests/smoke/run_stac.py --config-dir ./configs --config-name config_synth_data.yaml
+          if [ $? -ne 0 ]; then
+            echo "Synthetic E2E test failed!"
+            exit 1
+          fi

Additionally, as mentioned in the previous review, please ensure that run_stac.py implements proper error handling, documentation, and exit codes.

🧰 Tools
🪛 yamllint

[error] 107-107: no new line character at the end of file

(new-line-at-end-of-file)

stac_mjx/stac.py (2)

234-238: 🛠️ Refactor suggestion

Extract duplicated root optimization logic.

The root optimization logic and error handling is duplicated between fit_offsets and ik_only. This makes maintenance harder and increases the chance of inconsistencies.

Extract this into a helper method as suggested in the past review:

+def _handle_root_optimization(self, mjx_model, mjx_data, kp_data, *args):
+    if self._root_kp_idx == -1:
+        print("Missing or invalid ROOT_OPTIMIZATION_KEYPOINT, skipping root_optimization()")
+    elif self._mj_model.jnt_type[0] == mujoco.mjtJoint.mjJNT_FREE:
+        return compute_stac.root_optimization(
+            mjx_model, mjx_data, kp_data, self._root_kp_idx,
+            self._lb, self._ub, self._body_site_idxs, self._trunk_kps
+        )
+    return mjx_data

Then use it in both methods:

mjx_data = self._handle_root_optimization(mjx_model, mjx_data, kp_data)

Also applies to: 354-358

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 234-235: stac_mjx/stac.py#L234-L235
Added lines #L234 - L235 were not covered by tests


[warning] 238-238: stac_mjx/stac.py#L238
Added line #L238 was not covered by tests


83-88: ⚠️ Potential issue

Add error handling for invalid ROOT_OPTIMIZATION_KEYPOINT value.

The initialization should handle the case where the configured keypoint name is not found in _kp_names.

Apply this diff to add error handling:

 if "ROOT_OPTIMIZATION_KEYPOINT" in self.cfg.model:
+    try:
         self._root_kp_idx = self._kp_names.index(
             self.cfg.model.ROOT_OPTIMIZATION_KEYPOINT
         )
+    except ValueError:
+        print(f"Warning: Invalid ROOT_OPTIMIZATION_KEYPOINT '{self.cfg.model.ROOT_OPTIMIZATION_KEYPOINT}'. Using default.")
+        self._root_kp_idx = -1
 else:
     self._root_kp_idx = -1
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 88-88: stac_mjx/stac.py#L88
Added line #L88 was not covered by tests

stac_mjx/stac.py Show resolved Hide resolved
@talmolab talmolab deleted a comment from coderabbitai bot Oct 30, 2024
@charles-zhng
Copy link
Collaborator

When I run the synthetic data, I get this pynwb warning: UserWarning: PoseEstimationSeries 'part_0': Length of data does not match length of timestamps. Your data may be transposed. Time should be on the 0th dimension warn("%s '%s': Length of data does not match length of timestamps. Your data may be transposed. "
Is this fine?

Copy link
Collaborator

@charles-zhng charles-zhng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall looks good, just some questions about the data generation--we'll probably extend it for more tests later so important we set this up well

run: python run_stac.py --config-name config_synth_data.yaml
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use python run_stac.py stac=stac_synth_data model=synth_data instead to avoid creating the additional config file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great suggestion! Pushed!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How was the data saved from this notebook? And were there any offsets applied to the synthetic keypoint? I wanted to test whether it finds the ground truth offset when given different initial offsets in the model.yaml file, but it always returns the initial offset without any changes.

This notebook doesn't run as is; the rendering is cell 3 is different for me from the one shared, and the last cell throw an error: KeyError: "Invalid name '0'. Valid names: ['base', 'world']"

Copy link
Contributor Author

@jf514 jf514 Nov 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's just not ready for that. I changed to the title to reflect that it's just a smoke test... ie it runs with out crashing.

This PR isn't to provide that level of functionality, sadly. I've created #81 to track the next steps. Please feel free to add any specific requests to that.

As for the data generation, I created a single frame of fake data (not even collected from an actual Mujoco run) just to give an output.

Also, if you think there are any comments that need to be added to the code to make this clear, also lmk.

@@ -224,11 +231,16 @@ def fit_offsets(self, kp_data):

# Begin optimization steps
# Skip root optimization if model is fixed (no free joint at root)
if self._mj_model.jnt_type[0] == mujoco.mjtJoint.mjJNT_FREE:
if self._root_kp_idx == -1:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense 👍

stac_mjx/stac.py Show resolved Hide resolved
@jf514 jf514 changed the title Synthetic E2E test. Synthetic data smoke test. Nov 1, 2024
Copy link
Contributor

coderabbitai bot commented Nov 1, 2024

Warning

Rate limit exceeded

@jf514 has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 3 minutes and 18 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between dc26558 and dc881c3.

Walkthrough

The pull request introduces multiple changes across various configuration files, Jupyter notebooks, and a CI workflow. Key modifications include enhancements to the CI setup with the addition of a smoke test, the introduction of new configuration files for synthetic data generation, and updates to existing model configurations to specify root optimization keypoints. The changes improve the flexibility and functionality of the optimization processes while maintaining the integrity of existing configurations.

Changes

File Path Change Summary
.github/workflows/ci.yml Modified CI workflow to include a smoke test and updated job specifications for linting and testing.
Mat-to-Nwb-Synth-Data.ipynb Introduced a notebook for processing MATLAB data into NWB format, including data loading and validation.
configs/model/fly_tethered.yaml Updated MJCF_PATH and removed TIME_BINS parameter.
configs/model/fly_treadmill.yaml Added ROOT_OPTIMIZATION_KEYPOINT and removed TIME_BINS.
configs/model/mouse.yaml Added ROOT_OPTIMIZATION_KEYPOINT: Trunk.
configs/model/rodent.yaml Added ROOT_OPTIMIZATION_KEYPOINT: SpineL.
configs/model/synth_data.yaml New configuration file defining parameters for synthetic models.
configs/stac/stac_fly_treadmill.yaml Updated data_path from absolute to relative path.
configs/stac/stac_synth_data.yaml New configuration file for synthetic data generation with various parameters.
demos/create_synth_data.ipynb New notebook for simulating a chaotic pendulum using MuJoCo.
models/synth_model.xml New XML configuration for a MuJoCo simulation model.
stac_mjx/compute_stac.py Updated root_optimization function to accept root_kp_idx as a parameter.
stac_mjx/stac.py Enhanced Stac class to handle root optimization keypoints more robustly.
tests/configs/model/test_mouse.yaml Added ROOT_OPTIMIZATION_KEYPOINT: Trunk.
tests/configs/model/test_rodent.yaml Added ROOT_OPTIMIZATION_KEYPOINT: SpineL.
tests/configs/model/test_rodent_label3d.yaml Added ROOT_OPTIMIZATION_KEYPOINT: SpineL.
tests/configs/model/test_rodent_less_kp_names.yaml Added ROOT_OPTIMIZATION_KEYPOINT: SpineL.
tests/configs/model/test_rodent_no_kp_names.yaml Added ROOT_OPTIMIZATION_KEYPOINT: SpineL.

Possibly related PRs

  • CI workflow #22: Modifies the CI workflow in .github/workflows/ci.yml, directly related to the changes in this PR.
  • Creating .mat and .nwb data loaders #23: Adds a warning filter to the pytest command in the CI workflow, relevant to the testing job.
  • Smoke test #74: Introduces a new job for a smoke test in the CI workflow, complementing the enhancements made in this PR.

Suggested reviewers

  • talmo
  • charles-zhng

Poem

🐇 In the garden where changes bloom,
New paths emerge, dispelling gloom.
With smoke tests and models so bright,
We hop along, in pure delight!
Configs refined, like a gentle breeze,
Optimization flows with such ease.
Hooray for progress, let’s dance and play! 🌼


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Outside diff range and nitpick comments (13)
configs/stac/stac_fly_treadmill.yaml (1)

3-6: Enhance documentation for data setup process

While the comments provide download links, the setup process could be clearer. Consider adding:

  1. Expected file size
  2. Setup instructions (e.g., where to place the downloaded file)
  3. Validation steps to confirm correct setup

Add these details to the comments:

 # File is too large to commit
 # DL from: https://datadryad.org/stash/dataset/doi:10.5061/dryad.mpg4f4r73
 # Actual file: https://datadryad.org/stash/downloads/file_stream/3361804
+# Setup instructions:
+# 1. Download file (size: <specify_size>)
+# 2. Place in tests/data/ directory as wt_berlin_linear_treadmill_dataset.csv
+# 3. Verify setup by running: <specify_validation_command>
models/synth_model.xml (1)

10-12: Review lighting and camera setup for synthetic data quality

The current setup might affect synthetic data quality:

  1. Single light source could create strong shadows affecting feature detection
  2. Fixed camera position might limit the variety of synthetic data generated

Consider:

  1. Adding multiple light sources for more even illumination
  2. Adding multiple camera views or a moving camera for diverse data generation
configs/model/synth_data.yaml (2)

8-11: Implement tolerance parameters in optimizer loops

There's a TODO comment indicating that these tolerance parameters (FTOL, ROOT_FTOL, LIMB_FTOL) are not yet implemented in the optimizer loops.

Would you like me to help implement these tolerance parameters in the optimizer loops? I can assist in creating a new issue to track this implementation.


56-57: Remove duplicate comment about SITES_TO_REGULARIZE

The comment explaining SITES_TO_REGULARIZE is duplicated. The first instance appears at lines 47-48.

-# If you have reason to believe that the initial offsets are correct for particular keypoints, 
-# you can regularize those sites using _SITES_TO_REGULARIZE. 
🧰 Tools
🪛 yamllint

[error] 56-56: trailing spaces

(trailing-spaces)


[error] 57-57: trailing spaces

(trailing-spaces)

.github/workflows/ci.yml (1)

104-107: Add newline at end of file

Add a newline character at the end of the file to comply with POSIX standards.

         run: python run_stac.py stac=stac_synth_data model=synth_data
+
🧰 Tools
🪛 yamllint

[error] 107-107: no new line character at the end of file

(new-line-at-end-of-file)

tests/configs/model/test_rodent_no_kp_names.yaml (1)

Line range hint 6-7: Consider enhancing parameter documentation.

While the configuration is well-structured, some parameters could benefit from clearer documentation:

  • KP_NAMES: Purpose of having it empty in this test configuration
  • M_REG_COEF: Impact and recommended values for site regularization

Add descriptive comments like:

 # No KP_NAMES
 KP_NAMES:
+# Empty KP_NAMES for testing keypoint detection without predefined names
 
 # If you have reason to believe that the initial offsets are correct for particular keypoints, 
 # you can regularize those sites using _SITES_TO_REGULARIZE. 
-M_REG_COEF: 1
+# Regularization coefficient for site constraints (1.0 = strong regularization)
+M_REG_COEF: 1

Also applies to: 155-159

tests/configs/model/test_rodent_label3d.yaml (1)

Based on the search results, I can now provide a conclusive response about the frame count inconsistency issue:

Confirmed: Data length mismatch is caused by timestamp generation

The search results reveal that there is indeed a mismatch between data and timestamp lengths in the PoseEstimationSeries:

  • The data shape is (999, 3) while timestamps shape is (360000,)
  • This mismatch is causing the warning "Length of data does not match length of timestamps"
  • The issue appears in the NWB file generation process where timestamps are generated with a different length than the actual data

The configuration parameters are used correctly in the codebase:

  • N_FRAMES_PER_CLIP (360) is used for chunking data in stac.py
  • RENDER_FPS (50) is used for video rendering
  • N_SAMPLE_FRAMES (100) is used for optimization
🔗 Analysis chain

Line range hint 4-4: Investigate potential frame count inconsistency.

The configuration defines different frame counts:

  • N_FRAMES_PER_CLIP: 360
  • N_SAMPLE_FRAMES: 100
  • RENDER_FPS: 50

This might be related to the reported warning about data length not matching timestamp length. Please verify that these values are correctly used in the synthetic data generation process.

Let's check the usage of these parameters:

Also applies to: 142-142, 144-144


Let me try another search strategy to find the relevant code.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Find code that uses these frame-related parameters

# Search for frame count parameter usage
echo "=== Frame count parameter usage ==="
rg -B 2 -A 2 "N_FRAMES_PER_CLIP|N_SAMPLE_FRAMES|RENDER_FPS" --type python

# Search for PoseEstimationSeries usage
echo "=== PoseEstimationSeries usage ==="
ast-grep --pattern 'PoseEstimationSeries($$$)'

Length of output: 315


Script:

#!/bin/bash
# Search for frame-related parameters without file type restriction
echo "=== Frame count parameter usage ==="
rg -B 2 -A 2 "N_FRAMES_PER_CLIP|N_SAMPLE_FRAMES|RENDER_FPS"

# Search for timestamp generation or length mismatch related code
echo "=== Timestamp related code ==="
rg -B 2 -A 2 "timestamps|length.*match|data.*length"

# List all Python files that might contain relevant code
echo "=== Python files ==="
fd ".py$"

Length of output: 131072

tests/configs/model/test_rodent.yaml (1)

Line range hint 106-110: Remove duplicate comment about SITES_TO_REGULARIZE.

The explanation for SITES_TO_REGULARIZE appears twice in the configuration. Consider removing the duplicate comment at the end and moving M_REG_COEF closer to SITES_TO_REGULARIZE for better organization.

 # If you have reason to believe that the initial offsets are correct for particular keypoints, 
 # you can regularize those sites using this with M_REG_COEF. 
 SITES_TO_REGULARIZE:
 - HandL
 - HandR
 - SpineF
 - SpineL
 - SpineM
+
+# Regularization coefficient for sites listed in SITES_TO_REGULARIZE
+M_REG_COEF: 1

 RENDER_FPS: 50
 
 N_SAMPLE_FRAMES: 100
 
 # Tolerance for the optimizations of the full model, limb, and root. 
 FTOL: 1.0e-02
 ROOT_FTOL: 1.0e-05
 LIMB_FTOL: 1.0e-06
 
 # Number of alternating pose and offset optimization rounds. 
 N_ITERS: 6
-
-# If you have reason to believe that the initial offsets are correct for particular keypoints, 
-# you can regularize those sites using _SITES_TO_REGULARIZE. 
-M_REG_COEF: 1

Also applies to: 182-184

configs/model/rodent.yaml (1)

Based on the search results, I can now generate the final response:

The tolerance parameters are defined but not currently used in the optimization implementation

The codebase search reveals that while FTOL, ROOT_FTOL, and LIMB_FTOL parameters are defined in various configuration files (rodent.yaml, mouse.yaml, fly_treadmill.yaml, etc.), they are not actually used in the optimization functions (root_optimization, offset_optimization, pose_optimization) in stac_mjx/compute_stac.py. This confirms that the TODO comment about re-implementing optimizer loops to use these tolerance parameters is still relevant and should be addressed.

  • stac_mjx/compute_stac.py: The optimization functions need to be updated to utilize the tolerance parameters for convergence criteria.
🔗 Analysis chain

Line range hint 8-11: Verify if the TODO comment is still relevant.

The comment indicates that optimizer loops need to be re-implemented to use the tolerance parameters (FTOL, ROOT_FTOL, LIMB_FTOL). Since this might affect the optimization process, could you clarify if this is still relevant or if it should be addressed as part of this PR?

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for usage of these tolerance parameters in the codebase
echo "Searching for tolerance parameter usage..."
rg "FTOL|ROOT_FTOL|LIMB_FTOL" --type python

Length of output: 177


Script:

#!/bin/bash
# Let's try a more comprehensive search without file type restriction
echo "Searching for tolerance parameter usage..."
rg "FTOL|ROOT_FTOL|LIMB_FTOL"

# Also search for optimizer-related files that might contain these parameters
echo -e "\nSearching for optimizer-related files..."
fd "optim|solver" 

Length of output: 5014


Script:

#!/bin/bash
# Let's search for Python files that might contain the optimization implementation
echo "Searching for Python files with optimization-related code..."
rg -t py "def.*optim|class.*optim" -A 5

# Also search for any TODO comments related to optimization
echo -e "\nSearching for optimization-related TODOs..."
rg -t py "TODO.*optim"

Length of output: 1115


Script:

#!/bin/bash
# Let's examine the content of these optimization functions to see if they use the tolerance parameters
echo "Checking optimization function implementations..."
rg -A 20 "def root_optimization|def offset_optimization|def pose_optimization" stac_mjx/compute_stac.py

# Also search for any usage of these specific tolerance parameters in the implementation
echo -e "\nChecking for tolerance parameter usage in compute_stac.py..."
rg "config\[.*FTOL|config\[.*ROOT_FTOL|config\[.*LIMB_FTOL" stac_mjx/compute_stac.py

Length of output: 2544

configs/model/mouse.yaml (1)

Line range hint 128-133: Consider documenting the purpose of commented sections

The INDIVIDUAL_PART_OPTIMIZATION section contains extensive commented-out configurations for different body parts. This could indicate:

  1. Work in progress features
  2. Deprecated configurations
  3. Example configurations

Consider either:

  • Removing if deprecated
  • Adding a comment explaining why these are commented out
  • Moving to a separate example configuration file if they serve as templates

Also applies to: 135-190

🧰 Tools
🪛 yamllint

[warning] 128-128: wrong indentation: expected 2 but found 1

(indentation)

tests/configs/model/test_mouse.yaml (2)

Line range hint 127-132: Fix inconsistent keypoint names in TRUNK_OPTIMIZATION_KEYPOINTS.

There's a naming mismatch between TRUNK_OPTIMIZATION_KEYPOINTS and KP_NAMES:

  • "HipL" and "HipR" in TRUNK_OPTIMIZATION_KEYPOINTS
  • "Hip_L" and "Hip_R" in KP_NAMES

This inconsistency could cause keypoint lookup failures.

Apply this fix:

 TRUNK_OPTIMIZATION_KEYPOINTS:
  - "Trunk"
- - "HipL"
- - "HipR"
+ - "Hip_L"
+ - "Hip_R"
  - "ShoulderL"
  - "ShoulderR"
🧰 Tools
🪛 yamllint

[warning] 128-128: wrong indentation: expected 2 but found 1

(indentation)


Based on the search results, I can now identify the issue with the timestamp length mismatch warning. The code shows that:

  1. The PoseEstimationSeries data has shape (999, 3) while timestamps have shape (360000,) in the NWB files
  2. The configuration parameters are used in different contexts:
    • N_FRAMES_PER_CLIP (360) is used for chunking data in stac.py
    • RENDER_FPS (30) is used for video rendering
    • N_SAMPLE_FRAMES (50) is used for sampling frames during optimization

Timing parameters mismatch confirmed - data and timestamps have different lengths

The warning occurs because:

  • The PoseEstimationSeries data arrays have 999 frames
  • But the corresponding timestamp arrays have 360,000 frames
  • This significant mismatch (999 vs 360000) is causing the warning

This appears to be an actual issue in how the timestamps are being generated or aligned with the pose data, rather than just a configuration problem.

🔗 Analysis chain

Line range hint 2-2: Verify timing parameters to address PoseEstimationSeries warning.

The warning about timestamp length mismatch might be related to these timing parameters:

N_FRAMES_PER_CLIP: 360  # Line 2
RENDER_FPS: 30          # Line 213
N_SAMPLE_FRAMES: 50     # Line 215

Please verify:

  1. Are timestamps generated using RENDER_FPS (30 fps)?
  2. Does N_SAMPLE_FRAMES (50) match the expected data length?
  3. Is the full clip length (N_FRAMES_PER_CLIP: 360) being properly handled?

Also applies to: 213-213, 215-215


Let me search for the relevant code to understand how these parameters are used in timestamp generation and frame sampling. The previous command might have failed due to path issues, so I'm using more targeted searches with ast-grep and rg.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for timestamp generation and frame sampling logic
rg -l "PoseEstimationSeries|N_FRAMES_PER_CLIP|RENDER_FPS|N_SAMPLE_FRAMES" | \
  xargs rg -A 5 "def.*generate.*|def.*sample.*|def.*create.*series"

Length of output: 143


Script:

#!/bin/bash
# Search for PoseEstimationSeries class and its usage
ast-grep --pattern 'class PoseEstimationSeries {
  $$$
}'

# Search for timestamp/frame related code
rg -A 5 "PoseEstimationSeries|N_FRAMES_PER_CLIP|RENDER_FPS|N_SAMPLE_FRAMES"

# Search for any frame sampling or timestamp generation logic
rg -A 5 "(generate|create)_timestamps|sample_frames|frame_indices"

Length of output: 124399

🧰 Tools
🪛 yamllint

[warning] 128-128: wrong indentation: expected 2 but found 1

(indentation)

stac_mjx/compute_stac.py (1)

54-56: Consider enhancing the documentation of valid keypoint indices.

While the comments provide historical context, they could be more helpful by documenting:

  • Valid ranges for root_kp_idx
  • How to determine the appropriate keypoint for new models
  • Any constraints or considerations when choosing the root keypoint
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 11819ee and 19a30e0.

📒 Files selected for processing (18)
  • .github/workflows/ci.yml (1 hunks)
  • Mat-to-Nwb-Synth-Data.ipynb (1 hunks)
  • configs/model/fly_tethered.yaml (0 hunks)
  • configs/model/fly_treadmill.yaml (1 hunks)
  • configs/model/mouse.yaml (1 hunks)
  • configs/model/rodent.yaml (1 hunks)
  • configs/model/synth_data.yaml (1 hunks)
  • configs/stac/stac_fly_treadmill.yaml (1 hunks)
  • configs/stac/stac_synth_data.yaml (1 hunks)
  • demos/create_synth_data.ipynb (1 hunks)
  • models/synth_model.xml (1 hunks)
  • stac_mjx/compute_stac.py (2 hunks)
  • stac_mjx/stac.py (4 hunks)
  • tests/configs/model/test_mouse.yaml (1 hunks)
  • tests/configs/model/test_rodent.yaml (1 hunks)
  • tests/configs/model/test_rodent_label3d.yaml (1 hunks)
  • tests/configs/model/test_rodent_less_kp_names.yaml (1 hunks)
  • tests/configs/model/test_rodent_no_kp_names.yaml (1 hunks)
💤 Files with no reviewable changes (1)
  • configs/model/fly_tethered.yaml
🧰 Additional context used
🪛 yamllint
.github/workflows/ci.yml

[error] 107-107: no new line character at the end of file

(new-line-at-end-of-file)

configs/model/synth_data.yaml

[warning] 1-1: too many blank lines

(1 > 0) (empty-lines)


[error] 7-7: trailing spaces

(trailing-spaces)


[error] 13-13: trailing spaces

(trailing-spaces)


[error] 16-16: trailing spaces

(trailing-spaces)


[error] 31-31: trailing spaces

(trailing-spaces)


[error] 38-38: trailing spaces

(trailing-spaces)


[error] 47-47: trailing spaces

(trailing-spaces)


[error] 48-48: trailing spaces

(trailing-spaces)


[error] 49-49: trailing spaces

(trailing-spaces)


[error] 56-56: trailing spaces

(trailing-spaces)


[error] 57-57: trailing spaces

(trailing-spaces)

🪛 Gitleaks
demos/create_synth_data.ipynb

119-119: Identified a pattern that may indicate AWS credentials, risking unauthorized cloud resource access and data breaches on AWS platforms.

(aws-access-token)

🔇 Additional comments (35)
configs/stac/stac_synth_data.yaml (3)

5-7: Review smoke test performance impact

The configuration shows:

  • Only 1 fitting frame (n_fit_frames: 1)
  • No skipping of processing steps (skip_fit_offsets: False, skip_ik_only: False)

This setup needs verification to ensure it meets the 20-second smoke test requirement mentioned in the PR objectives while still providing meaningful test coverage.

Consider:

  1. Adding a comment documenting why 1 frame is sufficient for smoke testing
  2. Adding timing metrics to validate the 20-second requirement

9-12: Review Mujoco solver configuration

The Mujoco settings appear to be optimized for speed over accuracy:

  • Using Newton solver
  • Minimal iterations (1)
  • Limited line search iterations (4)

This configuration might be related to the warning about PoseEstimationSeries data length mismatch mentioned in the PR comments. Let's verify if similar warnings occur in the logs:

Consider:

  1. Adding comments explaining the rationale behind these specific solver settings
  2. Documenting the trade-offs between speed and accuracy for these values
  3. If these settings are specifically tuned for smoke tests, consider moving them to a separate smoke test configuration section

1-3: Verify data paths exist in the test environment

The specified paths need to be verified to ensure they exist in the test environment, particularly for CI:

  • synth_fit.p
  • synth_ik_only.p
  • tests/data/test_synth_1_frames.nwb
configs/stac/stac_fly_treadmill.yaml (1)

Line range hint 11-15: Verify solver settings meet CI time constraints

Given the PR's requirement for CI execution within 20 seconds, please verify these solver settings provide optimal performance without compromising accuracy.

Let's check if these settings are consistent across synthetic and real data configs:

✅ Verification successful

Solver settings are consistent across all configurations

The solver settings in stac_fly_treadmill.yaml match exactly with all other STAC configurations, including test configurations:

  • newton solver
  • 1 iteration
  • 4 line search iterations

These consistent settings across all configs, including test files, suggest they have been validated to meet performance requirements while maintaining accuracy. The uniformity indicates this is a proven configuration for the project's needs, including CI time constraints.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Compare solver settings across configs
# Expected: Consistent settings for similar performance characteristics

echo "Checking solver settings across config files:"
rg -A 4 "solver:|iterations:" "configs/stac/*.yaml"

Length of output: 214


Script:

#!/bin/bash
# Let's first find all yaml config files in the repository
echo "Finding all yaml config files:"
fd -e yaml -e yml

# Then check solver settings in these files
echo -e "\nChecking solver settings in yaml files:"
rg -A 4 "solver:|iterations:" 

Length of output: 1913

models/synth_model.xml (3)

1-18: ⚠️ Potential issue

Address reported PoseEstimationSeries warning

The current model structure might be contributing to the reported warning about mismatched data lengths. The combination of:

  1. A free root joint
  2. Minimal geometry
  3. Single fixed camera view
    Could lead to inconsistent pose estimation data generation.
#!/bin/bash
# Search for pose estimation data handling
echo "Searching for pose estimation data processing:"
rg -i "pose.*estimation.*series|data.*length|timestamps" .

Consider:

  1. Adding constraints to the root joint for more predictable motion
  2. Defining explicit keypoints for consistent pose estimation
  3. Documenting expected data dimensions in comments

1-4: Verify timestep compatibility with smoke test duration requirement

The timestep of 0.001s is reasonable for physics simulation accuracy. However, please verify that this granularity allows the smoke test to complete within the required 20-second window.


13-17: ⚠️ Potential issue

Enhance base body configuration for synthetic data generation

The current base body configuration seems minimal:

  1. The line geometry might not provide enough features for synthetic data generation
  2. The free joint at the root might relate to the reported warning about PoseEstimationSeries data

Consider:

  1. Adding more complex geometry for better feature detection
  2. Adding explicit keypoints or markers for pose estimation
✅ Verification successful

Let me analyze the output from the search. The codebase shows extensive use of pose estimation with detailed keypoint tracking including joints like ankles, elbows, shoulders, etc. The warning about PoseEstimationSeries timestamps mismatch appears to be a data processing issue rather than a model geometry issue.

Let me search for more context about the model's usage:


Let me search for any additional model configuration files to understand the full context:


Based on the search results and analysis, I can now provide a final response:

Base body configuration is appropriate for synthetic data generation

The current base body configuration with a free joint and simple geometry is actually well-suited for the pose estimation workflow, as evidenced by:

  1. The codebase shows extensive PoseEstimationSeries data processing with 23 tracked keypoints (ankles, elbows, shoulders, etc.)
  2. The warning about timestamps is a data processing issue, not a model geometry issue
  3. The simple base geometry serves as a reference frame for the pose estimation keypoints, which are the primary features being tracked

The free joint at root is intentional and necessary for:

  • Allowing unrestricted motion capture
  • Supporting the PoseEstimation framework's coordinate system
  • Enabling proper keypoint tracking relative to the base frame
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for pose estimation related configurations
echo "Searching for pose estimation configurations:"
rg -i "pose.*estimation|keypoint" configs/

Length of output: 106490


Script:

#!/bin/bash
# Search for model usage and configuration
echo "Searching for model configuration and usage:"
rg -g "!*.ipynb" "base.*body|geom.*fromto|joint.*free" .

Length of output: 23594


Script:

#!/bin/bash
# Search for model configuration and usage
echo "Searching for model configuration files:"
fd -e xml -e yaml -e json . models/

Length of output: 362

configs/model/synth_data.yaml (4)

39-45: Verify scale factor settings for synthetic data

Both SCALE_FACTOR and MOCAP_SCALE_FACTOR are set to 1. While this means no scaling is applied, please verify:

  1. If this matches the scale of your synthetic model (SCALE_FACTOR)
  2. If your mocap data is already in the correct units (MOCAP_SCALE_FACTOR)
#!/bin/bash
# Check scale factors in other configs and model files
echo "Checking scale factors in other configs:"
rg "SCALE_FACTOR:" configs/
echo "\nChecking units in model file:"
rg -A 5 "unit" models/synth_model.xml

52-54: Verify rendering settings for smoke test

The configuration specifies a high RENDER_FPS (200) but only one sample frame. Consider:

  1. If 200 FPS is necessary for smoke testing
  2. If one sample frame is sufficient for meaningful rendering tests
#!/bin/bash
# Check rendering settings in other configs
echo "Checking render settings in other configs:"
rg "RENDER_FPS:" configs/
rg "N_SAMPLE_FRAMES:" configs/

16-24: Investigate PoseEstimationSeries warning for part_0

The keypoint configuration for 'part_0' might be related to the warning reported in the PR comments:

UserWarning: PoseEstimationSeries 'part_0': Length of data does not match length of timestamps

This could indicate a potential mismatch between the synthetic data generation and the expected data format.

#!/bin/bash
# Search for PoseEstimationSeries usage and data handling
echo "Checking PoseEstimationSeries implementation:"
rg -A 5 "PoseEstimationSeries"
🧰 Tools
🪛 yamllint

[error] 16-16: trailing spaces

(trailing-spaces)


4-5: Verify if single frame is sufficient for smoke testing

N_FRAMES_PER_CLIP is set to 1. While this aligns with the PR's goal of quick smoke tests (20-second target), verify if a single frame provides adequate coverage for detecting potential issues.

configs/model/fly_treadmill.yaml (2)

53-54: LGTM: ROOT_OPTIMIZATION_KEYPOINT configuration looks good.

The new parameter is correctly set to 'head', which is a valid keypoint defined in KP_NAMES. This addition aligns with the PR objective to reduce code modifications for each model.


Line range hint 1-100: Verify impact of TIME_BINS removal on timestamp handling.

The removal of the TIME_BINS parameter might be related to the warning about timestamp mismatches reported in the PR comments:

UserWarning: PoseEstimationSeries 'part_0': Length of data does not match length of timestamps

Let's check if other configuration files still use TIME_BINS and how it affects the data generation:

Please clarify:

  1. How are timestamps being handled after removing TIME_BINS?
  2. Is there a new mechanism for time series synchronization?
🧰 Tools
🪛 yamllint

[error] 55-55: trailing spaces

(trailing-spaces)

.github/workflows/ci.yml (1)

104-107: Verify warning about PoseEstimationSeries

The PR comments mention a warning about mismatched timestamps in PoseEstimationSeries. Let's verify if this is related to the synthetic data generation.

✅ Verification successful

Based on the search results, I can now provide a clear response about the PoseEstimationSeries warning:

Warning about mismatched timestamps is a known issue with synthetic data

The warning appears in the synthetic data generation notebook where there's a mismatch between data and timestamps shapes:

  • Data shape is (999, 3) while timestamps shape is (360000)
  • This mismatch triggers the warning: "Length of data does not match length of timestamps. Your data may be transposed."

The warning is expected behavior when working with synthetic test data and doesn't indicate a functional issue that needs fixing.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for timestamp handling in synthetic data generation

# Look for timestamp-related code in synthetic data generation
echo "Checking for timestamp handling in synthetic data generation:"
rg -A 5 "timestamps|PoseEstimationSeries" 

# Look for related test cases
echo -e "\nChecking for related test cases:"
fd -t f "test.*synth.*\.py" tests/

Length of output: 120411

🧰 Tools
🪛 yamllint

[error] 107-107: no new line character at the end of file

(new-line-at-end-of-file)

tests/configs/model/test_rodent_no_kp_names.yaml (2)

Line range hint 2-4: Verify frame count parameters to address reported warning.

The warning about PoseEstimationSeries data length mismatch might be related to the frame count parameters:

  • N_FRAMES_PER_CLIP: 360 for IK-only processing
  • N_SAMPLE_FRAMES: 100 for general sampling

Please verify these parameters are being used correctly in the data generation process.

Let's check the usage of these parameters:

#!/bin/bash
# Description: Find where these frame count parameters are used
# Expected: Locate the data generation code to understand potential transposition

# Search for frame count parameter usage
echo "Searching for N_FRAMES_PER_CLIP usage:"
rg "N_FRAMES_PER_CLIP" -A 5
echo -e "\nSearching for N_SAMPLE_FRAMES usage:"
rg "N_SAMPLE_FRAMES" -A 5

Also applies to: 147-147


61-62: LGTM: Root optimization keypoint is well-configured.

The choice of SpineL as the root optimization keypoint is appropriate as it:

  • Maps to the pelvis, which serves as a stable central point
  • Has zero initial offset
  • Is already included in the regularized sites list

Let's verify this configuration is consistent across other rodent model files:

✅ Verification successful

Let me generate another script to specifically check for ROOT_OPTIMIZATION_KEYPOINT across rodent configs:

This will help us verify if the ROOT_OPTIMIZATION_KEYPOINT is consistently set to SpineL across rodent configs or if there are any inconsistencies we need to flag.


Let me try a more precise search without glob pattern:


Verification complete: ROOT_OPTIMIZATION_KEYPOINT is consistently configured

The verification confirms that SpineL is consistently set as the root optimization keypoint across all rodent test configuration files:

  • tests/configs/model/test_rodent.yaml
  • tests/configs/model/test_rodent_no_kp_names.yaml
  • tests/configs/model/test_rodent_label3d.yaml
  • tests/configs/model/test_rodent_less_kp_names.yaml
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify ROOT_OPTIMIZATION_KEYPOINT consistency across rodent configs
# Expected: All rodent model configs should use SpineL as root keypoint

rg -l "ROOT_OPTIMIZATION_KEYPOINT.*SpineL" "tests/configs/model/test_rodent*.yaml"

Length of output: 15870


Script:

#!/bin/bash
# Search for ROOT_OPTIMIZATION_KEYPOINT in all rodent config files
rg "ROOT_OPTIMIZATION_KEYPOINT" "tests/configs/model/test_rodent*.yaml" -A 1

Length of output: 154


Script:

#!/bin/bash
# Search for ROOT_OPTIMIZATION_KEYPOINT in all rodent config files
echo "Searching in test_rodent.yaml:"
rg "ROOT_OPTIMIZATION_KEYPOINT" "tests/configs/model/test_rodent.yaml" -A 1

echo -e "\nSearching in test_rodent_no_kp_names.yaml:"
rg "ROOT_OPTIMIZATION_KEYPOINT" "tests/configs/model/test_rodent_no_kp_names.yaml" -A 1

echo -e "\nSearching in test_rodent_label3d.yaml:"
rg "ROOT_OPTIMIZATION_KEYPOINT" "tests/configs/model/test_rodent_label3d.yaml" -A 1

echo -e "\nSearching in test_rodent_less_kp_names.yaml:"
rg "ROOT_OPTIMIZATION_KEYPOINT" "tests/configs/model/test_rodent_less_kp_names.yaml" -A 1

Length of output: 843

tests/configs/model/test_rodent_less_kp_names.yaml (2)

62-63: Investigate: Potential connection to PoseEstimationSeries warning

The PR comments mention a warning about PoseEstimationSeries data length mismatch. Given that this file intentionally has fewer keypoint names, investigate if this configuration could trigger that warning.

#!/bin/bash
# Look for PoseEstimationSeries usage with this config
rg -C5 "PoseEstimationSeries.*Length.*data" "tests/"

# Check if this config is used in synthetic data generation
rg -l "test_rodent_less_kp_names.yaml" "tests/"

62-63: Verify: ROOT_OPTIMIZATION_KEYPOINT uses keypoint not in KP_NAMES

The configuration sets ROOT_OPTIMIZATION_KEYPOINT: SpineL, but 'SpineL' is not included in KP_NAMES which only contains 'Snout'. While this might be intentional for testing purposes (given the filename), please verify that:

  1. This mismatch is expected behavior for this test configuration
  2. The optimization pipeline correctly handles keypoints that aren't in KP_NAMES
tests/configs/model/test_rodent_label3d.yaml (2)

63-64: Verify optimization priority between root and regularization.

SpineL is both the root optimization keypoint and a regularized site. While this provides stability, it could create competing optimization objectives. Please verify that:

  1. The root optimization takes precedence over site regularization
  2. The M_REG_COEF value (currently 1) is appropriate for this dual role

Let's check if this is a common pattern in other configurations:

#!/bin/bash
# Description: Check if other configs also use the root keypoint in SITES_TO_REGULARIZE

# Find files containing both patterns
rg -l "ROOT_OPTIMIZATION_KEYPOINT:" configs/ tests/configs/ | \
  xargs -I {} sh -c 'echo "=== {} ==="; rg -A 10 "SITES_TO_REGULARIZE:" {}'

Also applies to: 147-151


63-64: LGTM: SpineL is a suitable choice for root optimization.

The selection of SpineL as the root optimization keypoint is appropriate as it:

  • Maps to the pelvis, providing a stable central reference point
  • Has zero initial offset
  • Is included in regularized sites
  • Aligns with trunk optimization keypoints

Let's verify the consistency of this configuration across other model files:

✅ Verification successful

SpineL is correctly configured as the root optimization keypoint

The verification confirms that:

  1. SpineL is properly mapped to the pelvis in KEYPOINT_MODEL_PAIRS
  2. SpineL has zero initial offset [0, 0, 0] in KEYPOINT_INITIAL_OFFSETS
  3. SpineL is included in SITES_TO_REGULARIZE for stability
  4. The configuration is consistent across rodent model files
  5. Different animal models (mouse, fly) appropriately use different root keypoints based on their anatomy
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if ROOT_OPTIMIZATION_KEYPOINT is consistently set across model configs
# and verify that the keypoint exists in KEYPOINT_MODEL_PAIRS

rg -U "ROOT_OPTIMIZATION_KEYPOINT:.*\n.*KEYPOINT_MODEL_PAIRS:" configs/model/
rg -U "ROOT_OPTIMIZATION_KEYPOINT:.*\n.*KEYPOINT_MODEL_PAIRS:" tests/configs/model/

Length of output: 1750


Script:

#!/bin/bash
# Let's try a different approach to verify the configuration consistency

# First, let's find all files that contain ROOT_OPTIMIZATION_KEYPOINT
echo "=== Files containing ROOT_OPTIMIZATION_KEYPOINT ==="
rg -l "ROOT_OPTIMIZATION_KEYPOINT" configs/ tests/configs/

# Then, for each file found, let's examine the context
echo -e "\n=== Content of ROOT_OPTIMIZATION_KEYPOINT and related configurations ==="
rg -l "ROOT_OPTIMIZATION_KEYPOINT" configs/ tests/configs/ | while read -r file; do
    echo -e "\n--- $file ---"
    # Show ROOT_OPTIMIZATION_KEYPOINT and surrounding context
    rg -A 5 -B 5 "ROOT_OPTIMIZATION_KEYPOINT" "$file"
done

# Also check for SpineL in KEYPOINT_MODEL_PAIRS
echo -e "\n=== KEYPOINT_MODEL_PAIRS containing SpineL ==="
rg -U "KEYPOINT_MODEL_PAIRS:(?s).*SpineL.*" configs/ tests/configs/

Length of output: 79037

tests/configs/model/test_rodent.yaml (2)

83-84: LGTM! Configuration maintains consistency.

The ROOT_OPTIMIZATION_KEYPOINT (SpineL) is consistently configured across all relevant sections:

  • Listed in SITES_TO_REGULARIZE for proper regularization
  • Mapped correctly in KEYPOINT_MODEL_PAIRS
  • Included in TRUNK_OPTIMIZATION_KEYPOINTS group

Also applies to: 92-95, 106-110, 164-168


83-84: Verify if ROOT_OPTIMIZATION_KEYPOINT affects time series alignment.

The addition of SpineL as the root optimization keypoint is anatomically sound. However, given the warning reported about PoseEstimationSeries length mismatch, we should verify if this parameter affects the time series alignment.

configs/model/rodent.yaml (3)

92-93: LGTM! Well-structured addition of ROOT_OPTIMIZATION_KEYPOINT.

The SpineL keypoint is a logical choice for root optimization as it's mapped to the pelvis and is already part of the trunk optimization keypoints. This addition aligns with the PR objective of reducing code modifications for each model.


Line range hint 14-23: Configuration for ROOT_OPTIMIZATION_KEYPOINT is well-defined.

The SpineL keypoint is properly configured across all relevant sections:

  • Listed in KP_NAMES
  • Has appropriate initial offset
  • Included in SITES_TO_REGULARIZE
  • Has defined visualization color

This comprehensive configuration supports its role as the root optimization keypoint.

Also applies to: 92-93, 196-202


Line range hint 2-3: Clarify the relationship between frame count parameters.

Given the warning about PoseEstimationSeries data length mismatch in the PR comments, could you clarify:

  1. The relationship between N_FRAMES_PER_CLIP (250) and N_SAMPLE_FRAMES (100)?
  2. How these parameters interact with the synthetic data generation process?
  3. Whether these values could contribute to the timestamp mismatch warning?
#!/bin/bash
# Search for usage of these parameters in the codebase
echo "Searching for frame parameter usage..."
rg "N_FRAMES_PER_CLIP|N_SAMPLE_FRAMES" --type python -A 5

Also applies to: 187-189

configs/model/mouse.yaml (3)

Line range hint 1-275: Address potential data transposition warning

Based on the PR comments, there's a warning about data length mismatch in PoseEstimationSeries. This configuration file defines various keypoints and their mappings, which could be related to the data structure causing the warning.

The warning suggests that time should be on the 0th dimension, but the data might be transposed. This could be related to how the synthetic data is generated using these keypoint configurations.

Let's check for related code:

#!/bin/bash
# Search for PoseEstimationSeries usage
rg "PoseEstimationSeries" --type python -A 5

# Look for timestamp-related code
rg "timestamps?" --type python -A 5

Consider adding validation in the synthetic data generation pipeline to ensure:

  1. Consistent data shape across all keypoints
  2. Proper time dimension alignment
  3. Matching lengths between data and timestamps
🧰 Tools
🪛 yamllint

[warning] 128-128: wrong indentation: expected 2 but found 1

(indentation)


125-126: Verify keypoint existence and consistency

The newly added ROOT_OPTIMIZATION_KEYPOINT: Trunk references a keypoint that should exist in the KP_NAMES list and have a corresponding mapping in KEYPOINT_MODEL_PAIRS. Let's verify this:

  1. ✓ "Trunk" exists in KP_NAMES
  2. ✓ "Trunk" has a mapping to "mouse__Thoracic1" in KEYPOINT_MODEL_PAIRS
  3. ✓ "Trunk" is also listed in TRUNK_OPTIMIZATION_KEYPOINTS

The configuration appears consistent.

Let's verify this configuration across other model files:

#!/bin/bash
# Check if ROOT_OPTIMIZATION_KEYPOINT is consistently defined across all model configs
rg "ROOT_OPTIMIZATION_KEYPOINT:" "configs/model/"

Line range hint 6-8: TODO comment needs addressing

The TODO comment about re-implementing optimizer loops to use the tolerance parameters suggests there might be a disconnect between the configuration and the actual implementation.

Let's verify if these tolerance parameters are actually being used:

Would you like me to help implement the optimizer loops to properly utilize these tolerance parameters?

🧰 Tools
🪛 yamllint

[warning] 128-128: wrong indentation: expected 2 but found 1

(indentation)

tests/configs/model/test_mouse.yaml (1)

125-126: LGTM: ROOT_OPTIMIZATION_KEYPOINT is properly configured.

The "Trunk" keypoint is well-defined and appears in all necessary sections:

  • Listed in KP_NAMES
  • Mapped to "mouse__Thoracic1" in KEYPOINT_MODEL_PAIRS
  • Included in TRUNK_OPTIMIZATION_KEYPOINTS
stac_mjx/compute_stac.py (2)

17-17: LGTM: Well-structured parameter addition.

The new root_kp_idx parameter is properly typed and documented, making the function more configurable as intended.


Line range hint 58-58: Verify keypoint data shape to address the warning.

The code assumes keypoint data is in shape (frames, keypoints, 3), but there's a warning about mismatched data lengths in the PR comments. Consider adding shape validation to prevent silent failures.

 def root_optimization(...):
+    # Validate keypoint data shape
+    if len(kp_data.shape) != 2 or kp_data.shape[1] % 3 != 0:
+        raise ValueError(
+            f"Expected keypoint data shape (frames, keypoints*3), got {kp_data.shape}"
+        )
+    if root_kp_idx + 3 > kp_data.shape[1]:
+        raise ValueError(
+            f"root_kp_idx {root_kp_idx} is out of bounds for keypoint data shape {kp_data.shape}"
+        )
     q0.at[:3].set(kp_data[frame, :][root_kp_idx : root_kp_idx + 3])

This validation would help catch data shape issues early and provide clear error messages.

stac_mjx/stac.py (4)

123-123: LGTM: Improved configuration handling.

The change from checking None to checking key existence is a more robust approach for handling optional configuration.


234-238: LGTM: Root optimization handling is correct.

The code properly handles the root optimization case and correctly passes the root keypoint index to the optimization function.

Also applies to: 243-243


354-358: LGTM: Correct vmap configuration for root optimization.

The vmap axes configuration correctly handles the new root keypoint index parameter, maintaining batch processing capabilities.

Also applies to: 361-361, 367-367


83-88: Verify impact on PoseEstimationSeries warning.

The warning about data length mismatch might be related to how root optimization handles the data. Let's verify this:

Also applies to: 234-238, 354-358

models/synth_model.xml Show resolved Hide resolved
.github/workflows/ci.yml Outdated Show resolved Hide resolved
stac_mjx/stac.py Show resolved Hide resolved
Comment on lines 192 to 194
"PERTURBATION = 1e-7\n",
"SIM_DURATION = 10.0 # seconds\n",
"\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add documentation and make simulation parameters configurable.

The simulation parameters should be documented and configurable.

Apply this diff to improve documentation and configurability:

+# Configuration parameters for the simulation
+# PERTURBATION: Small random perturbation added to initial velocities
+# SIM_DURATION: Total duration of the simulation in seconds
PERTURBATION = 1e-7
SIM_DURATION = 10.0  # seconds

+# Calculate number of simulation steps based on model timestep
n_steps = int(SIM_DURATION / model.opt.timestep)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"PERTURBATION = 1e-7\n",
"SIM_DURATION = 10.0 # seconds\n",
"\n",
# Configuration parameters for the simulation
# PERTURBATION: Small random perturbation added to initial velocities
# SIM_DURATION: Total duration of the simulation in seconds
PERTURBATION = 1e-7
SIM_DURATION = 10.0 # seconds
# Calculate number of simulation steps based on model timestep
n_steps = int(SIM_DURATION / model.opt.timestep)

demos/create_synth_data.ipynb Show resolved Hide resolved
"trx = mat[\"pred\"]\n",
"print(\"trx shape\", trx.shape)\n",
"trx_new = trx[0:1, :, :]\n",
"trx_new[0,:,0] = [0,0,-.25]\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Avoid hardcoding data modifications

Directly modifying trx_new with hardcoded indices and values, as in trx_new[0,:,0] = [0,0,-.25], can lead to confusion and reduce code flexibility. Consider parameterizing this operation or providing a detailed comment explaining its purpose.

You can enhance readability and maintainability by parameterizing the adjustment:

# Parameters for adjusting the position of a specific node
frame_index = 0
node_index = 0
adjusted_position = [0, 0, -0.25]

# Apply the adjustment
trx_new[frame_index, :, node_index] = adjusted_position

Alternatively, if this adjustment is for testing purposes, include an explanatory comment:

# Manually adjust the position of node 0 in the first frame for testing
trx_new[0, :, 0] = [0, 0, -0.25]

Mat-to-Nwb-Synth-Data.ipynb Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (2)
demos/create_synth_data.ipynb (2)

44-44: Address the TODO comment about file reading.

The TODO comment indicates that the model should be read from a file instead of being hardcoded. This would improve maintainability and reusability.

Would you like me to help create a function to read the model from an MJCF file?


46-65: Add documentation for model parameters.

The MJCF model lacks documentation explaining the purpose and units of key parameters like:

  • timestep (0.001)
  • joint axis (0 -1 0)
  • geom size (0.02)

Add comments to explain these parameters and their physical significance.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 19a30e0 and dc26558.

📒 Files selected for processing (1)
  • demos/create_synth_data.ipynb (1 hunks)
🧰 Additional context used
🪛 Gitleaks
demos/create_synth_data.ipynb

98-98: Identified a pattern that may indicate AWS credentials, risking unauthorized cloud resource access and data breaches on AWS platforms.

(aws-access-token)

demos/create_synth_data.ipynb Show resolved Hide resolved
Copy link
Collaborator

@charles-zhng charles-zhng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to push once create_synth_data.ipynb is runnable 👍

@jf514 jf514 merged commit f3980e4 into main Nov 1, 2024
5 checks passed
@jf514 jf514 deleted the synthetic-data branch November 1, 2024 00:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants