Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preprocessing refactoring #119

Open
katiekly opened this issue Dec 7, 2024 · 3 comments
Open

Preprocessing refactoring #119

katiekly opened this issue Dec 7, 2024 · 3 comments
Labels
refactor Improve internal software structure without changing observable outcomes

Comments

@katiekly
Copy link
Collaborator

katiekly commented Dec 7, 2024

Goal: To clarify the intended preprocessing algorithm for refactoring (starting with DLC outputs and ending right before the creation of the training set). Feedback on the below approach from Luiz and others is encouraged.

Create new main function: preprocess_posedata() in a new script preprocessing.py.
This function should have four main steps, with a final fifth step saving the cleaned data to file and optionally outputting a figure visualizing the before-and-after for the user to QC. Currently, some of these functions exist in align_egocentrical.py and create_trainset.py.

  1. lowconf_cleaning(): Load in DeepLabCut CSV containing 3 columns per DLC-tracked body part (X pixel coordinate, Y pixel coordinate, DLC confidence p value). For each bodypart, nan out any frames with confidence less than confidence_threshold for that body part only. Linear interpolate over nan-ed frames.
  2. egocentrically_align_and_center(): Using the two reference key points (usually belly and tailbase), find and apply the rotation matrix required to align all body parts egocentrically using a rotation matrix. Shift the points to make the belly (or whatever the centered reference point is) 0,0 with the orientation reference point always down along the -y axis with x always zero. <Suggest to add the belly centering to crop_and_flip(), renaming to crop_flip_center.> Save PE-seq.npy, which is the ego aligned and belly-centered poses BEFORE any IQR cleaning. Consider renamed PE-seq to egoaligned_pose_estimation.npy
    -- Are the units of PE-seq.npy pixels? I think so yes.
  3. outlier_cleaning(): For each bodypart of a given session, z-score the aligned coordinates and using the IQR_val, identify outliers, nan them out and interpolate over the nans. Repeat for all bodyparts. Redo the z-score to remove the bias of the now-removed outliers.
  4. savgol_filtering(): If savitzky-golay filtering is desired, apply that now.
  5. save_and_visualize_cleaned_egoaligned_poses(): Save PE-seq_clean.npy. Consider renamed PE-seq_clean to cleaned_egoaligned_pose_estimation.npy. Plot a visualization showing the raw DLC timeseries, followed by the results of steps, 1, 2, 3, and 4 for all bodyparts for a sample 1-minute of 1 session.

Anything described above that currently exists in create_trainset(), like IQR and savegol, should be removed from create_trainset(), which should only create the train and test data sets, not transform the data.

@katiekly katiekly added the refactor Improve internal software structure without changing observable outcomes label Dec 7, 2024
@DrSRMiller
Copy link

User should input by the reference body part name string, not the index number. The two reference points should be inputted as arguments or entered into the config as separate parameters for clarity, named "centered_reference_point" and "orientation_reference_point".

@katiekly
Copy link
Collaborator Author

In the next minor or patch version, users could have the option to remove body parts from training in the create_trainset().

@DrSRMiller
Copy link

DrSRMiller commented Dec 10, 2024

image
Source: https://www.cell.com/cell-reports/fulltext/S2211-1247(24)01221-X?uuid=uuid%3Ae8ff0a33-5137-4f70-9418-9db062db3694#fig2

Rather than creating a visual as shown above, let's instead save a table that reports for each session (rows), the % of frames removed and interpolated over due to low confidence (column 1), and the % of frames removed and interpolated over due to IQR outlier detection (column 2).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
refactor Improve internal software structure without changing observable outcomes
Projects
None yet
Development

No branches or pull requests

2 participants