Preprocessing refactoring #119

katiekly · 2024-12-07T00:23:29Z

Goal: To clarify the intended preprocessing algorithm for refactoring (starting with DLC outputs and ending right before the creation of the training set). Feedback on the below approach from Luiz and others is encouraged.

Create new main function: preprocess_posedata() in a new script preprocessing.py.
This function should have four main steps, with a final fifth step saving the cleaned data to file and optionally outputting a figure visualizing the before-and-after for the user to QC. Currently, some of these functions exist in align_egocentrical.py and create_trainset.py.

lowconf_cleaning(): Load in DeepLabCut CSV containing 3 columns per DLC-tracked body part (X pixel coordinate, Y pixel coordinate, DLC confidence p value). For each bodypart, nan out any frames with confidence less than confidence_threshold for that body part only. Linear interpolate over nan-ed frames.
egocentrically_align_and_center(): Using the two reference key points (usually belly and tailbase), find and apply the rotation matrix required to align all body parts egocentrically using a rotation matrix. Shift the points to make the belly (or whatever the centered reference point is) 0,0 with the orientation reference point always down along the -y axis with x always zero. <Suggest to add the belly centering to crop_and_flip(), renaming to crop_flip_center.> Save PE-seq.npy, which is the ego aligned and belly-centered poses BEFORE any IQR cleaning. Consider renamed PE-seq to egoaligned_pose_estimation.npy
-- Are the units of PE-seq.npy pixels? I think so yes.
outlier_cleaning(): For each bodypart of a given session, z-score the aligned coordinates and using the IQR_val, identify outliers, nan them out and interpolate over the nans. Repeat for all bodyparts. Redo the z-score to remove the bias of the now-removed outliers.
savgol_filtering(): If savitzky-golay filtering is desired, apply that now.
save_and_visualize_cleaned_egoaligned_poses(): Save PE-seq_clean.npy. Consider renamed PE-seq_clean to cleaned_egoaligned_pose_estimation.npy. Plot a visualization showing the raw DLC timeseries, followed by the results of steps, 1, 2, 3, and 4 for all bodyparts for a sample 1-minute of 1 session.

Anything described above that currently exists in create_trainset(), like IQR and savegol, should be removed from create_trainset(), which should only create the train and test data sets, not transform the data.

DrSRMiller · 2024-12-10T17:31:10Z

User should input by the reference body part name string, not the index number. The two reference points should be inputted as arguments or entered into the config as separate parameters for clarity, named "centered_reference_point" and "orientation_reference_point".

katiekly · 2024-12-10T17:37:16Z

In the next minor or patch version, users could have the option to remove body parts from training in the create_trainset().

DrSRMiller · 2024-12-10T18:04:39Z

Source: https://www.cell.com/cell-reports/fulltext/S2211-1247(24)01221-X?uuid=uuid%3Ae8ff0a33-5137-4f70-9418-9db062db3694#fig2

Rather than creating a visual as shown above, let's instead save a table that reports for each session (rows), the % of frames removed and interpolated over due to low confidence (column 1), and the % of frames removed and interpolated over due to IQR outlier detection (column 2).

katiekly added the refactor Improve internal software structure without changing observable outcomes label Dec 7, 2024

luiztauffer mentioned this issue Dec 11, 2024

Relocate IQR cleaning into egocentric_alignment() and rename function #22

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preprocessing refactoring #119

Preprocessing refactoring #119

katiekly commented Dec 7, 2024 •

edited

Loading

DrSRMiller commented Dec 10, 2024

katiekly commented Dec 10, 2024

DrSRMiller commented Dec 10, 2024 •

edited

Loading

Preprocessing refactoring #119

Preprocessing refactoring #119

Comments

katiekly commented Dec 7, 2024 • edited Loading

DrSRMiller commented Dec 10, 2024

katiekly commented Dec 10, 2024

DrSRMiller commented Dec 10, 2024 • edited Loading

katiekly commented Dec 7, 2024 •

edited

Loading

DrSRMiller commented Dec 10, 2024 •

edited

Loading