Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance input validation, CLI functionality, and testing for directLFQ #39

Merged
merged 12 commits into from
Aug 13, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .github/workflows/build_windows_package.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ jobs:
with:
auto-update-conda: true
python-version: ${{ matrix.python-version }}
miniconda-version: latest
- name: Master version bumped
id: master_version_bumped
shell: bash -l {0}
Expand Down Expand Up @@ -52,6 +53,7 @@ jobs:
with:
auto-update-conda: true
python-version: ${{ matrix.python-version }}
miniconda-version: latest
- name: Conda info
shell: bash -l {0}
run: conda info
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/nbdev_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ jobs:
with:
auto-update-conda: true
python-version: ${{ matrix.python-version }}
miniconda-version: latest
- name: Conda info
shell: bash -l {0}
run: conda info
Expand Down
6 changes: 6 additions & 0 deletions .github/workflows/publish_and_release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ jobs:
with:
auto-update-conda: true
python-version: ${{ matrix.python-version }}
miniconda-version: latest
- name: Master version bumped
id: master_version_bumped
shell: bash -l {0}
Expand Down Expand Up @@ -51,6 +52,7 @@ jobs:
with:
auto-update-conda: true
python-version: ${{ matrix.python-version }}
miniconda-version: latest
- name: Conda info
shell: bash -l {0}
run: conda info
Expand Down Expand Up @@ -83,6 +85,7 @@ jobs:
with:
auto-update-conda: true
python-version: ${{ matrix.python-version }}
miniconda-version: latest
- name: Conda info
shell: bash -l {0}
run: conda info
Expand Down Expand Up @@ -115,6 +118,7 @@ jobs:
with:
auto-update-conda: true
python-version: ${{ matrix.python-version }}
miniconda-version: latest
- name: Conda info
shell: bash -l {0}
run: conda info
Expand Down Expand Up @@ -150,6 +154,7 @@ jobs:
with:
auto-update-conda: true
python-version: ${{ matrix.python-version }}
miniconda-version: latest
- name: Conda info
shell: bash -l {0}
run: conda info
Expand Down Expand Up @@ -185,6 +190,7 @@ jobs:
with:
auto-update-conda: true
python-version: ${{ matrix.python-version }}
miniconda-version: latest
- name: Conda info
shell: bash -l {0}
run: conda info
Expand Down
2 changes: 2 additions & 0 deletions .github/workflows/quick_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ jobs:
with:
auto-update-conda: true
python-version: ${{ matrix.python-version }}
miniconda-version: latest
- name: Conda info
shell: bash -l {0}
run: conda info
Expand All @@ -45,6 +46,7 @@ jobs:
with:
auto-update-conda: true
python-version: ${{ matrix.python-version }}
miniconda-version: latest
- name: Conda info
shell: bash -l {0}
run: conda info
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/quick_tests_ubuntu.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ jobs:
with:
auto-update-conda: true
python-version: ${{ matrix.python-version }}
miniconda-version: latest
- name: Conda info
shell: bash -l {0}
run: conda info
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/unused/all_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ jobs:
with:
auto-update-conda: true
python-version: ${{ matrix.python-version }}
miniconda-version: latest
- name: Conda info
shell: bash -l {0}
run: conda info
Expand Down
2 changes: 1 addition & 1 deletion directlfq/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@


__project__ = "directlfq"
__version__ = "0.2.19"
__version__ = "0.2.20"
__license__ = "Apache"
__description__ = "An open-source Python package of the AlphaPept ecosystem"
__author__ = "Mann Labs"
Expand Down
2 changes: 1 addition & 1 deletion directlfq/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -197,7 +197,7 @@ def gui():
@click.option("--filename_suffix", "-fs", type=str, default="", help="A suffix to add to the output file name.")
@click.option("--num_cores", "-nc", type = int, default = None, help="The number of cores to use (default is to use multiprocessing).")
@click.option("--deactivate_normalization", "-dn", type = bool, default = False, help="If you want to deactivate the normalization step, you can set this flag to True.")
@click.option("--filter_dict", "-dn", type = bool, default = False, help="In case you want to define specific filters in addition to the standard filters, you can add a yaml file where the filters are defined (see GitHub docu for example).")
@click.option("--filter_dict", "-fd", type = str, default = None, help="In case you want to define specific filters in addition to the standard filters, you can add a yaml file where the filters are defined (see GitHub docu for example).")

def run_directlfq(**kwargs):
print("starting directLFQ")
Expand Down
1 change: 1 addition & 0 deletions directlfq/lfq_manager.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ def run_lfq(input_file, columns_to_add = [], selected_proteins_file :str = None
input_df = lfqutils.import_data(input_file=input_file, input_type_to_use=input_type_to_use, filter_dict=filter_dict)

input_df = lfqutils.sort_input_df_by_protein_id(input_df)
input_df = lfqutils.remove_potential_quant_id_duplicates(input_df)
input_df = lfqutils.index_and_log_transform_input_df(input_df)
input_df = lfqutils.remove_allnan_rows_input_df(input_df)

Expand Down
7 changes: 7 additions & 0 deletions directlfq/normalization.py
Original file line number Diff line number Diff line change
Expand Up @@ -259,10 +259,15 @@ def __init__(self, complete_dataframe, num_samples_quadratic = 100):
self.normalization_function = None

def _run_normalization(self):
self._check_that_there_are_no_duplicate_rows()
if len(self.complete_dataframe.index) <= self._num_samples_quadratic:
self._normalize_complete_input_quadratic()
else:
self._normalize_quadratic_and_linear()

def _check_that_there_are_no_duplicate_rows(self):
if self.complete_dataframe.index.duplicated().any():
raise ValueError("There are duplicate rows in the input dataframe. Ensure that there are no duplicate quant_id/ion values.")

def _normalize_complete_input_quadratic(self):
self.complete_dataframe = self.normalization_function(self.complete_dataframe)
Expand Down Expand Up @@ -295,6 +300,8 @@ def _shift_remaining_dataframe_to_reference_sample(self):
linear_shifted_dataframe = SampleShifterLinear(linear_subset_dataframe, self._merged_reference_sample).ion_dataframe
self.complete_dataframe.loc[ self._linear_subset_rows, :] = linear_shifted_dataframe



@staticmethod
@njit
def _get_num_nas_in_row(row):
Expand Down
24 changes: 24 additions & 0 deletions directlfq/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -310,6 +310,30 @@ def index_and_log_transform_input_df(data_df):
def remove_allnan_rows_input_df(data_df):
return data_df.dropna(axis = 0, how = 'all')

def remove_potential_quant_id_duplicates(data_df : pd.DataFrame):
"""
Remove duplicate entries from a DataFrame based on the QUANT_ID column.

This function removes duplicate rows from the input DataFrame, keeping only the first
occurrence of each unique QUANT_ID. It also logs a warning message if any duplicates
are found and removed.

Args:
data_df (pd.DataFrame): dataframe in directLFQ format

Returns:
pd.DataFrame: dataframe in directLFQ format w duplicate QUANT_ID entries removed.
"""
before_drop = len(data_df)
data_df = data_df.drop_duplicates(subset=config.QUANT_ID, keep='first')
after_drop = len(data_df)
if before_drop != after_drop:
entries_removed = before_drop - after_drop
Comment on lines +329 to +331
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) could be simplified
if (entries_removed:= before_drop - after_drop):

LOGGER.warning(f"Duplicate quant_ids detected. {entries_removed} rows removed from input df.")

return data_df


def sort_input_df_by_protein_id(data_df):
return data_df.sort_values(by = config.PROTEIN_ID,ignore_index=True)

Expand Down
2 changes: 1 addition & 1 deletion misc/bumpversion.cfg
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[bumpversion]
current_version = 0.2.19
current_version = 0.2.20
commit = True
tag = False
parse = (?P<major>\d+)\.(?P<minor>\d+)\.(?P<patch>\d+)(\-(?P<release>[a-z]+)(?P<build>\d+))?
Expand Down
1,182 changes: 94 additions & 1,088 deletions nbdev_nbs/02_normalization.ipynb

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion release/one_click_linux_gui/control
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
Package: directlfq
Version: 0.2.19
Version: 0.2.20
Architecture: all
Maintainer: Mann Labs <[email protected]>
Description: directlfq
Expand Down
2 changes: 1 addition & 1 deletion release/one_click_linux_gui/create_installer_linux.sh
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ python setup.py sdist bdist_wheel
# Setting up the local package
cd release/one_click_linux_gui
# Make sure you include the required extra packages and always use the stable or very-stable options!
pip install "../../dist/directlfq-0.2.19-py3-none-any.whl[stable, gui]"
pip install "../../dist/directlfq-0.2.20-py3-none-any.whl[stable, gui]"

# Creating the stand-alone pyinstaller folder
pip install pyinstaller==4.10
Expand Down
4 changes: 2 additions & 2 deletions release/one_click_macos_gui/Info.plist
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,9 @@
<key>CFBundleIconFile</key>
<string>alpha_logo.icns</string>
<key>CFBundleIdentifier</key>
<string>directlfq.0.2.19</string>
<string>directlfq.0.2.20</string>
<key>CFBundleShortVersionString</key>
<string>0.2.19</string>
<string>0.2.20</string>
<key>CFBundleInfoDictionaryVersion</key>
<string>6.0</string>
<key>CFBundleName</key>
Expand Down
4 changes: 2 additions & 2 deletions release/one_click_macos_gui/create_installer_macos.sh
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ python setup.py sdist bdist_wheel

# Setting up the local package
cd release/one_click_macos_gui
pip install "../../dist/directlfq-0.2.19-py3-none-any.whl[stable, gui]"
pip install "../../dist/directlfq-0.2.20-py3-none-any.whl[stable, gui]"

# Creating the stand-alone pyinstaller folder
pip install pyinstaller==4.10
Expand All @@ -40,5 +40,5 @@ cp ../../LICENSE Resources/LICENSE
cp ../logos/alpha_logo.png Resources/alpha_logo.png
chmod 777 scripts/*

pkgbuild --root dist/directlfq --identifier de.mpg.biochem.directlfq.app --version 0.2.19 --install-location /Applications/directlfq.app --scripts scripts directlfq.pkg
pkgbuild --root dist/directlfq --identifier de.mpg.biochem.directlfq.app --version 0.2.20 --install-location /Applications/directlfq.app --scripts scripts directlfq.pkg
productbuild --distribution distribution.xml --resources Resources --package-path directlfq.pkg dist/directlfq_gui_installer_macos.pkg
2 changes: 1 addition & 1 deletion release/one_click_macos_gui/distribution.xml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
<?xml version="1.0" encoding="utf-8" standalone="no"?>
<installer-script minSpecVersion="1.000000">
<title>directlfq 0.2.19</title>
<title>directlfq 0.2.20</title>
<background mime-type="image/png" file="alpha_logo.png" scaling="proportional"/>
<welcome file="welcome.html" mime-type="text/html" />
<conclusion file="conclusion.html" mime-type="text/html" />
Expand Down
2 changes: 1 addition & 1 deletion release/one_click_windows_gui/create_installer_windows.sh
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ python setup.py sdist bdist_wheel
# Setting up the local package
cd release/one_click_windows_gui
# Make sure you include the required extra packages and always use the stable or very-stable options!
pip install "../../dist/directlfq-0.2.19-py3-none-any.whl[stable, gui]"
pip install "../../dist/directlfq-0.2.20-py3-none-any.whl[stable, gui]"

# Creating the stand-alone pyinstaller folder
pip install pyinstaller==4.10
Expand Down
2 changes: 1 addition & 1 deletion release/one_click_windows_gui/directlfq_innoinstaller.iss
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
; SEE THE DOCUMENTATION FOR DETAILS ON CREATING INNO SETUP SCRIPT FILES!

#define MyAppName "directlfq"
#define MyAppVersion "0.2.19"
#define MyAppVersion "0.2.20"
#define MyAppPublisher "Max Planck Institute of Biochemistry and the University of Copenhagen, Mann Labs"
#define MyAppURL "https://github.com/MannLabs/directlfq"
#define MyAppExeName "directlfq_gui.exe"
Expand Down
2 changes: 1 addition & 1 deletion settings.ini
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ author = Constantin Ammar
author_email = [email protected]
copyright = fast.ai
branch = master
version = 0.2.19
version = 0.2.20
min_python = 3.6
audience = Developers
language = English
Expand Down
1 change: 1 addition & 0 deletions tests/run_quicktests.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,5 +4,6 @@ python download_testfiles.py quicktest
cd quicktests
jupyter nbconvert --to script run_pipeline_w_different_input_formats.ipynb
python run_pipeline_w_different_input_formats.py
directlfq lfq -i ../../test_data/system_tests/quicktests/diann/shortened_input.tsv
cd ..
conda deactivate
Loading