relatednessFinder v0.3.0:

Purpose:

C.L.I tool to help identify estimated relatedness values for the European cohort in BioVU individuals. This tool was main designed to interact with the database on teh below server.

Cloning the repository from github:

If you wish to use this program first clone the repository using the following command:

git clone https://github.com/jtb324/relatednessFinder.git

You have to also initialize the submodules so change directories into the relatednessFinder directory and then run the following two commands.

git submodule init
git submodule update

Once you have cloned the repository you will need to create a virtual environment to properly install dependencies. It is recommended to use conda to create the virtual environment. There is a environment.yml file in the cloned repository so the following commands will create a virtual environment call "relatednessFinder"

cd relatednessFinder

conda env create -f environment.yml

Alternatives: You could also create a virtual environment using virtualenv or pipenv. There is a requirements.txt file in the cloned directory so you can use the command:

pip install -r requirements.txt

Just make sure that you are using python version >= 3.10. If you are familiar with poetry you could also use that. There is a pyproject.toml file in the cloned directory so you could use the command.

poetry install

Once you have the environment created you wil need to activate the environment. This will vary depending if your are using conda or a virtualenv. Once the environment is activated you can call the program from the cloned directory using the command:

python3 relatednessFinder/relatedness_finder.py --help

This command should show a help message with all the arguments and the optional arguments

Inputs for both commands:

The next sections will break down the two commands for the relatednessFinder program. The two commands are the determine-relatedness and gather-distributions commands.

determine-relatedness

This command is used to determine the relatedness between individuals in a list. It will return a text file where each row is a pair with the estimated relatedness between teh pair. You can see the arguments for this command by running:

python3 relatedness_finder.py determine-relatedness --help

Required Inputs:

grid file - This is represented by either the -g or --grid-file flag. The argument is the filepath to a tab separated text file that has two columns. The first column should be the grid ids and the second column should be the phenotype status. For this command you can just label all of the ids as having a phenotype status of 1 because we are not comparing ratios between the cases and controls. The following table shows an example of how the file should look. I've added a header line for display but the grid file input should not have a header line

GRID ID	PHENOTYPE STATUS
Patient 1	1
Patient 2	1

database_path - This argument is represented by either the -d or --database-path flag. This is the filepath to the database on the server.
table_name - This argument is represented by either the -t or --table-name flag. This will be the table name within the database. You can find this output by running the following commands

sqlite3 {path to database}

sqlite>.table

output_path - This argument is represented by either the -o or --output flags. This is ust the path to write the output to. This should be a full filepath that ends in .txt. By default the program writes to ./test.txt

Optional Inputs:

loglevel - This optional argument is represented by the --loglevel flag. This flag allows the user to set the log level as 'warning', 'verbose', or 'debug'. This levels go from the least informative to the most informative, respectively. Warning will only provide information about what parameters were passed to the program while debug will write more information about the whole process.
log_to_console - This flag is represented by --log-to-console. If the user provides this flag then output will be passed to the console through stdout. If not then the output will only be written to a log file.
log filename - This optional argument is represented by the --log-filename flag. This flag allows the user to craete a custom filename for the output log file. By default the program writes log output to test_determine_relatedness.log.

An example of these commands is:

python3 relatedness_finder.py determine-relatedness -g {gene_file} -d {database_path} -t {table_name} --output {output_path} --log-filename {log filename} --loglevel verbose --log-to-console

gather-distributions

This command is used to compare the distributions between two sets of IDs (typically cases and controls). Output will be written to two histograms :

python3 relatedness_finder.py determine-relatedness --help

Required Inputs:

case_control_file - The argument is the filepath to a tab separated text file that has two columns. The first column should be the grid ids and the second column should be the phenotype status. 0's will represent controls and 1's will represent cases. The following table shows an example of how the file should look. I've added a header line for display but the case_control_file input should not have a header line

GRID ID	PHENOTYPE STATUS
Patient 1	1
Patient 2	0

output - This is ust the path to write the output to. This should be a full filepath without a file suffix. The program will append _case.png or _controls.png to the returned histograms.
database_path - This is the filepath to the database on the server.
table_name - This will be the table name within the database. You can find this output by running the following commands

sqlite3 {path to database}

sqlite>.table

Optional Inputs:

loglevel - This optional argument is represented by the --loglevel flag. This flag allows the user to set the log level as 'warning', 'verbose', or 'debug'. This levels go from the least informative to the most informative, respectively. Warning will only provide information about what parameters were passed to the program while debug will write more information about the whole process.
log_to_console - This flag is represented by --log-to-console. If the user provides this flag then output will be passed to the console through stdout. If not then the output will only be written to a log file.
log filename - This optional argument is represented by the --log-filename flag. This flag allows the user to craete a custom filename for the output log file. By default the program writes log output to test_determine_relatedness.log.

An example of this command with all the optional arguments is shown below:

python3 relatedness_finder.py gather-distributions  case_control_file output_path database_path table_name --log-filename {log filename} --loglevel verbose --log-to-console

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
relatednessFinder		relatednessFinder
tests		tests
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
environment.yml		environment.yml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

relatednessFinder v0.3.0:

Purpose:

Cloning the repository from github:

Inputs for both commands:

determine-relatedness

gather-distributions

About

Releases

Packages

Languages

jtb324/relatednessFinder

Folders and files

Latest commit

History

Repository files navigation

relatednessFinder v0.3.0:

Purpose:

Cloning the repository from github:

Inputs for both commands:

determine-relatedness

gather-distributions

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages