diff --git a/Jupyter/images/Input_trajectories.png b/Jupyter/images/Input_trajectories.png index 8a6a3b49f..5b876a0a6 100644 Binary files a/Jupyter/images/Input_trajectories.png and b/Jupyter/images/Input_trajectories.png differ diff --git a/Jupyter/spatiotemporal-colab.ipynb b/Jupyter/spatiotemporal-colab.ipynb index c2541b1d9..ef788b2db 100644 --- a/Jupyter/spatiotemporal-colab.ipynb +++ b/Jupyter/spatiotemporal-colab.ipynb @@ -484,11 +484,11 @@ "\n", "# Trajectory modeling step 1: gathering of information\n", "\n", - "We begin trajectory modeling with the first step of integrative modeling, gathering information. Trajectory modeling utilizes dynamic information about the bimolecular process. In this case, we utilize heterogeneity models, snapshot models, physical theories, and synthetically generated small-angle X-ray scattering (SAXS) profiles.\n", + "We begin trajectory modeling with the first step of integrative modeling, gathering information. Trajectory modeling utilizes dynamic information about the bimolecular process. In this case, we utilize snapshot models, physical theories, and synthetically generated small-angle X-ray scattering (SAXS) profiles.\n", "\n", "\n", "\n", - "Heterogeneity models inform the possible compositional states at each time point and measure how well a compositional state agrees with input information. Snapshot models provide structural models for each heterogeneity model and measure how well those structural models agree with input information about their structure. Physical theories of macromolecular dynamics inform transitions between states. SAXS data informs the size and shape of the assembling complex and is left for validation.\n", + "Snapshot models inform transitions between sampled time points, and their scores inform trajectory scores. Physical theories of macromolecular dynamics inform transitions between sampled time points. SAXS data informs the size and shape of the assembling complex and is left for validation.\n", "\n", "# Trajectory modeling step 2: representation, scoring function, and search process\n", "\n", @@ -504,10 +504,10 @@ "To score trajectory models, we incorporate both the scores of individual snapshot models, as well as the scores of transitions between them. Under the assumption that the process is Markovian (*i.e.* memoryless), the weight of a trajectory model takes the form:\n", "\n", "$$\n", - "W(\\chi) \\propto \\displaystyle\\prod^{T}_{t=0} P( X_{N,t}, N_{t} | D_{t}) \\cdot \\displaystyle\\prod^{T-1}_{t=0} W(X_{N,t+1},N_{t+1} | X_{N,t},N_{t}, D_{t,t+1}),\n", + "W(\\chi) \\propto \\displaystyle\\prod^{T}_{t=0} P( X_{t} | D_{t}) \\cdot \\displaystyle\\prod^{T-1}_{t=0} W(X_{t+1} | X_{t},D_{t,t+1}),\n", "$$\n", "\n", - "where $t$ indexes times from 0 until the final modeled snapshot ($T$); $P(X_{N,t}, N_{t} | D_{t})$ is the snapshot model score; and $W(X_{N,t+1},N_{t+1} | X_{N,t},N_{t}, D_{t,t+1})$ is the transition score. Trajectory model weights ($W(\\chi)$) are normalized so that the sum of all trajectory models' weights is 1.0. Transition scores are currently based on a simple metric that either allows or disallows a transition. Transitions are only allowed if all proteins in the first snapshot model are included in the second snapshot model. In the future, we hope to include more detailed transition scoring terms, which may take into account experimental information or physical models of macromolecular dynamics.\n", + "where $t$ indexes times from 0 until the final modeled snapshot ($T$); $P(X_{t} | D_{t})$ is the snapshot model score; and $W(X_{t+1} | X_{t},D_{t,t+1})$ is the transition score. Trajectory model weights ($W(\\chi)$) are normalized so that the sum of all trajectory models' weights is 1.0. Transition scores are currently based on a simple metric that either allows or disallows a transition. Transitions are only allowed if all proteins in the first snapshot model are included in the second snapshot model. In the future, we hope to include more detailed transition scoring terms, which may take into account experimental information or physical models of macromolecular dynamics.\n", "\n", "### Searching for good scoring models\n", "\n", diff --git a/Jupyter/spatiotemporal.ipynb b/Jupyter/spatiotemporal.ipynb index 9761fb706..e63880465 100644 --- a/Jupyter/spatiotemporal.ipynb +++ b/Jupyter/spatiotemporal.ipynb @@ -61,7 +61,7 @@ }, { "cell_type": "code", - "execution_count": 49, + "execution_count": null, "id": "1ff4d3e5-04de-4092-8cfc-2ad3018675da", "metadata": {}, "outputs": [], @@ -81,18 +81,10 @@ }, { "cell_type": "code", - "execution_count": 50, + "execution_count": null, "id": "6b78a015-32e5-4701-8c6f-df14863ba9ce", "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Successfully calculated the most likely configurations, and saved them to configuration and topology files.\n" - ] - } - ], + "outputs": [], "source": [ "main_dir = os.getcwd()\n", "os.chdir(main_dir)\n", @@ -468,11 +460,11 @@ "\n", "# Trajectory modeling step 1: gathering of information\n", "\n", - "We begin trajectory modeling with the first step of integrative modeling, gathering information. Trajectory modeling utilizes dynamic information about the bimolecular process. In this case, we utilize heterogeneity models, snapshot models, physical theories, and synthetically generated small-angle X-ray scattering (SAXS) profiles.\n", + "We begin trajectory modeling with the first step of integrative modeling, gathering information. Trajectory modeling utilizes dynamic information about the bimolecular process. In this case, we utilize snapshot models, physical theories, and synthetically generated small-angle X-ray scattering (SAXS) profiles.\n", "\n", "\n", "\n", - "Heterogeneity models inform the possible compositional states at each time point and measure how well a compositional state agrees with input information. Snapshot models provide structural models for each heterogeneity model and measure how well those structural models agree with input information about their structure. Physical theories of macromolecular dynamics inform transitions between states. SAXS data informs the size and shape of the assembling complex and is left for validation.\n", + "Snapshot models inform transitions between sampled time points, and their scores inform trajectory scores. Physical theories of macromolecular dynamics inform transitions between sampled time points. SAXS data informs the size and shape of the assembling complex and is left for validation.\n", "\n", "# Trajectory modeling step 2: representation, scoring function, and search process\n", "\n", @@ -488,10 +480,10 @@ "To score trajectory models, we incorporate both the scores of individual snapshot models, as well as the scores of transitions between them. Under the assumption that the process is Markovian (*i.e.* memoryless), the weight of a trajectory model takes the form:\n", "\n", "$$\n", - "W(\\chi) \\propto \\displaystyle\\prod^{T}_{t=0} P( X_{N,t}, N_{t} | D_{t}) \\cdot \\displaystyle\\prod^{T-1}_{t=0} W(X_{N,t+1},N_{t+1} | X_{N,t},N_{t}, D_{t,t+1}),\n", + "W(\\chi) \\propto \\displaystyle\\prod^{T}_{t=0} P( X_{t} | D_{t}) \\cdot \\displaystyle\\prod^{T-1}_{t=0} W(X_{t+1} | X_{t},D_{t,t+1}),\n", "$$\n", "\n", - "where $t$ indexes times from 0 until the final modeled snapshot ($T$); $P(X_{N,t}, N_{t} | D_{t})$ is the snapshot model score; and $W(X_{N,t+1},N_{t+1} | X_{N,t},N_{t}, D_{t,t+1})$ is the transition score. Trajectory model weights ($W(\\chi)$) are normalized so that the sum of all trajectory models' weights is 1.0. Transition scores are currently based on a simple metric that either allows or disallows a transition. Transitions are only allowed if all proteins in the first snapshot model are included in the second snapshot model. In the future, we hope to include more detailed transition scoring terms, which may take into account experimental information or physical models of macromolecular dynamics.\n", + "where $t$ indexes times from 0 until the final modeled snapshot ($T$); $P(X_{t} | D_{t})$ is the snapshot model score; and $W(X_{t+1} | X_{t},D_{t,t+1})$ is the transition score. Trajectory model weights ($W(\\chi)$) are normalized so that the sum of all trajectory models' weights is 1.0. Transition scores are currently based on a simple metric that either allows or disallows a transition. Transitions are only allowed if all proteins in the first snapshot model are included in the second snapshot model. In the future, we hope to include more detailed transition scoring terms, which may take into account experimental information or physical models of macromolecular dynamics.\n", "\n", "### Searching for good scoring models\n", "\n", @@ -503,26 +495,10 @@ }, { "cell_type": "code", - "execution_count": 51, + "execution_count": null, "id": "bb887efe-0630-47b6-9bf1-85fa216a6816", "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - ".config files are copied\n", - ".csv stoichiometry files are copied\n", - "Scores for snapshot1_0min have been merged and saved\n", - "Scores for snapshot2_0min have been merged and saved\n", - "Scores for snapshot3_0min have been merged and saved\n", - "Scores for snapshot1_1min have been merged and saved\n", - "Scores for snapshot2_1min have been merged and saved\n", - "Scores for snapshot3_1min have been merged and saved\n", - "Scores for snapshot1_2min have been merged and saved\n" - ] - } - ], + "outputs": [], "source": [ "def merge_scores(fileA, fileB, outputFile):\n", " \"\"\"\n", @@ -673,25 +649,10 @@ }, { "cell_type": "code", - "execution_count": 52, + "execution_count": null, "id": "819bb205-fa52-42f9-a4ab-d2f7c3cff0ad", "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Initialing graph...\n", - "Done.\n", - "Calculation composition likelihood...\n", - "Done.\n", - "Scoring directed acycling graph...\n", - "Done.\n", - "Writing output...\n", - "Done.\n" - ] - } - ], + "outputs": [], "source": [ "nodes, graph, graph_prob, graph_scores = spatiotemporal.create_DAG(state_dict, out_pdf=True, npaths=3,\n", " input_dir=input, scorestr='_scores.log',\n", @@ -737,54 +698,10 @@ }, { "cell_type": "code", - "execution_count": 53, + "execution_count": null, "id": "f1436e33-81b6-400c-bf9c-7b667455265a", "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - ".config files are copied\n", - ".csv stoichiometry files are copied\n", - "Copied ../modeling/Snapshots/Snapshots_Modeling/snapshot1_0min/good_scoring_models/scoresA.txt to ./data/1_0min_scoresA.log\n", - "Copied ../modeling/Snapshots/Snapshots_Modeling/snapshot1_0min/good_scoring_models/scoresB.txt to ./data/1_0min_scoresB.log\n", - "Copied ../modeling/Snapshots/Snapshots_Modeling/snapshot2_0min/good_scoring_models/scoresA.txt to ./data/2_0min_scoresA.log\n", - "Copied ../modeling/Snapshots/Snapshots_Modeling/snapshot2_0min/good_scoring_models/scoresB.txt to ./data/2_0min_scoresB.log\n", - "Copied ../modeling/Snapshots/Snapshots_Modeling/snapshot3_0min/good_scoring_models/scoresA.txt to ./data/3_0min_scoresA.log\n", - "Copied ../modeling/Snapshots/Snapshots_Modeling/snapshot3_0min/good_scoring_models/scoresB.txt to ./data/3_0min_scoresB.log\n", - "Copied ../modeling/Snapshots/Snapshots_Modeling/snapshot1_1min/good_scoring_models/scoresA.txt to ./data/1_1min_scoresA.log\n", - "Copied ../modeling/Snapshots/Snapshots_Modeling/snapshot1_1min/good_scoring_models/scoresB.txt to ./data/1_1min_scoresB.log\n", - "Copied ../modeling/Snapshots/Snapshots_Modeling/snapshot2_1min/good_scoring_models/scoresA.txt to ./data/2_1min_scoresA.log\n", - "Copied ../modeling/Snapshots/Snapshots_Modeling/snapshot2_1min/good_scoring_models/scoresB.txt to ./data/2_1min_scoresB.log\n", - "Copied ../modeling/Snapshots/Snapshots_Modeling/snapshot3_1min/good_scoring_models/scoresA.txt to ./data/3_1min_scoresA.log\n", - "Copied ../modeling/Snapshots/Snapshots_Modeling/snapshot3_1min/good_scoring_models/scoresB.txt to ./data/3_1min_scoresB.log\n", - "Copied ../modeling/Snapshots/Snapshots_Modeling/snapshot1_2min/good_scoring_models/scoresA.txt to ./data/1_2min_scoresA.log\n", - "Copied ../modeling/Snapshots/Snapshots_Modeling/snapshot1_2min/good_scoring_models/scoresB.txt to ./data/1_2min_scoresB.log\n", - "Initialing graph...\n", - "Done.\n", - "Calculation composition likelihood...\n", - "Done.\n", - "Scoring directed acycling graph...\n", - "Done.\n", - "Writing output...\n", - "Done.\n", - "Initialing graph...\n", - "Done.\n", - "Calculation composition likelihood...\n", - "Done.\n", - "Scoring directed acycling graph...\n", - "Done.\n", - "Writing output...\n", - "Done.\n", - "Temporal precision between ../output_modelA/labeled_pdf.txt and ../output_modelB/labeled_pdf.txt:\n", - "1.0\n", - "Step 1: calculation of temporal precision IS COMPLETED\n", - "\n", - "\n" - ] - } - ], + "outputs": [], "source": [ "## 1 - calculation of temporal precision\n", "\n", @@ -930,22 +847,10 @@ }, { "cell_type": "code", - "execution_count": 54, + "execution_count": null, "id": "d3669407-4ddc-4e1f-b8ba-c8638024cd56", "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Precision of ./output/labeled_pdf.txt\n", - "1.0\n", - "Step 2: calculation of precision of the model IS COMPLETED\n", - "\n", - "\n" - ] - } - ], + "outputs": [], "source": [ "## 2 - calculation of precision of the model\n", "\n", @@ -991,20 +896,10 @@ }, { "cell_type": "code", - "execution_count": 55, + "execution_count": null, "id": "f3f7c451-2b04-4e8b-b39c-16c493e8a1a6", "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Step 3b: copy number validation IS COMPLETED\n", - "\n", - "\n" - ] - } - ], + "outputs": [], "source": [ "def read_labeled_pdf(pdf_file):\n", " \"\"\"\n", @@ -1156,89 +1051,10 @@ }, { "cell_type": "code", - "execution_count": 56, + "execution_count": null, "id": "d5cc728a-6159-4317-94eb-525dccc780aa", "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Finishing: snapshot1_0min.pdb\n", - "Finishing: snapshot2_0min.pdb\n", - "Finishing: snapshot3_0min.pdb\n", - "Finishing: snapshot1_1min.pdb\n", - "Finishing: snapshot2_1min.pdb\n", - "Finishing: snapshot3_1min.pdb\n", - "Finishing: snapshot1_2min.pdb\n", - "All .dat files have been copied successfully...\n", - "...lets proceed to FoXS\n", - "begin read_pdb:\n", - " WARNING No atoms were read from snapshot1_0min.pdb; perhaps it is not a PDB file.\n", - "end read_pdb\n", - "WARNING can't parse input file snapshot1_0min.pdb\n", - "FoXS for 0min is calculated and ready to create a plot. Nr of states is: 3\n", - "Plot 0min_FoXS.png is created\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "line 0: Cannot load input from 'fit.plt'\n", - "\n", - "/usr/local/bin/mv: cannot stat 'fit.plt': No such file or directory\n", - "/usr/local/bin/mv: cannot stat 'fit.png': No such file or directory\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "begin read_pdb:\n", - " WARNING No atoms were read from snapshot1_1min.pdb; perhaps it is not a PDB file.\n", - "end read_pdb\n", - "WARNING can't parse input file snapshot1_1min.pdb\n", - "FoXS for 1min is calculated and ready to create a plot. Nr of states is: 3\n", - "Plot 1min_FoXS.png is created\n", - "There is only one state in 2min\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "line 0: Cannot load input from 'fit.plt'\n", - "\n", - "/usr/local/bin/mv: cannot stat 'fit.plt': No such file or directory\n", - "/usr/local/bin/mv: cannot stat 'fit.png': No such file or directory\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "begin read_pdb:\n", - " WARNING No atoms were read from snapshot1_2min.pdb; perhaps it is not a PDB file.\n", - "end read_pdb\n", - "WARNING can't parse input file snapshot1_2min.pdb\n", - "FoXS for 2min is calculated and ready to create a plot. Nr of states is: 1\n", - "Step 4a: SAXS validation IS COMPLETED\n", - "\n", - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "line 0: Cannot load input from 'snapshot1_2min_2min_exp.plt'\n", - "\n", - "/usr/local/bin/mv: cannot stat 'snapshot1_2min_2min_exp.plt': No such file or directory\n", - "/usr/local/bin/mv: cannot stat 'snapshot1_2min_2min_exp.png': No such file or directory\n" - ] - } - ], + "outputs": [], "source": [ "# 4a - SAXS\n", "\"\"\"\n", @@ -1468,7 +1284,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.15" + "version": "3.11.7" } }, "nbformat": 4, diff --git a/doc/images/Input_trajectories.png b/doc/images/Input_trajectories.png index 8a6a3b49f..5b876a0a6 100644 Binary files a/doc/images/Input_trajectories.png and b/doc/images/Input_trajectories.png differ diff --git a/doc/trajectory.md b/doc/trajectory.md index a670b1493..7c568cff7 100644 --- a/doc/trajectory.md +++ b/doc/trajectory.md @@ -5,11 +5,11 @@ Here, we describe the final modeling problem in our composite workflow, how to b # Trajectory modeling step 1: gathering of information {#trajectories1} -We begin trajectory modeling with the first step of integrative modeling, gathering information. Trajectory modeling utilizes dynamic information about the bimolecular process. In this case, we utilize heterogeneity models, snapshot models, physical theories, and synthetically generated small-angle X-ray scattering (SAXS) profiles. +We begin trajectory modeling with the first step of integrative modeling, gathering information. Trajectory modeling utilizes dynamic information about the bimolecular process. In this case, we utilize snapshot models, physical theories, and synthetically generated small-angle X-ray scattering (SAXS) profiles. \image html Input_trajectories.png width=600px -Heterogeneity models inform the possible compositional states at each time point and measure how well a compositional state agrees with input information. Snapshot models provide structural models for each heterogeneity model and measure how well those structural models agree with input information about their structure. Physical theories of macromolecular dynamics inform transitions between states. SAXS data informs the size and shape of the assembling complex and is left for validation. +Snapshot models inform transitions between sampled time points, and their scores inform trajectory scores. Physical theories of macromolecular dynamics inform transitions between sampled time points. SAXS data informs the size and shape of the assembling complex and is left for validation. # Trajectory modeling step 2: representation, scoring function, and search process {#trajectories2} @@ -26,10 +26,10 @@ We choose to represent dynamic processes as a trajectory of snapshot models, wit To score trajectory models, we incorporate both the scores of individual snapshot models, as well as the scores of transitions between them. Under the assumption that the process is Markovian (*i.e.* memoryless), the weight of a trajectory model takes the form: \f[ -W(\chi) \propto \displaystyle\prod^{T}_{t=0} P( X_{N,t}, N_{t} | D_{t}) \cdot \displaystyle\prod^{T-1}_{t=0} W(X_{N,t+1},N_{t+1} | X_{N,t},N_{t}, D_{t,t+1}), +W(\chi) \propto \displaystyle\prod^{T}_{t=0} P( X_{t} | D_{t}) \cdot \displaystyle\prod^{T-1}_{t=0} W(X_{t+1} | X_{t},D_{t,t+1}) \f] -where \f$t\f$ indexes times from 0 until the final modeled snapshot (\f$T\f$); \f$P(X_{N,t}, N_{t} | D_{t})\f$ is the snapshot model score; and \f$W(X_{N,t+1},N_{t+1} | X_{N,t},N_{t}, D_{t,t+1})\f$ is the transition score. Trajectory model weights (\f$W(\chi)\f$) are normalized so that the sum of all trajectory models' weights is 1.0. Transition scores are currently based on a simple metric that either allows or disallows a transition. Transitions are only allowed if all proteins in the first snapshot model are included in the second snapshot model. In the future, we hope to include more detailed transition scoring terms, which may take into account experimental information or physical models of macromolecular dynamics. +where \f$t\f$ indexes times from 0 until the final modeled snapshot (\f$T\f$); \f$P(X_{t} | D_{t})\f$ is the snapshot model score; and \f$W(X_{t+1} | X_{t}, D_{t,t+1})\f$ is the transition score. Trajectory model weights (\f$W(\chi)\f$) are normalized so that the sum of all trajectory models' weights is 1.0. Transition scores are currently based on a simple metric that either allows or disallows a transition. Transitions are only allowed if all proteins in the first snapshot model are included in the second snapshot model. In the future, we hope to include more detailed transition scoring terms, which may take into account experimental information or physical models of macromolecular dynamics. ### Searching for good scoring models {#trajectory_searching}