-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How do I extract similar BGC distance values from the interactive output #72
Comments
Hi, I am not associated with BiG-SLiCE but I ran into similar issues.
Now you have the interactive visualization, which shows information about your BGC. You're interested in specifically the distance value of your BGCs to the closest GCF, which is denoted in the red. So, for example, your BGC1 has a distance of 1864 to GCF_008711465.1/NZ_VXKQ01000005.region001, and you want to be able to get the lowest distance value for all 1000 of your BGCs to their closest GCFs? |
Yes, my current situation and what I want to do is just as you described.
|
I generated this data in python version 3.7.12. The relevant libraries were sqlite3, pandas, and numpy. Here's the code for how I scraped the relevant information in python.
This should get you on the right path. I had to hardcode in how to parse the names of the folders, but the general idea of how to gain access to the data in the SQL is there. To figure out generally how to access things in SQL (where the data was being stored), I used this script.
This way, I could open up Hope this helps! |
Hi, ialas. Thank you for your reply. I used sqlitebrowser to open the SQL database file ( If I am right, the data extracted by the first script is not what I want. The data I need is the distance value under module "Similar BGCs". Because, my final goal is to judge the novelty of my BGCs based on these distance values. If a query BGC has a minimum distance value (d) greater than 900, it will be considered a novel BGC. I hope I'm wrong, so I can use your helpful script to solve this problem soon.
|
Hi, However, I opened the output results using the Flask server script, and noticed that all BGCs are being uniformly classified into the same GCF model (GCF_0001) with identical distances(199.xxxx). While I'm not familiar with Python, it seems that this issue might be tied to the extraction of values from the "gcf_membership," as discussed in your previous conversation with @ialas . I'm seeking guidance on how to specifically address this problem. Could you provide insights into potential missteps in my execution or configuration of BIG-SLICE? Any assistance in resolving this particular situation would be greatly appreciated. |
Hi, |
@ChrisC610 |
Thanks!Your help means a lot to me! |
@boykawang
Do you know what's wrong with the bigslice? Could you please be so kind to help me use the query mode for the novelty of BGCs according to BiG-FAM database? |
Hi, it appears that your command did not add an output directory argument. |
Hi, Thanks a lot for your reply!
Could you be so kind to tell me what parameters did you use in antiSMASH, so that the bigslice can run successfully? |
Hi. In fact, I used antiSMASH-6.1.1. |
Did you solve this problem? I downloaded this file http://bioinformatics.nl/~kauts001/ltr/bigslice/paper_data/data/full_run_result.zip and used it as the output folder, but it still gives an error: "Can't find a matching HMM library in the database!" This is very confusing to me. |
@htaohan download_bigslice_hmmdb may be solve problems |
@boykawang Hi, sorry to bother you. I would like to ask some questions regarding the results from bigslice(v1.1.0). Could you please explain how to convert the report.db results into a visualized web interface? Additionally, my goal is also to analyze the distances between my BGCs and the reference database(BigSlice_1.2M_database/full_run_result) to reflect their novelty. Do you have any solutions for obtaining the minimum distance to the reference GCFs in bulk? Thank you!" And here is my command and result: |
I remember the table named 'gcf_membership,' which documents the distances of different GCFs. Specifically, it includes the columns of 'gcf_membership'. |
Hi, I had some problems extracting data from interactive output pages.
I successfully ran the query module to analyze the novelty of my BGCs (more than 1000), and here are the command I used:
bigslice --query ./my_BGC_gbk_files --n_ranks 3 ~/BigSlice_1.2M_database/full_run_result
I opened the output results using flask server script according the instructions, and the results were displayed on the web page. The important data I need are the distance values between my BGC and the reference BGCs (marked in the red box in following figure). Due to the huge number of reference BGCs, a single BGC needs a lot of web pages to display the distance values, and the table on web page can not be sorted according to the value of the distance. My goal is to obtain the minimum distance value between each of my BGCs and the reference BGC.
I have tried to find the distance values I need in both the interactive web source codes and the output sqlite database files, but they are not directly displayed in the codes or sqlite database files.
If you have any solutions or instructions, please contact me as soon as possible. Thanks!
The text was updated successfully, but these errors were encountered: