Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing Data: mixed.qtl.Hwang_King_2016 #206

Open
jd-campbell opened this issue May 17, 2024 · 6 comments
Open

Missing Data: mixed.qtl.Hwang_King_2016 #206

jd-campbell opened this issue May 17, 2024 · 6 comments
Assignees

Comments

@jd-campbell
Copy link

I have noticed that there is missing data in the mixed.qtl.Hwang_King_2016 directory. The *qtl.tsv file only contains 2 QTLs but the SoyBase MySQL database lists 27 QTLs.

#qtl_identifier	trait_name	genetic_map	linkage_group	start	end	peak
mqCanopy wilt-019	Canopy wilt	GmComposite2003	A1	0.98	2.98	1.98						
mqCanopy wilt-021	Canopy wilt	GmComposite2003	D2	46.8	48.8	47.8						

@jd-campbell Will review the paper and SoyBase MySQL to ensure all the data is in the DS.

@adf-ncgr
Copy link
Contributor

@jd-campbell not %100 sure but this sounds potentially related to some other issues that I'm guessing may stem from scripts that Sam had written to generate the files for soybean QTLs from the info in the soybase mysql. At present I have no clue as to where those scripts may be but will send a flare up to Sam and see if he has any recollection of where he might have put them.

@adf-ncgr
Copy link
Contributor

Sam was super-fast and helpful in his response. The scripts are here:
https://github.com/sammyjava/SoyBase
He did say that the direct outputs were subjected to ad hoc munging due to naming conflicts and the like, but seems like a good place to start (provided I can actually figure out how to run the scripts, which he said require some ssh tunneling to the mysql db). Anyway, this may also be relevant for #205 so I'll hopefully be able to make some headway on it.

@jd-campbell
Copy link
Author

@adf-ncgr Thanks for the info. This helps in my work. Please send my thanks to Sam also!

@adf-ncgr
Copy link
Contributor

@jd-campbell not sure this one is quite ready to be closed, but here's an update. I got Sam's code to run and for this dataset it seems to have produced 26 QTLs, although one of them (mqCanopy wilt-013) looks like it may be problematic without location info:

mqCanopy wilt-014       Canopy wilt     GmComposite2003_D1b     50.11   52.61   51.36
mqCanopy wilt-019       Canopy wilt     GmComposite2003_A1      0.98    2.98    1.98
mqCanopy wilt-021       Canopy wilt     GmComposite2003_D2      46.8    48.8    47.8
mqCanopy wilt-013       Canopy wilt
mqCanopy wilt-008       Canopy wilt     GmComposite2003_A1      16.16   18.16   17.16
mqCanopy wilt-012       Canopy wilt     GmComposite2003_D2      124.0   126.0   125.0
mqCanopy wilt-015       Canopy wilt     GmComposite2003_D2      114.97  124.02  119.5
mqCanopy wilt-007       Canopy wilt     GmComposite2003_A1      2.54    4.54    3.54
mqCanopy wilt-023       Canopy wilt     GmComposite2003_D1b     47.69   49.69   48.69
mqCanopy wilt-005       Canopy wilt     GmComposite2003_D1b     83.04   85.04   84.04
mqCanopy wilt-011       Canopy wilt     GmComposite2003_D2      56.07   58.07   57.07
mqCanopy wilt-022       Canopy wilt     GmComposite2003_D1b     33.42   35.42   34.42
mqCanopy wilt-016       Canopy wilt     GmComposite2003_D1b     3.79    6.54    5.17
mqCanopy wilt-026       Canopy wilt     GmComposite2003_L       47.2    49.2    48.2
mqCanopy wilt-002       Canopy wilt     GmComposite2003_D1b     0.0     1.0     0.5
mqCanopy wilt-017       Canopy wilt     GmComposite2003_D1b     4.51    6.51    5.51
mqCanopy wilt-024       Canopy wilt     GmComposite2003_B1      33.25   35.25   34.25
mqCanopy wilt-006       Canopy wilt     GmComposite2003_D2      51.4    53.4    52.4
mqCanopy wilt-010       Canopy wilt     GmComposite2003_B1      54.8    56.8    55.8
mqCanopy wilt-027       Canopy wilt     GmComposite2003_D2      125.5   127.5   126.5
mqCanopy wilt-001       Canopy wilt     GmComposite2003_D1b     11.58   13.58   12.58
mqCanopy wilt-009       Canopy wilt     GmComposite2003_B1      75.1    77.1    76.1
mqCanopy wilt-003       Canopy wilt     GmComposite2003_D1b     51.61   53.61   52.61
mqCanopy wilt-020       Canopy wilt     GmComposite2003_B1      64.82   85.59   75.21
mqCanopy wilt-018       Canopy wilt     GmComposite2003_D1b     84.04   85.59   84.82
mqCanopy wilt-025       Canopy wilt     GmComposite2003_L       81.9    83.9    82.9

In any case, I'm not sure why the datastore file would only have 2 QTLs since this one seems more complete (though maybe still not entirely complete?). I'll try to explore a little more but wanted to let you know there's at least some progress on this.

@adf-ncgr adf-ncgr reopened this May 24, 2024
@adf-ncgr
Copy link
Contributor

OK, it looks like the issue with that one QTL without location info is probably a data issue, and not the fault of the code. mqCanopy wilt-013 is one of ~40 QTLs without an entry in the qtl_position_table :

select QTLID, QTLName from qtl_table where QTLID not in (select QTLID from qtl_position_table);
+-------+-----------------------------+
| QTLID | QTLName                     |
+-------+-----------------------------+
|    18 | Chlorimuron sensitivity 1-4 |
|    19 | Chlorimuron sensitivity 1-5 |
|    20 | Chlorimuron sensitivity 1-6 |
|    27 | Chlorimuron sensitivity 2-2 |
|  1208 | cqSeed protein-002          |
|    72 | Fe effic 2-1                |
|   163 | Leaflet ash 1-6             |
|  1410 | Leaflet shape 9-5           |
|   175 | Lodging 4-1                 |
|  4291 | mqCanopy wilt-013           |
|   440 | Plant height 11-4           |
|  4072 | Plant height 37-7           |
|   393 | Plant height 4-3            |
|   408 | Plant height 5-14           |
|   418 | Plant height 6-10           |
|   421 | Plant height 6-13           |
|   417 | Plant height 6-9            |
|   425 | Plant height 7-3            |
|   464 | Pod dehiscence 1-11         |
|   465 | Pod dehiscence 1-12         |
|  2534 | Sclero 8-4                  |
|   736 | SCN 10-2                    |
|   732 | SCN 9-4                     |
|   733 | SCN 9-5                     |
|   950 | SDS 8-4                     |
|   554 | Seed protein 5-5            |
|   555 | Seed protein 5-6            |
|   976 | Seed sucrose 1-11           |
|   977 | Seed sucrose 1-12           |
|   979 | Seed sucrose 1-14           |
|   980 | Seed sucrose 1-15           |
|   981 | Seed sucrose 1-16           |
|   982 | Seed sucrose 1-17           |
|   826 | Seed weight 3-7             |
|   828 | Seed weight 3-9             |
|  1193 | Seed yield 15-14            |
|   894 | Seed yield 3-3              |
|   965 | Stem length, main 1-1       |
+-------+-----------------------------+
38 rows in set (0.0682 sec)

Also note that the db seems to have only 26 not 27 QTLs (at least, per select count(*) from qtl_table where QTLName like 'mqCanopy wilt%'), so I think the version I got out of running the code is probably close to correct. Let me know if you think that one QTL missing a position can be fixed in the db, otherwise I'll just replace the datastore file with the new one.

@maxglycine
Copy link

@adf-ncgr @jd-campbell Since the paper says that mqCanopy wilt-013 (QTL name 5-2) is only associated with Satt229, the position values should be 92.88 94.88 93.88. That is 1 cM on each side of Satt229 which the database says is at 93.88 on LG L or Gm19. I am not sure why it was left out, but this record was problematic and had to be adjusted after the data was originally entered by an undergrad student worker. I have inserted mqCanopy wilt-013 into both stage and production MySQL databases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants