-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xbrl / xml documents embedded in .txt submission #6
Comments
The file listed in the SEC's index (the filename attribute on the Index That string can be transformed into the human-readable index of all those But I don't know how, other than parsing that HTML, to automatically get But as you can see on that sample index page, there's the main html file, In any case, all of that is to permit the possibility of text analysis of On Tue, Jun 4, 2013 at 12:56 AM, Jacob Fenton [email protected]:
|
Ok, at some point I'll look more closely at the SEC's spec for this kind of submission (I think it's there). If I'm following, there's a unique seqence_number for each individual piece. My sense is the way to handle this is with a separate model entirely--call it index_document. At the point in which a filing is downloaded, I'd populate index_document. If it's a normal 10K and the xbrl is found, then index_document is just the xbrl file; if it's a giant mishmash of assorted formats tagged as text, then some (all? only the xbrl?) are extracted, saved to the local file system, and entered into the index_document. Not sure this the best path, but I find it appealing because it gives the possibility of including other files for later analysis. Also, it may have some bearing on 8K's referenced here: #3 -- but I'm not sure. |
In Models.xbrl, xbrl_localpath() assumes the xbrl filename has an .xml extension. But in some cases the xml/xbrl documents appear to have been included in a larger text submission. For example, see here: http://www.sec.gov/Archives/edgar/data/320193/0001193125-12-023398.txt (there are several distinct xbrl files included there). It looks like the text file also includes a binary zip file within a block. All inside a .txt file. Which is, uh, odd.
I came across this while looking into handling 10-Q filings--perhaps this isn't an issue for 10-K's. What do you think is the best way to handle this? Should parsing xml from within the .txt file be part of the download() step? Or is there another file location that these should be pulled from?
The text was updated successfully, but these errors were encountered: