-
Notifications
You must be signed in to change notification settings - Fork 147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
10Q's #8
Comments
I don't have any plans to, but you are welcome to and I will merge it in. |
Two issues here helped resolve the 10K problem, both in xbrl.py:
if "http://fasb.org/us-gaap/2013-01-31" in self.EntireInstanceDocument or "http://xbrl.us/us-gaap/2013-01-31" in self.EntireInstanceDocument: and pushed line 106 with: |
Thanks! I think this coincidentally got fixed by @blacksburg98 's simplifications, which removed the section that you modified entirely. If you pull the code after the merge I did earlier today, here's what I get when I download a 10-Q... it appears to work. wget http://www.sec.gov/Archives/edgar/data/899426/000143774912013311/cbmc-20120930.xml from pysec import xbrl In [13]: x.fields However, the automatic downloading and local access of files as bound to the Index django model doesn't work for 10-Qs, though. This was discussed by @jsfenten on the issues page under "xbrl / xml documents embedded in .txt submission". I am thinking he's right--that we should move to downloading the .txt files (which are a combination of various XBRL, HTML and even images in binary form), extract the main XBRL, and save it, perhaps in the database rather than on the filesystem. I don't have time to do this myself, right now at least, though, so maybe someone will jump in! |
Is this resolved? After modifying xbrl_link() to treat "10-K" and "10-Q" as the same, |
@chrisspen, would you be up for doing a little testing to see if everything works OK if I simply modify line 20 of models.py to be: if self.form.startswith('10-K') or self.form.startswith('10-Q'): ? That is, confirm that urls to download 10-Qs are constructed correctly consistently (there's some pattern to generating the URLs to raw files from their ID numbers, but I seem to recall some variance based on form type or other information), they don't overwrite 10-Ks, and it's OK to store them in the Index model? Even better, if it needs any more than this one-line change, feel free to make any changes it needs and I will merge them in. As discussed above, the xbrl.py Python class for extracting values from a file should definitely work well for 10-Ks and 10-Qs, but the way that we download files and store them on the local filesystem using the Django Index model to bind them evolved for a somewhat specific use case and I am not sure if it needs to be generalized to be more flexible or otherwise improved, or not. If we can confirm that the Django model and underlying database and local files system structure works with 10-Ks and 10-Qs with simple changes, then that's great and we should certainly just do that! Thanks for your input. |
Yes, that's exactly what I did. I'm not an expert on the arrangement of Edgar files, and I'm using your code as a reference, but an Index record represents a unique combination of company+form+date and the URL uses the ID from the Index record's filename to construct the XBRL URL, which again appears to be unique to the company and form type. e.g. which appear to be completely unique, so I don't see any problem with them conflicting. However, you're right in that the bigger problem is with xbrl.py not handling it. For my app, I need some way to bulk read all attributes in the file by namespace, and you currently only implemented a method to extract one at a time. So I implemented a method to iterate over the results of an xpath search result. I only tested extracting those elements in the us-gaap namespace, as that seems to be the most commonly used. The only really weird thing I ran into was in the actual data itself. In the period dates for duration contexts, some date ranges don't make sense. e.g. for one context with an id called "D2013Q1", the start and end dates were for Q1 of 2012. I'm not sure if this is bad data, or if this just means that the numbers for 2013 Q1 were compiled from the previous year. But I guess this is a different problem. This is a great project and it's saved me a lot of effort, but since I'm in need of quite a bit more functionality, I've forked it at https://github.com/chrisspen/django-sec. Feel free to borrow from it as you wish. I'm currently using it to import 10-K and 10-Q data in a dev environment. I'll publish it on PyPI once I feel it's ready for prime-time. |
Just checked out your fork--awesome work!! So the zip file (xbrl_link), html_link and index_link all work for I have been confused by the context dates too before. I assumed it was How (where in your code) are you doing the 'bulk read all attributes Also, with the xbrl_fundamentals, the point was to overcome On Tue, Jan 21, 2014 at 7:44 PM, Chris Spencer [email protected] wrote:
|
Thanks. Yeah, I believe xbrl_link() is the only one that really matters. I should actually probably disable the html download, because I'm not doing anything with that, so it's just wasted bandwidth. I implemented a new management command, |
Library does not work with quarterly reports (10Q's). Using .xml file download directly. GetBaseInformation in xbrl.py fails.
Any plans to look at this issue?
The text was updated successfully, but these errors were encountered: