10Q's #8

ghost · 2013-09-12T11:38:45Z

Library does not work with quarterly reports (10Q's). Using .xml file download directly. GetBaseInformation in xbrl.py fails.
Any plans to look at this issue?

lukerosiak · 2013-09-12T14:31:27Z

I don't have any plans to, but you are welcome to and I will merge it in.

ghost · 2013-09-12T15:47:58Z

Two issues here helped resolve the 10K problem, both in xbrl.py:

Lines 96 and 100 - Invest_TaxonomyVersion is set to year 2019, probably not related to problem, just caught it. These were changed to the appropriate years in each line.
To make 10K's work, 2013 taxonomy needs to be set. I simply added to GetBaseInformation the following (at the top of the "if"):

if "http://fasb.org/us-gaap/2013-01-31" in self.EntireInstanceDocument or "http://xbrl.us/us-gaap/2013-01-31" in self.EntireInstanceDocument:
#This IS the 2013 US GAAP taxonomy
self.fields['USGAAP_TaxonomyVersion'] = "http://fasb.org/us-gaap/2013-01-31"
self.fields['Invest_TaxonomyVersion'] = "http://xbrl.sec.gov/invest/2013-01-31"

and pushed line 106 with:
#DEI Taxonomy
if "http://xbrl.sec.gov/dei/2013-01-31" in self.EntireInstanceDocument:
self.fields['DEI_TaxonomyVersion'] = "http://xbrl.sec.gov/dei/2013-01-31"

lukerosiak · 2013-09-12T16:34:32Z

Thanks! I think this coincidentally got fixed by @blacksburg98 's simplifications, which removed the section that you modified entirely. If you pull the code after the merge I did earlier today, here's what I get when I download a 10-Q... it appears to work.

wget http://www.sec.gov/Archives/edgar/data/899426/000143774912013311/cbmc-20120930.xml

from pysec import xbrl
x = xbrl.XBRL('~/research/sec/pysec/cbmc-20120930.xml')

In [13]: x.fields
Out[13]:
{'Assets': 1816000.0,
'BalanceSheetDate': '2012-09-30',
'CommitmentsAndContingencies': 0,
'ComprehensiveIncome': -788000.0,
'ComprehensiveIncomeAttributableToNoncontrollingInterest': 0,
'ComprehensiveIncomeAttributableToParent': -788000.0,
'ContextForDurations': 'c6_From1Jan2012To30Sep2012',
'ContextForInstants': 'c0_AsOf30Sep2012',
'CostOfRevenue': 124000.0,.....

However, the automatic downloading and local access of files as bound to the Index django model doesn't work for 10-Qs, though. This was discussed by @jsfenten on the issues page under "xbrl / xml documents embedded in .txt submission". I am thinking he's right--that we should move to downloading the .txt files (which are a combination of various XBRL, HTML and even images in binary form), extract the main XBRL, and save it, perhaps in the database rather than on the filesystem. I don't have time to do this myself, right now at least, though, so maybe someone will jump in!

chrisspen · 2014-01-20T00:04:56Z

Is this resolved? After modifying xbrl_link() to treat "10-K" and "10-Q" as the same, download() seems to work just fine and I'm able to parse the XBRL and extract the same attributes that are in the 10-K.

lukerosiak · 2014-01-21T21:14:34Z

@chrisspen, would you be up for doing a little testing to see if everything works OK if I simply modify line 20 of models.py to be:

if self.form.startswith('10-K') or self.form.startswith('10-Q'): ?

That is, confirm that urls to download 10-Qs are constructed correctly consistently (there's some pattern to generating the URLs to raw files from their ID numbers, but I seem to recall some variance based on form type or other information), they don't overwrite 10-Ks, and it's OK to store them in the Index model?

Even better, if it needs any more than this one-line change, feel free to make any changes it needs and I will merge them in.

As discussed above, the xbrl.py Python class for extracting values from a file should definitely work well for 10-Ks and 10-Qs, but the way that we download files and store them on the local filesystem using the Django Index model to bind them evolved for a somewhat specific use case and I am not sure if it needs to be generalized to be more flexible or otherwise improved, or not. If we can confirm that the Django model and underlying database and local files system structure works with 10-Ks and 10-Qs with simple changes, then that's great and we should certainly just do that!

Thanks for your input.

chrisspen · 2014-01-22T00:44:35Z

Yes, that's exactly what I did.

I'm not an expert on the arrangement of Edgar files, and I'm using your code as a reference, but an Index record represents a unique combination of company+form+date and the URL uses the ID from the Index record's filename to construct the XBRL URL, which again appears to be unique to the company and form type.

e.g.
For cik 1750, the 10-Q and 10-K URLs are respectfully:
http://www.sec.gov/Archives/edgar/data/1750/000110465913072128/0001104659-13-072128-xbrl.zip
http://www.sec.gov/Archives/edgar/data/1750/000104746913007797/0001047469-13-007797-xbrl.zip

which appear to be completely unique, so I don't see any problem with them conflicting.

However, you're right in that the bigger problem is with xbrl.py not handling it. For my app, I need some way to bulk read all attributes in the file by namespace, and you currently only implemented a method to extract one at a time. So I implemented a method to iterate over the results of an xpath search result. I only tested extracting those elements in the us-gaap namespace, as that seems to be the most commonly used.

The only really weird thing I ran into was in the actual data itself. In the period dates for duration contexts, some date ranges don't make sense. e.g. for one context with an id called "D2013Q1", the start and end dates were for Q1 of 2012. I'm not sure if this is bad data, or if this just means that the numbers for 2013 Q1 were compiled from the previous year. But I guess this is a different problem.

This is a great project and it's saved me a lot of effort, but since I'm in need of quite a bit more functionality, I've forked it at https://github.com/chrisspen/django-sec. Feel free to borrow from it as you wish. I'm currently using it to import 10-K and 10-Q data in a dev environment. I'll publish it on PyPI once I feel it's ready for prime-time.

lukerosiak · 2014-01-22T20:23:54Z

Just checked out your fork--awesome work!!

So the zip file (xbrl_link), html_link and index_link all work for
10-Qs? I guess the zip file is the only thing that matters. If I
recall (I haven't used edgar in a while) I'm not sure the other link
generators will work for non-10 filings.

I have been confused by the context dates too before. I assumed it was
filer error.

How (where in your code) are you doing the 'bulk read all attributes
in the file by namespace'? That's great if you can do it. Just as
background for how my code came to be the way it is, for me, it seemed
like XBRL was such a complex and bloated format, and that by stripping
it down to whatever values were important, it could become more
accessible and less scary, as well as "flat" so it could be stored in
a database and have stats run on it.

Also, with the xbrl_fundamentals, the point was to overcome
inconsistencies in the way people file reports, often by extending
XBRL with custom tags unnecessarily or using weird ones for basic
accounting terms, by implementing a manual logic that will check
several places if necessary to get a basic value. (I would have no
idea how to do that if it weren't for accountant Charles Hoffman). So
in that way it was more useful than just turning, essentially, XBRL
into JSON/a python dict.

On Tue, Jan 21, 2014 at 7:44 PM, Chris Spencer [email protected] wrote:

Yes, that's exactly what I did.

I'm not an expert on the arrangement of Edgar files, and I'm using your code
as a reference, but an Index record represents a unique combination of
company+form+date and the URL uses the ID from the Index record's filename
to construct the XBRL URL, which again appears to be unique to the company
and form type.

e.g.
For cik 1750, the 10-Q and 10-K URLs are respectfully:
http://www.sec.gov/Archives/edgar/data/1750/000110465913072128/0001104659-13-072128-xbrl.zip
http://www.sec.gov/Archives/edgar/data/1750/000104746913007797/0001047469-13-007797-xbrl.zip

which appear to be completely unique, so I don't see any problem with them
conflicting.

However, you're right in that the bigger problem is with xbrl.py not
handling it. For my app, I need some way to bulk read all attributes in the
file by namespace, and you currently only implemented a method to extract
one at a time. So I implemented a method to iterate over the results of an
xpath search result. I only tested extracting those elements in the us-gaap
namespace, as that seems to be the most commonly used.

The only really weird thing I ran into was in the actual data itself. In the
period dates for duration contexts, some date ranges don't make sense. e.g.
for one context with an id called "D2013Q1", the start and end dates were
for Q1 of 2012. I'm not sure if this is bad data, or if this just means that
the numbers for 2013 Q1 were compiled from the previous year. But I guess
this is a different problem.

This is a great project and it's saved me a lot of effort, but since I'm in
need of quite a bit more functionality, I've forked it at
https://github.com/chrisspen/django-sec. Feel free to borrow from it as you
wish. I'm currently using it to import 10-K and 10-Q data in a dev
environment. I'll publish it on PyPI once I feel it's ready for prime-time.

—
Reply to this email directly or view it on GitHub.

chrisspen · 2014-01-22T21:04:31Z

Thanks. Yeah, I believe xbrl_link() is the only one that really matters. I should actually probably disable the html download, because I'm not doing anything with that, so it's just wasted bandwidth.

I implemented a new management command, sec_import_attrs to load attributes from the xbrl into the Attribute and AttributeValue models. I can see what you're trying to do with xbrl_fundamentals.py, but I wanted have an easily browsable admin interface showing all the attributes in current use and which are most popular. For my app, I'm not too concerned with normalizing multiple attributes into one logical attribute just yet. Also, the logic may not be 100% accurate. After aggregating usage for thousands of attributes, I've found that several, like "InterestExpense" and "EarningsPerShareDilluted", are some of the most common but aren't included in the list.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

10Q's #8

10Q's #8

ghost commented Sep 12, 2013

lukerosiak commented Sep 12, 2013

ghost commented Sep 12, 2013

lukerosiak commented Sep 12, 2013

chrisspen commented Jan 20, 2014

lukerosiak commented Jan 21, 2014

chrisspen commented Jan 22, 2014

lukerosiak commented Jan 22, 2014

chrisspen commented Jan 22, 2014

10Q's #8

10Q's #8

Comments

ghost commented Sep 12, 2013

lukerosiak commented Sep 12, 2013

ghost commented Sep 12, 2013

lukerosiak commented Sep 12, 2013

chrisspen commented Jan 20, 2014

lukerosiak commented Jan 21, 2014

chrisspen commented Jan 22, 2014

lukerosiak commented Jan 22, 2014

chrisspen commented Jan 22, 2014