Big Data is here, unless you’re thinking about financial reporting. In that case, its not Big Data, its Bushy Data.
Big Data doesn’t fit on a thumb drive, but a database of all XBRL financial reports filed with the SEC does. To date, and for the foreseeable future, financial reporting data isn’t Big Data.
So what is Bushy Data? Bushy Data is data where the ratio of data to metadata is very high. If we think of a typical CSV file generated by a spreadsheet program, there is one row of header data (just the column names) and a large number of data rows. That’s a ratio of metadata (the headers) to data that is very low, 1/10,000 perhaps. And the metadata itself in this example is very simple, the names of the headers and the column breaks, at best. No type, constraint, or validation rules are included.
On the other hand, in XBRL financial reporting data the amount of metadata per each data item is very high. In the XBRL instance document, we have contexts for each fact. While contexts can be shared by many facts, current highly dimensional reporting architectures drive towards almost a single fact per context, due to the hypercubes involved.
At that is just the instance document. The attached taxonomies bring in even more metadata, especially the semantic network of how all the accounting concepts relate to each other and to the external world (references and labels, for you XBRL cognoscenti).
The next layer of complexity in the XBRL data is all of the text captured in paragraphs and sections. XBRL as used by the SEC is extremely helpful in allowing a query to be targeted at just the parts of the filing that are of interest, therefore avoiding a lot of ‘false positives’ – clutter in your search results.
SEC filings also have each data point in the text broken out as a separate fact, so a ‘tick and tie’ is possible. This constraint on numbers in text is in effect more metadata.
As vendors rush to hype and sell products as Big Data tools, keep in mind that Big Data isn’t always Bushy Data, and vice versa.