The search for big data knows no limits. As Apple was releasing its latest iPhone, alternative big data mining providers were scraping social media reaction to feed into trading algorithms (the stock sold off after the product announcement). Now new analysis from Macquarie examines the value of email receipt big data mining, pointing to the ability to predict Amazon’s retail sales to various degrees of success before the quarterly earnings announcement. But the key is not only technical, but applying the human touch makes the difference.
harrivicknarajah0 / Pixabay
Big Data Mining learns a lot by studying consumer emails
For Macquarie, diving into the big data pool was an eye opening experience. What they ultimately found was that quantitative investment methods with alternative data are best used when combined with fundamental analysis, not in replacing human analysis.
For their study, Macquarie worked with email receipt data provided by Quandl, a platform for financial data. The company scans numerous niche information sources including millions of email inboxes where receipts of consumers can be found. This provides insight into transaction level information such as product description, taxes paid and shipping accounts all from artificial intelligence scans of their email accounts. The data is updated weekly and covers a wide variety of e-commerce platforms.
From Macquarie’s standpoint, transforming the raw transaction data into tradable information was no small task, but the exercise proved meaningful.
Massive computer power required to mine massive vertical niches
In a September 11 report titled “Big is beautiful: How email receipt data can help predict company sales,” Macquarie learned firsthand the challenges.
While the date they used only cover three listed companies – Amazon, Walmart and H&M – the absolute size of the dataset made it a challenge to conduct “even the simplest queries” using standard database tools.
Macquarie then turned to Amazon Redshift, a data warehouse solution that transforms standard SQL analysis into a quick and cost effective process. This became their preferred solution, allowing analytical processing to occur through simple syntax or with slight modifications to standard SQL queries.
“Redshift stores database table information in a compressed form by column rather than by row and this reduces the number of disk Input/Output requests and the amount of data load from disk particularly when dealing with a large number of columns,” Macquarie explained.
Quantitative methods were “as least as accurate as analyst consensus,” but when applying the human touch Macquarie discovered success
While the technical challenges were important, it was the output where the investment insight occurred. Macquarie discovered that big data mining can be used to generate quarterly sales forecasts, but their accuracy was not optimal, categorizing it as “at least as accurate as analyst consensus.”
The adjustment they made that delivered a difference combining big data output and then using a fundamental analysis overlay, which resulted in “the best results in terms of forecasting accuracy.”
They found “evidence” that their method of analysis, with a fundamental overlay, was able “to accurately predict the direction of sales surprises at earnings time.”
There has been much work of late with algorithmic traders using big data mining to glean information into earnings announcements and trade ahead of the news. Macquarie, for its part, found that it wasn’t just about the data, but the understanding of the business model of the companies in questions.
They claimed the resulting work that leads to success was a “quantamental approach,” combining quantitative and fundamental knowledge to make the data useful.
When dealing with big data mining, they discovered that size, in fact, matters. Big data is typically massive in terms of vertical depth, but doesn’t offer much heft in terms of horizontal reach.
Quant strategies over the past years have relied on several variables to deliver success. The first is a large and deep sample size, which isn’t always available with alternative data. For instance, monitoring email receipts don’t have a long history of success, which makes validating a strategy through back testing challenging.
Quant strategies also lean on “breadth of a signal,” a wide range of data that can provide correlation analysis and corroborate strategies. These are just some of the challenges data scientists face when turning big data into big returns.