Using Twitter to predict financial markets

A University of California, Riverside professor and several other researchers have developed a model that uses data from Twitter to help predict the traded volume and value of a stock the following day.

A trading strategy based on the model created by Vagelis Hristidis, an associate professor at the Bourns College of Engineering, one of his graduate students and three researchers at Yahoo! in Spain, outperformed other baseline strategies by between 1.4 percent and nearly 11 percent and also did better than the Dow Jones Industrial Average during a four-month simulation.

“These findings have the potential to have a big impact on market investors,” said Hristidis, who specializes in data mining research, which focuses on discovering patterns in large data sets. “With so much data available from social media, many investors are looking to sort it out and profit from it.”

Hristidis and his co-authors, Eduardo J. Ruiz, one of his graduate students, and Carlos Castillo, Aristides Gionis and Alejandro Jaimes, all of whom work for Yahoo! Research Barcelona, presented the findings last month at the Fifth ACM International Conference on Web Search & Data Mining in Seattle.

Hristidis and his co-authors set out to study how activity in Twitter is correlated to stock prices and traded volume. While past research has looked the sentiment, positive or negative, of tweets to predict stock price, little research has focused on the volume of tweets and the ways that tweets are linked to other tweets, topics or users. Further, past work has mostly studied the overall stock market indexes, and not individual stocks.

They obtained the daily closing price and the number of trades from Yahoo! Finance for 150 randomly selected companies in the S&P 500 Index for the first half of 2010.

Then, they developed filters to select only relevant tweets for those companies during that time period. For example, if they were looking at Apple, they needed to exclude tweets that focused on the fruit.

They expected to find the number of trades was correlated with the number of tweets. Surprisingly, the number of trades is slightly more correlated with the number of what they call “connected components.” That is the number of posts about distinct topics related to one company. For example, using Apple again, there might be separate networks of posts regarding Apple’s new CEO, a new product it released and its latest earnings report.

They also found stock price is slightly correlated with the number of connected components.

For the study, the researchers simulated a series of investments between March 1, 2010 and June 30, 2010 and analyzed performance using several investment strategies. During that time frame, the Dow Jones Industrial Average fell 4.2 percent.

In two variants of an autoregression model, that is buying every day stocks based on the assumption that the stock price is a function of the prices of the stock in the last few days, losses were 8.9 percent and 13.1 percent.

In the random model, in which as random set of stocks is bought every, sold at the end of the day and repeated the next day, the average loss was 5.5 percent.

In the fixed model, which involves buying a set of stocks that have best combination of market cap, company size and total debt and keeping them for the entire simulation, the average loss was 3.8 percent.

The model the researchers developed using Twitter data lost on average 2.4 percent.

Hristidis notes several potential weaknesses in the study.

First, the trading strategy worked in a period when the Dow Jones dropped, but it may not produce the same results when the Dow Jones is rising. There is also sensitivity related to the duration of the trading. For example, it took 30 days in the simulation to start outperforming the Dow Jones.

Related