Analyzing Twitter to Predict the Market

Michael Haederle lives in New Mexico. He has written for the Los Angeles Times, People Magazine, Tricycle: The Buddhist Review and many other publicat...

Handheld sensors using specialized "” and relatively cheap "” biosensors may deliver an instant diagnosis of diseases, contaminated water and biological attacks.

After studying four decades of terrorism, Aaron Clauset thinks he’s found mathematical patterns that can help governments prevent and prepare for major terror attacks. The U.S. government seems to agree.

Melting glaciers yield evidence on new theories of Asian migration to the Americas. Underwater robots search the sea bottom, looking for more.

The ancient Romans of Pompeii were already parboiled when the lava arrived, according to a new investigation with scary implications for modern-day Naples.

The advent of agriculture created a new kind of seed-scattering strategy.

Receive 1 year (6 issues) of our print magazine for just $24.95. Miller-McCune features polished, in-depth reports on research and solutions across the policy spectrum "” from health care, education and energy to international affairs, poverty and the global economy. It's a must read for well-informed and solutions-driven individuals.

close this window

We encourage you to share any articles or material you find on Miller-McCune.com with friends and colleagues. Please fill in the fields below with the name and e-mail address. Then fill in the same information for you. Miller-McCune will not keep any information about you or your friend, and the e-mail your friends receive will appear to have come from your e-mail address. The asterisk (*) denotes a required field.

January 27, 2011

Trying to divine the mood of a group of people is hard and requires trust in their answers. A new method has researchers whistling a happier tune.

Social scientists seeking to assess the collective mood of large groups of people traditionally have relied on slow, laborious sampling methods that usually entail some form of self-reporting.

Peter Dodds and Chris Danforth, mathematicians at the University of Vermont, dreamed up an ingenious way to sample the feelings of many more people much more quickly.

They downloaded the lyrics to 232,000 popular songs composed between 1960 and 2007 and calculated how often emotion-laden words like “love,” “hate,” “pain” and “baby” occurred in each.

Then they graphed their results, averaging over the emotional valence of individual words. A clearly negative trend emerged over the 47-year period, from bright and happy (think Pat Boone) to dark and depressive (death metal and industrial music come to mind).

The pair has used similar methods to analyze millions of sentences downloaded from blogs, as well as the text of every U.S. State of the Union address and a vast trove of Twitter tweets.

They see distinctive patterns emerging in how collective moods shift over time. The Internet, with its ability to transmit vast amounts of data, is the key.

“People have been trying to take a picture of what’s happening on the Web in real time and feed it into essentially another dial, like the Consumer Confidence Index or the gross domestic product,” Danforth explains. “That would help decision-makers decide what it is that people are feeling at the moment or how well social programs are working.”

Other researchers are onto the same idea. A team at Indiana University has shown that how calm the public mood is "” as measured by the language used in millions of 140-character Twitter tweets "” accurately predicts how well the stock market will do in the following few days.

Recently, scientists have even shown they could predict movie box office receipts based solely on Twitter chatter and the number of theaters in which a film is showing.

This new field of looking for hidden patterns in vast quantities of text or other user-generated information "” variously called “sociotechnical data mining” or “computational social science” "” is deceptively simple: just add together the numerical values assigned to various emotionally positive or negative words in a sentence and take their average.

The method starts with established lists of commonly used words that have been ranked according to their emotional valence.

For the song lyrics experiment, Danforth and Dodds used the Affective Norms for English Words list, developed from a 1999 study in which participants graded their reactions to 1,034 words on a 1-9 scale (in which 9 is “happy, pleased, satisfied [and] contented”). On this scale, “triumphant” scores an 8.82, for example, while “suicide” comes in at 1.25.

Song lyrics "” which presumably reflect audience taste "” were analyzed mostly to prove that the data-mining technique worked, Danforth says. In breaking out the results, he and Dodds also classified lyrics by genres and individual artists. Not surprisingly, gospel music ranked as the genre having the most positive lyrics.

“One of the things that had surprised us was that we had expected rap and hip-hop to be down near the bottom "” but it’s really not, it’s actually sort of in the middle,” Danforth says. “It’s metal, industrial music and punk at the bottom, at least in the lyrics.”

The method may not accurately characterize the meaning of a given text. For example, The Beatles’ “Maxwell’s Silver Hammer” recounts acts of violence "” (“Bang! Bang! Maxwell’s silver hammer came down upon her head/Bang! Bang! Maxwell’s silver hammer made sure that she was dead.”) "” but most listeners would understand the song’s lyrics to be comical. Yet when the technique is applied to thousands of song lyrics, differences in intended meaning tend to average out, Danforth says.

Meanwhile, he says, the method avoids the self-reporting pitfalls in social science studies of the sort that typically are performed on freshman psychology majors, chief among them the tendency to tailor one’s responses to please the interviewer.

Still, popular song lyrics provide only limited insight into society’s emotional state. After proving their sampling concept with song lyrics (and song titles), Danforth and Dodds examined nearly 10 million blog sentences starting with “I feel . . . ” downloaded from the website http://www.wefeelfine.org.

Graphing the results for a period from 2005-2009, they detected an annual up-tick in positive sentiments as Thanksgiving and Christmas approached. They also saw dips that corresponded with the anniversaries of 9/11 and the onset of the economic crisis in 2008.

The pair has expanded the research to encompass some 2 billion Twitter tweets, classifying them based on where the tweeters live, and whether that affects the emotional content of the words they use.

People using the happiest language also use a less-diverse suite of words, they have found. “That tends to be in places that voted Republican in the last election,” Danforth says. “So, happier tweets, less diversity of thought "” and predominantly those were in the Republican counties.”

The reverse was true in Democratic counties, but he hastens to add, “It’s purely observational. There’s no causal story here. We’re not trying to predict the next election.”

They are also generating their own list of words and their valences by collecting the most frequently used words from the Twitter and New York Times databases and posting them on Amazon’s Mechanical Turk website. Each user receives a random list of 50 words drawn from the 10,000 words in the database and is asked to assign valence scores.

“We want to get a better picture between individual words,” Danforth says. “We’re going to try to build the language up from scratch and see whether it’s a reasonable thing to assign a happiness score to sentences based on a few words we find in them.”

At Indiana University, computer scientist Johan Bollen also wanted to see whether Twitter posts could accurately gauge the public mood. He decided to test his data against the daily fluctuations of the Dow Jones Industrial Average to see whether there was any correlation.

His team started with a collection of 9.8 million tweets gathered between February and December 2008. Using a Google-generated list of 964 words, they applied an existing six-dimensional psychological measure of mood states: Calm, Alert, Sure, Vital, Kind and Happy.

When they graphed the fluctuating mood states individually, they noticed a remarkable correlation between peaks in the “Calm” category and improvements in the stock market.

“We had that eureka moment when we looked at our results,” he says. “We thought, ‘We can actually do that? That’s amazing.’”

Showing that calmness was a precondition for stock market rallies was “a big shock,” because it reversed the presumed order of things. “We assumed that if the markets do well, people are happy,” he says. “We were expecting that happiness or sadness would be driven by the markets.”

Most people would be tempted to use a crystal ball into how the stock market will perform to make a killing, but Bollen remains on focused on the scientific implications of his research.

“People have been talking to me about this,” he says. “My students have been trying to convince me that we should put some money where our collective mouths are. If the results hold up it could be worth quite a bit of money.”

Noting that more than 500 million people use Facebook and 140 million are on Twitter, Bollen says these data collection methods have the potential to revolutionize social science.

“We’re talking about environments that have more users than you have inhabitants in most industrialized nations on Earth,” he says. “You could never ever get a sample like that in any other way.”

Sign up for the free Miller-McCune.com e-newsletter.

“Like” Miller-McCune on Facebook.

Follow Miller-McCune on Twitter.

Add Miller-McCune.com news to your site.

Barack Obama Education Elections Energy Health Health Care Journalism miller-mccune only podcast Politics Science

What's one way to convert climate change skeptics? By making them sweat.

Highway administrators say car-mounted Wi-Fi system could let cars and trucks gossip with each other on the road, dramatically improving safety and efficiency.

Read Full Article »


Comment
Show comments Hide Comments


Related Articles

Market Overview
Search Stock Quotes