A team of computer scientists at the USC Viterbi School of Engineering set out to develop new tools automate and organize social science data. What did they use as their data sets? Twitter posts from coastal capitals, New York City and Los Angeles.
The researchers found that they could identify similar tweets that do not have hashtags by using natural language processing and neural networks to create clusters of alike tweets. “It’s using AI to create a map of similar tweets,” says Mayank Kejriwal, a Research Assistant Professor in the USC Department of Industrial and Systems Engineering, and a Research Lead at the USC Information Sciences Institute, who is an expert in knowledge graphs.
The new tool takes tweets and converts them into a mathematical vector using a neural network, allowing you to see if two points are similar to one another. Thus, you can see clusters of related posts. For example, the AI is able to tell if tweets such as ‘Lounging on the beach, thinking about my life’ and ‘Today it’s surfing, sunbathing and mindfulness’ are very similar, even though they have almost no words in common.
In comparing Angelenos to New Yorkers, they found some interesting trending posts.
No. Angelenos did not talk about the 405 any more than NYers talked about the LIE. And neither groups talked about real estate as a major topic.
The top cluster of tweets in LA? Around the topic of “jobs.”
There is often a perception that LA is vain. New York City and New York are in top five things New Yorkers tweet about.
What about art? Art seems to dominate the conversation for Los Angeles more so than in New York. But divide by population, and the average New Yorker turns out to be more artsy.
Contrary to popular perception, online, Angelenos tended to talk about healthcare not wellness.
New York, the data sample reveals, may in fact, be “The City that never sleeps” as the researchers say its inhabitants come alive at night. but Angelenos, believe or not, have bigger mouths– they post more on social media. During business hours, Angelenos tend to post between 30-60% more tweets than New Yorkers. After 8 pm, there’s no difference between the two.
In both LA and New York, areas with higher income tweet more.
The dataset was culled from October 2016. The research was documented in the Journal of Computational Social Science.
“Studies like this allow us to understand the nuances of everyday 21st century life. We live in such an outrage, celebrity-driven culture today that it’s easy to forget that most of us are just focused on going about our business: eating, commuting, working, complaining. We document more of it on social media than we realize. This technology allows us to read the story of our society – as it’s being written, right before our eyes,” says Kejriwal.