In an online conversation involving millions of people, how can we measure the disruptive impact of a single event?
Let’s consider the case of #BlackLivesMatter. What started off as a hashtag has blossomed into a full-fledged movement, and the American Dialect Society recently selected #BlackLivesMatter as Word of the Year for 2014 — a testament to the impact it has had in shaping public discussion around social justice. An analysis of social media interactions using natural language processing provides rich insight into important conversations about race in America.
The single event we’re looking at here took place on Dec. 3, 2014, when a grand jury failed to indict NYPD Officer Daniel Pantaleo in the death of Eric Garner. #BlackLivesMatter predates this event — it actually came into use in 2013 following the shooting of Trayvon Martin, but really rose to prominence following the August 2014 shooting of Michael Brown in Ferguson, MO. By November 2014, #BlackLivesMatter had come to embody many facets of the discussion surrounding race, violence, and justice in America, and was not just tied to discussions of specific incidents. Following these events, the conversation became crystallized around this new story.
To see how #BlackLivesMatter and discussions of social justice changed after Dec. 3, let’s begin by setting the stage with how #BlackLivesMatter was being used the month before.
One very useful way to explore the themes in large amounts of data is to use topic modeling — an automatic method of extracting recurring themes in a set of documents based on patterns of words that tend to occur near each other (for more details, see our recent blog post on themes in Dr. King’s sermons and speeches). We took a sample of about 350,000 tweets from the month of November and automatically identified 30 topics in this data. For example, one topic captures discussions of Michael Brown’s death in the context of other recent police shootings, including key terms such as: justiceformikebrown, ferguson, tamir rice, trayvon martin, justice, family, and rip.
Of the 30 topics, only 15 pointed directly to conversations around the Michael Brown shooting.
The rest of the topics we found related to other things — either specific events or social justice more generally. One topic focuses on “hoods off,” the campaign by the hacktivist group Anonymous to reveal the identities of KKK members around the country. Another captures the discussion surrounding the #AllLivesMatter hashtag, which is viewed by many as expressing opposition to the #BlackLivesMatter movement:
— Arthur Chu (@arthur_affect) November 27, 2014
These topics are well-defined but they are individually much less frequent than general discussions of #BlackLivesMatter and social justice. The circles in the graph below represent the relative number of tweets corresponding to each topic. The large blue circle represents general discussions while the red and green circles represent #AllLivesMatter and “hoods off” respectively:
In the month of #BlackLivesMatter tweets following Officer Pantaleo’s non-indictment on Dec. 3, the picture changes significantly: 27 out of 30 automatically-identified topics directly reference Eric Garner, and all 30 topics reflect the hashtag #ICantBreathe, which references Garner’s last words and for a period of time nearly matched #BlackLivesMatter in its frequency of use. This shows just how much this single event crystallized the #BlackLivesMatter discussion — while people continued to talk about many things beyond the death of Eric Garner, that subject immediately became important to every facet of the discussion. It is also the case that the events of Dec. 3 added fuel to the fire of protests and social justice movements. Discussions of #BlackLivesMatter peaked around the time of the Ferguson protests in August, and then began to recede somewhat. The non-indictment of Officer Pantaleo represented the second, and to many people even more glaring, instance of a police officer not being brought to trial in a questionable death, and these events then revived and for a time dominated the discussion.
The types of topics change as well — they become much more oriented towards specific issues. One topic captures the discussion of #BlackLivesMatter at the federal level — including not only stories about Congressional staffers walking out in protest of the Garner and Brown verdicts, but also connections to international human rights issues with frequent mentions of Gaza, introduced by many authors as another instance of systematic identity-based human rights abuses. Another captures the story of LeBron James wearing an “I Can’t Breathe” shirt during a pre-game warmup. Another clearly relates to discussions around the specific circumstances of Garner’s death, including key terms such as video, chokehold, banned, coroner, and homicide. This tweet is characteristic of much of that discussion and was retweeted numerous times in modified form including #BlackLivesMatter:
— Chris Rock (@ozchrisrock) December 4, 2014
While in December we do see the emergence of many more specific topics, discussions of social justice in general are still very common as well. The largest of our 30 topics for the December data is the one that captures the broader discussion of social justice, including key terms like: black, white, ericgarner, ferguson, police, racism, and america. What we see in December is that general social justice conversations coalesce into one single well-defined topic in the model, as opposed to in November when this discussion was spread across many topics:
As can be seen in the figure above, there were about the same number of general social justice tweets in December as in November. However, due to a significant increase in activity in December they represent a substantially smaller percentage of the overall conversation — in November general discussions were nearly a full third of all uses of #BlackLivesMatter, but in December they were only about half that common. This is further evidence of how a single significant event can alter the landscape of online discussion about a topic.
For a final exploration into this data, let’s zoom in a bit and look at the ebb and flow of a few selected topics over time, pointing out the events that led to these changes in the conversation. To begin, let’s look at overall trends in #BlackLivesMatter use day over day. The first thing we do is to transform our counts of tweets per day to log counts — this is a standard practice with frequency data that helps keep extreme values from obscuring patterns in the rest of the data. This turns out to be important here because on one day in particular (Nov. 25), we see a spike in activity that is 100 times the rate of the slowest day — 0n November 25 there were very widespread protests leading to this surge in activity. We see another spike following the Dec 3 non-indictment of Pantaleo, and following these events #BlackLivesMatter is used more frequently for the rest of December:
These two upticks in activity are apparent in many of the 30 topics we identified. However, individual topics also exhibit their own unique behavior relative to other key events. Let’s look at 3 topics here, related to very different aspects of the conversation: one focused on protests in the San Francisco Bay Area, one related to the video of Eric Garner’s death, and one related to several NBA players (most prominently LeBron James) appearing on the court wearing “I Can’t Breathe” t-shirts. Here, because we’re mainly interested in when these values spike, we’ll look at actual counts as opposed to log counts like above:
The trends visible in the data on a topic-by-topic basis are compelling. Different facets of the conversation rise to prominence and fade as key events take place. In fact, there is even more information in the above chart than we could label for readability reasons:
- Note the first peak of the “Protest” topic — this is on November 25, when protests were occurring nationwide. This is a huge signal in our overall data, but less so here because this topic is really capturing events specific to the Bay Area.
- The “Protest” topic also rises significantly for a period starting on Dec. 4 — this again captures a major series of protests in Berkeley and Oakland.
- The “Basketball” topic peaks on December 7 as well — reflecting another basketball-related event when an “I Can’t Breathe” shirt was worn by Derrick Rose of the Chicago Bulls.
Here we’ve seen a number of different ways in which we can unravel the complex stories contained in large-scale online discussions. Using topic modeling we are able to see at a high level how the nature of discussion changes following a significant event, and we are then able to zoom in and look at individual topics to get an even more fine-grained picture of how the conversation evolves. This is, of course, not limited to just the study of social justice hashtags — any serious inquiry into social media conversations can benefit from this type of nuanced examination. Conversations are complex, and topic modeling is one way that we can better understand the details of what people are saying, both over the course of a long discussion and at a moment in time.
— Nick Gaylord (@texastacos)
We’d also like to thank two of our recent externs, Nader Helmy and Zion Mengesha, for their contributions at several stages of this project.Read more