#BlackLivesMatter: How events change conversations

Posted 02.24.2015

In an online conversation involving millions of people, how can we measure the disruptive impact of a single event?

Let’s consider the case of #BlackLivesMatter. What started off as a hashtag has blossomed into a full-fledged movement, and the American Dialect Society recently selected #BlackLivesMatter as Word of the Year for 2014 — a testament to the impact it has had in shaping public discussion around social justice. An analysis of social media interactions using natural language processing provides rich insight into important conversations about race in America.

Protests in New York following the death of Eric Garner. (Source: www.ibtimes.co.uk)

The single event we’re looking at here took place on Dec. 3, 2014, when a grand jury failed to indict NYPD Officer Daniel Pantaleo in the death of Eric Garner. #BlackLivesMatter predates this event — it actually came into use in 2013 following the shooting of Trayvon Martin, but really rose to prominence following the August 2014 shooting of Michael Brown in Ferguson, MO. By November 2014, #BlackLivesMatter had come to embody many facets of the discussion surrounding race, violence, and justice in America, and was not just tied to discussions of specific incidents. Following these events, the conversation became crystallized around this new story.

To see how #BlackLivesMatter and discussions of social justice changed after Dec. 3, let’s begin by setting the stage with how #BlackLivesMatter was being used the month before.

One very useful way to explore the themes in large amounts of data is to use topic modeling — an automatic method of extracting recurring themes in a set of documents based on patterns of words that tend to occur near each other (for more details, see our recent blog post on themes in Dr. King’s sermons and speeches). We took a sample of about 350,000 tweets from the month of November and automatically identified 30 topics in this data. For example, one topic captures discussions of Michael Brown’s death in the context of other recent police shootings, including key terms such as: justiceformikebrown, ferguson, tamir rice, trayvon martin, justice, family, and rip.

Of the 30 topics, only 15 pointed directly to conversations around the Michael Brown shooting.

The rest of the topics we found related to other things — either specific events or social justice more generally. One topic focuses on “hoods off,” the campaign by the hacktivist group Anonymous to reveal the identities of KKK members around the country. Another captures the discussion surrounding the #AllLivesMatter hashtag, which is viewed by many as expressing opposition to the #BlackLivesMatter movement:

These topics are well-defined but they are individually much less frequent than general discussions of #BlackLivesMatter and social justice. The circles in the graph below represent the relative number of tweets corresponding to each topic. The large blue circle represents general discussions while the red and green circles represent #AllLivesMatter and “hoods off” respectively:


In the month of #BlackLivesMatter tweets following Officer Pantaleo’s non-indictment on Dec. 3, the picture changes significantly: 27 out of 30 automatically-identified topics directly reference Eric Garner, and all 30 topics reflect the hashtag #ICantBreathe, which references Garner’s last words and for a period of time nearly matched #BlackLivesMatter in its frequency of use. This shows just how much this single event crystallized the #BlackLivesMatter discussion — while people continued to talk about many things beyond the death of Eric Garner, that subject immediately became important to every facet of the discussion. It is also the case that the events of Dec. 3 added fuel to the fire of protests and social justice movements. Discussions of #BlackLivesMatter peaked around the time of the Ferguson protests in August, and then began to recede somewhat. The non-indictment of Officer Pantaleo represented the second, and to many people even more glaring, instance of a police officer not being brought to trial in a questionable death, and these events then revived and for a time dominated the discussion.

The types of topics change as well — they become much more oriented towards specific issues. One topic captures the discussion of #BlackLivesMatter at the federal level — including not only stories about Congressional staffers walking out in protest of the Garner and Brown verdicts, but also connections to international human rights issues with frequent mentions of Gaza, introduced by many authors as another instance of systematic identity-based human rights abuses. Another captures the story of LeBron James wearing an “I Can’t Breathe” shirt during a pre-game warmup. Another clearly relates to discussions around the specific circumstances of Garner’s death, including key terms such as video, chokehold, banned, coroner, and homicide. This tweet is characteristic of much of that discussion and was retweeted numerous times in modified form including #BlackLivesMatter:

While in December we do see the emergence of many more specific topics, discussions of social justice in general are still very common as well. The largest of our 30 topics for the December data is the one that captures the broader discussion of social justice, including key terms like: black, white, ericgarner, ferguson, police, racism, and america. What we see in December is that general social justice conversations coalesce into one single well-defined topic in the model, as opposed to in November when this discussion was spread across many topics:


As can be seen in the figure above, there were about the same number of general social justice tweets in December as in November. However, due to a significant increase in activity in December they represent a substantially smaller percentage of the overall conversation — in November general discussions were nearly a full third of all uses of #BlackLivesMatter, but in December they were only about half that common. This is further evidence of how a single significant event can alter the landscape of online discussion about a topic.

For a final exploration into this data, let’s zoom in a bit and look at the ebb and flow of a few selected topics over time, pointing out the events that led to these changes in the conversation. To begin, let’s look at overall trends in #BlackLivesMatter use day over day. The first thing we do is to transform our counts of tweets per day to log counts — this is a standard practice with frequency data that helps keep extreme values from obscuring patterns in the rest of the data. This turns out to be important here because on one day in particular (Nov. 25), we see a spike in activity that is 100 times the rate of the slowest day — 0n November 25 there were very widespread protests leading to this surge in activity. We see another spike following the Dec 3 non-indictment of Pantaleo, and following these events #BlackLivesMatter is used more frequently for the rest of December:


These two upticks in activity are apparent in many of the 30 topics we identified. However, individual topics also exhibit their own unique behavior relative to other key events. Let’s look at 3 topics here, related to very different aspects of the conversation: one focused on protests in the San Francisco Bay Area, one related to the video of Eric Garner’s death, and one related to several NBA players (most prominently LeBron James) appearing on the court wearing “I Can’t Breathe” t-shirts. Here, because we’re mainly interested in when these values spike, we’ll look at actual counts as opposed to log counts like above:


The trends visible in the data on a topic-by-topic basis are compelling. Different facets of the conversation rise to prominence and fade as key events take place. In fact, there is even more information in the above chart than we could label for readability reasons:

  • Note the first peak of the “Protest” topic — this is on November 25, when protests were occurring nationwide. This is a huge signal in our overall data, but less so here because this topic is really capturing events specific to the Bay Area.
  • The “Protest” topic also rises significantly for a period starting on Dec. 4 — this again captures a major series of protests in Berkeley and Oakland.
  • The “Basketball” topic peaks on December 7 as well — reflecting another basketball-related event when an “I Can’t Breathe” shirt was worn by Derrick Rose of the Chicago Bulls.

Here we’ve seen a number of different ways in which we can unravel the complex stories contained in large-scale online discussions. Using topic modeling we are able to see at a high level how the nature of discussion changes following a significant event, and we are then able to zoom in and look at individual topics to get an even more fine-grained picture of how the conversation evolves. This is, of course, not limited to just the study of social justice hashtags — any serious inquiry into social media conversations can benefit from this type of nuanced examination. Conversations are complex, and topic modeling is one way that we can better understand the details of what people are saying, both over the course of a long discussion and at a moment in time.

— Nick Gaylord (@texastacos)

We’d also like to thank two of our recent externs, Nader Helmy and Zion Mengesha, for their contributions at several stages of this project.

Read more

3 Things Learned from a Strata Startup Finalist

Posted 02.20.2015

Idibon was featured as a “Startup Showcase Finalist” at Strata this year. Strata/Hadoop World is the biggest conference for data-focused companies like ours, so it was a great opportunity to get the pulse of the industry.

Our mission is to bring language technologies to all the world’s languages.  In this year’s competition, we were the only text analytics company among the finalists, and the only machine-learning company, which are very hot areas right now. At GigaOm’s Structure Data conference last year, the biggest east coast conference for our industry,  we were awarded the “Best Text Analysis” Startup. We appreciate the recognition and are glad that the industry recognizes the importance of Idibon’s mission.

Demoing Idibon

Demoing Idibon

For two hours on Wednesday night, we demoed Idibon to 100s of people at our table at the Startup Showcase. Despite the recognition that Idibon has received, there were some fundamental properties of our industry and markets that people were repeatedly surprised about during our demo. Here are the three that stood out the most for me:

1. English is a minority language

Only 5% of the world’s conversations are in English at any time. Many people who came up to us at the showcase were surprised by this fact when I shared it with them. It is not new information – I shared this fact when presenting at Strata in 2013 on “How the world communicates”.

There are now more digital communications in Chinese than English, and as more of the world comes online English will also come down to something closer to 5% of our digital communications. It will still occupy a privileged place in some areas, like the Media, Entertainment Industry and in Scientific publications, but how often are you publishing in these areas relative to chatting with your friend, family and colleagues? To understand information expressed globally, we need smart language technologies for as many of the world’s languages as possible.

2. Many people’s first technology will contain Artificial Intelligence

Across the world, more people have access to a cellphone than fresh drinking water. In a few more years, more people will have access to a smart-phone than fresh-drinking water. These phones will ship AI-powered language capabilities and connected the users to other AI-powered services like search engines. But both the devices and search engines will fail to process the majority of languages spoken by these users.

As with the case of English, many people who came up to our demo seemed surprised that the US only has about 5% of the world’s population, and the usage patterns for big data technology outside the US (or for that matter outside Silicon Valley) will be very different to those here.

3. We still expect talking robots

On the lighter side, a few people joked that they expected Idibon to create their first talking robot. For as long as humanity has discovered new building techniques and technologies, we have dreamed of creating intelligence that we can interact with.

We have certainly helped some cellphone manufacturers develop smarter language technologies for their devices, but for the time being Idibon is focussed the interaction between people and language processing at scale. See also, my post/video on “Where’s my talking robot”.

I look forward to seeing more globally-focused technologies and talks at future Strata conferences!

Rob Munro
Feb 20, 2015

Read more

Idibon Supports UNICEF to Provide Natural Language Processing to SMS-Based Social Monitoring Systems in Africa

Posted 02.09.2015

Today, Idibon announced an ongoing collaboration with UNICEF’s U-report project. UNICEF will use Idibon’s NLP technology to better understand broad themes in SMS messages sent by U-reporters across Africa.


When I worked at a health clinic in rural Burundi, I was surprised to find that the majority of the patients we treated had access to a flip phone. Most of these villagers didn’t even have electricity in their homes. They lived in very isolated places, unable to reach out to a broader community except by phone. According to recent surveys, around 2/3 of households in sub-Saharan Africa have at least one mobile phone.

UNICEF’s U-report is an amazing example of an innovative product designed specifically for an audience who lacks internet access but has access to a cell phone. The U-report sends out poll questions via SMS to get opinion data from U-reporters – people who have signed up to send UNICEF their insights, questions, and local updates.

Idibon has recently committed to processing unsolicited messages and classifying them by language, category, and urgency. Our intention is to give underserved populations an outlet to say what’s on their mind and to amplify voices within the crowd that have specific and actionable information to offer. Right now, there are thousands of these messages sent every day.

Non-profits sometimes struggle to align themselves with the interests and desires of the populations they serve. We feel lucky to work with UNICEF Innovation Labs  as they build technology that allows them to reach out directly to individuals. It is my hope that with more sophisticated text analysis, U-report will be able to listen and respond to more and more voices. Because together, we can build an ever more connected world.

– Jessica Long

Idibon Supports UNICEF to Provide Natural Language Processing to SMS-Based Social Monitoring Systems in Africa

San Francisco, CA – February 9, 2015

Idibon announces a collaboration with UNICEF to provide scalable natural language processing and analytics to their U-report applications. U-report applications leverage UNICEF’s open source RapidPro platform, which empowers governments to deliver valuable real-time information and connect communities to available services via text (SMS) messages.

Text messages are the most widely used digital communication for many of the world’s poorest people, for whom cell phone ownership is more widespread than access to fresh drinking water, education or basic health. “Processing text messages from anywhere in the world goes to the heart of Idibon’s mission,” said Idibon’s CEO Robert Munro. “Helping organizations process communications in any language gives a voice to the most overlooked people in our connected world — the more than 50 percent of the world who do not speak English, Chinese, Spanish or any other dominant language.”

In the collaboration, Idibon is helping UNICEF process messages sent from citizens to their analysts in potentially hundreds of languages. “We are excited to work with Idibon to explore new ways of engaging with the young people that we serve — in their own local languages and dialects. Thanks to Idibon’s unique natural language processing technology, we’ll be able to better understand and empower marginalized communities that are often excluded due to language barriers,” said Evan Wheeler, CTO of UNICEF’s Global Innovation Centre.

“Our machine learning systems can help UNICEF understand the most pressing needs of the populations they serve, at a scale not currently available to social development organizations,” explained Munro. Prior to Idibon, Munro ran the largest uses of text messaging for disaster response and wrote his Stanford PhD thesis about the use of machine learning for this purpose. Overseeing the project within Idibon is Jessica Long. Her unique experience combines a MS in Natural Language Processing at Stanford with work in health information systems in rural Burundi.

Idibon was founded with the mission of bringing smart language processing to all the world’s languages and has deep expertise in social development engagements. Idibon’s investors include Khosla Ventures, where Vinod Khosla has been a thought leader on income disparity and machine learning, and Morningside Ventures, run by the Chan family who recently gave $350m to the Harvard School of Public Health, the University’s largest ever gift. Idibon has been actively involved in processing unstructured data and text classification in disaster and development contexts, supporting damage assessments following Hurricane Sandy in New York, helping track epidemics globally, and partnering with organizations like the MIT’s Humanitarian Response Lab.

About Idibon:
Idibon helps companies understand their language data. Using cutting-edge natural language processing and data science, Idibon takes unstructured data like social media, emails, and websites, and provides structured answers to key business intelligence questions.

Press Contact:
Robert Munro

Read more