The Science of Memes: Dan Zarrella's Quest for Social Media Hard Data

The science of memes: Dan Zarrella’s quest for social media hard data

by Simon Owens

In response to the announcement that the Washington Post plans to sell Newsweek, Seth Godin — one of the most widely-linked and retweeted marketing bloggers — fired off a post yesterday morning in which he lectures his readers as to where the future of profitable publishing is headed. After quickly dismissing the idea of monetizing blogging as “awfully difficult to do,” he identifies where the money truly is: micro-magazines. “There’s room in the market for 100,000 profitable micro-magazines,” he states matter-of-factly. ” … Don’t expect overnight successes in this form of media, but certainly expect that once someone figures out how to be the voice of a tribe, the revenue will take care of itself.”

But before media executives begin throwing their money into “micro magazines” (a quick Google search indicates that Godin seemingly made the term up), they should consider the fact that the blogger doesn’t back up this assertion with any hard data. No focus groups indicating a strong interest in this medium. No charts detailing a steep rise in subscriptions. No statistical comparisons between micro-magazines and regular (macro?) magazines. The post seems to consist solely of his gut reaction to the Newsweek announcement. The only “evidence” he offers is the anecdote of a single magazine, which he predicts will reach “100,000 people with 2 employees.” In fact, many blog posts of Godin’s follow this trend. They offer interesting ideas that may make you think about an issue in a different way, but if you pressed someone to name a directly-applicable idea in the post you’d come up empty handed.

When I interviewed Dan Zarrella I didn’t ask him his opinion on Seth Godin or this specific blog post, but I got the sense that this kind of “gut reaction” advice is the antithesis of his own widely circulated work. On his website Zarrella describes himself as a “social media scientist,” but on the phone he called himself a “half marketer, half developer.”

“When I got into social media and saw the analysis out there, it tends to be soft focus unicorns and rainbows,” he told me. “Things like ‘love your customers’ … and that kind of stuff. It’s stuff that’s hard to disagree with and it sounds nice, but it’s not really based on anything more substantial than ‘feel-good.’ So when you sit around a conference room and say, ‘let’s go make something viral,’ it’s a silly conversation. And if you ask somebody why something goes viral, why ideas spread, the first thing a lot of them will say is, ‘because it was good’ or ‘because it was funny,’ which is easily proven false. There are certainly good ideas that don’t spread and bad ideas that do. And I’m convinced that there’s some kind of other factor, or set of factors or criteria or what have you that lead to ideas and concepts of how things go viral.”

So Zarrella, who works for Hubspot, a marketing software company, sought out to answer these questions using a method ignored by most self-described “social media gurus”: gathering hard, empirical data. Using the open APIs of several major social media platforms, he’s gathered reams of user output that he has then analyzed in-depth. One of his first forays in this field focused on the social-news site Digg, an early example of crowds voting on and curating what kind of news should become “popular.” For this particular study, he “created a database of 33,000 of the 39,000 stories that made Digg’s homepage in 2007 and, using Yahoo!’s API, … tracked how many external links each URL had pointing to it.” This way, he could determine what kinds of stories that made it to Digg’s front page drove the largest number of links from other websites, including blogs. He broke down the stories into categories and then keywords, seemingly creating a scientific playbook that one could use if you wanted to maximize your impact on Digg.

But while Digg provided an interesting look into the mass appeal of content, only a relatively small number of links make it to the front page each day, and the overwhelming majority of the stories submitted to Digg receive fewer than three diggs. Twitter users, on the other hand, produce millions of tweets each day, many of which get retweeted by dozens or even hundreds of other users. In terms of a potential sample size to study the power of viral distribution, Twitter is a goldmine.

“Retweets are this perfect viral mechanism,” Zarrella told me. “People have been telling each other information and spreading ideas for thousands of years, but the retweets to me were the first example that allowed marketers to gather millions of these conversations, this memetic conversation, and analyze what is it that made them go viral.”

Over time, Zarrella was able to amass a database of over 100 million retweets, and once he had done so he began sifting through the data to see what kinds of words, phrases, and themes got the most traction. It was because of this approach that he was able to make assertions like “you should Tweet your links in afternoons, evenings and on weekends” to get more clicks. Those who want more click-throughs should also know that “the more frequently you Tweet links, the fewer clicks you’ll get.” Each study he publishes is typically accompanied by a series of graphs, allowing readers to visualize the data, perhaps making it more digestible. Unlike Godin’s advice, Zarrella’s is based on sound statistical reasoning.

Given the nearly infinite ways you can splice the data, Zarrella was able to divide his results into dozens of posts, each one newsworthy — and potentially viral — in its own right. It reminded me of the stories of how several journalism schools are creating joint programs that try to mash up journalism and computer programming. “One of the most interesting areas to apply it would be to a large content producer, any publisher who’s doing any content and wants to get more social media traffic,” he told me. “And this data gets better more when applied more specifically. For my purposes, I’m trying to generalize best practices. If I were doing it for a specific newspaper, they’d be able to get insights that are much more targeted.”

Perhaps more importantly, this data can potentially be considered the world’s biggest focus group. Most polls and focus groups are limited to relatively small sample sizes — dozens (in the case of focus groups) or maybe a few thousand people (in the case of polls and surveys) — but many of these major social media sites have millions of users. “One of the people in that regard that I’m most inspired by — and I may not always agree with the outcome of his work but you’ve got to respect him for his methods — is [Republican pollster] Frank Luntz. I think beyond being a pollster, he’s done interesting things with video sessions; he’ll take a group of people and show them a video, and they have this dial and they’ll turn it one way if they agree and the other if they disagree. And so he’s found things like if you call it an ‘estate tax’ then everybody’s for it, but if you call it a ‘death tax’ they’re against it, or if you call it ‘oil drilling’ or ‘deep sea oil rigs’, they’re against it. If you call it ‘energy exploration’ then everybody is for it. I’m very inspired by the work he does, but what he does is really really expensive. What he does is a little bit different than what I do, but if you do split testing online you can get much similar results, in fact I’ve worked on political campaigns in the past, and we were really able to fine tune messages for very little investment for things that in years past would have cost an arm and a leg.”

Lately, Zarrella has been tackling a much larger behemoth: Facebook. But he approached the site somewhat differently, focusing on 20 top online news sources that use a Facebook “share” button, one that happens to have an API. For these studies, he not only grabs the headline of what is shared, but also the articles themselves so he can create a much deeper analysis of the content’s viral aspects. “Twitter is really easy, there’s this simple elegance to it that’s very attractive,” he explained. “Facebook is much more mainstream, and so something going really really viral on Facebook is way more important than going really viral on Twitter. When I’m studying twitter I’m tending to be looking at social media geeks. When I’m looking at Facebook I’m looking at a much larger section of the general public. That’s a little bit more interesting in that regard, but it’s a fair bit more complex in that Facebook is a more robust system.” Though some of his results aren’t surprising (did you know that sex sells on Facebook?), other posts, like the breakdown of the “most-sharable word,s” aren’t so obvious.

Zarrella told me that his ultimate goal is to make social media marketing a more “technical” field. So far, most of this kind of research has been conducted in the academic arena, what he called “collecting data for data’s sake.” But his hope is that marketers and users will be able to apply these methods. “You can inform what you’re doing and come up with best practices that are based in more than just unicorns and rainbows.”

But what about the thousands of self-described “social media gurus” on Twitter, the ones who try to get clients to pay them thousands of dollars for consultation simply because they’ve created a Twitter account and read Mashable every day? Do their self-administered accolades annoy Zarrella?

“What I would say is if you’re looking for someone to give social media advice, when you hear that advice and hear supposed best practices, ask yourself what data is backing that up. There’s a difference between empirical data and anecdotal. Ashton Kutcher can get a lot of followers and retweets, but that’s anecdotal evidence, not empirical. The empirical tells a lot more than the exception that is Ashton Kutcher.”

Simon Owens is a journalist and social media consultant. You can follow him on Twitter, read his blog, or email him at [email protected].

—