After two years, I summed up the courage to leave my 18-month toddler home with my husband (he had a day off) and take up a professional assignment. All three of us did fine. I delivered a talk about how data and journalism mix at Navrachana University in Vadodara. Here it is:
I am a writer, designer and educator and one of the things that interest me is to track the trends in the media. Today, I am going to be talking about data journalism, which seems like a new and upcoming field, but actually dates back to the times of Napolean in the 1800s. But we're not going to go that far back today. We're talking about it now because what we have today is big data.
Let's start with breaking up the words. What is data (though the plural of datum is data, I am going to use it with the singular here)? Anything that stimulates – words, numbers, images and sound – could be data.
Suppose I ask you to do an assignment about the traffic situation at Genda Circle. You'll probably go there three to four times a day everyday for a week. Some of you might take pictures, some might ask drivers and pedestrians for quotes, others might get numbers of vehicles from the traffic department and create graphs to depict the story and some others might record the sounds and note the decibel levels. Some of you might get all this data together to tell your story. Why do you do that? To bring in objectivity to your story. Everyone knows what it feels like to be stuck in peak traffic. But how bad it is can be told through persons' accounts, numbers of vehicles, noise decibels, etc.
Now we look Facebook. You upload a picture to share with your family and friends. Your friends like it. You get comments and you respond to them. Then, after a few hours, you upload something else and then something else. Slowly, the act of putting the first picture up erases from your memory. But not Facebook's. They remind you again and again. They have the data. You are data. What you do is data. What your friends do is data. How you respond to your friends is data. Your choices, relationships, the matter of your existence are all data. Even if you delete your account, they still have your data. Anything on the internet stays forever.
When we look at journalism, it comprises three parties:
The source of the information
The collector, processor and disseminator of the information
The consumer of the information
How do you find a story? If you flip through news channels, or pages of newspapers and magazines from any part of the world, you'll find broadly the stories talk about trends – social, political, business, economic, health, sports, lifestyle, fashion, entertainment, etc. What is trending? So you either spot a trend, or track a trend or forecast a trend. You can do all of these, if you have the data. Then you can tell the story in the classic inverted pyramid format that you use in the media – the conclusion of the story in the first paragraph followed by the support for your inference in the form of quotes, statistics and references and end it with suggestions or speculations.
Before the advent of smartphones, media houses were powerful organizations. You owned the information you got, protected your sources, analysed it your way and chose the pace of disseminating the information. Objectivity has always been a questionable issue but it is also a relative term. You tried to put out the facts in a way that would be in the interest of the public. There was a wide space for advocacy and enlightenment.
After smartphones and social networks came into existence, everybody with a phone is a creator and disseminator of information. Media houses have lost that advantage. There are too many sources to fact-check and there is little you can do to protect a source. You can't choose to sit on a story to get all the angles in. The spread happens as soon as you get the news. What is now happening is traditional media houses release a piece of information and an army of 'fact-checkers' out there corroborate the news. If the information is wrong, you get trolled. And unfortunately, instead just apologising and moving on, many media get into spats with their trolls in the digital space. Reactions, hate-speeches, verbal diarrhoea, finger-pointing follow in many cycles. There are cries that journalism is under attack.
So how do you move forward in this environment? You work with data. The more there is, the greater chances of objectivity. As you see in research. A sample size of 30 and a sample of 30,000 can tell different stories for the same population. Let's look at the sources. Who your source is determines your bias in the story. Multiple sources that corroborate facts helps diminishing the bias. There are paid and unpaid sources of information, as you can see and how you negotiate with them to get information.
A lot of information comes as text and graphics. Here are some of the formats you get - the good ones, Excel, CSV, XML, Spreadsheets and the bad ones, Word, PDF, HTML, Powerpoint.
Let's say you've been given a few chapters of a novel and you have to guess its genre. You don't have time to read all the chapters. So what do you do? You could flip through the pages and look for keywords such as romance, love, murder, mystery, fight, siblings, business, etc. and make a guess or you could run a software with some of these keywords and feed in a program which basically says that if there are 10 instances of the word 'love' in those pages, it could lean towards romance as a genre. It could be a murder mystery with romance in it, too. This is not foolproof. But it cuts out some of the options. When you have large data-sets, you need to sort and filter. Here's what you typically use in a media-house. You go to the computer geek in the department and ask him to sift through large files on say, Excel.
This is what Wikileaks did in 2008-09. They leaked a huge US military database from Afghanistan to media around the world. Guardian, a newspaper, in UK picked through the 92000-odd rows of data, sorted it and cleaned it and then they discovered certain keywords like IED attacks, ambush attacks, etc. With that in hand, they broke stories about how the war in Afghanistan was bleeding the NATO troops. However, they never compromised the security of the troops.
One of my favorite people who worked with data is the late Dr Hans Rosling. Do look up Gapminder and his book, Factfulness. If you spend your days scrolling through news and social media sites, you'll feel the world is pretty awful with hate, greed, climate change issues, poverty, wars, crimes, deaths, pollution, etc. Dr Rosling proves, with data, that it has never been a better time in the history of humans to be alive and for this long. He does maps, graphs, interactive bubbles, etc, to tell you stories through videos about population, public health, economies, growth, climate change. He compares the lives of people who earn $1 (level 1), $4 (level 2), $16 (level 3) and $32 (level 4) a day (which your parents must be earning otherwise you wouldn't be studying here), how different they are at each level, yet how similar they are across the world. Out of a world population of 7 billion, only 1 billion live like us! Yet, this is better than all of human history for tens and thousands of years. What he has also done is collected pictures of people in different groups from around the world and created what is the Dollar Street. Do have a look.
When you talk about Data Journalism in India, even among the media, there is a vague sense of what it actually entails. Still, one of the best data journalists we have around is Rukmini Shrinivasan. I encourage you to go through her bylines in Huffington Post and The Hindu. There are also websites like Factly.in, Newslaundry and India Spend that actively work with data and put together visualisations to make it easier for the lay person to understand the story. Do check them out.
My effort during this talk has been to merely introduce you to the idea of what data journalism is and how they are put together. It's a complex science of storytelling requiring a combination of skills in statistics, design, artificial intelligence and communications. I hope you do get to scratch the surface some time.