Discussing the ethics, challenges, and best practices of machine learning in journalism

This article was originally published on the Data Journalism Awards Medium Publication managed by the Global Editors Network. You can find the original version right here.

___________________________________________________________________________________________________________________

 

Peter Aldhous of BuzzFeed News and Simon Rogers of the Google News Initiative discuss the power of machine learning in journalism, and tell us more about the groundbreaking work they’ve done in the field, dispensing some tips along the way.

 

Machine learning is a subset of AI and one of the biggest technology revolutions hitting the news industry right now. Many journalists are getting excited about it because of the amount of work they could get done using machine learning algorithms (to scrape, analyse or track data for example). They enable them to do tasks they couldn’t before, but it also raises a lot of questions about ethics and the ‘reliance on robots’.

 

BuzzFeed’s ‘Hidden Spy Planes

 

Peter Aldhous is the brain behind BuzzFeed News’s machine learning project ‘Hidden Spy Planes’. The investigation revealed how US airspace is buzzing with surveillance aircraft operated for law enforcement and the military, from planes tracking drug traffickers to those testing new spying technology. Simon Rogers is data editor for Google (who’s also been contributing to some great work on machine learning, including ProPublica’s Documenting Hate project which provides trustworthy facts on the details and frequency of hate crimes).

We asked both of them to sit down for a chat on the Data Journalism Awards Slack team.

 

What is it about AI that gets journalists so interested? How can it be used in data journalism?

Peter Aldhous: I think the term AI is used way too widely, and is mostly used because it sounds very impressive. When you say ‘intelligence’, mostly people think of higher human cognitive functions like holding a conversation, and sci-fi style androids.

But as reporters, we’re often interested in finding the interesting things from a mass of data, text, or images that’s too big to go through manually. That’s something that computers, trained in the right way, can do well.

And I think machine learning is a much more descriptive and less pretentious label for that than AI.

Simon Rogers: There is a big gap between what we’ve been doing and the common perception of self aware machines. I look at it as getting algorithms to do some of the more tedious work.

 

Why and when should journalists use machine learning?

P.A.: As a reporter, only when it’s the right tool for the job — which likely means not very often. Rachel Shorey of The New York Times was really good on this in our panel on machine learning at the NICAR conference in Chicago in March 2018.

She said things that have solved some problems almost as well as machine learning in a fraction of the time:

– Making a collection of text easily searchable;

– Asking a subject area expert what they actually care about and building a simple filter or keyword alert;

– Using standard statistical sampling techniques.

 

What kind of ethical/security issues does the use of machine learning in journalism rise?

P.A.: I’m very wary of using machine learning for predictions of future events. I think data journalism got its fingers burned in the 2016 election, failing to stress the uncertainty around the predictions being made.

There’s maybe also a danger that we get dazzled by machine learning, and want to use it because it seems cool, and forget our role as watchdogs reporting on how companies and government agencies are using these tools.

I see much more need for reporting on algorithmic accountability than for reporters using machine learning themselves (although being able to do something makes it easier to understand, and possible to reverse engineer.)

If you can’t explain how your algorithm works to an editor or to your audience, then I think there’s a fundamental problem with transparency.

I’m also wary of the black box aspect of some machine learning approaches, especially neural nets. If you can’t explain how your algorithm works to an editor or to your audience, then I think there’s a fundamental problem with transparency.

S.R.: I agree with this — we’re playing in quite an interesting minefield at the moment. It has lots of attractions but we are only really scratching the surface of what’s possible.

But I do think the ethics of what we’re doing at this level are different to, say, developing a machine that can make a phone call to someone.

 

‘This Shadowy Company Is Flying Spy Planes Over US Cities’ by BuzzFeed News

 

 

What tools out there you would recommend in order to run a machine learning project?

P.A.: I work in R. Also good libraries in Python, if that’s your religion. But the more difficult part was processing the data, thinking about how to process the data to give the algorithm more to work with. This was key for my planes project. I calculated variables including turning rates, area of bounding box around flights, and then worked with the distribution of these for each planes, broken into bins. So I actually had 8 ‘steer’ variables.

This ‘feature engineering’ is often the difference between something that works, and something that fails, according to real experts (I don’t claim to be one of those). More explanation of what I did can be found on Github.

 

There is simply no reliable national data on hate crimes in the US. So ProPublica created the Documenting Hate project.

 

S.R.: This is the big change in AI — the way it has become so much easier to use. So, Google hat on, we have some tools. And you can get journalist credits for them.

This is what we used for the Documenting Hate project:

 

 

It also supports a tonne of languages:

 

 

With Documenting Hate, we were concerned about having too much confidence in machine learning ie restricting what we were looking for to make sure it was correct.

ProPublica’s Scott Klein referred to it as an ‘over eager student’, selecting things that weren’t right. That’s why our focus is on locations and names. Even though we could potentially widen that out significantly

P.A.: I don’t think I would ever want to rely on machine learning for reporting. To my mind, its classifications need to be ground-truthed. I saw the random forest model used in the ‘Hidden Spy Planes’ story as a quick screen for interesting planes, which then required extensive reporting with public records and interviews.

 

What advice do you have for people who’d like to use machine learning in their upcoming data journalism projects?

P.A.: Make sure that it is the right tool for the job. Put time into the feature engineering, and consult with experts.

You may or may not need subject matter expert; at this point, I probably know more about spy planes than most people who will talk about them, so I didn’t need that. I meant an expert in processing data to give an algorithm more to work with.

Don’t do machine learning because it seems cool.

Use an algorithm that you understand, and that you can explain to your editors and audience.

Right tool for the job? Much of the time, it isn’t.

Don’t do this because it seems cool. Chase Davis was really good in the NICAR 2018 panel on when machine learning is the right tool:

  • Is our task repetitive and boring?
  • Could an intern do it?
  • If you actually asked an intern to do it, would you feel an overwhelming sense of guilt and shame?
  • If so, you might have a classification problem. And many hard problems in data journalism are classification problems in disguise.

We need to do algorithmic accountability reporting on ourselves! Propublica has been great on this:

 

But as we use the same techniques, we need to hold ourselves to account

S.R.: Yep — this is the thing that could become the biggest issue in working with machine learning.

 

What would you say is the biggest challenge when working on a machine learning project: the building of the algorithm, or the checking of the results to make sure it’s correct, the reporting around it or something else?

 

P.A.: Definitely not building the algorithm. But all of the other stuff, plus feature engineering.

S.R.: We made a list:

  • We wanted to be sure, so we cut stuff out.
  • We still need to manually delete things that don’t fit.
  • Critical when thinking about projects like this — the map is not the territory! Easy to conflate amount of coverage with amount of hate crimes. Be careful.
  • Always important to have stop words. Entity extractors are like overeager A students and grab things like ‘person: Man’ and ‘thing: Hate Crime’ which might be true but aren’t useful for readers.
  • Positive thing: it isn’t just examples of hate crimes it also pulls in news about groups that combat hate crimes and support vandalized mosques, etc.

It’s just a start: more potential around say, types of crimes.

I fear we may see media companies use it as a tool to cut costs by replacing reporters with computers that will do some, but not all, of what a good reporter can do, and to further enforce the filter bubbles in which consumers of news find themselves.

 

Hopes & wishes for the future of machine learning in news?

P.A.: I hope we’re going to see great examples of algorithmic accountability reporting, working out how big tech and government are using AI to influence us by reverse engineering what they’re doing.

Julia Angwin and Jeff Larson’s new startup will be one to watch on this:

 

 

I fear we may see media companies use it as a tool to cut costs by replacing reporters with computers that will do some, but not all, of what a good reporter can do, and to further enforce the filter bubbles in which consumers of news find themselves.

Here’s a provocative article on subject matter experts versus dumb algorithms:

 

 

 

Peter Aldhous tells us the story behind his project ‘Hidden Spy Planes’:

‘Back in 2016 we published a story documenting four months of flights by surveillance planes operated by FBI and Dept of Homeland Security.

I wondered what else was out there, looking down on us. And I realised that I could use aspects of flight patterns to train an algorithm on the known FBI and DHS planes to look for others. It found a lot of interesting stuff, a grab bag of which mentioned in this story.

But also, US Marshals hunting drug cartel kingpins in Mexico, and a military contractor flying an NSA-built cell phone tracker.’

 

Should all this data be made public?

Interestingly, the military were pretty responsive to us, and made no arguments that we should not publish. Certain parts of the Department of Justice were less pleased. But the information I used was all in the public, and could have been masked from flight the main flight tracking sites. (Actually DEA does this.)

US Marshals operations in Mexico are very controversial. We strongly feel that highlighting this was in the public interest.

 

About the random forest model used in BuzzFeed’s project:

Random forest is basically a consensus of decision tree statistical classifiers. The data journalism team was me, all of the software was free and open source. So it was just my time.

The machine learning part is trivial. Just a few lines of code.

 

 

If you had had a team to help with this, what kinds of people would you have included?

Get someone with experience to advise. I had excellent advice from an academic data scientist who preferred not to be acknowledged. I did all the analysis, but his insights into how to go about feature engineering were crucial.


marianne-bouchart

Marianne Bouchart is the founder and director of HEI-DA, a nonprofit organisation promoting news innovation, the future of data journalism and open data. She runs data journalism programmes in various regions around the world as well as HEI-DA’s Sensor Journalism Toolkit project and manages the Data Journalism Awards competition.

Before launching HEI-DA, Marianne spent 10 years in London where she worked as a web producer, data journalism and graphics editor for Bloomberg News, amongst others. She created the Data Journalism Blog in 2011 and gives lectures at journalism schools, in the UK and in France.

The future of news is not what you think and no, you might not be getting ready for it the right way

This article was originally published on the Data Journalism Awards Medium Publication managed by the Global Editors Network. You can find the original version right here.

_______________________________________________________________________________________________________________________

 

Editors, reporters and, anyone in news today: how prepared are you for what is coming? Really. There is a lot of talk right now on new practices and new technologies that may or may not shape the future of journalism but are we all really properly getting ready? Esra Dogramaci, member of the Data Journalism Awards 2017 jury and now working as Senior Editor on Digital Initiatives at DW in Berlin, Germany, thinks we are not. The Data Journalism Awards 2017 submission deadline is on 10 April.

 

Esra Dogramaci, Senior Editor on Digital Initiatives at DW, Photo: Krisztian Juhasz

 

Before joining DW, Esra Dogramaci worked at the BBC in London and Al Jazeera English, amongst others. She discusses here the preconceived ideas people have about the future of journalism and how we might be getting it all wrong. She also shares some good tips on how to better prepare for the journalism practices of the future as well as share with us her vision of how the world of news could learn from the realm of television entertainment.

 

What do you think most people get wrong when describing the future of journalism?

 

There are plenty of people happy to ruminate on the future of journalism — some highly qualified such as the Reuters Institute and the Tow Center who make annual predictions and reports based on data and patterns while others go with much less than that. Inevitably, people get giddy about technology — what can we do with virtual reality (VR), augmented reality (AR), artificial intelligence (AI), personalisation (not being talked about so much anymore), chatbots, the future of mobile and so on. However with all this looking forward to where journalism is headed (or rather how technology is evolving and, how can journalism keep pace with it), are we actually setting ourselves and journalism students up with all that is needed for this digital future? I think the answer is no.

 

What is, according to you, a more adequate description (or prediction) of the future of news?

 

If we’re talking about a digital future, the journalists of tomorrow are not equipped with the digital currency they will need.

Technology definitely matters but it’s not so useful when you don’t have people who understand it or can build and implement appropriate strategy to bridge journalism in a digital age. Middle or senior management types for instance, are less likely to know how to approach Snapchat, which they would be less likely to use, than a high school teenager who is using it as a social sharing tool or their primary source of news.

So if we aren’t actually:

1. Listening to our audience and knowing who they are and how they use these technologies, and

2. Bringing in people who know how to use these tools that speak to and with the audience,

…the efforts are going to be laughable at worst and dismissed at best.

In essence, technology and those who know how to use, develop and iterate it go together. That’s the future of news. We should be looking forward with technology, but we’ve also got to look back at the people coming through the system that will inherit and step into the – hopefully relevant – foundations we’re building now.

 

“Are we actually setting ourselves and journalism students up with all that is needed for this digital future?”

 

When looking at the evolution of journalism practices over the past few years, which ones fascinate you the most?

 

There are two things that stand out. The first is analytics and the second is the devolution of power, both points are interrelated.

Data analytics have really transformed non-linear journalism. Its instantly measurable, helping people make editorial decisions but also question and understand why content you thought would perform doesn’t. Data allows us to really understand our audience, and come up with content that not just resonates with them but how to package content that they will engage with. For instance a website audience is not going to be the same as your TV audience (TV is typically older and watches longer content but again the data will tell specifics), so clipping a TV package and sticking it on Facebook or YouTube isn’t optimal and suggests to your audience that you don’t understand these platforms and more importantly, them. They will go to another news provider that does.

An example of this was a project where it was traditionally assumed [in one of my previous teams] that the audience was very interested in Palestinian-Israeli conflict and so a lot of stories were delivered about it. However, we discovered through the numbers, on a consistent basis, that the audience wasn’t as interested as assumed, rather people were more into the conflicts in Syria, Yemen as well as Morocco and Algeria stories. These stories and audiences may not have traditionally registered on top of the editorial agenda because of what was historically thought to be in the audiences interest, but our data was suggesting we needed to pay more attention to the coverage in these areas.

Now, that being said, it’s still stunning to see how little analytics are used day to day. There still seems to be a monopoly on the numbers rather than integration into newsrooms. There are a plethora of tools available in making informed editorial or data decisions but generally editors don’t understand them or follow metrics that are not useful because they don’t know how to interrogate the data, or we hear things like ‘I’m an editor, I’ve been doing this for x years, I know better.’

Fortunately though, about 80–90% of editors I find are keen to understand this data-driven decision-making world and once you sit down and explain things, they become great advocates. Ian Katz at BBC Newsnight, Carey Clark at BBC HardTalk are two editors who embody this.

The second area is devolving power. The best performing digital teams are when not all decision-making is consolidated at the top, and you really give people time and space to figure out problems, test new ideas without the pressure always to publish. That’s a very different model to traditional hierarchical or vertical journalism structures. Its an area of change and letting go of power. But empowering the team empowers leaders as well.

An example of this is a team I worked with where all decisions and initiatives went through a social media editor. As a result, there was a bottleneck, and frustration for things not being done and generally being late to the mark on delivering stories and being relevant on platform as competitors were overtaking. What we did is decentralise control — we asked the team what platforms they’d like to take responsibility for (in addition to day to day tasks) and together came up with objectives and a proposition to deliver on those. The result? Significant growth across the board, increase in engagement but perhaps most importantly, a happier team. That’s what most people are looking for: recognition, responsibility, autonomy. If you can keep your team happy, they are going to be motivated and the results will follow.

 

Global Headaches: the 10 biggest issues facing Donald Trump, by CNN

 

 

Do you have any stories in mind that represent best what you think the future of newsmaking will look like?

 

CNN digital did this great Global Headaches project ahead of the US elections last year.

The project was on site (meaning that traffic was coming to the site and not a third party platform), made for mobile which would presumably reflect an audience coming mainly from mobile, used broadcast journalists and personalities as well as regular newsgathering, with an element of gamification. Each scenario had an onward journey which then takes your reader out of the game element and into the story.

 

Example from the “onward journey” with the CNN “Global Headaches” project

 

This isn’t a crazy high tech innovation but it is something that would have been much harder to pull off say 5 years ago. This example is multifaceted and making use of the tools we have available today in a smart way. It demonstrates that CNN can speak to the way their audience is consuming content while fulfilling its journalistic remit.

Examples like this doesn’t mean we should be abandoning long form text for instance and going purely for video driven or interactive stories. The Reuters Institute found last year (in their report The Future of Online News Video) that there is oversaturation of video in publishing and that text is still relevant. So, I would caution against throwing the text baby out with the bathwater, which then comes down to two things:

  1. Know your audience and do so by bringing analytics into the newsroom (it’s still slightly mind boggling the number of newsrooms who do not have any analytics in the editorial process)
  2. Come up with a product that you love and that works. The best of these innovations are multidisciplinary and do something simple using the relevant tools we have, that are accessible today. There’s no use investing in a VR project if the majority of your audiences lack the headsets to experience it.

 

Do you think news organisations are well equipped for this digital future?

 

Yes and no. There are the speedboats like Quartz, AJ+, NowThis, Vox, who can pivot quickly and innovate versus the bigger media tankers that turn very slowly. One question I get asked quite a bit is “what’s the most important element in digital change”. The answer is leadership. There needs to be someone(s) who understands, supports and pushes change, otherwise everyone down the ranks will continue to struggle and face resistance.

I truly believe in looking at the people who are on the ground, rolling up their sleeves and getting the work done, trying, failing, succeeding, and who keep persevering — versus always deferring to editors who have been in place for say 10 years to lead the way. Those people in the trenches are the ones we should be shining the light on and listening to. They are much closer to the audience and can give you usable insights that also go beyond numbers.

If I could name a few, people like Carol Olona, Maryam Ghanbarzadeh at the BBC, Alaa Batayneh or Fatma Naib, at Al Jazeera, Jacqui Maher at Conde Nast, need to be paid attention to. You may not see them at conferences or showcased much but by having people like them in place, news organisations are well equipped for a digital future.

 

Do you see some places in the world (some specific organisations maybe?) that are actually doing better than others on that front?

 

The World Economic Forum wouldn’t traditionally be associated as being a digital media organisation, but a few years ago they started to invest in social media and develop an audience that normally would not be interested in them. They take data and make it relevant and accessible for low cost, bite size social consumption.

Take this recent video for example:

 

Your brain without exercise, a video by the World Economic Forum
And also this related one:

 

Best of 2016 social video by the World Economic Forum

 

There is also this NYT video of Simone Biles made ahead of the 2016 summer Olympics which then has the option of taking you to an onward site journey.

The Financial Times hasn’t been afraid of digital either. You see them taking interesting risks which might go over a lot of people’s heads but the point is they’re trying. Like in their project “Build your own Kraft Heinz takeover”.

 

 

Then there are the regular suspects — AJ+ isn’t trying to do everything, they’re trying to be relevant for a defined audience on the platforms that audience uses. Similarly, Channel 4 News isn’t pumping out every story they do on social, but deliberately going for emotionally charged stories rather than straight reporting as well as some play with visualising data.

 

What would you like to see more of in newsrooms today which would actually prepare staff better for what’s coming?

 

When you’re hiring new staff, assign them digital functions and projects rather than putting them on the traditional newsroom treadmill. A lot of organisations have entry level schemes and this could easily be incorporated into that model. That demonstrates that digital is a priority from the outset. You could also create in house lightning attachments, say a six-week rotation at the end of which you’re expected to deliver something ready for publishing, driven by digital. My City University students were able to come up with a data visualization in less than an hour, and put together a social video made on mobile in 45 minutes (social or mobile video wasn’t even on the course but I snuck it in). Six weeks in a newsroom is plenty of time for something substantial.

Also, have the right tools in place and ensure that everyone is educated on the numbers. Reach and views for instance get thrown around a lot- they are big easy numbers to capture and comprehend, but we need to make a distinction between what is good for PR versus actionable metrics in the newsroom. As more people clue into what matters, I do think (and we see in certain places like Newswhip for instance) where success is based on engagement, interactions and watchtime rather than views, impressions or reach.

Finally and obviously, its devolution of power and more risk taking. Make people better by empowering them — that means carve out the time and space to experiment without the pressure to deliver or publish. When you are continually driving staff against deadlines, creativity suffers. Fortunately there are so many third party tools and analytics that will very quickly tell you what’s working and what’s not, contributing to a much more efficient newsroom freeing up valuable time to think and experiment. Building multi disciplinary teams is a good step in this direction. DW is experimenting with a “lab like” concept bringing together editorial, technical and digital folks in an effort to bring the best of all worlds together and see what magic they come up with.

 

From your experience teaching social and digital journalism at City University London, what can you say about the way the younger generation of journalists is being trained for the future? Do they realise what’s at stake?

 

At the beginning of term, I heard quite a few students say that digital didn’t matter, it wasn’t “real journalism” and that they were taking the class merely because it was perceived as an “easy pass”. That’s because the overall coursework, emphasized magazine and newspaper journalism. At the end of the term, and almost on a weekly basis since, my former students write to me about either digital projects they have done, digital jobs they are going for or how something we went over in the class has led to another opportunity.

There remains a major emphasis on traditional broadcast journalism — TV, radio, print, but very little for digital. That’s not something to fault students on. Digital is changing constantly but teaching staff mainly reflect the expertise of the industry, and that expertise is traditional. While there are a lot of digital professionals, it does not come close to the level of expertise and experience currently on offer at institutions training the next journalist generation. That being said organisations like Axel Springer have journalism academies where all of their instructors, are working full time in media and can translate the day to day relevance into the classroom. That’s more of the kind of thing we need to have.

The students I think do realise what’s at stake because a lot of those journalism jobs they’re applying for all require some level of digital literacy. Sure everyone might watch a YouTube video but what happens when an Editor asks you why a news video has been uploaded and monetised by other users elsewhere. Would you know what to do?

 

What could be done to improve the educational system in the UK and beyond? Simply make journalism courses more digitally focussed?

 

There is nothing that will compel places to change but reputation. If students are leaving institutions because what they are learning is not preparing them to meet the demands of the industry they’re choosing to go into, word will spread sooner than later. There will surely be visionary institutions who ‘get it’ and adapt, some are there already.

‘Smart’ places will build in digital basics so students can have the confidence to hit the ground running. I see this in a lot of digital job requirements. It’s a given that anyone starting in journalism in 2017 has basic social media literacy. Beyond that everything is a bonus — how can you file from a mobile phone, can you interpret complex data and tell a story with it. Then, are you paying attention to analytics?

As Chris Moran (Guardian) had pointed out:

 

“staff blame the stupid internet for low page views on a piece…but credit the quality of the journalism when one hits the jackpot.”

We need a much more sophisticated understanding beyond yes/no answers to points like these.

A lot of media houses have academies or training centres expected also to bridge digital gaps. The caution there is that the trainings they offer when it comes to things beyond CMS, uploading video, etc., is that other digital knowledge seem to fall in the “nice to know” rather than “you need this” category. The best thing is to find the in-house talents who know what they’re talking about and get them to lead the way.

 

Another recurrent question when talking about our digital future is the question of business models for news organisations. As the latter are under continual financial strain, you actually think we should get inspiration from the entertainment industry. Can you elaborate on this idea?

 

Yes. The entertainment industry always has a much larger creative capacity and funding so they are able to take more risks with less at stake. That’s where we should be looking and seeing what the obvious news applications could be rather than trying to build our own innovations all the time. Most news houses just cannot compete with entertainment budgets. Jimmy Fallon showcased Google Tilt brush in January 2016:

 

 

https://www.youtube.com/watch?time_continue=2&v=Dzy7ydbEyIk

 

 

I then saw it in November 2016 at a Google News event but have yet to see anyone use it in a meaningful news application. It doesn’t necessarily mean that all these things will be picked up on, but it does mean we should keep a finger on the pulse of what’s possible. Matt Danzico, now setting up a Digital News Studio at NBC is in a unique position. He’s in the same building as Late Night, SNL, and others. That means he has access to all the funky things entertainment is coming up with and can think about news applications for it.

Similarly, how can news organisations think about teaming up with Amazon or Netflix for instance and start to make their content more accessible? These media giants have the capacity to push creative boundaries and invest, and news organisations have their journalistic expertise to offer in that relationship. That’s very relevant in this time of “fake news”.

 

You have recently been appointed Senior Editor of Digital at DW in Berlin. Can you tell us more about what this position entails and the type of projects you’ll be doing? How different is it from what you’ve done in the past at the BBC and Al Jazeera for example?

 

DW is in a position familiar to many broadcasters, and that is a slight shift away from linear broadcasting to a considerable foray into digital. The difference is that DW is not starting from zero, with plenty of good (and bad) examples around to learn from. The first thing is to set a good digital foundation — getting the right tools in house and bringing people along on the digital journey — in a nutshell increasing literacy and comfort with digital. Once that is done I think you’ll see a very sharp learning curve and a lot more ambitious digital projects and initiatives coming from DW.

We’re very lucky that we have a new Editor in Chief, Ines Pohl and new head of news, Richard Walker, both infused with ideas and energy of making a great digital leap. Complementary to that we have a new digital strategy coming from the DG’s office which I’ve been involved with in addition to a new DW “lab like” concept, as I mentioned before. A lot of people might not know how big DW is — there are 30 language services and English is the largest of those, so getting all systems firing digitally is no small task.

Compared to BBC or AJ, the scope and scale of the task is of course much bigger. At AJ we had a lot of free range in the beginning because no one was doing what we did, at the BBC, there was much more process involved, less risk taking. Based on those experiences, DW is somewhere in the middle, a good balance. 2017 could be the year where stars align for DW. There are approximately 12 parliamentary or national elections in Europe and DW knows this landscape well. So bringing together the news opportunities, a willingness to evolve and invest in something new along with leadership that can really drive it, I think DW will be turning heads soon.

 


marianne-bouchart

Marianne Bouchart is the founder and director of HEI-DA, a nonprofit organisation promoting news innovation, the future of data journalism and open data. She runs data journalism programmes in various regions around the world as well as HEI-DA’s Sensor Journalism Toolkit project and manages the Data Journalism Awards competition.

Before launching HEI-DA, Marianne spent 10 years in London where she worked as a web producer, data journalism and graphics editor for Bloomberg News, amongst others. She created the Data Journalism Blog in 2011 and gives lectures at journalism schools, in the UK and in France.

 

How three women are influencing data journalism and what you can learn from them

This article was originally published on the Data Journalism Awards Medium Publication managed by the Global Editors Network. You can find the original version right here.

________________________________________________________________________________________________________________________

 

Stephanie Sy of Thinking Machines (Philippines), Yolanda Ma of Data Journalism China and Esra Dogramaci of Deutsche Welle, formerly Al Jazeera (Germany), new members of the Data Journalism Awards jury, talk innovation, data journalism in Asia and the Middle East, and women in news.

left to right: Yolanda Ma (Data Journalism China), Esra Dogramaci (Deutsche Welle, formerly BBC and Al Jazeera), and Stephanie Sy (Thinking Machines) join DJA Jury

 

We welcomed three new members to the Data Journalism Awards jury last year (pictured above). They are all women, strong-willed and inspiring women, and they represent two regions that are often overlooked in the world of data journalism: Asia and the Middle East.

What was your first project in data journalism or interactive news and what memory do you keep from it?

Esra Dogramaci: In 2012, Invisible Children launched a campaign to seek out Lord’s Resistance Army(LRA) leader Joseph Kony and highlight the exploitation of child soldiers. Then, at Al Jazeera, we wanted to see what people in North Uganda, who lived in one of the areas who were affected by the LRA actually had to say about it. They would ‘speak to tweet’ and we would map their reactions on Ushahidi using a Google Fusion table in the background.

 
Uganda Speaks by Al Jazeera

 

Although Al Jazeera had started doing this kind of projects back in 2009 during the war on Gaza (the experiment’s page of the Al Jazeera Lab website has now disappeared but can be viewed through WebArchive.org), it picked up steam during Egypt’s 2011 Arab Spring where, due to lack of broadcast media coverage, protesters were using social media to bring attention to what was happening.

Interactive story by Thinking Machines

 

Stephanie Sy: Our first data journalism project as a team at Thinking Machines was a series of interactive stories on traffic accidents in Metro Manila. We cleaned and analysed a set of Excel sheets of 90,000 road accidents spanning 10 years.

It was the first project we worked on as a mixed team of journalists, designers, and data scientists, and the first time we tried to build something from scratch with d3.js! I worked on the d3 charts, and remember being in utter despair at how hard it was to get the interactive transitions to render nicely across different browser types. It was surprisingly well received by the local civic community, and that positive feedback emboldened us to keep working.

 
Connected China, Thomson Reuters

 

Yolanda Ma: One of my first projects was Connected China for Thomson Reuters, which tracked and visualised the people, institutions and relationships that form China’s elite power structure (learn more about it here).

This project taught me the importance of facts and every piece of data in it (thousands, if not millions in total) went through a rigid fact-checking process (by human beings, not machines, unfortunately). I learned by doing that facts are the bones of data journalism, not fancy visualisations, even though this project turned out to be fancy and cool, which is good too.

 

Now, what was the latest project you worked on and how do the two compare?

 

ED: Towards the end of last year, I taught a data journalism module to City University London Master’s students who were able to pull together their own data visualisation projects in the space of an hour. The biggest difference is how vastly the interfaces have improved and how quick and intuitive the designs and interactive softwares are now. There are a lot more companies switched on to storytelling beyond TV or text and that knowledge combined, how do you stand out in the world of online news?

Complementary to that Al Jazeera was always a front runner because they were willing to take risks and try something new when no one else was. In the newsrooms I’ve worked at or see since, there is still a general aversion to risk taking in preference of safety — though everyone knows that to survive and thrive in this digital media landscape, its risk taking, innovation that is going push those boundaries and really get you places.

SS: Our latest related data story is a piece we put together visualising traffic jams across Metro Manila during the holiday rush season. This time we were looking at gigabytes of Waze jams data that we accessed through the Waze API. It definitely grew out of our early work in transit data stories, but reflects a huge amount on growth in our ability to handle complex data, and understanding of what appeals to our audience.

One big piece of learning we got from this is that our audience in the Philippines mainly interacts with the news through mobile phones and via Facebook, so complex d3 interactives don’t work for them. What we do now is to build gifs on top of the interactives, which we then share on Facebook. You can see an example of that in the linked story. That gets us a tremendous amount of reach, as we’re able to communicate complex results in a format that’s friendly for our audience.

YM: I’ve been doing data journalism training mostly in the past few years and helping others do their data projects, so nothing comparable really. The latest project I worked on is this Data Journalism MOOC with HKU in partnership with Google News Lab. It is tailored-made for practitioners in Asia, and it’s re-starting again soon (begins March 6), so go on and register before it’s too late!

 

What excites you about the future of data journalism and interactive news?

 

ED: The ability to tell stories in a cleaner, more engaging way. Literally everything can be turned into a story just by interrogating the data, being curious and asking questions. The digital news world has always been driven by data and it’s exciting to see how “traditional” journalism is embracing this more. I love this example from Berliner Morgenpost where they charted this bus line in Berlin, combined with a dash cam comparing various data such as demographics, voting. Its an ingenious way of taking complex data and breaking it into a meaningful, engaging way rather than pie charts.

M29 from Berliner Morgenpost

 

SS: There are tremendous amounts of data being generated in this digital age, and I think data journalism is a very natural evolution of the field. Investigative journalists should be able to use computer science skills to find their way through messy datasets and big data. It’s absolutely reasonable to expect that a news organization might get a 1 terabyte dump of files from a source.

YM: It excites me because it is the future. We live in the age of data, and the inevitable increasing amount of data available means there is growingly huge potential for data journalism. People’s news consumption is also changing and I believe personalisation is one of the key characteristics for the new generation of consumers, which means interactive news — interactive in many different ways — will thrive.

 

How are Asian and Middle Eastern media organisations (depending on your experience) doing in terms of data journalism and interactive news compared to the rest of the world?

 

ED: I think Al Jazeera has always been a pioneer in this. They have a great interactive team that drew together people from various disciplines within the organisation — coders, video people, designers, journalists — before everyone else was doing it and they’ve been able to shed light on stories that wouldn’t usually be picked up on by mainstream media radars.

Example that illustrates my point: The project “Broken homes, a record year of home demolitions in occupied East Jerusalem” by Al Jazeera

“Broken homes, a record year of home demolitions in occupied East Jerusalem” by Al Jazeera

 

SS: We have a few media organisations like the Philippine Center for Investigative Journalism, Rappler, and Inquirer who have been integrating data analysis into their reporting, but there isn’t anyone regularly producing complex data journalism pieces.

Our key problem is the lack of useful datasets. A huge amount of work goes into acquiring, cleaning, and triple checking the raw data. Analysis is “garbage in, garbage out” and we can’t create good data journalism without the presence of good data. This is where the European and North American media organisations have an edge. Their governments and civic society organisations follow open data standards, and citizens can request data [via FOIA]! The Philippine government has been making serious progress towards more open data sharing, and I hope they’re able to sustain that commitment.

Example that illustrates my point: PCIJ’s Money Politics project is a great example of an organisation doing the data janitorial work of acquiring and validating hard-to-find data. During our last presidential elections in 2015, GMA News Network and Rappler both created hugely popular election tracking live data stories.

PCIJ’s Money Politics

 

YM: Media organisations in Asia are catching up on data journalism and interactive news. There are some challenges of course, for example, lack of data in less developped countries, lack of skills and talents (and limited training opportunities), and even poor infrastructure or unstable internet especially in rural areas that would limit the presentation of news stories. Despite the difficulties, we do see good works emerging, though not necessarily in English. Check out some of the stories from the last GIJN’s Investigative Journalism Conference held in Nepal and you’ll get an idea.

Example that illustrates my point: This Caixin Media data story analysed and visualised the property market in China for the past few years.

 

Another New Normal, Caixin Media

 

What view do you have on the role of women in the world of news today? How is it being a woman in your respective work environment? Do you feel it makes a difference? If so, which one and why?

 

ED: Women are underrepresented not just in news coverage but in leadership positions too. I have to admit though that being at Deutsche Welle, I see a lot more women in senior management and it feels like a much more egalitarian working environment. However looking at my overall experience as a woman in news, you do face a lot of sexism and prejudice. Every woman I know has a story to tell and when the latest story about Uber came out a lot of my female colleagues around me were nodding their heads.

What got me through challenging times is having a fantastic network of female role models and mentors who are there to support you. That was one piece of advice I gave to prior teams, get a mentor. A lot of women feel isolated or feel the way they are treated is normal but it’s not. Women should also be aware that there is a real risk you will be punished if you speak up, challenge the status quo and tow the party line. If this happens, it’s an environment or team you probably shouldn’t be in anyway.

SS: It’s alarming to see parties around the world trying to stifle the voices of anyone who doesn’t belong and calling any news that doesn’t flatter them as “fake news.”. It’s important for us to speak up as women, and to practice intersectionality when it comes to other marginalised communities. As people who work with data, we can see past the aggregates and look at the complex messy truth. We must be able to communicate that complexity in order for our work to make a difference.

YM: Most of the data journalism teams in China are led by woman, and I think they are doing really well 🙂

 

What do you think makes a great data journalism project? What will you be looking for when marking projects for the Data Journalism Awards this year?

 

ED: Simplicity. It’s easy to get lost in data and try to do too much, but it’s often about taking something complex and making it accessible for a wider audience, getting them to think about something they haven’t or perhaps consider in a different way. I’ll be looking for the why — why does this matter, does this story or project make a dent in the universe?

After all, isn’t that what telling stories is about? The obvious thing that comes through is passion. It’s also something obvious but you can tell when a person or team has cared and really invested into the work versus projects being rolled off a conveyor belt.

SS: A great data journalism project involves three things: novel data, clever analytical methods, and great communication through the project’s medium of choice. I’m hoping to see a wide variety of mediums this year!

Will someone be submitting an audio data journalism project? With all the very exciting advances in the field of artificial intelligence this year, I’m also hoping to see projects that incorporate machine learning, and artificial intelligence.

YM: I believe data journalism is after all journalism — it has to reveal truth and tell stories, based or driven by data. I’ll be looking for stories that do make an impact in one way or another.

 

If you had one piece of advice for people applying for the Data Journalism Awards competition, what would it be?

 

ED: Don’t be intimidated by the competition or past award winners. Focus on what you do best. I say this especially for those applying for the first time, I see a lot of hesitation and negative self talk of ‘I’m not good enough’ etc. In every experience there’s something to learn, so don’t hesitate.

SS: Don’t forget to tell a story! With data science methods, it’s easy to get lost in fancy math and lose track of the narrative.

YM: Tell us a bit about the story behind your story — say, we may not know how hard it might be to get certain data in your country.

 

What was the best piece of advice you were ever given in your years of experience in the media industry?

 

ED: Take every opportunity. That’s related to a quote that has been coming up over and over again for the past week or so, “success is when preparation meets opportunity.”

SS: One of my best former bosses told me to imagine that a hungover, unhappy man with a million meetings that day was the only reader of my work. He haunts me to this day.

YM: I started my career with the ambition (like many idealistic young people) to change China. My first (and second) boss Reg Chua once said to me, don’t worry about changing China but focus on making small changes and work with a long-term vision. Sounds cliche.

He said that to me in 2012. The next year, together with two other friends I started DJChina.org, which started in 2013 as a small blog and now grown to be one of the best educational platforms for data journalism practitioners in China. The year after, in 2014, Open Data China was launched (using the domain name I registered a few years back), and indicated a bottom-up movement to push for more open data, which was incorporated into national policy within a year. So I guess all these proved that Reg was right, and it could be applied to anywhere, or anything. Think big, act small, one story (or project) at a time, and changes will happen.

 


left to right: Yolanda Ma (Data Journalism China), Esra Dogramaci (Deutsche Welle, formerly BBC and Al Jazeera), and Stephanie Sy (Thinking Machines)

 

Stephanie Sy is the founder of Thinking Machines, a data science and data engineering team based in the Philippines. She brings to the jury her expertise in data science, engineering and storytelling.

Yolanda Ma is the co-founder of Data Journalism China, one of the best educational platforms for data journalism practitioners in China. Not only representing the biggest country in Asia, she also has experience teaching data skills to journalists and a great knowledge of data journalism from her region.

Esra Dogramaci has now joined Deutsche Welle and formerly worked with the BBC, Al Jazeera in Qatar and Turkey, as well as the UN Headquarters and UNICEF. She brings to the DJA jury significant experience in digital transformation across news and current affairs, particularly in social video and off platform growth and development.

 


The Data Journalism Awards are the first international awards recognising outstanding work in the field of data journalism worldwide. Started in 2012, the competition is organised by the Global Editors Network, with support from the Google News Lab, the John S. and James L. Knight Foundation, and in partnership with Chartbeat. More info about cash prizes, categories and more, can be found on the DJA 2017 website.


marianne-bouchart

Marianne Bouchart is the founder and director of HEI-DA, a nonprofit organisation promoting news innovation, the future of data journalism and open data. She runs data journalism programmes in various regions around the world as well as HEI-DA’s Sensor Journalism Toolkit project and manages the Data Journalism Awards competition.

Before launching HEI-DA, Marianne spent 10 years in London where she worked as a web producer, data journalism and graphics editor for Bloomberg News, amongst others. She created the Data Journalism Blog in 2011 and gives lectures at journalism schools, in the UK and in France.