Data-driven journalism and structuring information at Circa
The assumed output of a reporter is the “article.” That’s what reporters are supposed to produce during their work day, and it’s the default unit by which journalists organize their data. There’s plenty of information in the text that’s produced, but how much of that information is structured? In a typical content management system (CMS) you’ll find a headline field, a main text field, information about the article’s creator, a date of its creation and maybe a field for some meta-tags — usually basic nouns — included as an afterthought, often for SEO purposes.The value of journalism comes from filtering things out of the flow of information and serving them up to readers. But those basic fields in the CMS fail to capture a lot of the value of information invested in the reporting process. If you asked a reporter about the information in an article you’d get specifics: It contains a quote from the mayor, some statistics about government spending, the announcement of a new zoning permit, a description of a local event, and so on. But that information is adrift inside the main unit of the article — without structure it’s lost, except for the ability to search for a string of words in Google.
At Circa we do things differently. The process of creating a story requires the writer to tag information in a structured way. If we insert a quote, we have two extra fields for the name of the person quoted and an alias — their working title. As a result, I can ask our chief technology officer to search our database for all the quotes we have from, say, Eric Holder. I can also ask to have that search refined by date(s) or topics: “Give me all the Eric Holder quotes from the last six months that are associated with the IRS. Also, I’d like all the aliases we’ve used for him.”
In a newsroom where data is unstructured this task would be incredibly time-consuming if not impossible. But because our content is structured, at Circa it’s simply a matter of asking.
The CMS or platform that a news organization uses to create content isn’t neutral. Decisions made in building or configuring that CMS define the way news is displayed later. If an input field for the “location” of an event doesn’t exist, then the only way to surface all events that took place at a specific location is to conduct a painstaking search through the blobs of words that exist in the main content field of articles.
The job of a reporter is to collect, filter, organize and then deliver information. Shouldn’t a CMS capture the level of detail that we invest in that process from the start?
Why do we always invoke the idea of narrative structure over structured data?
Here’s something Ezra Klein wrote in discussing his move to his new venture at Vox: “The software newsrooms have adopted in the digital age has too often reinforced a workflow built around the old medium. We’ve made the news faster, more beautiful, and more accessible. But in doing we’ve carried the constraints of an old technology over to a new one.” As Steve Buttry leads “Project Unbolt,” I suspect one of the barriers Digital First Media will need to confront is that their CMS is designed to produce articles, an increasingly arcane manner of structuring information.
Data-driven journalism is, of course, a growing movement. The best-understood example of data-driven journalism is the crime map: we collect the location/type of crimes and then overlay that information on a map. Because there’s structure to the information, we can surface greater meaning from it.
Maybe a use for tagging at the paragraph level. Paratags is what I called this function in my Parula code that powers ToledoTalk.com. But in this code, Junco, I can place hashtags anywhere. And maybe these hashtags should display within the article for the user but at a slightly smaller and subdued font.
By JR
- 671 words
created:
- updated:
source
- versions
- backlinks
Related articles
Changing newsrooms from a print culture to a digital culture - Mar 03, 2014
New media relies on new technology - Apr 27, 2014
Ohio is fat and oppressed - 2009 - Mar 28, 2014
JotHut uses - Nov 12, 2014
The homepage is dead - May 21, 2014
more >>