Natural Language Generation Technology is Having an Increasing Impact on Journalism


By law, public companies in the United States release corporate earnings every quarter.  By tradition, reporters work through those reports each quarter to extract financial data for business news articles. There are two fundamental limitations of this approach.  First, there are currently more than 4,000 publicly traded companies in the U.S. and almost 40,000 around the world. With its existing resources, even an organization as big as the Associated Press (AP) could only produce about 300 such stories in a timely fashion per quarter – and that left thousands of potential company earnings stories unwritten.

Second, even though these stories are relatively easy to write, they involve a lot of repetitive work copying and pasting numbers – something even trained reporters find difficult to do with perfect accuracy.  It also forces highly skilled, highly trained reporters to spend their time working on formulaic stories when their talents as reporters could have been put to better use elsewhere.

Earnings reports are not the only formulaic stories in media. Most sports journalism centers on scores and other game statistics. In American college football alone, 128 teams in the NCAA Football Bowl Subdivision (FBS) play an average of 12 games in each regular season, for a total of 1,536 regular games, not including semifinal and championship games. And this is only one part of Division 1.  Add other football divisions and other NCAA sports such as basketball, baseball, and hockey. Add the different levels the game is played besides college, including the professional, masters, high school, and youth leagues. Add teams and sports from around the world. Finally, add fantasy football and other synthetic sport leagues, and the number of potential games to cover is enormous.  Interest in these types of stories is high, but with so much to write about, most games go unreported.

To help produce earnings reports more quickly, comprehensively, and accurately, the AP utilized natural language generation (NLG) technology from Durham, North Carolina-based Automated Insights to automate the writing of stories that share the same formula, such as earnings reports – while ensuring that each one is up to AP’s exacting standards. Following a successful initial project with earnings reports, the AP extended its use of Automated Insights’ NLG platform to include sports reporting.

Automated Insights is the creator of Wordsmith, an NLG tool that turns raw data into human-friendly prose and allows data to be adapted into news stories and reports. The structured data can be based on inputs of both numerical data and text data, including proper nouns or search engine optimization (SEO) keywords. The structured data, in the form of rows and columns, is sent to Wordsmith, which matches the data to a structured narrative that allows for differences in the story. The specific data analysis and writing style of each narrative vary depending on the specific case, and are determined in advance by the person designing the narrative. Some of these differences are related to color and tone; for example, losses can be described in negative or even pejorative terms, and gains can be described in positive or even glowing terms. Wordsmith chooses the proper text based on programmed logic. The result is that hundreds or thousands of stories can be turned out in rapid succession, and each story is different.

The AP saw so much value in using Wordsmith that it invested in Automated Insights in 2014, prior to the company’s acquisition by Vista Equity Partners and STATS in February 2015.

Automating repetitive writing tasks is as cost effective as automating other repetitive tasks. For the media, financial sector, and e-commerce companies, for example, NLG allows staff to concentrate on tasks that require human participation, such as sales and customer service. NLG allows data to be disseminated at incredibly high speeds, eliminates errors caused by human boredom and fatigue, and can be customized in different ways to make the story fresh and less mechanical. Creating a large volume of articles on variety of obscure events with a high degree of customization becomes cost effective.

Staff accustomed to the stress and boredom of producing routine reports on a short timeline also benefit from being able to focus on more satisfying projects. Customers benefit by receiving information tailored to their needs and expectations. With NLG, it’s possible to produce content on a massive scale where each individual piece is personalized to a specific user.

A number of challenges remain for the wider adoption of NLG, however.  Data needs to be accurate. Data need to be structured – it must relate to the copy that will be produced. It is essential that the narrative structure be designed correctly to produce the desired content for the targeted recipient.

Writing in general, and sports writing in particular, is something that people like to do. Cultural resistance to NLG writing sports stories that people have historically done is fairly high. Resistance to NLG writing stories that people have traditionally not done, such as SEO-related content, is much lower.

Comments are closed.