Financial markets lie at the heart of macro- and microeconomic prosperity. The lifeblood of financial markets is information and the vast majority of financially-relevant information involves textual content of some form (published annual reports, press releases, web pages, analysts’ research, regulatory intervention, and financial media).

Textual content can range from a few words – such as a newspaper headline or a Tweet – to hundreds of pages: Barclays December 2013 Annual Report & Accounts comprises 436 pages, the majority of which is text. The volume of data is vast; it is also expanding rapidly in response to increasing business complexity, the growing importance of intangible assets, evolving regulations, and new communication methods (e.g., social media). Resource allocation decisions, economic competitiveness and social welfare are increasingly reliant on financial market participants’ ability to understand and analyse these data.

Translating qualitative information contained in financial narratives and commentaries into quantitative outputs such as earnings forecasts, investment recommendations, economic and social impact assessments, etc. has always posed a challenge for users and in many scenarios there may be a temptation to discount this content in favour of hard numbers. Emerging evidence, however, suggests that this is a dangerous strategy.

Researchers, investment professionals, and regulators interested in understanding the role of narrative information in financial markets are increasingly turning to techniques from computer science and corpus linguistics to help analyze these data. The basic approach involves developing computer algorithms that allow users to harvest large samples of textual information from various sources (like annual reports, web sites, social media) and then measure a range of key properties including readability, general tone (positive or negative), primary themes, extent of uncertainty, etc. These properties may then be used in a variety of applications such as predicting future performance or provide a red-flag in relation to current activities.

The emerging evidence suggests that financial narratives contain material information above and beyond that contained in stock prices and accounting data. For example, research shows that negative words in financial news stories convey negative information about the corresponding firm’s earnings above and beyond stock analysts’ forecasts and historical accounting data, and that this can be used to predict share price over short intervals such as 24 hours. More recent research finds that views expressed in articles and commentaries on the specialist social media platform SeekingAlpha.com are also useful for predicting future stock returns and earnings surprises.

Other research teams have focused on studying whether financial narratives help to identify deceptive or fraudulent behavior by senior management, with promising results. For example, the language used by management of U.S. listed firms in their written performance commentaries (e.g., the Management Discussion & Analysis section of form 10-K) has been shown to discriminate between truthful and fraudulent reports, even after taking financial statement information into account. Similarly, the words management use during conference calls with financial analysts can also help to identify violations of accounting rules and other questionable reporting practices.

These promising results have promoted regulators to explore the benefits of using similar techniques as part of their market surveillance methods. The U.S. Securities and Exchange Commission (SEC), for example, is now using linguistic tools as a means of screening registrants’ financial statement data as a means of identifying potential fraud cases. Traditionally the SEC employed models based solely on quantitative data to flag potential violators but with only limited success. Recent developments incorporating information extracted from firms’ qualitative disclosures suggest significant improvement in detection rates is possible.

An important factor affecting the extent to which these automated methods of textual analysis can be applied to financial narratives is the form and format in which such information is presented to financial market users. In the regard the U.S. is well-ahead of most other jurisdictions (at least where company-produced information is concerned) as a result of the SEC’s Electronic Data Gathering, Analysis and Retrieval (EDGAR) system. EDGAR is specifically designed facilitate automated downloading and analysis of vast datasets of company financial information.

In contrast, financial reporting procedures in many other countries create significant barriers to analyzing textual information. Take for example the U.K., where companies’ annual reports are published as PDF documents. Not only do these documents make direct, automated access to text problematic, the lack of any common structure makes it very difficult to identify and analyze consistent disclosures across different firms or for the same firm over time. Recently, however, a team of researchers from the universities of Lancaster and Manchester (funded by the Economic and Social Research Council) have developed a freely available web-based software tool designed to analyze large samples of U.K. annual report content (http://ucrel.lancs.ac.uk/cfie/). Developments such as this are helping to improve the accessibly of textual information and unlock its potential as a powerful investment resource.

By Professor Steve Young, Lancaster University Management School, www.lancaster.ac.uk/lums

Related Articles