Skip to main content
Clearlead AI Consulting
All case studies

Media Analysis and Topic Tracking Platform

A large US research institute needed an automated way to trace how known individuals appear across a wide variety of news outlets. Tracking not just the articles where they are mentioned, but also the opinions that are being expressed within those articles.

Analytics dashboard on a tablet showing visitor trends and traffic sources
  • 6 years

    of coverage in the monitored archive

  • 500+

    distinct media sources worldwide

  • 300+

    named individuals in the tracking set

The client

  • Large research institute
  • Media intelligence
  • United States

Overview

The client carries out corporate-affairs and political research in the United States. They needed reliable tracking of named executives and organisations across six years of global news coverage.

Clearlead designed and built an NLP pipeline for entity resolution, disambiguation, and topic tracking at scale.

The result is a production system that turns raw articles into structured mention and topic records analysts can use in briefing workflows.

Methods

  • NLP
  • Entity extraction
  • Topic modelling

Engagement type

  • Design and build

Duration

4 weeks

The situation

The client carries out political and corporate-affairs research in the United States. They required a robust method of matching a set of 300+ individuals of interest across a dataset that spanned over six years from over 500 global news outlets. In addition to tracking the individuals being mentioned, it was also necessary to identify if a set of pre-ordained set of topics of interest were also being discussed, and if they were being discussed in reation to these individuals or not.

It was clear that the task was not something that was feasible for human reviews to complete, due to the scale of the data that would be necessary to review (and since new articles would also need to be considered).

They had attempted parts of the work in house. However, because people appear under subtly different name forms, unrelated people can share a name, and keyword-style matching does not allow for a high level of accuracy in either identifying the individuals or in tracking the topics of interest in the articles.

It was clear that they required a more sophisticated mechanism to perform this matching and tracking, that can work at scale.

The engagement

Clearlead led the technical design and build with a staged approach, the aim was a solution that could scale with the archive.

  • Ingestion and normalisation so article text could be processed at volume without losing source metadata needed for outlet- and time-level comparisons.

  • Entity- and topic-level NLP covering executive and organisation mentions, semantic signals for public-issue themes, and disambiguation. This was the key aspect to the solution and required significant experimentation with variety of the latest techniques in Natural Language Processing (NLP) and benchmarking against a significant set of human-reviewd documents.

  • Reporting and visualisation aligned to briefing workflows: trends by time, publication, and issue category.

  • Analyst-controlled configuration for entities, themes, and match behaviour so the research agenda could evolve without a full platform rebuild.

Outcomes

  • A robust system to go from raw articles to structured topic and mention records.
  • Analysts could explore how executive-related coverage clusters around defined public issues and how that mix shifts over time.
  • A system that has the flexiblity to adjust to new individuals of interest, new topics, as well as new data sources, which an analyst can add via a configuration file.

Similar applications

The core techniques apply wherever structured insight needs to be drawn from large, unstructured text collections across many sources over time.

  • Media monitoring

    Tracing how named executives, organisations, or brands appear across outlets and how the issues discussed alongside them shift over time.

  • Market and competitive intelligence

    Aggregating how competitors, products, and sector themes are discussed across news, trade press, and other curated source collections.

  • Policy and public affairs research

    Measuring how institutions, officials, legislation, or defined thematic priorities show up in public discourse when manual review of every source is not practical.

  • Longitudinal framing analysis

    Tracking how the narrative around a topic, initiative, or organisation shifts over time, distinguishing changes in volume from changes in how something is being discussed.

  • Entity disambiguation across large corpora

    Any collection where the same person, organisation, or concept appears under different names or spellings, and where meaningful analysis requires resolving those references to a consistent identity.

Discuss a similar engagement

Book a free 30-minute call. We will answer questions directly and say no when something is not a fit.

Book a free call