In the modern information age, written news content comes to readers at rates never before seen in history. Today, the public can go online even as a newsworthy event is still unfolding and read high quality, accurate coverage. However, as events constantly evolve, so do the narratives used to convey information. New people and locations can appear in follow up stories, while inital players fade from the ongoing coverage. Stories that are initially reported as one thing can become something else entirely by the time coverage ends…
Track changing key stories overtime of businesses, people, or other entities important to strategy or national security
Analyze evolution of diaster and assess where relief efforts were adequate and where relief operations could be improved
Compare a story’s various narratives and threads to pinpoint where misinformation entered a storyline
This project had the following main steps:
1. Scrape real news articles from data source (Guardian API, 1/1/16-3/32021; Politics, US News, UK News, Global Development, World News)
2. Aggregate the articles into clusters (used combination of key word importance measures,
BM25F, & Infomap graph clustering algorithm)
3. Perform Event Extraction (GiveMe5W1H)
4. Named Entity Recognition (Spacy's NER Pipeline, used pipeline for People, Geopolitical Entities, Organizations Locations)
5. Convert Features returned by extraction techniques into networks
6. Graph those networks to find how the important element of stories change over time