DAWN MCINTOSH

Member since: Oct 21, 2010, NASA Ames Research Center

Recurring Anomaly Detection System (ReADS)

An algorithm shared by DAWN MCINTOSH, updated on Sep 10, 2010

Summary

Overview:

ReADS can analyze text reports, such as aviation reports and problem or maintenance records. ReADS uses text clustering algorithms to group loosely related reports and documents, this reduces human error and fatigue. Plus, ReADS identifies interconnected reports; automating the discovery of possible recurring anomalies. ReADS provides a visualization of the clusters and recurring anomalies. ReADS has been integrated into a secure web-based search tool to allow uses to perform their own text mining.

Recurring Anomaly Identification

ReADS identifies reports which mention other reports as a recurring anomaly using regular expressions to search documents and identify references of other reports by name. ReADS also detects recurring anomalies by determining the similarity between documents using a cosine distance similarity measure. Then according to the similarity measure, ReADS will run a hierarchical clustering algorithm to detect the recurring anomalies. The hierarchical tree is partitioned into clusters by setting a threshold. A low threshold implies that the reports must be very similar to be sorted into the same cluster.

Here's more info.

The figure below is a screenshot of the clustering results.

show more info