Washington, D.C. — Disease statistics buried within patient records or detailed in newspaper clippings can be sorted and organized to depict geographic patterns, allowing the discovery of trends that were previously overlooked, according to a Penn State geographer.
“The use of interactive maps and graphs, combined with word search interfaces, can lead to greater insight into complex events like the spread of Swine flu,” said Frank Hardisty, research associate, Penn State GeoVISTA Center.
The GeoViz Toolkit is a user-friendly application that combines text mining with geographical mapping. It allows users to search publically available data to identify and visualize data patterns for their own interests or concerns.
The flexible software package allows someone with no programming experience to navigate the application, while also providing different components and analytical tools for experienced analysts.
“Potential applications range from research in public health — infectious disease dynamics, cancer etiology, surveillance and control — through analysis of socioeconomic and demographic data, to exploration of patterns of incidents related to terrorism or crime,” said Hardisty.
Many sources for disease and crime statistics — newspaper articles for example — are in a semi-structured format that do not clearly present the data in a table or graph, but rather bury it within the text of the document.
To obtain high-quality, relevant information from these documents, researchers use “text analytics” or ‘”text mining,” allowing them to retrieve only applicable information, like the date and description of a disease-related death, from the flood of information usually included in a newspaper clipping.
“An example would be searching a database of H1N1 flu reports for ‘child’ or ‘children’ and seeing if there is spatial clustering in the relative frequency of those reports,” Hardisty told attendees today (April 15) at the 2010 Association of American Geographers Annual Meeting in Washington, D.C.
H1N1 data, provided by RhizaLabs, was used in a GeoViz query. Reports containing “child” or similar terms were mapped, with areas containing a high frequency of children cases highlighted. In general, areas with low population density exhibited a higher proportion of cases containing the search term.
“The hypothesis that this evokes is that rural states have proportionally more transmissions via children, while more densely populated places are more likely to experience other vectors of transmission,” said Hardisty.
The GeoViz application allows users to easily manipulate the software to change time and location, as well as how the data is viewed. The user can thus visualize the pattern of how the disease spreads and determine how quickly it progresses from one area to the next.
Visual geographic analysis can identify locations that are more or less susceptible to certain disease, crime, or weather patterns and researchers might link these occurrences with a cause or trigger. Using the GeoViz Toolkit could contribute to how people respond to or prevent these incidents.
“First, GeoViz methods can help first responders gain better situational awareness. Second, a better retroactive understanding of clustered patterns like disease incidence and public security incidents will lead to the development of effective control measures,” concluded Hardisty.
The Department of Homeland Security’s VACCINE initiative and the Gates Foundation Vaccine Modeling Initiative supported this work. The GeoViz Toolkit was developed under the leadership of Alan MacEachren, Director, GeoVISTA Center, Penn State. For complete list of contributors visit: http://code.google.com/p/geoviz/wiki/GeoVizToolkitContributors.