Healthcare Data Mining for Safety
Expanding Horizons for Pharmacovigilance
Background
Traditional mainstream drug safety practices in the pharmaceutical industry rely on the collection and analysis of safety data from clinical trials and passive surveillance based on the voluntary reporting of spontaneous adverse events (AEs) for approved products. Analysis of trial data and spontaneous AE reports provides essential information for conducting pharmacovigilance, yet these data sources have well-known limitations. As a result, there has recently been a groundswell of interest in the analysis of large collections of electronic healthcare data (i.e., electronic medical records and insurance claims data) to support pharmacovigilance processes.
Electronic healthcare data can provide the opportunity to employ data mining techniques, thus extracting a potentially valuable secondary use from a large body of data originally created for the purpose of supporting healthcare operations (e.g., administration, billing, clinical charting, etc.). The term Healthcare Data Mining for Safety (HDMS) refers to the use of available electronic healthcare data to support drug safety surveillance.
Much of the current interest in HDMS can be traced to the landmark FDA Amendments Act (FDAAA) of 2007, which directs the FDA to implement active adverse event surveillance methods using large-scale observational data sources from the public and private sectors. More specifically, the legislation mandates the creation of “…a postmarket risk identification and analysis system to link and analyze safety data from multiple sources, with the goals of including, in aggregate—at least 25,000,000 patients by July 1, 2010; and at least 100,000,000 patients by July 1, 2012…” FDA CDER Director Dr. Janet Woodcock, in testimony before Congress, described the FDA’s plans for using healthcare data:
-
FDA is currently exploring, testing, and developing new methods of
signal detection, data mining, and analysis of patient-level electronic
healthcare data. These new methods will complement our existing passive
postmarket surveillance system by generating hypotheses about and
confirming the existence and cause of safety problems.
Using Electronic Healthcare Data for Assessing Product Safety
Figure 1 summarizes the strengths and weaknesses of the three major data sources for assessing product safety. Spontaneous AE reports allow for easy identification of adverse events using an efficient, timely reporting system but are handicapped by significant underreporting. Controlled clinical trials provide high-quality data, but only for small patient populations, and do not adequately assess the risks of long-term exposure. Data from electronic healthcare records may enable studying the effects of a drug on large patient populations but requires extensive interpretation, cleaning, coding and transformations of data so that it can be effectively applied to analysis and the development of new tools optimized for working with healthcare data.
Figure 1: Main Sources of Safety-related Data

Click to enlarge image
In all three cases, robust data mining and analytical tools can be extremely helpful to improve understanding of the broader safety profile of therapeutic products. A long-term vision is to be able to leverage the complementary benefits of all three sources of data as part of a comprehensive workbench that will meet the rapidly expanding safety assessment needs of the 21st century safety professional.
Phase Forward's Healthcare Data Mining for Safety Initiative
Phase Forward is presently developing HDMS software to extend the capabilities of its Empirica Safety suite of products to support mining healthcare data. These ongoing efforts are taking place in collaboration with government agencies interested in monitoring product safety, including the FDA, CDC and the Office of the Surgeon General of the Department of Defense.
An example of the use of the present HDMS prototype is shown in the screen shot below:
In this figure, data mining signal scores are shown for several drugs and the ICD9-coded diagnosis, "7948:Non-Specific Abnormal Results of Function Study of Liver." By clicking on the number of cases for Isoniazid (118), a list of cases is shown along with drilldown to a graphical patient profile display of one of the cases.
What HDMS Might Mean For You
While Phase Forward's HDMS efforts are still under development, this initiative does have several important implications for our customers:
- Customers can be assured that Phase Forward is extremely active in this important area and is collaborating closely with government health agencies in developing and testing new pharmacovigilance methods.
- Customers should anticipate that in the future as Phase Forward continues solution development, new safety products will be made available.
- Customers will have an opportunity to participate and provide input into Phase Forward's activities in this area.

