FDA seems to be recognizing that it is behind in its technology
to handle data. Therefore, it organized a meeting with representatives from industry
and some external advisors. Due to the COVID-19 crisis, the meeting, originally
envisaged to be public, was held virtual on June 30, 2020. The complete recording (4 hours and 47 minutes!) is however available.
I watched it all and made a lot of notes. Here is my (biased) summary.
I watched it all and made a lot of notes. Here is my (biased) summary.
The meeting consisted of three main parts, each with a
presentation (from someone at the FDA), followed by a strongly managed panel
discussion, moderated by Cliff Goodman, leaving little opportunity for real
open discussion. My own impression was that none of the members of the panels
was able, or willing to really mention or discuss the pain points in the data
handling process at the FDA.
Before the first session, there was a short introduction
by Allison Hoffman, Senior Health Science Advisor at FDA, followed by a
presentation by Amy Abernethy, Principal Deputy Commissioner, acting CIO
at FDA. A few statements from my notes:
- "FDA efficient and modern as possible"
- "We can either be enabler, or bottleneck"
Mentioned were "use of cloud technologies", "share interoperable data", "not PDFs as digital paper", "structured information", "artificial intelligence", "blockchain for track-and-trace of drugs and food". No real strategies or measures however were mentioned how to get there.
She also introduced Vid Desai (Chief Technology Officer - CTO) and Ram Iyer (Chief Data Officer - CDO), two new persons and functions at the FDA.
Some notes I made:
- "FDA efficient and modern as possible"
- "We can either be enabler, or bottleneck"
Mentioned were "use of cloud technologies", "share interoperable data", "not PDFs as digital paper", "structured information", "artificial intelligence", "blockchain for track-and-trace of drugs and food". No real strategies or measures however were mentioned how to get there.
She also introduced Vid Desai (Chief Technology Officer - CTO) and Ram Iyer (Chief Data Officer - CDO), two new persons and functions at the FDA.
Some notes I made:
Vid Desai:
- "Data source, data is very distributed"
- "We can no longer assume, that data will be submitted to us and residing by the FDA…"
- "Rethink technologies and processes, need to invest in people and culture"
Ram Iyer
- "Two months in the agency - so more questions than answers"
- "Amount, complexity and variety of data, … is going to be growing exponentially"
- "New technologies like AI … require data management skills that we have not invested in"
- "Data source, data is very distributed"
- "We can no longer assume, that data will be submitted to us and residing by the FDA…"
- "Rethink technologies and processes, need to invest in people and culture"
Ram Iyer
- "Two months in the agency - so more questions than answers"
- "Amount, complexity and variety of data, … is going to be growing exponentially"
- "New technologies like AI … require data management skills that we have not invested in"
The first session was on "Data Sharing".
A presentation was given by Mary Ann Slack (CDER, Office of Strategic Programs),
presenting a "use case" on "FDA's Opioids Data Warehouse and
Beyond" where data from many (44) different sources is brought together (a
topic that will come back over and over again). The goal is to come to a
cloud-based, secure environment (data lake), using APIs, that supports the
analytical needs of the FDA. How to get there was not explained.
The panel members were then introduced:
- Deven Mc Grew, Ciitizen (start-up), privacy specialist
- Craig Taylor – FDA – data security
- Tim Williams – representing Phuse - UCB – semantic web, linked data
- Jose Arrieta – HHS CIO
- Mike Fleckenstein – MITRE – data warehouses, architecture
- Deven Mc Grew, Ciitizen (start-up), privacy specialist
- Craig Taylor – FDA – data security
- Tim Williams – representing Phuse - UCB – semantic web, linked data
- Jose Arrieta – HHS CIO
- Mike Fleckenstein – MITRE – data warehouses, architecture
The following panel discussions then were not so
interesting, at least not for me, no new information … IMO it doesn't make sense discussing whether one should
explain how data lakes work to the general public … or whether availability and data protection can go
together at all (of course it can).
The second session was on Data Exchange
It started with a presentation of Meredith Chuck (FDA,
Oncology, CDER) on "Replacement of PDFs by real electronic (structured
data) for Safety Reports". FDA is getting 40,000-50,000 such reports /
year, all as PDFs, which is of course extremely difficult to handle. She announced
that the new process will be based on sponsor-generated XML "files" with
structured data and submitted through the FDA Gateway into FAERS (FDA-AE
reporting system). They are also exploring APIs.
Such a project is of course highly desirable, but isn't 10
years too late? The whole world is using APIs, RESTful web services and XML or
JSON for data exchange, but FDA is still using … PDFs.
Also, when we talk (but the FDA doesn't) about data exchange for clinical studies,
the current "show stopper" for innovation is surely the 30-year old SAS
Transport 5 ("XPT") format. SAS Transport 5 was however not mentioned
at all in any of the presentations nor in any of the discussions. How does the
FDA want to start using cloud, AI and ML, bringing data together from different
sources when it refuses to get rid of that legacy format that is not used by
any other industry anymore? Just as a very simple example: with XPT format it
is not possible to combine an SDTM record with the source data (e.g. an EHR-FHIR resource).
Simply switching to XML (without changing anything else), would already make
this possible.
The panel members then (additionally) were:
- Mark Bach – J&J
- Danica Marinac-Dabic – FDA/CDRH
- Jeff Allen – Friends of Cancer Research
- Don Rucker - ONC
- Mark Bach – J&J
- Danica Marinac-Dabic – FDA/CDRH
- Jeff Allen – Friends of Cancer Research
- Don Rucker - ONC
The panel "discussion" was then moderated as a
"Q&A" session, which leaves little room for real discussions. One
of the questions was "Clarify 'operational data versus reported data' and
the relation to EHRs?". A few notes I made:
- Don Rucker: "In the world of EHRs, APIs changed everything. "Point in time" data [i.e. real time data] is also important to the FDA".
- Jonathan Shough: "... not only higher volume of data but also multiple sources of data"
- Don Rucker: "In the world of EHRs, APIs changed everything. "Point in time" data [i.e. real time data] is also important to the FDA".
- Jonathan Shough: "... not only higher volume of data but also multiple sources of data"
Interesting for me with regard to this is that FDA, at least
for clinical data, does not have any "point in time" data at all.
All data (SDTM, SEND, ADaM) is submitted up to years after the clinical data was collected.
When, a few years ago, I proposed the "1 MB submission", where
sponsors submit e.g. a define.xml even before the study starts, defining what
will be submitted, and providing the API end point details with which FDA can
pull the already collected data (in this case categorized into SDTM/SEND) at any
time during the study, I was declared to be crazy by very many in the
clinical research world. Such a mechanism would not be very difficult to
implement, but it would require a change in mindset. With SAS Transport 5, it
don't think it would work either, as it is not very RESTful web services friendly (I will have a try next weeks).
Another question from the moderator, this time to Jeff
Allen was "What has been changing in the role of real-world data?".
A few notes I made from the answer of Jeff Allen:
- "Aggregation and curation allows the use in research upfront"
- "Data from EHRs is not meant to immediately replace clinical trials, but …augments that information. Challenge is still how to layer these different datasets together"
A few notes I made from the answer of Jeff Allen:
- "Aggregation and curation allows the use in research upfront"
- "Data from EHRs is not meant to immediately replace clinical trials, but …augments that information. Challenge is still how to layer these different datasets together"
With respect to layering the data together, I suspect that
one of the main reasons mandated the use of LOINC codes in SDTM-LB is exactly
this. "Real world lab data" all come with LOINC codes, and until
recently, most sponsors did not submit the LOINC code as LBLOINC as SDTM, meaning
it was impossible to compare submitted SDTM-LB data with lab data from e.g.
electronic health records.
"Real World Data" however also comes with LOINC coding for many other domains. It uses a lot of SNOMED-CT, ICD-10 and other healthcare codes, none of which is used or supported by CDISC.
"Real World Data" however also comes with LOINC coding for many other domains. It uses a lot of SNOMED-CT, ICD-10 and other healthcare codes, none of which is used or supported by CDISC.
By the way, it was also surprising that in the almost 5
hours recording, the word "CDISC" was not used a single time, but
neither were other organizations in the area of standardization. Were the panel
members forbidden to mention them, or have these be found not sufficiently important
to be mentioned? A bit frightening anyway ...
A wording that came back over and over again was
"rolling data flow". For EHRs, this essentially is pretty easy to implement,
data privacy being very important, and to be taken care of.
For clinical research, see my remark about the "1 MB submission".
Also interesting was the remark from Jeff Allen: "The perfect can become the enemy of the good".
Remember that HL7-FHIR is an 80/20 solution, not trying to "boil the ocean". Sometimes I do have the impression with ever new versions of SDTM, CDISC is trying to "cover every possible use case", "patching" the shortcomings of earlier versions.
With respect to this, one remark from Jonathan Shough, about giving subjects back their own results of the study they participated in, was interesting. I presume that SDTM is not the right solution (and especially not when using XPT format) for such, as it cannot be combined with the EHRs the patients already have access to (usually as HL7-FHIR).
For clinical research, see my remark about the "1 MB submission".
Also interesting was the remark from Jeff Allen: "The perfect can become the enemy of the good".
Remember that HL7-FHIR is an 80/20 solution, not trying to "boil the ocean". Sometimes I do have the impression with ever new versions of SDTM, CDISC is trying to "cover every possible use case", "patching" the shortcomings of earlier versions.
With respect to this, one remark from Jonathan Shough, about giving subjects back their own results of the study they participated in, was interesting. I presume that SDTM is not the right solution (and especially not when using XPT format) for such, as it cannot be combined with the EHRs the patients already have access to (usually as HL7-FHIR).
Last question in the panel discussion was: "If FDA has
data exchange right in 2025, what would this be?". One answer was
"operational data instead of documents", another "combining data
from different sources".
The third session was on Data Usage.
The session started with a presentation of Don Prater (FDA Office
of Food Policy and Response) on a pilot using AI-ML for imported food screening,
with shipping information as the primary data source. The current system is
named PREDICT (Predictive Risk-based Evaluation for Dynamic Import Compliance
Targeting), and is a "rules-based" system. The idea is to move more
to AI for predictive analysis, using as well structured as unstructured data,
and to connect it to traceability systems. A proof of concept was started April
last year, feeding a ML system with historical data from the rule-based system.
Typical issues and limitations using AI were encountered (like bias). The next
envisaged step is an "operational field pilot". Numbers about how the
AI based system performs better than the classic PREDICT system were however
not revealed.
The panel then consisted of:
- Don Prater (FDA)
- Isaac Kohane (Harvard)
- Frank Yiannas (FDA)
- Andrea Corvos (Elektra Labs, ex-FDA)
- Joe Goodgame (Remarque Systems)
- Ram Iyer (CDO at FDA)
- Peter Lee (Microsoft – ML data science)
- Don Prater (FDA)
- Isaac Kohane (Harvard)
- Frank Yiannas (FDA)
- Andrea Corvos (Elektra Labs, ex-FDA)
- Joe Goodgame (Remarque Systems)
- Ram Iyer (CDO at FDA)
- Peter Lee (Microsoft – ML data science)
Topics were blockchain, ML and AI, structured and unstructured
data and Electronic Medical Records (EMRs). Comparisons were made with Uber, Amazon
, nuclear plant monitoring.
Remark that none of these is using SAS Transport 5 – just imagine you would need to download the Amazon product catalog as SAS Transport 5 …
One statement from Danica Marinac-Dabic is worth mentioning "if we are split in our silos, like of clinical trials are using vocabularies that is different from other things … it is really hard to make the data work across …".
In my opinion, it is high time that CDISC starts recognizing and using healthcare vocabularies from the healthcare world like SNOMED-CT, LOINC, UCUM for units, etc. "Mapping" as was done for a small part of LOINC is just a mock solution.
Remark that none of these is using SAS Transport 5 – just imagine you would need to download the Amazon product catalog as SAS Transport 5 …
One statement from Danica Marinac-Dabic is worth mentioning "if we are split in our silos, like of clinical trials are using vocabularies that is different from other things … it is really hard to make the data work across …".
In my opinion, it is high time that CDISC starts recognizing and using healthcare vocabularies from the healthcare world like SNOMED-CT, LOINC, UCUM for units, etc. "Mapping" as was done for a small part of LOINC is just a mock solution.
At the end of the panel discussion, each of the panel members
was asked to answer the question "how can FDA lead the way in putting data
to use", in one sentence. Here are a few answers:
- Joe Goodgame: "requesting data from sponsors earlier- not wait until 6 months after start of the trial" [P.S. Jozef: in reality, it is not 6 months, it is years]
- Ram Iyer: "Act as an orchestrator, act as a lighthouse"
- Frank Yiannas: "Improve on traceability"
- Peter Lee: Embrace interoperability standards
- Joe Goodgame: "requesting data from sponsors earlier- not wait until 6 months after start of the trial" [P.S. Jozef: in reality, it is not 6 months, it is years]
- Ram Iyer: "Act as an orchestrator, act as a lighthouse"
- Frank Yiannas: "Improve on traceability"
- Peter Lee: Embrace interoperability standards
It was nice following this meeting, but also pretty boring. Just as a manifestation of the current status of IT at the FDA: a transcript of the meeting is currently (3 weeks after the meeting) still not available, although there is very good software for that on the market.
One of the statements of the moderator Cliff Goodman
near the end shocked me a bit. He stated that "we have been discussing a
lot of tools", but essentially not a single tool was discussed at all. What
was discussed are technologies, not tools.
The meeting was then finalized with some closing remarks of Amy
Abernethy, also presenting some results of the different "Precision
FDA Data Challenges", where externals are challenged to provide innovative
solutions for problems that occur at the FDA. That is of course fine, but has
the danger that all that remain are island solutions, and that no real knowledge
transfer to FDA takes place. The example she gave was essentially not about
data innovation, it was about some new (at least for the FDA) developed statistical
methods.
My conclusions
This meeting was essentially NOT about "strategy",
it was about "vision". FDA (as a data-driven organization) is now starting
to develop a vision about what it wants to improve on in future regarding data. That's a good thing. It sees promises in cloud, blockchain, ML and AI, APIs for
"rolling data". The meeting did not give us any information about HOW
it wants to try to modernize (the strategy).
The reality at the FDA is still that it is mostly about
"documents", in very many cases PDFs, or as "files" such as
SAS Transport files. Modern methods such as APIs, RESTful web services,
"rolling data" are all still mostly unknown at the FDA, and surely
not generally implemented.
The meeting did not discuss at all how FDA wants to clean up
what I call the "show stoppers for modernization": SAS Transport 5,
use of memory sticks or file servers (at the best) for storage and transport of
data, outdated equipment and software, each reviewer using his/her own (often
outdated) tools. It did not make any statements on budgets for modernization,
about empowerment of the new CTO and CDO. What can they achieve? The history
shows us that such people can only do something "when asked" by a department.
Is there a real will for modernization at the departments, or does it depend on
some "island innovators" within them?
In the meeting, the really difficult questions were not asked
and not discussed. Everybody was very nice and very polite to the FDA. But we are used of
that. One can however turn an old building into a modern one without breaking down good
parts of the old.
The good news however is that it looks as (finally), FDA wants to cooperate with industry (also IT industry, not only pharma) to come to modernization. The meeting was however discussing "how to run", but i.m.o., with respect to modern IT, FDA still needs to learn "to walk".
The good news however is that it looks as (finally), FDA wants to cooperate with industry (also IT industry, not only pharma) to come to modernization. The meeting was however discussing "how to run", but i.m.o., with respect to modern IT, FDA still needs to learn "to walk".
In a nutshell, for me (personal opinion), the meeting
contents were pretty disappointing. There was a lot of "vision", but
very little of "strategy". The moderator did a good job, given the Covid-19 enforced format. The difficult questions were however not asked, not discussed. When
all the nice things discussed remain "vision", I am afraid nothing much
will happen, and FDA will not be able to catch up in IT. After all, also IT is very
rapidly evolving.
My suggestion is that FDA hires people from Uber, Google, Microsoft, and empowers them, with a large budget, to really get something done. Reviewers should then be educated, and not be allowed to stick to their old formats and outdated tools.
Let us see what is happening in the following months. In order to pick up speed, FDA should not wait another year before organizing the next, this time "real strategy", public meeting.
My suggestion is that FDA hires people from Uber, Google, Microsoft, and empowers them, with a large budget, to really get something done. Reviewers should then be educated, and not be allowed to stick to their old formats and outdated tools.
Let us see what is happening in the following months. In order to pick up speed, FDA should not wait another year before organizing the next, this time "real strategy", public meeting.