Working on and with CDISC Standards: July 2020

FDA seems to be recognizing that it is behind in its technology to handle data. Therefore, it organized a meeting with representatives from industry and some external advisors. Due to the COVID-19 crisis, the meeting, originally envisaged to be public, was held virtual on June 30, 2020. The complete recording (4 hours and 47 minutes!) is however available.
I watched it all and made a lot of notes. Here is my (biased) summary.

The meeting consisted of three main parts, each with a presentation (from someone at the FDA), followed by a strongly managed panel discussion, moderated by Cliff Goodman, leaving little opportunity for real open discussion. My own impression was that none of the members of the panels was able, or willing to really mention or discuss the pain points in the data handling process at the FDA.

Before the first session, there was a short introduction by Allison Hoffman, Senior Health Science Advisor at FDA, followed by a presentation by Amy Abernethy, Principal Deputy Commissioner, acting CIO at FDA. A few statements from my notes:
- "FDA efficient and modern as possible"
- "We can either be enabler, or bottleneck"
Mentioned were "use of cloud technologies", "share interoperable data", "not PDFs as digital paper", "structured information", "artificial intelligence", "blockchain for track-and-trace of drugs and food". No real strategies or measures however were mentioned how to get there.
She also introduced Vid Desai (Chief Technology Officer - CTO) and Ram Iyer (Chief Data Officer - CDO), two new persons and functions at the FDA.
Some notes I made:

Vid Desai:
- "Data source, data is very distributed"
- "We can no longer assume, that data will be submitted to us and residing by the FDA…"
- "Rethink technologies and processes, need to invest in people and culture"
Ram Iyer
- "Two months in the agency - so more questions than answers"
- "Amount, complexity and variety of data, … is going to be growing exponentially"
- "New technologies like AI … require data management skills that we have not invested in"

The first session was on "Data Sharing". A presentation was given by Mary Ann Slack (CDER, Office of Strategic Programs), presenting a "use case" on "FDA's Opioids Data Warehouse and Beyond" where data from many (44) different sources is brought together (a topic that will come back over and over again). The goal is to come to a cloud-based, secure environment (data lake), using APIs, that supports the analytical needs of the FDA. How to get there was not explained.

The panel members were then introduced:
- Deven Mc Grew, Ciitizen (start-up), privacy specialist
- Craig Taylor – FDA – data security
- Tim Williams – representing Phuse - UCB – semantic web, linked data
- Jose Arrieta – HHS CIO
- Mike Fleckenstein – MITRE – data warehouses, architecture

The following panel discussions then were not so interesting, at least not for me, no new information … IMO it doesn't make sense discussing whether one should explain how data lakes work to the general public … or whether availability and data protection can go together at all (of course it can).

The second session was on Data Exchange

It started with a presentation of Meredith Chuck (FDA, Oncology, CDER) on "Replacement of PDFs by real electronic (structured data) for Safety Reports". FDA is getting 40,000-50,000 such reports / year, all as PDFs, which is of course extremely difficult to handle. She announced that the new process will be based on sponsor-generated XML "files" with structured data and submitted through the FDA Gateway into FAERS (FDA-AE reporting system). They are also exploring APIs.

Such a project is of course highly desirable, but isn't 10 years too late? The whole world is using APIs, RESTful web services and XML or JSON for data exchange, but FDA is still using … PDFs.

Also, when we talk (but the FDA doesn't) about data exchange for clinical studies, the current "show stopper" for innovation is surely the 30-year old SAS Transport 5 ("XPT") format. SAS Transport 5 was however not mentioned at all in any of the presentations nor in any of the discussions. How does the FDA want to start using cloud, AI and ML, bringing data together from different sources when it refuses to get rid of that legacy format that is not used by any other industry anymore? Just as a very simple example: with XPT format it is not possible to combine an SDTM record with the source data (e.g. an EHR-FHIR resource). Simply switching to XML (without changing anything else), would already make this possible.

The panel members then (additionally) were:
- Mark Bach – J&J
- Danica Marinac-Dabic – FDA/CDRH
- Jeff Allen – Friends of Cancer Research
- Don Rucker - ONC

The panel "discussion" was then moderated as a "Q&A" session, which leaves little room for real discussions. One of the questions was "Clarify 'operational data versus reported data' and the relation to EHRs?". A few notes I made:
- Don Rucker: "In the world of EHRs, APIs changed everything. "Point in time" data [i.e. real time data] is also important to the FDA".
- Jonathan Shough: "... not only higher volume of data but also multiple sources of data"

Interesting for me with regard to this is that FDA, at least for clinical data, does not have any "point in time" data at all. All data (SDTM, SEND, ADaM) is submitted up to years after the clinical data was collected. When, a few years ago, I proposed the "1 MB submission", where sponsors submit e.g. a define.xml even before the study starts, defining what will be submitted, and providing the API end point details with which FDA can pull the already collected data (in this case categorized into SDTM/SEND) at any time during the study, I was declared to be crazy by very many in the clinical research world. Such a mechanism would not be very difficult to implement, but it would require a change in mindset. With SAS Transport 5, it don't think it would work either, as it is not very RESTful web services friendly (I will have a try next weeks).

Another question from the moderator, this time to Jeff Allen was "What has been changing in the role of real-world data?".
A few notes I made from the answer of Jeff Allen:
- "Aggregation and curation allows the use in research upfront"
- "Data from EHRs is not meant to immediately replace clinical trials, but …augments that information. Challenge is still how to layer these different datasets together"

With respect to layering the data together, I suspect that one of the main reasons mandated the use of LOINC codes in SDTM-LB is exactly this. "Real world lab data" all come with LOINC codes, and until recently, most sponsors did not submit the LOINC code as LBLOINC as SDTM, meaning it was impossible to compare submitted SDTM-LB data with lab data from e.g. electronic health records.
"Real World Data" however also comes with LOINC coding for many other domains. It uses a lot of SNOMED-CT, ICD-10 and other healthcare codes, none of which is used or supported by CDISC.

By the way, it was also surprising that in the almost 5 hours recording, the word "CDISC" was not used a single time, but neither were other organizations in the area of standardization. Were the panel members forbidden to mention them, or have these be found not sufficiently important to be mentioned? A bit frightening anyway ...

A wording that came back over and over again was "rolling data flow". For EHRs, this essentially is pretty easy to implement, data privacy being very important, and to be taken care of.
For clinical research, see my remark about the "1 MB submission".
Also interesting was the remark from Jeff Allen: "The perfect can become the enemy of the good".
Remember that HL7-FHIR is an 80/20 solution, not trying to "boil the ocean". Sometimes I do have the impression with ever new versions of SDTM, CDISC is trying to "cover every possible use case", "patching" the shortcomings of earlier versions.
With respect to this, one remark from Jonathan Shough, about giving subjects back their own results of the study they participated in, was interesting. I presume that SDTM is not the right solution (and especially not when using XPT format) for such, as it cannot be combined with the EHRs the patients already have access to (usually as HL7-FHIR).

Last question in the panel discussion was: "If FDA has data exchange right in 2025, what would this be?". One answer was "operational data instead of documents", another "combining data from different sources".

The third session was on Data Usage.

The session started with a presentation of Don Prater (FDA Office of Food Policy and Response) on a pilot using AI-ML for imported food screening, with shipping information as the primary data source. The current system is named PREDICT (Predictive Risk-based Evaluation for Dynamic Import Compliance Targeting), and is a "rules-based" system. The idea is to move more to AI for predictive analysis, using as well structured as unstructured data, and to connect it to traceability systems. A proof of concept was started April last year, feeding a ML system with historical data from the rule-based system. Typical issues and limitations using AI were encountered (like bias). The next envisaged step is an "operational field pilot". Numbers about how the AI based system performs better than the classic PREDICT system were however not revealed.

The panel then consisted of:
- Don Prater (FDA)
- Isaac Kohane (Harvard)
- Frank Yiannas (FDA)
- Andrea Corvos (Elektra Labs, ex-FDA)
- Joe Goodgame (Remarque Systems)
- Ram Iyer (CDO at FDA)
- Peter Lee (Microsoft – ML data science)

Topics were blockchain, ML and AI, structured and unstructured data and Electronic Medical Records (EMRs). Comparisons were made with Uber, Amazon , nuclear plant monitoring.
Remark that none of these is using SAS Transport 5 – just imagine you would need to download the Amazon product catalog as SAS Transport 5 …
One statement from Danica Marinac-Dabic is worth mentioning "if we are split in our silos, like of clinical trials are using vocabularies that is different from other things … it is really hard to make the data work across …".
In my opinion, it is high time that CDISC starts recognizing and using healthcare vocabularies from the healthcare world like SNOMED-CT, LOINC, UCUM for units, etc. "Mapping" as was done for a small part of LOINC is just a mock solution.

At the end of the panel discussion, each of the panel members was asked to answer the question "how can FDA lead the way in putting data to use", in one sentence. Here are a few answers:
- Joe Goodgame: "requesting data from sponsors earlier- not wait until 6 months after start of the trial" [P.S. Jozef: in reality, it is not 6 months, it is years]
- Ram Iyer: "Act as an orchestrator, act as a lighthouse"
- Frank Yiannas: "Improve on traceability"
- Peter Lee: Embrace interoperability standards

It was nice following this meeting, but also pretty boring. Just as a manifestation of the current status of IT at the FDA: a transcript of the meeting is currently (3 weeks after the meeting) still not available, although there is very good software for that on the market.

One of the statements of the moderator Cliff Goodman near the end shocked me a bit. He stated that "we have been discussing a lot of tools", but essentially not a single tool was discussed at all. What was discussed are technologies, not tools.

The meeting was then finalized with some closing remarks of Amy Abernethy, also presenting some results of the different "Precision FDA Data Challenges", where externals are challenged to provide innovative solutions for problems that occur at the FDA. That is of course fine, but has the danger that all that remain are island solutions, and that no real knowledge transfer to FDA takes place. The example she gave was essentially not about data innovation, it was about some new (at least for the FDA) developed statistical methods.

My conclusions

This meeting was essentially NOT about "strategy", it was about "vision". FDA (as a data-driven organization) is now starting to develop a vision about what it wants to improve on in future regarding data. That's a good thing. It sees promises in cloud, blockchain, ML and AI, APIs for "rolling data". The meeting did not give us any information about HOW it wants to try to modernize (the strategy).

The reality at the FDA is still that it is mostly about "documents", in very many cases PDFs, or as "files" such as SAS Transport files. Modern methods such as APIs, RESTful web services, "rolling data" are all still mostly unknown at the FDA, and surely not generally implemented.

The meeting did not discuss at all how FDA wants to clean up what I call the "show stoppers for modernization": SAS Transport 5, use of memory sticks or file servers (at the best) for storage and transport of data, outdated equipment and software, each reviewer using his/her own (often outdated) tools. It did not make any statements on budgets for modernization, about empowerment of the new CTO and CDO. What can they achieve? The history shows us that such people can only do something "when asked" by a department. Is there a real will for modernization at the departments, or does it depend on some "island innovators" within them?

In the meeting, the really difficult questions were not asked and not discussed. Everybody was very nice and very polite to the FDA. But we are used of that. One can however turn an old building into a modern one without breaking down good parts of the old.

The good news however is that it looks as (finally), FDA wants to cooperate with industry (also IT industry, not only pharma) to come to modernization. The meeting was however discussing "how to run", but i.m.o., with respect to modern IT, FDA still needs to learn "to walk".

In a nutshell, for me (personal opinion), the meeting contents were pretty disappointing. There was a lot of "vision", but very little of "strategy". The moderator did a good job, given the Covid-19 enforced format. The difficult questions were however not asked, not discussed. When all the nice things discussed remain "vision", I am afraid nothing much will happen, and FDA will not be able to catch up in IT. After all, also IT is very rapidly evolving.
My suggestion is that FDA hires people from Uber, Google, Microsoft, and empowers them, with a large budget, to really get something done. Reviewers should then be educated, and not be allowed to stick to their old formats and outdated tools.

Let us see what is happening in the following months. In order to pick up speed, FDA should not wait another year before organizing the next, this time "real strategy", public meeting.

Working on and with CDISC Standards

Monday, July 20, 2020

The "Modernizing the FDA's Data Strategy" Meeting. A (biased) Summary