Saturday, November 25, 2023

The Need for Speed: Why CDISC Dataset-JSON is so important.

The CDISC community has been suffering for 20 years or more by the obligation of FDA (and other regulatory authorities following FDA) to submit datasets using the SAS Transport 5 (XPT) format.
The disadvantages and limitations of XPT are well known: limitations to 8, 40 and 200 characters, only US-ASCII encoding only, etc.. But there is much more. Essentially, the use of XPT has essentially been a road-blocker for innovation at the regulatory authorities all these years.
Therefore, the CDISC Data Exchange Standards Team has developed a modern exchange format, Dataset-JSON, which, as the name states, is based on JSON, the currently worldwide must used exchange format anyway, especially for the use with APIs (Application Program Interfaces) and RESTful Web Services.
The new exchange format is currently being piloted by the FDA, in cooperation with PHUSE and CDISC.

Unlike XPT, Dataset-JSON is really vendor-neutral and much , much easier to implement in software than XPT. This has also resulted in a large number of applications being developed and showcased during the COSA Dataset-JSON Hackathon. There are however many opportunities created by the new format, which are however not well recognized by the regulatory authorities.
XPT is limited to the storage of "tables" in "files", i.e. two-dimensional. JSON however allows to represent data (and metadata) in many more dimension and deepness. This means that, even when Dataset-JSON will at first still be used to exchange "tables", these can be enhanced and extended to also carry audit trails (much wanted by the FDA), source data (e.g. from EHRs, lab transfers) and any type of additional information, as well on the level of the dataset, the record, as the individual data point.
Furthermore, Dataset-JSON will allow to embed images (e.g. X-Rays, EMRs) and digital data like ECGs into the submission data.

The major advantage of using this modern format is however on another level.

Traditionally, submissions to regulatory authorities are only done after database closure, mapping the data to SDTM, SEND and/or ADaM, etc.. This essentially means a period of often several months are the clinical study has been finalized, and years after the clinical study has been started. In the mean, many patients may have died or seriously harmed, as the treatment they need, is not available yet. This is what we call "the need for speed".

Dataset-JSON can be game changer here.

Essentially, partial submission datasets can be generated as soon as the first clinical data are received from the sites. The regulatory authorities are however not used to start reviewing as soon as the first clinical data is available, among others, due to their technical infrastructure.
JSON is especially used worldwide for use with APIs and RESTful web services, meaning that even submission data can be exchanged real time, once they are created. Although JSON can of course be used with and for "files", the real strength is in its uses for "services". All other industries have moved from files to SOA, "ServiceOriented Architecture".

What does this mean for regulatory submissions? 

Imagine a "regulatory neutral zone" (one can discuss what "neutral" means) between sponsor and regulatory agency, where the sponsor can submit submission records (not necessarily as "files") as soon as they are created, using an API and e.g. using RESTful Web Services. Using the same API, records can also be updated (or deleted) when necessary, using audit trails. On the other side, reviewers can query the study information from the repository, using the API, not necessarily by downloading "files" (although that remains possible), but by getting answers on questions or requests like "give me all subjects and records of subjects with a systolic blood pressure of more than 130 mmHg that have a BMI higher than 28".
This "regulatory neutral zone" is surely different from the current "Electronic Submissions Gateway" (which is completely file based), but more related to API-governed repositories used in many other (also regulated) industries such as aviation, financial, etc..

Essentially, when all this in place, regulatory submission could be started as soon as the first data points become available, and finalized much sooner (even months or years sooner) as is currently the case. This can then save the life of thousands of patients.