Wednesday, November 27, 2019

Using the "CDISC Library API" in the "Smart Submission Dataset Viewer"

The second own project I have been working on in the last days is extending the open source "Smart Submission Dataset Viewer" with features using the "CDISC Library API". Essentially, the "CDISC Library" is the "CDISC single source of truth", especially about the metadata of electronic submissions. The "define.xml" file that one must also submit to the regulatory authorities is the "sponsor's truth" about the submission. But of course, the contents of the define.xml (the sponsor's truth) must also comply to the "CDISC truth", and that is exactly what we have the "CDISC Library" for. So, I added some new features to the "Smart Submission Dataset Viewer" that query the CDISC library and compare the responses with what is in the define.xml.
In the new version, when one clicks the "options" button, and then select the "CDISC Library features" tab, this is now what the user will see:


In many cases, you will want to check all checkboxes. The first checkbox takes care that for each variable in your submission (SDTM, SEND or ADaM, as defined in the define.xml), the software queries the CDISC library for the properties of that variable and displays it as a tooltip when the user hovers the mouse over the column header:

The two next checkboxes take care that the variable properties as defined in the define.xml and the codelists assigned to them are compared with the information from the CDISC Library, and if they do not match, generates a "discrepancy" and also displays that in the tooltip too:


We don't use things like "Error" or "Warning", as we think it is not up to us to judge what such a discrepancy means or how bad (or good) it is. After all, there may have been a very good reason to deviate from the standard!
The last checkbox allows to generate a report (as HTML and/or XML) in addition to having the tooltips. An example of such a report is below:

For example, for row 7 (VSORRESU), a discrepancy is found stating that the wrong codelist was assigned to VSORRESU. In fact, the define.xml states that the "UNIT" codelist (C71620) is assigned to VSORRESU, whereas the CDISC Library states that the codelist with NCI code C66770 should be assigned.
But what is the codelist C66770? In first instance, when querying the CDISC Library for VSORRESU, it does not provide any details of the codelist C66770 (even not the name), but instead, delivers a link (reference) to where that information can be found, i.e. provides the link for the next possible query to the CDISC Library:

In the report, this reference is provided. So, if the user wants more details about the codelist C66770 (the one the library says should be used for VSORRESU), he/she clicks the hyperlink, and a new query is done on the CDISC library using the link that was provided in the prior response.
We call this "chaining". It means that one response contains the links to any other information about that "object", including information about any previous versions of that "object" and later versions of that "object". This principle is known as "
HAETAOS" principle by computer scientists. It is a great way to generate your own "network" of all the "things" in your submission, which is a partial copy of the "network of things" in the CDISC Library.

What are the limitations of using the CDISC Library in the "Smart Submission Dataset Viewer"?
Essentially, the only limitation is creativity! We only added a very small amount of features that are using the CDISC Library. But there can be so much more. So, if you have ideas about features you would like to see added to the "Smart Submission Dataset Viewer", just drop me a mail, and we can implement it. After all, developing and testing new methods using the "CDISC Library API" usually takes only a few hours, as the CDISC Library API is so extremely easy to implement.