Wednesday, June 26, 2019

CDISC Validation: PointCross – Pinnacle21 v.3 comparison: Part 3: hot topics 1

In our third part, we make comparisons between Pinnacle21 Validator 3.0 and MySEND 1.0 for some "hot topics", i.e. topics that were highly problematic in earlier versions of the Pinnacle21 validator (for MySEND, we can't say, as there are no earlier versions).


A "hot topic" coming back over and over again in the Pinnacle21 user forum are the labels for variables and datasets. The famous message "label mismatch" (35 "hits" in the forum) is very well known… One of the reasons is that in some cases, variable labels have been published that are longer than 40 characters (the limit for SAS-XPT), and that Pinnacle21 than took the freedom to define itself what the label should be. The most famous example is for the variable PESTRESC.
With the recent release of the "CDISC Library" which is (according to CDISC itself) the "CDISC truth", this issue should essentially be resolved. An overview of our test results is given below.

SDTM-IG version
PESTRESC Label according to
SDTM-IG and CDISC-Library
Validation Result MySEND
(FDA rules)
Validation Result Pinnacle21 v.3.0
(FDA rules)
Character Result/Finding in Std Format
Character Result/Finding in Std Format
Character Result/Finding in Standard Format
SDTM/dataset variable label mismatch
*1 Pinnacle21 v.3.0 seems to compare the variable label with the "ItemDef Description" from the define.xml when define.xml is provided. In case there is a mismatch between them, it gives an error with a clear error message.
In case no define.xml is provided, it does not give an error for the label "Character Result/Finding in Standard Format". In case the define.xml is present and the label is "Character Result/Finding in Standard Format" in both, no error is thrown.
In case the "label" is completely wrong in both the dataset as in the define.xml (e.g. using "test" for the label for PESTRESC) it gives an error "SDTM/dataset variable label mismatch".

So it looks as Pinnacle21 seems to have made progress here, not throwing an error anymore when the label for PESTRESC does not correspond to what they think it should be, whereas MySEND still seems to follow what Pinnacle21 did in earlier releases.

Remark that the 40-character limitation is an artificial one, due to that the FDA (and PMDA) still require the completely outdated XPT format to be used. In modern times, the transport format is independent from the content standard and does not limit it in what content can be. XPT is a disaster in this sense. HL7-FHIR however shows how it can be done: one standard, three transport formats (XML, JSON, Turtle).

Order of variables: EPOCH in SV ("Subject Visit")

Another hot topic that pops up over and over again is the correct order of variables in a dataset, especially when "timing variables" are added to an "observation" dataset.
The correct order for timing variables is:


Searching for "wrong order" on the Pinnacle21 forum leads to 35 entries.
Just as an example, there was a complaint on the forum about the correct order of "EPOCH"in SV (Subject Visits). "EPOCH" is not described for "SV" in the SDTM-IG. But the FDA wants it anyway (omitting it leads to a validation error). So it needs to be inserted. The author of the entry did put it after "VISIT" and before "SVSTDTC" (VISITDY and "TEATORD" were absent), which seems perfectly ok. He/she still got an " SD1079" error.

Unfortunately, there was no reaction from Pinnacle21 at all.

However: SV is a "special purpose" domain, not a "domain based on the 3 general observation classes", so one may wonder whether the rule is really applicable to SV anyway. There is also no other indication of what the order should be in SV.
So, as a test, we added "EPOCH" after "VISITDY" and before "SVSTDTC" in our test dataset, and looked what the validator says about it (this was a false positive error in Pinnacle 2.2.0 see e.g.

We re-generated the SV dataset, NOT using SAS-software. The SV dataset then contained 17 records.
When using Pinnacle21 v.3.0 (using FDA rules, SDTM-IG 3.2, and XPT for the format), we did not get any error or warning regarding the order of the variables. A bit surprisingly, we did get errors that there are "null" values for STUDYID and USUBJID for record 18, this although we only have 17 records. When using Dataset-XML for the format, then this error disappears.

When using MySEND (using FDA rules, SDTM-IG 3.2, and XPT format), we did get a message "Model permissible variable added into standard domain". It does not say however whether it is an "info", a "warning" or an "error". This message was a typical warning in Pinnacle21 2.2.0, leading to a lot of confusion as when leaving out "EPOCH" one also got a warning. So, whatever one did (adding EPOCH or leaving it out), one ALWAYS got a warning. This approach seems to have been given up in Pinnacle21 3.0.0, but it is still present (without a severity however) in MySEND.

Good is anyway, that for SV, both software packages do not generate a false positive error when "EPOCH" is added in the right place in SV. 

We must however emphasize that in modern IT, the order of the variables in such datasets is fully irrelevant. Essentially, SDTM is a "View" on a database (the original database being omitted in the submission). Also in modern databases, the order of the variables (columns) in such a "view" is completely irrelevant. The probable reason for this "order" requirement in SDTM is again the outdated XPT format, and the outdated tools reviewers are using at the FDA, such as the "SAS System Viewer", which is even not supported anymore by SAS itself. In our own open-source "Smart Submission Dataset Viewer", the order is not of importance, the columns can be moved from one place to another anyway (this is not possible with the SAS System Viewer).

CodeLists: TS

It is always very interesting to see how codelists are treated. Essentially, the define.xml is the "sponsor's truth", so ideally, a validator should check whether the codelist as given in the define.xml (which is very often a subset of the one published) matches the one from CDISC, taking "extensibility" into account, and taking "extended values" into account. When all that is OK, the validator should check the values in the datasets against those in the define.xml.

That this does not always work very well, is shown in a bug report for Pinnacle21 for the TS dataset, where it was reported that in some cases, value for TSPARMCD are not checked against the controlled terminology.
In the case of Pinnacle21, I kept keeping the message "TSPARMCD value not found in 'Trial Summary Parameter Test Code' extensible codelist" for the term "TEST", but now got a warning (in v.3.0) instead of an error (in v.2.2) when "TEST" was not in the codelist at all.
For me, this means that Pinnacle21 did not step down from its definition of a warning being as "something that may be unusual". In my opinion, when a term is correctly defined as an extended value in the define.xml, and it appears in the dataset, there should not be any report of it at all.

In the case of MySEND, with "TEST" and "Test Type" defined as extended values in define.xml, we got the messages (without an assigned "severity"): "'TEST' value not found in 'TSPARMCD' extensible codelist" and "'Test Type' value not found in 'TSPARM' extensible codelist".
So it looks as also MySEND is following the approach used by Pinnacle21, not giving full merit to extended codelist items in the define.xml.


It looks as Pinnacle21 has made some progress in version 3.0.0. Some rule implementations, mostly leading to false positives, have been relaxed. Problematic however is that "warning" still often means that something may be unusuable. These messages should be of type "info", which (fortunately) has now been introduced in version 3.0.
For these "hot topics", MySEND seems to follow the approaches Pinnacle21 had in version 2.2. This is a pity, as MySEND does now have the unique opportunity to do it really right. I do however have the impression that they are still trying to mimic the outcomes of the old Pinnacle21. But with short release cycles, this can of course be corrected.
In the next blog entry, we will have a look at some other hot topics such as "database keys" in submission files, and the FDA rule that for each "ORRES", there must be an "ORRESU", which is of course complete nonsense.

Also read:
Part 1: Installation
Part 2: Validation features