Labels: PESTRESC
A "hot topic" coming back over and over again in
the Pinnacle21 user forum are the labels for
variables and datasets. The famous message "label mismatch" (35
"hits" in the forum) is very well known… One of
the reasons is that in some cases, variable labels have been published that are
longer than 40 characters (the limit for SAS-XPT), and that Pinnacle21 than
took the freedom to define itself what the label should be. The most famous
example is for the variable PESTRESC.
With the recent release of the "CDISC Library" which is (according to CDISC itself) the "CDISC truth", this issue should essentially be resolved. An overview of our test results is given below.
With the recent release of the "CDISC Library" which is (according to CDISC itself) the "CDISC truth", this issue should essentially be resolved. An overview of our test results is given below.
SDTM-IG version
|
PESTRESC Label according to
SDTM-IG and CDISC-Library |
Validation Result MySEND
(FDA rules) |
Validation Result Pinnacle21 v.3.0
(FDA rules) |
3.1.2
|
Character Result/Finding in Std Format
|
OK
|
OK
|
3.1.3
|
Character Result/Finding in Std Format
|
OK
|
OK
|
3.2
|
Character Result/Finding in Standard
Format
|
SDTM/dataset variable label mismatch
(FALSE POSITIVE) |
OK*1
|
*1 Pinnacle21 v.3.0 seems to compare the variable
label with the "ItemDef Description" from the define.xml when
define.xml is provided. In case there is a mismatch between them, it gives an
error with a clear error message.
In case no define.xml is provided, it does not give an error for the label "Character Result/Finding in Standard Format". In case the define.xml is present and the label is "Character Result/Finding in Standard Format" in both, no error is thrown.
In case the "label" is completely wrong in both the dataset as in the define.xml (e.g. using "test" for the label for PESTRESC) it gives an error "SDTM/dataset variable label mismatch".
In case no define.xml is provided, it does not give an error for the label "Character Result/Finding in Standard Format". In case the define.xml is present and the label is "Character Result/Finding in Standard Format" in both, no error is thrown.
In case the "label" is completely wrong in both the dataset as in the define.xml (e.g. using "test" for the label for PESTRESC) it gives an error "SDTM/dataset variable label mismatch".
So it looks as Pinnacle21 seems to have made progress here, not throwing an error anymore when the label for PESTRESC does not correspond to what they think it should be, whereas MySEND still seems to follow what Pinnacle21 did in earlier releases.
Remark that the 40-character limitation is an artificial one, due to that the FDA (and PMDA) still require the completely outdated XPT format to be used. In modern times, the transport format is independent from the content standard and does not limit it in what content can be. XPT is a disaster in this sense. HL7-FHIR however shows how it can be done: one standard, three transport formats (XML, JSON, Turtle).
Order of variables: EPOCH in SV ("Subject
Visit")
Another hot topic that pops up over and over again is the correct order of variables in a dataset, especially when "timing variables" are added to an "observation" dataset.
The correct order for timing variables is:
SDTM 1.4 (base for SDTM-IG 3.2):
VISITNUM, VISIT, VISITDY, TAETORD, EPOCH, --DTC, --STDTC, --ENDTC, --DY,
--STDY, --ENDY, --DUR, --TPT, --TPTNUM. –ELTM,
--TPTREF, --RFTDTC, --STRF, --ENRF, --EVLINT, --EVINTX, --STRTPT, --STTPT,
--ENRTPT, --ENTPT, --STINT, --ENINT, --DETECT: Searching for "wrong order" on the Pinnacle21 forum leads to 35 entries.
Just as an example,
there was a complaint on the forum about the correct order of "EPOCH"in SV (Subject Visits).
"EPOCH" is not described for "SV" in the SDTM-IG. But the
FDA wants it anyway (omitting it leads to a validation error). So it needs to
be inserted. The author of the entry did put it after "VISIT" and before
"SVSTDTC" (VISITDY and "TEATORD" were absent), which seems
perfectly ok. He/she still got an " SD1079" error.
Unfortunately, there
was no reaction from Pinnacle21 at all.
So, as a test, we
added "EPOCH" after "VISITDY" and before "SVSTDTC"
in our test dataset, and looked what the validator says about it (this was a
false positive error in Pinnacle 2.2.0 – see e.g. https://www.pinnacle21.com/forum/svepoch-variable-wrong-order).
We re-generated the SV
dataset, NOT using SAS-software. The SV dataset then contained 17 records.
When using Pinnacle21 v.3.0 (using FDA rules, SDTM-IG 3.2, and XPT for the
format), we did not get any error or warning regarding the order of the
variables. A bit surprisingly, we did get errors that there are
"null" values for STUDYID and USUBJID for record 18, this although we only have 17 records. When using Dataset-XML
for the format, then this error disappears.
When using MySEND
(using FDA rules, SDTM-IG 3.2, and XPT format), we did get a message "Model
permissible variable added into standard domain". It does not say however
whether it is an "info", a "warning" or an
"error". This message was a typical warning in Pinnacle21 2.2.0, leading to a lot of confusion as when leaving out "EPOCH" one also got a warning. So, whatever one
did (adding EPOCH or leaving it out), one ALWAYS got a warning. This approach
seems to have been given up in Pinnacle21 3.0.0, but it is still present
(without a severity however) in MySEND.
Good is anyway, that for SV, both software packages do not generate a false positive error when "EPOCH" is added in the right place in SV.
We must however emphasize that in modern IT, the order of the variables in such datasets is fully irrelevant. Essentially, SDTM is a "View" on a database (the original database being omitted in the submission). Also in modern databases, the order of the variables (columns) in such a "view" is completely irrelevant. The probable reason for this "order" requirement in SDTM is again the outdated XPT format, and the outdated tools reviewers are using at the FDA, such as the "SAS System Viewer", which is even not supported anymore by SAS itself. In our own open-source "Smart Submission Dataset Viewer", the order is not of importance, the columns can be moved from one place to another anyway (this is not possible with the SAS System Viewer).
CodeLists: TS
It is always very
interesting to see how codelists are treated. Essentially, the define.xml is
the "sponsor's truth", so ideally, a validator should check whether
the codelist as given in the define.xml (which is very often a subset of the
one published) matches the one from CDISC, taking "extensibility"
into account, and taking "extended values" into account. When all
that is OK, the validator should check the values in the datasets against those
in the define.xml.
That this does not always work very well, is shown in a bug report for Pinnacle21 for the TS dataset, where it was reported that in some cases, value for TSPARMCD are not checked against the controlled terminology.
In the case of
Pinnacle21, I kept keeping the message "TSPARMCD value not found in 'Trial
Summary Parameter Test Code' extensible codelist" for the term
"TEST", but now got a warning (in v.3.0) instead of an error (in v.2.2) when "TEST"
was not in the codelist at all.
For me, this means that Pinnacle21 did not step down from its definition of a warning being as "something that may be unusual". In my opinion, when a term is correctly defined as an extended value in the define.xml, and it appears in the dataset, there should not be any report of it at all.
For me, this means that Pinnacle21 did not step down from its definition of a warning being as "something that may be unusual". In my opinion, when a term is correctly defined as an extended value in the define.xml, and it appears in the dataset, there should not be any report of it at all.
In the case of MySEND, with "TEST" and "Test
Type" defined as extended values in define.xml, we got the messages
(without an assigned "severity"): "'TEST' value
not found in 'TSPARMCD' extensible codelist" and "'Test Type' value
not found in 'TSPARM' extensible codelist".
So it looks as also MySEND is following the approach used by Pinnacle21, not giving full merit to extended codelist items in the define.xml.
So it looks as also MySEND is following the approach used by Pinnacle21, not giving full merit to extended codelist items in the define.xml.
Conclusion
It looks as Pinnacle21 has made some progress in version 3.0.0. Some rule implementations, mostly leading to false positives, have been relaxed. Problematic however is that "warning" still often means that something may be unusuable. These messages should be of type "info", which (fortunately) has now been introduced in version 3.0.
For these "hot topics", MySEND seems to follow the approaches Pinnacle21 had in version 2.2. This is a pity, as MySEND does now have the unique opportunity to do it really right. I do however have the impression that they are still trying to mimic the outcomes of the old Pinnacle21. But with short release cycles, this can of course be corrected.
For these "hot topics", MySEND seems to follow the approaches Pinnacle21 had in version 2.2. This is a pity, as MySEND does now have the unique opportunity to do it really right. I do however have the impression that they are still trying to mimic the outcomes of the old Pinnacle21. But with short release cycles, this can of course be corrected.
In the next blog entry, we will have a look at some other hot topics such as "database keys" in submission files, and the FDA rule that for each "ORRES", there must be an "ORRESU", which is of course complete nonsense.
Also read:
Part 1: Installation
Part 2: Validation features