Friday, July 26, 2019

CDISC Validation: PointCross – Pinnacle21 v.3 comparison: Part 4: some more hot topics

In the fourth part of our series, we look at some more "hot topics" that were real problems with the Pinnacle21 validation software in prior versions. We will look into whether these were corrected in v.3.0.0 and compare how the new MySEND from validation software from Pointcross life sciences treats these cases.
We will especially concentrate on two major groups of items: uniqueness of records and when and how --ORRESU and --STRESU values ("Original Result Units" and "Standardized Result Units") must be populated in SDTM and SEND.

Uniqueness of records based on dataset keys defined in the define.xml

Some time ago, there was a report of false positive errors when using Pinnacle21 v.2.2 when people choose to have different keys than the examplekeys in the SDTM-IG.
As a test, I changed the definition for AE in a define.xml file and removed the key assignment for AESTDTC. I then added a second "Anxiety" record for the same patient with clearly different start and end dates.
According to the (remaining) keys defined in the define.xml (STUDYID, USUBJID,AEDECOD) this should lead to a "duplicate records" error. 
Neither MySEND nor Pinnacle21 v.3.0 reported any issues. So it looks as both packages ignore the key definitions from the define.xml and use their own ones (whatever these are remains a secret to me…).
The other way around, I duplicated a record in my LB (Laboratory) dataset for "Vitamin B12" and got a "duplicate records" error for both the validators. In MySEND, it reported that this was based on the combination of USUBJID, LBTESTCD, LBDTC, LBSPEC, LBMETHOD, VISITDY, LBDY, LBCAT, VISITNUM.
For Pinnacle21 it reported that this was based on the combination of USUBJID, LBTESTCD, LBCAT, LBSPEC, LBMETHOD, VISITNUM, VISITDY, LBDTC. The difference between both is LBDY.
However, I never declared LBDY as a dataset key in my define.xml, nor did I for LBSPEC.
This seems to confirm that in both packages, the "uniqueness keys" from the define.xml are mostly or completely ignored.

I then changed the data to contain LBTPT, LBTPTNUM, and gave the two records that were reported to be non-unique different values for LBTPTNUM and LBTPT, and registered LBTPT as a key variable in the define.xml. The errors should now go away. It did in the case of Pinnacle21. When I gave both records the same value for LBTPTNUM (making them non-unique again), no issue was reported however.
For MySEND, the error also went away when adding LBTPT and LBTPTNUM, and assigning a key to LBTPTNUM and having different values for LBTPTNUM.
When the two records have the same LBTPTNUM however, also no error is thrown. The reason is probably that I needed to assign different values for LBDTC, as otherwise I did get some other errors.
All together, it looks as both MySEND and Pinnacle21 have their own rules for defining what "record uniqueness" is, and ignore the key definitions from the define.xml. This is essentially a completely false approach, as the define.xml is "the sponsor's truth" and the keys should solely be taken from the define.xml.

Uniqueness of records based on TESTCD, SPEC, METHOD …

One of the major problems of the use of (post-coordinated) CDISC-CT for laboratory tests is that there is no perfect way to automatically detect uniqueness of records without using the keys from the define.xml, and even then ... Suppose the following situation. A subject is tested on glucose in urine using a dipstick method. The result is "+2" (ordinal). As the test is positive, a quantitative test is performed also using a dipstick method, but then quantitative, with a result of 2 mmol/L. Both records in SDTM have the same values for LBTESTCD ("GLUC"), LBSPEC ("URINE"), and LBMETHOD ("DIPSTICK"), and even the same value for LBDTC, as both tests were done on the same sample (LBDTC = "Date/Time of Specimen Collection"). The major difference however is the LOINC code. In the first case (ordinal), the LOINC code is 25428-4 whereas in the second case (quantitative), the LOINC code is 22705-8.
How do both the validation tools treat this situation? In our define.xml, we set "LBLOINC" as one of the keys (using "KeySequence") to tell the system (and of course the reviewers) that this was a key in our database. Do the validation tools accept our choices for the uniqueness keys?
Pinnacle21 v.3.0.0 reports a warning for one of the records: 


 
It seems that it overrides our own choices for the uniqueness keys defined in the define.xml (yes,  we did indeed add define.xml using the GUI) by "their own" (i.e. what Pinnacle21 think should be the keys).
MySEND also gives an error with a somewhat different message:


It argues that there cannot be 2 measurements with the same datetime of collection. However for LB, LBDTC is the datetime of the sample collection (and not of the measurement). So if two measurements were performed on the same sample, these should never be marked as "duplicate records".
So also here, a false positive error, as also here, the choice of the uniqueness keys in the define.xml is fully ignored.

Units: "ORRESU when ORRES is provided" and "STRESU when STRESC is provided"

One of the most contested rules are the FDA rules "Missing value for --ORRESU, when --ORRES is provided" (SD0026) and "Missing value for --STRESU, when --STRESC is provided" (SD0043). We all know that there are so many cases where there is no unit. Simple examples are "pH", and all tests that provide ordinal or narrative values. So, these rules just don't make sense
In our test set, the Hematocrit (LBTESTCD=HCT) values are provided as fractions, e.g. "0.43", without unit. For all such records, both MySEND as Pinnacle21 state there is an issue there, referencing the above-mentioned rules. Probably, both packages assume that all hematocrit values are of have to be reported with "%" as the unit, but I have nowhere found such a rule. Also, in both packages, the rule seems not to be applied when the value of LBSTRESC cannot be converted to a number. So, on what basis is it decided whether the rule is applied? On the value itself? That is and remains intransparent.
An improvement relative to the past is already that no such errors are thrown when the test is "pH" (LBTESTCD=PH). We all know (or should know) that there is no unit for pH, as it is the logarithm of a ratio.
Essentially, these two rules should not exist: they are nonsense, and there is currently no correct way to 100% accurately describe when a test has a unit and when not, especially not using CDISC coding systems.
The case may be however be different when using LOINC codes.
Essentially, the LOINC code itself contains indirect information about whether a unit is expected. First of all, one of the five parts of the "LOINC Name" is the "scale". If the value for "Scale" is "Qn" (meaning "Quantitative"), there is already is a good chance that there is a unit. But not always… If the value of the part "Property" is either "VFr" (volume fraction) or "MFr" (mass fraction), there might be a unit or not. LOINC also provides an "example UCUM unit" when there is one available, but it does not mean that that unit must be used.
For our "hematocrit" example, the LOINC code 4544-3 of Blood by Automated count" has an example (UCUM) unit "%", but given that the "property" is a "fraction", it does not mean that "%" needs to be used. Essentially, it should be possible to develop a set of rules to determine whether LBORRESU must have a value, based on the LOINC code and/or the UCUM notation. For example, "%" as UCUM unit cannot be reduced to a combination of the 7 base units, neither can "[pH]".
I did not find any indications however that either MySEND or Pinnacle21 ever did any attempt to develop such rules. As long as such a clear set of rules is not publicly available, it does not make sense to implement rules SD0026 and SD0043: they should simply be removed.


Conclusions

Both packages unfortunately seem to ignore the "uniqueness keys" provided by the define.xml (the "sponsor's truth") and have their own (partially intransparent) rules of what is understood under "unique". One argument I heard in the past for not taking the keys from the define.xml is that "many define.xml files are not correct". That however is "the world upside down" and is like saying: "many people ignore the red traffic light, so let us remove all traffic lights".
Both packages try to implement the FDA rules SD0026 and SD0043, although these rules should not exist as there is currently now way to correctly find out whether a test has a unit or not. Even with the help of the LOINC code this remains tricky, as the examples with mass fraction or volume fraction shows, where "%" is a valid unit, but where the fraction can also be a number between 0 and 1, without unit.

Next time we will report on a "code review" of the Pinnacle21 v.3 CLI source code (PointCross MySEND is not open source).

Prior in the series:
Part 1: Installation
Part 2: Validation features
Part 3: Hot topics 1