Sunday, March 16, 2025

New PMDA rules for CDISC standards published March 2025

PMDA (the Japanese regulatory authorities) have recently published new rules for use of CDISC standards.

The Excel file with the rules can be downloaded from: https://www.pmda.go.jp/files/000274354.zip

Unfortunately, the rules are only published in the form of an Excel file, so not in a "vendor-neutral" format, and also barely or not usable for execution in software, also as it is "text only", so not containing any "machine-readable" instructions, even not in "meta-language". For example, I am missing "precondition" statements (except for "domain"), clearly stating (best in machine-readable form) when a rule is applicable or not. Preconditions act like filters, and are key in the description of rules to be applied in software.

With exception of additional rules, e.g. for SDTMIG-3.4 and some clarifications, there is not much new under the (Japanese) sun, even the format of the Excel worksheets hasn't changed since 2019. One may see this as a good sign of consistency, but I do have another opinion.

Even more problematic is that many of the rules keep being open for interpretation, meaning that different implementers (such as software vendors) may have a different implementations, leading to different results. That is of course unacceptable. CDISC CORE, as a "reference implementation", does a much better job here.

Furthermore, many of the rules have a too vague description (as was already the case in the past), and do not contain the necessary information so that any software company can implement them. So I ask myself whether these rules have really been developed by the PMDA, or by an external party that has interests of keeping the rule descriptions vague.

Lets have a look at a few examples:

What does this rule mean? It does not even mention the define.xml which is the place where value-level conditions are defined, and where one can find the associated codelists for value-level variables. And what is meant with "extensible codelist"? Is the CDISC codelist meant? Or the "extended" (not "extensible) CDISC-codelist in the define.xml?
So, enormously open for different interpretations …

Another one that must requires more and better explanation is rule SD0003:

 

It says "must conform the ISO 8601 international standard". It does not say which part of that standard … If one takes the rule literally, it allows e.g. P25D (25 days) as a value for e.g. LBDTC. This is probably not what is meant, probably what is meant is that it must comply with the ISO 8601 notation for dates, datetimes, partial-dates and so one, so e.g. "2025-03-16T09:22:33". But the rule doesn't say that …

Another questionable rule is SD0052:


Does this also apply to "unscheduled" visits? When I have two visits which both have VISIT = "Unscheduled Visit", must they have the same value for VISITNUM? I doubt so, as for unscheduled visits, they must have values like "3.1", "3.2", "4.1". Or should one take care that in such a case the value for "VISIT" is e.g. "Unscheduled Visit 3.1"? The rule does not say anything about this …
The SDTMIG-3.4 in section 4.4.5 states:

If one follows this, both the cases "left null" and "generic value" would violate PMDA-SD0052, at least when taking the rule literally.

Another one, but which is related to the use of SAS Transport 5 ("XPT" format) is rule SD1212:


This is problematic as the (still mandated) SAS-XPT format stores numbers in the "IBM mainframe notation" which is not compatible with modern computers (that use IEEE), and --STRESN is a number and --STRESC is character. So, what is the meaning of "equal"? Is it a violation when e.g. LBSTRESC is ".04" and the visual presentation (e.g. using the SAS Universal Viewer) of it is "0.04". I have seen lots of what in my opinion are "false positives" in the implementation of one vendor.
Time also PMDA moves to the modern CDISC Dataset-JSON format for submissions.

What I also found interesting is that in the Excel, "Message" comes before "Description", as if the rule is the rule is already pre-attributed to be implemented by a single vendor. It is also the question whether a vendor-neutral organization like the PMDA should impose on the vendors what the message is in a software implementation. If "Message" would be replaced by "Example message" and come after the "Rule description" in the Excel I would already feel better.

Let's now have a look at the Define-XML rules, an area where there have been a lot of inaccuracies in the rule in the past.

Just take the first ones from the Excel: 

 

 

The first rule DD0001 already astonishes me. The rule description "There is an XML schema validation issue …" is surely not a rule, it is an observation. The rule should sound something like: "Any define.xml file must validate without errors against the corresponding XML-schema". Also the sentence "it is most likely an issue with element declarations" does not belong in a rule description.

Rule DD0008 is essentially not necessary as when the element is not in the correct position, the define.xml will not validate without errors against the corresponding XML-Schema. Again, this is an observation, not a rule.
Also rule DD0002 is not necessary, as when one of the namespaces is missing, the define.xml will not validate against the schema. What is in the "description" is essentially not more than a recommendation, or "requirements lookup".

And what to think about the wording in rule DD0010 "… should generally not …". What does "generally" mean here? Does it mean that there are exceptions? If so, what are they?

What surprised me (well, not really) is the use of the wording "should". I found it 26 times in the Excel file. Essentially, the word "should" should never appear in a rule. "Should" is an expectation, and expectations do not belong in rule formulations.

I always compare it to saying to your teenage daughter or son: "you should be back home by midnight". Do you think he or she will then really be back at home at 12 pm on a Saturday night? If you say. "you must be back home by midnight", that already sounds more strict isn't it?

I did not check every rule in the Excel in detail. That would be too frustrating …

Unfortunately, such rule definitions open the door for different interpretations, leading to different results when using different implementations, of which some will surely be "false positives", or conflict each other. This is something we observed in the past with the SDTM rules of one vendor: When you did something "A", you got an error or warning "X", and when you did it "B", the warning/error X disappeared, but you got an error/warning Y. So. whatever you did, you always got an error or warning. Pretty frustrating isn't it?

Does this publication show improvement when compared with the older versions? I would say "very little".
The way the rules are published (as non-machine-executable Excel) and the way they have been written (some seem to be written as copy-pasted from leaflets in the "complaint box" at the entrance of the PMDA restaurent.

As CDISC-CORE will soon also start working on the PMDA rules, I presume CDISC will be in contact with PMDA to discuss every one rule, check it for necessity, reformulate it to be precise in cooperation with PMDA coworkers, and then publish them again, together with the open source implementation (which currently is in as well human-readable as machine-readable YAML/JSON code). The CDISC open implementation can then serve as the official "reference implementation".



 

 


 






 

 

No comments:

Post a Comment