Monday, November 24, 2014

Follow up to "FDA publishes Study Data Validation Rules"

My good friend and colleague at CDISC Sam Hume picked this up, corrected my code and tested it on real Dataset-XML files. Here is his code:

declare namespace def = "http://www.cdisc.org/ns/def/v2.0";
declare namespace odm="http://www.cdisc.org/ns/odm/v1.3";
for $s in doc('file:/c:/path-here/define.xml')//odm:ItemDef[@Name='ARMCD'] 
    let $oid := $s/@OID
    for $armvalue in doc('DM.xml')//odm:ItemGroupData//odm:ItemData[@ItemOID=$oid]
        where string-length($armvalue/@Value) > 20
            return <error>Invalid value for ARMCD {$armvalue} - it has more than 20 characters</error>

He used oXygen XML Editor and ran the XQuery on a file rather than on a native XML database (I use eXist).

So I tried another one: rule #175: "Missing value for --STAT, when --REASND is provided" with: "Completion Status (--STAT) should be set to 'NOT DONE', when Reason Not Done (--REASND) is populated". Here is my XQuery (running against the eXist native XML database where I loaded the test files):

(: Rule FDAC175 :)
declare namespace def = "http://www.cdisc.org/ns/def/v2.0";
declare namespace odm="http://www.cdisc.org/ns/odm/v1.3";
declare namespace data="http://www.cdisc.org/ns/Dataset-XML/v1.0";
(: get the OID for VSSTAT :)
for $s in doc('/db/fda_submissions/cdisc01/define2-0-0-example-sdtm.xml')//odm:ItemDef[@Name='VSSTAT'][1]
let $vsstatoid := $s/@OID
(: get the OID for VSREASND :)
let $vsreasndoid := $s/../odm:ItemDef[@Name='VSREASND']/@OID
(: select the VSREASND data points  :)
for $record in doc('/db/fda_submissions/cdisc01/vs.xml')//odm:ItemGroupData/odm:ItemData[@ItemOID=$vsreasndoid]
(: get the record number :)
let $recnum := $record/../@data:ItemGroupDataSeq
(: and check whether there is a corresponding VSSTAT :)
let $vsstat := $record/../odm:ItemData[@ItemOID=$vsstatoid]
where empty($vsstat)  (: VSSTAT is missing :)
return <error recordnumber="{$recnum}" rule="FDAC175">Missing value for VSSTAT when VSREASND is provided - VSREASND = {$record/@Value}</error> 

I added some comments so that the code is self-explaining.
Essentially, the FDA rule is not one rule, it are two rules. So I still need to adapt the code somewhat so that is also checks on the present of "NOT DONE" for VSSTAT. Here is the corrected part:

where empty($vsstat) or data($vsstat/@Value) != 'NOT DONE'
return <error recordnumber="{$recnum}" rule="FDAC175">Missing or invalid value for VSSTAT when VSREASND is provided - VSREASND = {$record/@Value}</error>

The data() function is important to retrieve the value from the attribute instead of getting the attribute as a node.

In the next few weeks, I will publish more about this nice way of defining the FDA rules extremely precise (no room for different interpretations) and in a machine-executable way.
If we can get this done, everybody will be playing by the same rules ... Isn't that wonderful?

No comments:

Post a Comment