Today, I started gaining experience with the CDISC CORE feature that one can add its own "custom" core rules to the CORE engine. Essentially, this is a great feature, not offered at all by e.g. Pinnacle21.
How to add custom rules to the CORE engine is explained on the CORE GitHub website, but I still had to experiment a bit to get it all right. That is also the reason for this blog, so that the reader has a clear path and explanation how this can be accomplished.
As I did already have some experience with the development of CDISC CORE rules as a volunteer (I was made responsible for the SEND rules, and some of the FDA Business Rules", I could make a jumpstart.
So, I first made a custom rule that all subjects in my studies may not be older than 65 years:
simply using NotePad++.
Important here (as I found out mostly by trial and error) are:
- You can and should define your own "Organization"
- Normally, you will define custom rules as part of your own (set of) standard(s), in this case, "XML4Pharma_Standard". Originally, I had "CDISC" here, which caused my rule not to run. In
this screenshot, that is already corrected. CROs may have several such sets of standards, e.g. for each sponsor customer.
- Assign an own ID for the rule, which is different from any of the CDISC CORE IDs
I also added a second one (with ID XML4P2) stating that AEDECOD must always be populated.
Whereas the first rule is more meant for specific studies, the second is more a "Quality Assurance"
rule for submissions to regulatory authorities.
I then added the YAML files (one per custom rule) to a directory where we will manage all your custom rules. In our case case, we used the directory "D:\CDISC_Standards\CDISC_CORE_extensions":
Then, using Windows Powershell and the CORE Windows distribution, I used the command:
.\core.exe update-cache --custom_rules_directory D:\CDISC_Standards\CDISC_CORE_extensions
which failed, as the engine wants to have the "api key", message =
Error: Missing option '--apikey'.
So I added it:
.\core.exe update-cache --custom_rules_directory D:\CDISC_Standards\CDISC_CORE_extensions --apikey {apikey}
where {apikey} is replaced by my real API key.
This lead to the message:
Added 2 rules: XML4P2, XML4P1
Cache updated successfully.
One also needs to set up a file that links the custom rules with a "custom standard", which is done in a JSON file.
In our case, I defined a "custom standard" named "xml4pharma_standard",
with the contents:
{
"xml4pharma_standard/1-0": [
"XML4P1", "XML4P2" ]
}
Remark that I used lowercase for "xml4pharma_standard": the value will later be used case-sensitive when starting a validation. It took me some time realizing that …
The value after the
slash is the version of our custom standard. This will enable versioning of rules and standards.
This JSON file with links between the custom standard and custom rules is then submitted to the engine using:
.\core.exe update-cache --custom_standard D:\CDISC_Standards\CDISC_CORE_extensions\XML4Pharma_Standard.json --apikey {apikey}
Leading to the message:
Added 1 new standards
Cache updated successfully
One can then check whether these are found, using:
.\core.exe list-rules --custom_rules
which shows both the custom rules I defined in JSON format (although I defined them in YAML):
or, leading to the same result, as we only defined one "standard":
.\core.exe list-rules --custom_rules -s xml4pharma_standard -v 1-0
One can however imagine that a CRO may want to define several "custom" standards, one for each sponsor customer.
However, the command:
.\core.exe list-rule-sets
does not list the newly defined standard - it only lists the rules defined by CDISC.
For running against the custom rules, I then used:
.\core.exe validate -cs -s xml4pharma_standard -v 1-0 -d D:\MetaDataSubmissionGuide_2_0_JSON_1-1_Files_testing\SDTM_Dataset-JSON
I found that two things are very important to get this to work:
* one really needs to add the "-cs" keyword
* for the standard name, provided immediately after "-s", this is case sensitive.
When one e.g. uses "-s XML4Pharma_Standard"
validation executes, but in the result Excel report, the tab "Rules_Report" will not list any results. So the name of the standard in the ".\core.exe validate -s" must exactly match with what is in the
file XML4Pharma_Standard.json:
{
"xml4pharma_standard/1-0": ["XML4P1", "XML4P2" ]
}
I then noticed that, although my study has participants over 65 years old (DM.AGE, DM.AGEU=YEARS), no issues were reported for DM. Reason was that originally, my rule originally stated that this is an SDTMIG rule, instead of that it is a rule of my custom standard. So I corrected the YAML, and then used the "update-cache" "--update_custom_rule" command
.\core.exe update-cache --update_custom_rule D:\CDISC_Standards\CDISC_CORE_extensions\AGE_65.yaml --apikey {apikey}
leading to the message:
Updated rule: XML4P1
Cache updated successfully
I then ran:
.\core.exe validate -cs -s xml4pharma_standard -v 1-0 …
again, and indeed, the engine reported issues for all the subjects in DM that are over 65 years old:
Other important remarks:
* I used the CORE Windows distribution 0.10.0 for adding these custom rules. This will of course also work in the same way when using the native Python distribution: one just needs to replace ".\core.exe"
by "python core.py".
For Linux (and probably for Mac), one should just use "./core" instead of ".\core.exe".
* I haven't found out how one can run the CDISC/FDA rules together
with the custom rules in just one run. When I do so (if it is possible) I will update the blog.
Future work:
We recently developed software that uses the CDISC Library and its API to query the CDISC "Dataset specializations".
Essentially, these describe combinations of variables and their properties that belong together for individual cases.
For example, in SC (Subject Characteristics) , when SCTESTCD=MARISTAT (Marital Status), it is expected that the value of SCSTRESC (Standardized result) is one of:
[ANNULLED, DIVORCED, DOMESTIC PARTNER, INTERLOCUTORY, LEGALLY SEPARATED, MARRIED,
NEVER MARRIED, POLYGAMOUS, SEPARATED, 'WIDOWED],
which can be translated into a CORE check:
In the define.xml, this will usually be defined by a "ValueList" (yes, we also have already automated the generation of such valuelists using the CDISC Library API).
We are currently setting up a library (as a separate "standard") that we will add to the CORE engine as a set of custom rules. This will be explained in a separate blog.
Another interesting thought is about the use of RESTful web services (RWS) to have "on-the-fly" rules. For example, to test whether the values of LBTESTCD, LBTEST, LBMETHOD, LBRESTYP, LBRESSCL,
…, are in accordance with the value in LBLOINC.
We do already have the mappings between LOINC and SDTM for about 10,000 LOINC codes and an API
for it, so taken LBTESTCD, LBTEST, LBMETHOD, LBRESTYP, LBRESSCL, ... into account, this would lead to >40,000 rules.
Is that what we want? Probably not …
We will probably want have only 5-6 rules that take the value
of LBLOINC, then uses the RWS, and check whether the value in LBTESTCD matches. The same for LBTEST, LBMETHOD, etc.
One could also think about having something similar for checking e.g. whether the value in AEDECOD (which usually takes the "preferred term" in MedDRA) and the value in AELLT (MedDRA "Lower Level Term") match. However, in such a case, we cannot use a public RESTful Web Services due to license restrictions, but we can think of companies having their own (licensed) implementation of MedDRA as a central database, and having CORE rules that query this database, e.g. using an RWS.
Essentially, we are only at the beginning of exploring the opportunities that CORE, as Open Source software brings us. We can even start thinking about the use of "Dataset Specializations" and "CORE on the fly" during the development of the mappings. Image for example someone coding that the standardized result (--STRESC) for LBTESTCD=WEIGHT, is "cm", then the system would immediately protest and stating that only "kg" and "LB" are allowed.
The only real limitation is (as so often): lack of imagination ...