Yesterday and today, I worked on implementing the new SDTM oncology and a number of the new draft SDTM-IG 3.1.4 domains in my SDTM-ELT(TM) software. For each new domain, I created a template define.xml file that can be read by the software.
It was so frustrating!
The new domains come as a PDF file with tables and additional information. During creation of the templates I continously needed to switch between the PDF and between the (smart) XML editor I am using. I needed to copy the variable name, role and datatype and then paste it into the template. I then also needed to add whether the variable is mandatory (required/expected) or not (permissible).
The validation whether everything was correctly implemented was purely visual, comparing a view of the machine-readable template with the tables in the PDF file. Another frustrating task...
In earlier days, the SDTM team also published these tables in an Excel file. Although not a vendor-neutral format, this Excel file at least allowed to automate some things. The way I proceeded was as follows:
First I read the Excel file into OpenOffice Calc (a competitor of Excel). Then I exported the tables as an OpenOffice "odf" file, which is essentially a zip file containing a set of XML files. After renaming and unzipping, one needs to take one of these XML files, and transform it to CDISC define.xml using a stylesheet.
Although each new publication of the SDTM team had differently formatted Excel files, meaning that I needed to adapt the stylesheet to transform to define.xml each time, it was a very good way, and I could generate the templates in a few hours.
The CDISC SDTM team does not publish their tables in a machine-readable format anymore. I do not know why, because I cannot imagine that the team is not using any computer tools to help in generating the tables. So why not publish them in a machine-readable format?
SDTM table definitions in a machine-readable format - that would be cool!