Wednesday, January 20, 2016

Airport security, toothpaste and validation rules

Last week, I traveled to Geneva for an E3C (CDISC European Coordination Committee) meeting to prepare the upcoming CDISC Interchange in Vienna.
At the airport, my almost-empty tube of toothpaste was confiscated. Reason: although there was only about 10ml of toothpaste in it, the tube was marked to be 125ml, which was 25ml of non-volume too much.

Conclusion: "10 ml of toothpaste in a 125ml tube is much more dangerous than 99ml of nitroglycerin in an 100ml tube". Logic, isn't it? I also had a 250g pack of filled (with liquid) chocolate in my hand luggage, but that was not confiscated.

I checked the rules, and indeed, you may not take an (even empty) 125ml tube through security.

Why such illogical rules? The only thing I can imagine is that such (stupid) rules can also be handled by stupid people. "Swich off your brain"! Do not force security people to have to think, and surely not do allow them to do risk assessment...

So, what does this all have to do with CDISC validation rules? Everything.

Yesterday,  I validated an SDTM dataset that I consider completely valid (it was published by CDISC and used in the second FDA SDTM pilot). For SUPPLB, I got:


For each record, I got 2 errors, one saying that there are unexpected characters, one saying that the value if IDVARVAL does not point to a record in LB.xpt.

Imagine that you show these error messages to your regulatory people - I think they would go crazy...

So, what happened?

When looking into the validation results details, I found e.g.:


I presume, that IDVARVAL was originally programmed to be an integer, and then transformed to "char", as this is a requirement for IDVARVAL. During this transformation, the text was left-padded with blanks, instead of right-padded (remark that when using Dataset-XML, there would be no padding at all). So the validation tool finds a blank as the first character, and marks this as an error.
In the SAS Viewer however, it is however presented as:


Everything seeming to be OK (the viewer seems to be smarter than the validation tool...).

Even worse is that the consequence of this is that the validator also marks this is a "invalid referenced record", because it isn't smart enough to understand that e.g. "       6" is semantically the same as "6".
So, due to a stupid rule (FDAC216), implemented in a stupid (strict) way, I got 64403 warnings and 64403 errors. Thank you!

Back to our toothpaste and risk assessment. What is the risk that a modern software cannot connect the SUPPLB records to their corresponding LB records because of this padding? Almost zero isn't it? In modern software, the first thing that is done when reading a string from file, is trimming it (removing blanks at start and end). At least I always do so in my software. I presume SAS software or any other modern review software will exactly do the same.

What I mean is that we should start thinking about validation as being risk assessment. It is of course however much easier to develop stupid rules, assuming that people doing clinical research are stupid and that their software tools are dumm. Do we accept such insults?










1 comment:

  1. Hi Jozef, I have to disagree with you. You are saying that the software should remove leading blanks. How about curly quotes? Should they be considered the same as straight quotes? Or non-breaking spaces? I do not think it is the responsibility of importing software to "correct" the incoming data. Even when using Dataset-XML, " 6" will not be the same as "6". It may be semantically the same, but not in XML.

    ReplyDelete