Thursday, August 20, 2015

EPOCH expected in most SDTM domains - why that is nonsense

The FDA is over and over again complaining about the file sizes of SDTM submissions, and forcing sponsors to squeeze these files as much as possible. However, the FDA does not allow us to submit zipped files, but they cannot give us a reasonable explanation why not either.
On the other hand, we see that each time a new SDTM-IG is published, additional derived variables have been added - all on request of the FDA. So no wonder files become so large ...
Some time ago, a lot of new --DY (study day) variables were added and made "expected" and it is now also expected that the "the EPOCH variable is included for every clinical subject-level observation".

Now both --DY as EPOCH are derived. --DY (study day) can easily be calculated from the --DTC (date/time of collection or observation) and the RFSTDTC (reference start date/time) in the DM (demographics) domain. Similarly, EPOCH can easily be calculated from the --DTC and the records for that subject in the SE (Subject Elements) domain.

So why does the FDA then still insist that as well --DY as EPOCH is delivered for each observation although it blows up the size of the datasets? Can't the tools of the FDA reviewers calculate the --DY values and the EPOCH "on the fly"?

Some time ago, we developed the "Smart Dataset-XML Viewer", a free and open-source tool for inspecting SDTM, SEND and ADaM files in the new CDISC Dataset-XML format. The Viewer has a good number of great features for exactly doing what the FDA's tools cannot accomplish. It then was a big surprise to us when we were told that most reviewers choose not to use the viewer during the "FDA Dataset-XML pilot". Most of them preferred the (statistical analysis) tools that they always have been working with.

I recently added the newest feature to the "Smart Dataset-XML viewer" which is the display of the (on-the-fly) lookup of the EPOCH and the ELEMENT on each --DTC value. It took me just two evenings to implement that. What the tool is doing is for each --DTC value, it picks up the USUBJID and then compares the date/time with the SESTDTC (Start Date/Time of Element) and SEENDTC (End Date/Time of Element). When the --DTC value falls between the start and end date, the element code (ETCD) is retrieved as well as the value for EPOCH.

Here is a screenshot of a result (here for an observation in the LB domain):

with a detail snapshot:

showing that for the date/time of specimen collection "2014-01-16T13:17" (January 16th 2014 at 1:17 pm) the study day (LBDY) is 15 (15th day of the study for this subject) and the corresponding Element is "PBO" and the corresponding EPOCH is the "blinded treatment" (other epochs are "SCREENING" and "FOLLOW-UP). These values were NOT taken from the dataset, they were calculated "on-the-fly" from the RFSTDTC and the records for this subject in the SE dataset:

This shows that the obligation of the FDA to add --DY and EPOCH to each record is nonsense.
It can easily be done by viewing or analysis tools. The "on-the-fly" calculation even considerably improves data quality.

This nice little feature (again, programmed in two short evenings time) has some further implications. For that, let us have a look at the AE (adverse events). The originally captured dates are AESTDTC (start date/time of adverse event) and AEENDTC (end date/time of adverse event). That's it.
The FDA also requires to submit AESTDY (study day of start of adverse event) and AEENDY (study day of end of adverse event), unnecessarily blowing up the file size again, and even worse, introducing data redundancy. But what about the EPOCH?
Which EPOCH? That for the start of the adverse event? Of the end of it? Or maybe of the date/time of the collection of the AE? It is not possible to add 3 extra columns with EPOCH, or should we add new variables in SUPPAE, for example STAEPOCH, ENDEPOCH? The FDA doesn't tell us.
But of course, for each of them, the "Smart Dataset-XML Viewer" can calculate them "on-the-fly". Here are some screenshots:

First for AESTDTC:


and for AEDTC:

stating that the AE started on study day 3 (element PBO, epoch BLINDED TREATMENT) and ended on study day 199 (element FOLO, epoch FOLLOW-UP) and was captured on day 23 (element PBO, Epoch BLINDED TREATMENT).

So, with this feature, reviewers can easily find out in what study day and in which element and epoch the adverse event started, ended, and was captured in the EDC system, WITHOUT having the --DY and EPOCH variables in the dataset itself.

I will soon upload the updated "Smart Dataset-XML Viewer" programm and source code to the Sourceforge website from which anyone can download them.


  1. As a follow-up...
    This week I was working on an SDTM submission for a customer. When validating the submission with the software tool the FDA is using (Pinnacle21) I noticed that when one EPOCH in a findings domain, one gets the warning SD1076/FDAC031 "Model permissible variable added into standard domain". When one does NOT have EPOCH in a findings domain, one gets the warning SD1077/FDAC021 "FDA Expected variable not found".

    So, whatever one is doing (having EPOCH in or not), one ALWAYS gets a warning. Does this make sense?

  2. This comment has been removed by the author.

  3. Hello Jozef,

    I really admire your work.

    I have bit of confusion in conmed/AE related data where event starts before informed consent and then event is still ongoing at the end of the study then what will be the epoch? (missing?).

    Similarly, if event start before Informed consent and ends during study parrticipation (e.g. Treatment epoch)then what what will be EPOCH (missing?)

    Also about partial dates I have seen in many places people mentions if dates are partial then EPOCH will be missing.

    Can you direct me to appropiate source where I can get information related this topic.

    Thank you.


    1. Dear Suhel,

      In my understanding, the epoch is about the time of the observation or collection of the information, not about when the AE or CM started or ended.
      Whether it is useful for the reviewer to know whether the CM that was started before the study and was continued after the study was collected during the "screening" or "treatment" epoch, is something I doubt. I think this also resolved the question about partial dates as these should only occur for events that happened (long) before the study started.