Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Pregnancy start date

    Variables I have: estimated date of delivery as date with 15 missing values (format %tcDD.NN.YY), gestational length as ga_weeks in weeks (format %8.2f) with 1324 missing values and gestational age at inclusion (at the time of participation in study) with 15 missing values (format %8.2f). I want to generate a pregnancy start date in stata. How can i do that? Help is highly apprecaited

  • #2
    Variables I have: estimated date of delivery as date with 15 missing values (format %tcDD.NN.YY)
    Are you sure that's what you have? If so, when you look at the values of the estimated date of delivery, do they make sense? I raise this because format %tcDD.NN.YY means that what you have is not a date variable but a date and time variable and you have chosen to only display the date part and ignore the time part. This is important, because the code you need will differ depending on which kind of variable you have.

    To avoid any uncertainty about this, please post back showing example data, using the -dataex- command. If you are running version 18, 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    There is another, substantive, question. There are two ways you can try to estimate the onset of pregnancy. On the one hand, the average pregnancy duration is 280 days, so you can just look back 280 days from the estimated date of delivery. On the other hand, you can count back by the number of weeks gestational age at study inclusion from the date of study inclusion. Given the missing values in your data, there probably will be some women for whom only one of these approaches can be undertaken. But for some women all three variables will be non-missing and both approaches will be possible. These may not always give the same results. What do you want to do when they conflict? Is there one you might consider more reliable than the other? Or do you want to take the midpoint between them? Or the earlier of the two? Or the later? Or something else?

    Comment


    • #3
      Thanks, Clyde for this insightful answer. Unfortunately, I can't use dataex command and share the data example as it is on an external computer.

      Here's sample data:
      Est_date_delivery GA_weeks GA_inclusion Survey_completion_date
      10.01.12 39.00 24.00 26-Sep-2011
      28.01.12 40.00 12.00 18-Jul-2011
      02.02.12 41.00 11.00 02-Aug-2011
      04.02.12 38.00 11.00 14-Aug-2011
      Since there's a lot of missingness in GA_weeks, I totally agree with your suggestion to use survey_completion_date. Can you kindly guide me on how can I generate a pregnancy start date from GA_weeks and if it is missing for an observation using GA_inclusion & Survey_completion_date instead? Many thanks!!

      Comment


      • #4
        The problem with the tableau you have shown for example data is that it is not possible to be certain what type of variable Est_date_delivery is. You have shown what it looks like, but it could, underlyingly, still be either a date variable or a date and time variable, and the code is different. Given that you said that it is formatted %tcDD.NN.YY, if it looks like what you showed, it pretty much has to be a date and time variable, because date variable with a %tc format would look pretty bizarre. But if I'm wrong, then this code will produce bizarre results. I'm similarly assuming Survey_completion_date is underlyingly a date and time variable.

        Code:
        gen pregnancy_onset_1 = dofc(Est_date_delivery) - 280
        format pregnancy_onset_1 %td
        label var pregnancy_onset_1 "Pregnancy onset estimated from delivery date"
        
        gen pregnancy_onset_2 = dofc(Survey_completion_date) - 7*GA_inclusion
        format pregnancy_onset_2 %td
        label var pregnancy_onset_2 "Pregnancy onset estimated from survey date and GA at survey"
        
        gen pregnancy_onset_3 = dofc(Est_date_delivery) - 7*GA_weeks
        format pregnancy_onset_3 %td
        label var pregnancy_onset_3 "Pregnancy onset estimated from delivery date and GA at delivery"
        
        gen wanted = pregnancy_onset_3
        replace wanted = pregnancy_onset_2 if missing(wanted)
        replace wanted = pregnancy_onset_1 if missing(wanted)
        Note: If this code produces bizarre results (pregnancy onset dates that are clearly off by many decades), then it is likely because Est_date_delivery or Survey_completion_date are not date and time variables as I assumed. If they are date variables instead, then remove dofc() from the commands.

        Comment


        • #5
          thanks a lot! It worked by removing dofc

          Comment

          Working...
          X