Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating time to treat period for event study analysis

    Hello,

    I am having an issue with my data. To explain it in simple language, I have created a simple example data set here:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(Family_id Year had_new_kid)
    1 2000    .
    2 2000    .
    3 2000    .
    4 2000 2000
    5 2000    .
    1 2001 2001
    2 2001    .
    3 2001    .
    4 2001    .
    5 2001    .
    1 2002    .
    2 2002    .
    3 2002    .
    4 2002    .
    5 2002 2002
    end

    Here I have a panel data set on five families and from year 2000 to 2002. Now, I have the year when they have their first kid. Now I want to run an event study analysis, where I have to create leads and lags period for the families after and from having the first kid. Therefore, I need to create time to treat variable. Like this:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(Family_id Year had_new_kid time_to_treat)
    1 2000    . 2001
    2 2000    .    .
    3 2000    .    .
    4 2000 2000 2000
    5 2000    . 2002
    1 2001 2001 2001
    2 2001    .    .
    3 2001    .    .
    4 2001    . 2000
    5 2001    . 2002
    1 2002    . 2001
    2 2002    .    .
    3 2002    .    .
    4 2002    . 2000
    5 2002 2002 2002
    end
    Can anyone please help me with the codes here? Also, what should I do with the families (like families 2 & 3) for whom the event of having kids never occurs.

    Thank you!


  • #2
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(Family_id Year had_new_kid)
    1 2000    .
    2 2000    .
    3 2000    .
    4 2000 2000
    5 2000    .
    1 2001 2001
    2 2001    .
    3 2001    .
    4 2001    .
    5 2001    .
    1 2002    .
    2 2002    .
    3 2002    .
    4 2002    .
    5 2002 2002
    end
    
    bys Family_id (had_new_kid): gen wanted= had_new_kid[1]
    Res.:

    Code:
    . sort Year Family_id
    
    . l, sep(0)
    
         +-------------------------------------+
         | Family~d   Year   had_ne~d   wanted |
         |-------------------------------------|
      1. |        1   2000          .     2001 |
      2. |        2   2000          .        . |
      3. |        3   2000          .        . |
      4. |        4   2000       2000     2000 |
      5. |        5   2000          .     2002 |
      6. |        1   2001       2001     2001 |
      7. |        2   2001          .        . |
      8. |        3   2001          .        . |
      9. |        4   2001          .     2000 |
     10. |        5   2001          .     2002 |
     11. |        1   2002          .     2001 |
     12. |        2   2002          .        . |
     13. |        3   2002          .        . |
     14. |        4   2002          .     2000 |
     15. |        5   2002       2002     2002 |
         +-------------------------------------+
    
    .

    Also, what should I do with the families (like families 2 & 3) for whom the event of having kids never occurs.
    I am not sure. What is your outcome variable? If it is time to having kids, then such observations may be treated as right-censored as some will eventually have kids (you just have not observed this event). You may want to check out what other studies have done in the past.
    Last edited by Andrew Musau; 07 Sep 2023, 07:46.

    Comment


    • #3
      Thank you so much for the code! It worked perfectly! I really appreciate the help.

      My outcome variable is the family income, and I want to see how it gets affected before and after having a kid. There are some families in my data set, like families 2 & 3, who never have kids throughout the period. Therefore, I wondered if I should drop these families from my data set.

      Thanks!

      Comment


      • #4
        Dear Andrew Musau,

        I might sound silly, but for my future help, could you please explain why you put 1 in the third bracket in your command?

        Here,

        Code:
         bys Family_id (had_new_kid): gen wanted= had_new_kid[1]
        Thank you!

        Comment


        • #5
          My outcome variable is the family income, and I want to see how it gets affected before and after having a kid. There are some families in my data set, like families 2 & 3, who never have kids throughout the period.
          If you were looking at a count, e.g., number of kids, you could include families with no kids. But since you state that the variable is "time to first kid", then there is no way to calculate this value for families without kids. You can drop them, but do mention that you have done so and include both the frequency and percentage of such families.

          could you please explain why you put 1 in the third bracket in your command?
          The following reads like this:

          bys Family_id (had_new_kid): gen wanted= had_new_kid[1]
          Construct groups of "Family_id" and sort by "had_new_kid". As "had_new_kid" is a numerical variable, Stata sorts from smallest to largest. So with the command

          gen wanted= had_new_kid[1]
          I am instructing Stata for each Family_id group, pick the first sorted value of "had_new_kid", which is referenced as "had_new_kid[1]". The second sorted value is "had_new_kid[2]", the last sorted value is "had_new_kid[_N]". So this guarantees that I have the first time the family had a kid as this is the earliest year that a kid in the family was born.





          Comment


          • #6
            Thank you so much for your help, Andrew Musau!

            Comment


            • #7
              Hello!

              I am having another problem here! Please have a look at the data set below; notice that family_id 4 had kid again at year 2003. Can the previous code still generate the year when they have their first kid? which would still be 2000. In this case I tried the code mentioned above, it worked! However, I still need to determine the families like 4 here, which had kids more than once.

              Code:
              * Example generated by -dataex-. For more info, type help dataex
              clear
              input float(Family_id Year had_new_kid)
              1 2000    .
              2 2000    .
              3 2000    .
              4 2000 2000
              5 2000    .
              1 2001 2001
              2 2001    .
              3 2001    .
              4 2001    .
              5 2001    .
              1 2002    .
              2 2002    .
              3 2002    .
              4 2002    .
              5 2002 2002
              1 2003    .
              2 2003    .
              3 2003    .
              4 2003 2003
              5 2003    .
              end


              Could anyone please help?

              Thank you so much!

              Comment

              Working...
              X