Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating variable in long dataset for first treatment

    Hi, I'm using Stata14.2 and I am struggling to create the variable in the manner that I want it.

    I have a large panel dataset with observations for each account in each month.
    The issue I'm having is that the households were given a 'treatment' but some of them received their treatment in November while others received theirs at other times. There is also a control group, who obviously didn't receive any insert.

    Now I am trying to create a single variable to tell when each household received its first insert.
    Thus, 0 for never received an insert (i.e control), 1 for received insert in November, 2 for December, and so on.

    The code I was using before was this (and obviously it does not produce the correct variable):
    Code:
    bysort acc_no: gen first_insert=0
    bysort acc_no: replace first_insert=1 if insertNOV==1
    bysort acc_no: replace first_insert=2 if insertDEC==1
    bysort acc_no: replace first_insert=3 if insertJAN==1
    bysort acc_no: replace first_insert=4 if insertFEB==1
    bysort acc_no: replace first_insert=5 if insertMAR==1
    bysort acc_no: replace first_insert=6 if insertAPR==1
    bysort acc_no: replace first_insert=7 if insertMAY==1
    Here is an example of the dataset (I can't share account numbers for privacy concerns, but they are all numeric, meter_no is just to show the repeats):

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str16 meter_no str8 month float(insertNOV insertDEC insertJAN)
    "CPUP963"  "November" 0 0 0
    "CPUP963"  "December" 0 0 0
    "CPUP963"  "January"  0 0 0
    "CPUP963"  "January"  0 0 1
    "CPUP963"  "February" 0 0 0
    "CPUP963"  "March"    0 0 0
    "CPUP963"  "April"    0 0 0
    "CPUP963"  "May"      0 0 0
    "CMUJ745"  "October"  0 0 0
    "CMUJ745"  "November" 0 0 0
    "CMUJ745"  "December" 0 0 0
    "CMUJ745"  "January"  0 0 0
    "CMUJ745"  "February" 0 0 0
    "CMUJ745"  "May"      0 0 0
    "CMUJ745"  "October"  0 0 0
    "CMUJ745"  "November" 0 0 0
    "CMUJ745"  "December" 0 1 0
    "CMUJ745"  "January"  0 0 1
    "CMUJ745"  "February" 0 0 0
    "CMUJ745"  "March"    0 0 0
    "CMUJ745"  "May"      0 0 0
    "CUWB154"  "December" 0 0 0
    "CUWB154"  "January"  0 0 0
    "CUWB154"  "March"    0 0 0
    "CUWB154"  "April"    0 0 0
    "CUWB154"  "October"  0 0 0
    "CUWB154"  "November" 0 0 0
    "CUWB154"  "December" 0 0 0
    "CUWB154"  "January"  0 0 0
    "CUWB154"  "February" 0 0 0
    "CUWB154"  "May"      0 0 0
    "CRAJ7386" "October"  0 0 0
    "CRAJ7386" "November" 0 0 0
    "CRAJ7386" "December" 0 0 0
    "CRAJ7386" "January"  0 0 0
    "CRAJ7386" "February" 0 0 0
    "CRAJ7386" "March"    0 0 0
    "CRAJ7386" "April"    0 0 0
    "CRAJ7386" "May"      0 0 0
    "CRAJ7386" "October"  0 0 0
    "CRAJ7386" "November" 1 0 0
    "CRAJ7386" "December" 0 1 0
    "CRAJ7386" "January"  0 0 1
    "CRAJ7386" "April"    0 0 0
    "CRAJ7386" "May"      0 0 0
    "211885"   "December" 0 0 0
    "211885"   "January"  0 0 0
    "211885"   "February" 0 0 0
    "211885"   "March"    0 0 0
    "211885"   "April"    0 0 0
    "211885"   "May"      0 0 0
    "211885"   "October"  0 0 0
    "211885"   "December" 0 1 0
    "211885"   "April"    0 0 0
    "211885"   "May"      0 0 0
    "806975"   "November" 0 0 0
    "806975"   "December" 0 0 0
    "806975"   "January"  0 0 0
    "806975"   "February" 0 0 0
    "806975"   "March"    0 0 0
    "806975"   "April"    0 0 0
    "806975"   "May"      0 0 0
    "806975"   "October"  0 0 0
    "806975"   "November" 1 0 0
    "806975"   "December" 0 1 0
    "806975"   "January"  0 0 0
    "806975"   "February" 0 0 0
    "806975"   "March"    0 0 0
    "806975"   "April"    0 0 0
    "806975"   "May"      0 0 0
    "87596"    "May"      0 0 0
    "87596"    "November" 1 0 0
    "87596"    "May"      0 0 0
    "76194"    "October"  0 0 0
    "76194"    "November" 0 0 0
    "76194"    "December" 0 0 0
    "76194"    "January"  0 0 0
    "76194"    "February" 0 0 0
    "76194"    "March"    0 0 0
    "76194"    "April"    0 0 0
    "76194"    "October"  0 0 0
    "76194"    "November" 1 0 0
    "76194"    "December" 0 1 0
    "76194"    "January"  0 0 1
    "76194"    "February" 0 0 0
    "76194"    "March"    0 0 0
    "76194"    "April"    0 0 0
    "76194"    "May"      0 0 0
    "907461"   "October"  0 0 0
    "907461"   "January"  0 0 0
    "907461"   "February" 0 0 0
    "907461"   "March"    0 0 0
    "907461"   "April"    0 0 0
    "907461"   "May"      0 0 0
    "907461"   "February" 0 0 0
    "907461"   "March"    0 0 0
    "907461"   "April"    0 0 0
    "907461"   "May"      0 0 0
    "CMDC605"  "October"  0 0 0
    "CMDC605"  "November" 0 0 0
    end
    Last edited by Max Baard; 16 Sep 2022, 04:50.

  • #2
    Try this. I wasn't sure of the order of months -- do they start in November? or in October? You will need to change the label accordingly.

    Code:
    label define months 1 "November" 2 "December" 3 "January" 4 "February" 5 "March" 6 "April" 7 "May" 8 "June" 9 "July" 10 "August" 11 "September" 12 "October"
    encode month, gen(month_num) label(months)
    bysort meter_no (month_num): gen first_insert = month[1]
    drop month_num
    This of course also assumes that everything happens within a one year period, and month names correspond uniquely to a year. If not, we will need to bring in a year variable, and it may be better to construct a single numeric monthly date variable to do the job.
    Last edited by Hemanshu Kumar; 16 Sep 2022, 05:41.

    Comment


    • #3
      Originally posted by Hemanshu Kumar View Post
      Try this. I wasn't sure of the order of months -- do they start in November? or in October? You will need to change the label accordingly.

      Code:
      label define months 1 "November" 2 "December" 3 "January" 4 "February" 5 "March" 6 "April" 7 "May" 8 "June" 9 "July" 10 "August" 11 "September" 12 "October"
      encode month, gen(month_num) label(months)
      bysort meter_no (month_num): gen first_insert = month[1]
      drop month_num
      Hi Hermanshu, sorry I'm realising I should have been slightly more clear. I have data for multiple years (hence some months appearing to be there more than one time).
      For clarity,

      Code:
      input str16 meter_no float(date insertNOV insertDEC insertJAN)
      "CPUP963"   2 0 0 0
      "CPUP963"   3 0 0 0
      "CPUP963"  12 0 0 1
      "CPUP963"   4 0 0 0
      "CPUP963"  13 0 0 0
      "CPUP963"  14 0 0 0
      "CPUP963"  15 0 0 0
      "CPUP963"  16 0 0 0
      "CMUJ745"  10 0 0 0
      "CMUJ745"   2 0 0 0
      "CMUJ745"   3 0 0 0
      "CMUJ745"  11 0 1 0
      "CMUJ745"   4 0 0 0
      "CMUJ745"  12 0 0 1
      "CMUJ745"  13 0 0 0
      "CMUJ745"   5 0 0 0
      "CMUJ745"  14 0 0 0
      "CMUJ745"  16 0 0 0
      "CMUJ745"   8 0 0 0
      "CMUJ745"   1 0 0 0
      "CMUJ745"   9 0 0 0
      "CUWB154"  10 0 0 0
      "CUWB154"   3 0 0 0
      "CUWB154"  11 0 0 0
      "CUWB154"   4 0 0 0
      "CUWB154"  12 0 0 0
      "CUWB154"  13 0 0 0
      "CUWB154"   6 0 0 0
      "CUWB154"   7 0 0 0
      "CUWB154"  16 0 0 0
      "CUWB154"   9 0 0 0
      "CRAJ7386" 10 1 0 0
      "CRAJ7386"  2 0 0 0
      "CRAJ7386"  3 0 0 0
      "CRAJ7386" 11 0 1 0
      "CRAJ7386" 12 0 0 1
      "CRAJ7386"  4 0 0 0
      "CRAJ7386"  5 0 0 0
      "CRAJ7386"  6 0 0 0
      "CRAJ7386" 15 0 0 0
      "CRAJ7386"  7 0 0 0
      "CRAJ7386"  8 0 0 0
      "CRAJ7386" 16 0 0 0
      "CRAJ7386"  9 0 0 0
      "CRAJ7386"  1 0 0 0
      end
      label values date date
      label def date 1 "2014 OCT", modify
      label def date 2 "2014 NOV", modify
      label def date 3 "2014 DEC", modify
      label def date 4 "2015 JAN", modify
      label def date 5 "2015 FEB", modify
      label def date 6 "2015 MAR", modify
      label def date 7 "2015 APR", modify
      label def date 8 "2015 MAY", modify
      label def date 9 "2015 OCT", modify
      label def date 10 "2015 NOV", modify
      label def date 11 "2015 DEC", modify
      label def date 12 "2016 JAN", modify
      label def date 13 "2016 FEB", modify
      label def date 14 "2016 MAR", modify
      label def date 15 "2016 APR", modify
      label def date 16 "2016 MAY", modify
      label var meter_no "METER_NO" 
      label var date "group(year Month)"

      Unfortunately, this isn't quite what I'm looking to do, as this labels the variable "first_insert" by the month label of the first time that the account appears in the dataset rather than when they receive their first insert (which would be when insertNOV==1 then first_insert==1 when insertNOV==0 and then insertDEC==1 then first_insert==2, etc.).

      I need it as (this is written by hand here for ease of comprehension, as I fear that I may not be explaining it very well):
      Where first_insert==3 for acc_no "1234" because they received their first insert in January, acc_no "9876" is 1 because they received it in November.
      Code:
      meter_no insertNOV insertDEC insertJAN first_insert
      "1234"   0  0  0  3
      "1234"   0  0  0  3
      "1234"   0  0  1  3
      "1234"   0  0  0  3
      "1234"   0  0  0  3
      "9876"   1  0  0  1
      "9876"   0  1  0  1
      "9876"   0  0  1  1
      "9876"   0  0  0  1
      "9876"   0  0  0  1
      "5678"   0  1  0  2
      "5678"   0  0  0  2
      "5678"   0  0  0  2
      "5678"   0  0  0  2
      "5678"   0  0  0  2
      "5678"   0  0  0  2

      Comment


      • #4
        Max Baard this data seems to be a result of some prior reshaping etc. It might actually be easier to work with the original data (including variables on year and month rather than the date which has been created, as well the original variable(s) that were used to create the insert* variables). Could you give us that?

        Comment


        • #5
          With the data in its existing form, see if this does what you want?
          Code:
          sort meter_no
          foreach var of varlist insert* {
              by meter_no: egen _`var' = max(`var') 
          }
          
          gen byte first_insert = cond(_insertNOV == 1,1,cond(_insertDEC == 1,2,cond(_insertJAN == 1,3,0)))
          drop _insert*

          Comment


          • #6
            @Hermanshu Kumar yes, this data has gone through an extensive reshaping process. In its original form, it is split up into roughly 16 datasets for each month and it takes days to regenerate the main dataset (it is huge).
            I am really trying to not have to go back and recreate it all...
            I know that what I am trying to do is tricky since I have been attempting it for the past couple of hours and trying to find information online to no avail.

            I am multiple did regression on this data and I want to check that my ATTs aren't biased (as recent developments from Baker et al. (2022), Callaway and Sant'Anna (2021), & Sun and Abraham (2021) show that they do under specific circumstances).



            Comment


            • #7
              Originally posted by Hemanshu Kumar View Post
              With the data in its existing form, see if this does what you want?
              Code:
              sort meter_no
              foreach var of varlist insert* {
              by meter_no: egen _`var' = max(`var')
              }
              
              gen byte first_insert = cond(_insertNOV == 1,1,cond(_insertDEC == 1,2,cond(_insertJAN == 1,3,0)))
              drop _insert*

              Unfortunately, this produces the exact same result as my original code.

              EDIT:

              This is correct, I made an error when adjusting it to code for every month that an insert was sent out.

              Thank you very much Hermanshu, I greatly appreciate the help.
              Last edited by Max Baard; 16 Sep 2022, 08:21. Reason: I made a mistake

              Comment


              • #8
                Max Baard I'm afraid then I need you to help me help you. I just ran my code on your toy example, and it produces exactly the first_insert column you created. What do you need different?

                Comment

                Working...
                X