Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to create multiple dummy variables using loops

    Hi everyone,

    In my dataset I have the date when certain loans where given, and using the date commands I managed to create one variable for each part of the date (variables: day month year). So now I need a dummy variable for each combination of month and year variable (f.e dummy1=Month1Year1, dummy2=Month2Year1, ..., dummy155=Month12Year13). With the tabulate command I managed to create the dummies for the 12 months and the dummies for the 13 years, but now I have problems in order to create the 155 dummies combination variables. Does somebody know how to do it? I've tried it with foreaches loops but it gives me errors every time.

    Thanks in advance

  • #2
    Code:
    . // I assume that this is the kind of data you started with
    . clear
    
    . input str17 date    y
    
                      date          y
      1. "30 August 2021"    1
      2. "31 August 2021"    3
      3. "01 September 2021" 4
      4. "02 September 2021" 2
      5. "30 August 2022"    4
      6. "31 August 2022"    6
      7. "01 September 2022" 8
      8. "02 September 2022" 7
      9. end
    
    .
    . // you turned it into a Stata date
    . gen statadate = date(date, "DMY")
    
    . format statadate %td
    
    . list
    
         +-----------------------------------+
         |              date   y   statadate |
         |-----------------------------------|
      1. |    30 August 2021   1   30aug2021 |
      2. |    31 August 2021   3   31aug2021 |
      3. | 01 September 2021   4   01sep2021 |
      4. | 02 September 2021   2   02sep2021 |
      5. |    30 August 2022   4   30aug2022 |
         |-----------------------------------|
      6. |    31 August 2022   6   31aug2022 |
      7. | 01 September 2022   8   01sep2022 |
      8. | 02 September 2022   7   02sep2022 |
         +-----------------------------------+
    
    .
    . // What you wanted is to loose the day information
    . gen mdate = mofd(statadate)
    
    . list
    
         +-------------------------------------------+
         |              date   y   statadate   mdate |
         |-------------------------------------------|
      1. |    30 August 2021   1   30aug2021     739 |
      2. |    31 August 2021   3   31aug2021     739 |
      3. | 01 September 2021   4   01sep2021     740 |
      4. | 02 September 2021   2   02sep2021     740 |
      5. |    30 August 2022   4   30aug2022     751 |
         |-------------------------------------------|
      6. |    31 August 2022   6   31aug2022     751 |
      7. | 01 September 2022   8   01sep2022     752 |
      8. | 02 September 2022   7   02sep2022     752 |
         +-------------------------------------------+
    
    .
    . // add some formating you it reads easier
    . format mdate %tm
    
    . list
    
         +--------------------------------------------+
         |              date   y   statadate    mdate |
         |--------------------------------------------|
      1. |    30 August 2021   1   30aug2021   2021m8 |
      2. |    31 August 2021   3   31aug2021   2021m8 |
      3. | 01 September 2021   4   01sep2021   2021m9 |
      4. | 02 September 2021   2   02sep2021   2021m9 |
      5. |    30 August 2022   4   30aug2022   2022m8 |
         |--------------------------------------------|
      6. |    31 August 2022   6   31aug2022   2022m8 |
      7. | 01 September 2022   8   01sep2022   2022m9 |
      8. | 02 September 2022   7   02sep2022   2022m9 |
         +--------------------------------------------+
    
    .
    . // but the formating only changes what you see, not what the computer sees
    . // underneath it is still some integers
    . // As long as your dates are all after 1960, the integers will all be non-neg
    > ative
    . // so you can just use it directly as a factor variable, and there is no need
    . // to create dummies yourself.
    .
    . reg y i.mdate
    
          Source |       SS           df       MS      Number of obs   =         8
    -------------+----------------------------------   F(3, 4)         =      7.26
           Model |      35.375         3  11.7916667   Prob > F        =    0.0428
        Residual |         6.5         4       1.625   R-squared       =    0.8448
    -------------+----------------------------------   Adj R-squared   =    0.7284
           Total |      41.875         7  5.98214286   Root MSE        =    1.2748
    
    ------------------------------------------------------------------------------
               y | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
           mdate |
            740  |          1   1.274755     0.78   0.477    -2.539287    4.539287
            751  |          3   1.274755     2.35   0.078    -.5392869    6.539287
            752  |        5.5   1.274755     4.31   0.013     1.960713    9.039287
                 |
           _cons |          2   .9013878     2.22   0.091    -.5026538    4.502654
    ------------------------------------------------------------------------------
    
    .
    . // as an extra you may want to add value labels
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      Oh, thank you so much Maarten, really appreciate it. In the case we wanted the dummy variables would it be something like this?

      foreach var of varlist mdate {
      g d`var' = 0
      replace d`var'=1 if mdate==format loandate %tm
      }

      PS: Loandate is the original variable we are trying to get the month-year variables

      Comment


      • #4
        #3 is some way from where you want. First off, the loop itself is a loop over one variable, not its distinct values, so legal but pointless. Second off, the syntax within the loop will make no sense to Stata.

        As Maarten Buis implies, you don't need the dummy variables to exist for most Stata purposes. If you wanted them, a command could be

        Code:
        tab mdate, gen(mdate) 

        Comment

        Working...
        X