Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dummy regression for multiple variables

    I have 50 internet search variables that I want to regress on the day of the week and month dummies to remove the seasonality of the search words. How do I input this into STATA in one go rather than doing it for each singular word?

  • #2
    See

    Code:
    help varlist
    where syntax such as below is allowed:

    Code:
    set seed 12132022
    set obs 200
    foreach var in y x1 x2 z1 b4 g3 r2{
        gen `var'= rnormal()<0.5
    }
    regress y i.(x1-r2)
    Res.:

    Code:
    . regress y i.(x1-r2)
    
          Source |       SS           df       MS      Number of obs   =       200
    -------------+----------------------------------   F(6, 193)       =      0.92
           Model |  1.21345568         6  .202242614   Prob > F        =    0.4798
        Residual |  42.3065443       193  .219204893   R-squared       =    0.0279
    -------------+----------------------------------   Adj R-squared   =   -0.0023
           Total |       43.52       199  .218693467   Root MSE        =    .46819
    
    ------------------------------------------------------------------------------
               y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
            1.x1 |  -.0172137   .0709231    -0.24   0.808    -.1570975    .1226702
            1.x2 |   .0252226   .0726676     0.35   0.729     -.118102    .1685472
            1.z1 |   -.116466   .0748482    -1.56   0.121    -.2640915    .0311594
            1.b4 |  -.0016491   .0737578    -0.02   0.982    -.1471239    .1438257
            1.g3 |  -.0869167   .0695176    -1.25   0.213    -.2240284     .050195
            1.r2 |   .0893662   .0725014     1.23   0.219    -.0536305     .232363
           _cons |    .751469   .1282551     5.86   0.000     .4985075    1.004431
    ------------------------------------------------------------------------------

    With that many indicators, probably best to install reghdfe from SSC and absorb most of them (the ones whose coefficients are of no immediate interest).

    ADDED IN EDIT:

    I have 50 internet search variables that I want to regress on the day of the week and month dummies to remove the seasonality of the search words. How do I input this into STATA in one go rather than doing it for each singular word?
    I probably misread what you want. If you want to regress each on the week and month indicators, assuming then you have the 50 variables + 2 other variables named "week" and "month", then:


    Code:
    ds week month, not
    foreach var in `r(varlist)'{
        display "command: regress `var' i.(week month), cluster(clustervar)"
        regress `var' i.(week month), cluster(clustervar)
    }
    Last edited by Andrew Musau; 13 Dec 2022, 07:13.

    Comment


    • #3
      I have variables for each month (january-november (dec being my base)) and the same for weekdays. Would i replace this into the "week month" section?

      Comment


      • #4
        If you read the suggested documentation:

        Code:
        help varlist
        you will see that if these 50 internet search variables are ordered, then you can refer to them by the first and last name in the order. Thus:

        Code:
        foreach var of varlist name1- name50{
            regress `var' jan-nov monday-saturday, robust
        }
        Just replace the highlighted in the code. If no such order exists in the dataset, then you have to specify the exceptions within ds, e.g.,

        Code:
        ds monday tuesday ... sunday january february... december, not

        Comment


        • #5
          Hi Andrew, thanks for the reply. I want to keep the residual and have this as a new value within my data editor- e.g. having searchterm_residual as a new term- is there a way to do this? And also for each term, the Saturday value has been omitted due to collinearity- is there a way around this?

          Comment


          • #6
            One day of the week must be omitted, otherwise the dummies would be collinear with the intercept. If you omitted Sunday and have no observations for this day of the week in the dataset, then the reference will have to be one of the remaining dummies. But the collinearity could also come from some other variable or combination of variables. I cannot follow the discussion on residuals. The usual way you would generate residuals would be

            Code:
            regress ...
            predict res, res
            Then a variable "res" will be generated within the dataset. In any case, start a new thread and expound on your question if my reply is not satisfactory.

            Comment


            • #7
              I have data for all the days but it is Saturday that is being omitted- I used ib7.day to make Sunday my base, but data is showing up for this?

              Comment


              • #8
                Show the result of

                Code:
                tab day
                and the full regression output and commands. Place this

                Code:
                within CODE delimiters

                Comment


                • #9
                  Code:
                  foreach var of varlist ldiffcost_w-ldiffexpense_w{
                  regress `var' ib12.month ib7.day, robust
                  }
                  Code:
                  Linear regression                               Number of obs     =      6,907
                                                                  F(17, 6889)       =     393.45
                                                                  Prob > F          =     0.0000
                                                                  R-squared         =     0.5501
                                                                  Root MSE          =     .45837
                  
                  ------------------------------------------------------------------------------
                               |               Robust
                  ldiffasset_w | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
                  -------------+----------------------------------------------------------------
                         month |
                      January  |   .0283948    .032805     0.87   0.387    -.0359131    .0927027
                     February  |  -.0012409   .0318373    -0.04   0.969    -.0636517    .0611699
                        March  |   .0117377   .0309431     0.38   0.704    -.0489203    .0723956
                        April  |   .0064888   .0331068     0.20   0.845    -.0584107    .0713883
                          May  |   .0051116   .0321649     0.16   0.874    -.0579415    .0681647
                         June  |   .0201765   .0296566     0.68   0.496    -.0379595    .0783126
                         July  |   .0053628   .0306117     0.18   0.861    -.0546455    .0653712
                       August  |   .0048855   .0317793     0.15   0.878    -.0574117    .0671826
                    September  |   .0142241   .0305628     0.47   0.642    -.0456885    .0741366
                      October  |   .0192822   .0302018     0.64   0.523    -.0399227    .0784871
                     November  |   .0102481   .0299485     0.34   0.732    -.0484602    .0689564
                               |
                           day |
                       Sunday  |   1.078412   .0266829    40.42   0.000     1.026106    1.130719
                       Monday  |   1.870212    .023487    79.63   0.000      1.82417    1.916254
                      Tuesday  |   1.057486   .0202965    52.10   0.000     1.017699    1.097274
                    Wednesday  |   .9628343   .0197522    48.75   0.000     .9241139    1.001555
                     Thursday  |   .9343056   .0194276    48.09   0.000     .8962215    .9723898
                       Friday  |   .8256083   .0199397    41.41   0.000     .7865204    .8646962
                     Saturday  |          0  (omitted)
                            7  |          0  (empty)
                               |
                         _cons |  -.9721731   .0297401   -32.69   0.000    -1.030473   -.9138734
                  ------------------------------------------------------------------------------
                  note: 6.day omitted because of collinearity.
                  note: 7b.day identifies no observations in the sample.

                  Comment


                  • #10
                    If you reread your threads from yesterday, you will discover that Nick Cox and I stated that Sundays are coded as 0. You were directed to

                    Code:
                    help dow()
                    There is no day coded 7 in your dataset, that is why the coefficient on 7 turns up empty. To specify Sunday as the base:

                    Code:
                    regress `var' ib12.month ib0.day

                    Comment


                    • #11
                      Ok thanks for amending, did not see the mistake- as you said before to keep the residual I can put predict res, res- is there a way to do this for each singular variable?
                      E.g. possibly
                      Code:
                      foreach var of varlist ldiffcost_w-ldiffexpense_w {
                      regress `var' ib12.month ib0.day, robust
                      predict `var'res, res }

                      Comment


                      • #12
                        That would do it (with the closing brace in a new line). The requirement is that variable names need to be distinct.

                        Comment

                        Working...
                        X