Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using reghdfe command with if-statements

    Hello, bit of a complex one here:

    I’m currently working as a research assistant, using my supervisor’s code, which uses employee-level data for a firm which “de-trashes” stock coming into its warehouse i.e., removes transit packaging.
    The code is designed to estimate productivity, measured in units [de-trashed] per minute (upm). It uses the reghdfe command, a linear regression that absorbs multiple layers of fixed effects. It also uses an independent variable called PLANNED_UPH which is a target that, if reached, workers get paid a bonus.
    The fixed effects used in the regression equation are:
    • fe3_j (SKU code i.e., product fixed effects)
    • fe3_i (worker fixed effects)
    • fe3_t (date fixed effects)
    • fe3_dow (day of week fixed effects)
    • fe3_shift (shift type fixed effects i.e., day, early or late shift)
    • fe3_h (hour of the day fixed effects)
    • fe3_handle (handling class fixed effects)
    • fe3_station (warehouse workstation fixed effects)
    • fe3_group (group of workers fixed effects)
    The code is as follows:

    reghdfe uph PLANNED_UPH, ///
    absorb(fe3_j=SKU_ID fe3_i=user_code fe3_t=date_code fe3_dow=dow fe3_shift=shift_type fe3_h=HourDay1 ///
    fe3_handle=HANDLING_CLASS fe3_station=STATION_ID fe3_group=GROUP_ID)
    quietly estadd local controls "Yes"
    quietly estadd local FE_t "Yes"
    quietly estadd local FE_i "Yes"
    quietly estadd local FE_j "Yes"
    est store H3

    The output (H3) is as follows:
    HDFE Linear regression Number of obs = 2,480,900
    Absorbing 9 HDFE groups F( 1,2454358) = 1.66
    Prob > F = 0.1971
    R-squared = 0.5447
    Adj R-squared = 0.5398
    Within R-sq. = 0
    Root MSE = 0.2292
    uph Coef. Std. Err. t P>t [95% Conf. Interval]
    PLANNED_UPH -2.25e-06 1.75E-06 -1.29 0.197 -5.68e-06 1.17E-06
    _cons .4962852 0.002311 214.75 0.000 .4917558 0.5008146
    Absorbed degrees of freedom:
    Absorbed FE Categories Redundant Num. Coefs
    -
    SKU_ID 25692 0 25692
    user_code 567 1 566
    date_code 232 1 231
    dow 7 7 0
    shift_type 3 1 2
    HourDay1 9 1 8
    HANDLING_CLASS 2 2 0
    STATION_ID 38 1 37
    GROUP_ID 7 2 5
    What I have been asked to do is to first, split the data in half by date (I did this by just creating binary dummies called split1 and split2 to represent data from the first and second halves of the year, respectively). I then have to run the same regression again for just the first half and then copy the values of the coefficients on the fixed effects into the data subset from the second half. This way, I can look at the coefficient on each of the fixed effects and interpret them more easily.

    To run the regression on the first half of code, I thought of running the code with if-statements so that the regressions would only run if split1==1. Then for each user ID (worker), I could copy the coefficients from split1 to split2 somehow, then run the code only for split2. However, wherever I place the if-statements in the code, it returns with errors. I’m grateful for any ideas, thanks.
    Last edited by John Poole; 27 Feb 2021, 07:26.

  • #2
    This is actually a simple question, so let me start with some advice.

    With regard to using Stata effectively, I'm sympathetic to you as a new user - there is quite a lot to absorb. And even worse if perhaps you are under pressure from your supervisor to produce some output quickly. Nevertheless, I'd like to encourage you to take a step back from your immediate tasks.

    When I began using Stata in a serious way, I started, as have others here, by reading my way through the Getting Started with Stata manual relevant to my setup. Chapter 18 then gives suggested further reading, much of which is in the Stata User's Guide, and I worked my way through much of that reading as well. There are a lot of examples to copy and paste into Stata's do-file editor to run yourself, and better yet, to experiment with changing the options to see how the results change.

    All of these manuals are included as PDFs in the Stata installation and are accessible from within Stata - for example, through the PDF Documentation section of Stata's Help menu. The objective in doing the reading was not so much to master Stata - I'm still far from that goal - as to be sure I'd become familiar with a wide variety of important basic techniques, so that when the time came that I needed them, I might recall their existence, if not the full syntax, and know how to find out more about them in the help files and PDF manuals.

    Stata supplies exceptionally good documentation that amply repays the time spent studying it - there's just a lot of it. The path I followed surfaces the things you need to know to get started in a hurry and to work effectively.

    With regard to using Statalist effectively, please take a few moments to review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. Note especially sections 9-12 on how to best pose your question. It's particularly helpful to copy commands and output from your Stata Results window and paste them into your Statalist post using code delimiters [CODE] and [/CODE], and to use the dataex command to provide sample data, as described in section 12 of the FAQ.

    Section 12.1 is particularly pertinent

    12.1 What to say about your commands and your problem

    Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly!
    ...
    Never say just that something "doesn't work" or "didn't work", but explain precisely in what sense you didn't get what you wanted.
    The more you help others understand your problem, the more likely others are to be able to help you solve your problem.

    Now, with that out of the way, I expect you have used the wrong version of if
    Code:
    help if
    help ifcmd
    You haven't shown us the code that failed or the results it produced, so consider the following example.
    Code:
    . sysuse auto, clear
    (1978 Automobile Data)
    
    . if foreign==1 regress mpg weight // this is incorrect syntax
    
    . list foreign in 1, nolabel
    
         +---------+
         | foreign |
         |---------|
      1. |       0 |
         +---------+
    
    . regress mpg weight if foreign==1 // this is correct syntax
    
          Source |       SS           df       MS      Number of obs   =        22
    -------------+----------------------------------   F(1, 20)        =     17.47
           Model |  427.990298         1  427.990298   Prob > F        =    0.0005
        Residual |  489.873338        20  24.4936669   R-squared       =    0.4663
    -------------+----------------------------------   Adj R-squared   =    0.4396
           Total |  917.863636        21  43.7077922   Root MSE        =    4.9491
    
    ------------------------------------------------------------------------------
             mpg |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
          weight |   -.010426   .0024942    -4.18   0.000    -.0156287   -.0052232
           _cons |    48.9183   5.871851     8.33   0.000     36.66983    61.16676
    ------------------------------------------------------------------------------
    
    .
    The first regress does not run because the expression "foreign==1" is evaluated once to decide whether or not to run the command, and foreign is a variable, so what is evaluated is "foreign[1]==1" - using the value of foreign in the first observation. That is zero, so the regress command is bypassed.

    The second regress runs, including only those observations for which foreign==1. The hint is in the Syntax section of the output of help regress.
    Code:
    Syntax
    
            regress depvar [indepvars] [if] [in] [weight] [, options]
    The optional (because it is enclosed in brackets) if clause is what you needed. FWIW the option in clause allows one to restrict the command to a range of observation numbers, as I did with the list command in the example.
    Last edited by William Lisowski; 27 Feb 2021, 08:22.

    Comment

    Working...
    X