Using reghdfe command with if-statements

John Poole

Join Date: May 2020
Posts: 2

Using reghdfe command with if-statements

27 Feb 2021, 07:22

Hello, bit of a complex one here:

I’m currently working as a research assistant, using my supervisor’s code, which uses employee-level data for a firm which “de-trashes” stock coming into its warehouse i.e., removes transit packaging.
The code is designed to estimate productivity, measured in units [de-trashed] per minute (upm). It uses the reghdfe command, a linear regression that absorbs multiple layers of fixed effects. It also uses an independent variable called PLANNED_UPH which is a target that, if reached, workers get paid a bonus.
The fixed effects used in the regression equation are:

fe3_j (SKU code i.e., product fixed effects)
fe3_i (worker fixed effects)
fe3_t (date fixed effects)
fe3_dow (day of week fixed effects)
fe3_shift (shift type fixed effects i.e., day, early or late shift)
fe3_h (hour of the day fixed effects)
fe3_handle (handling class fixed effects)
fe3_station (warehouse workstation fixed effects)
fe3_group (group of workers fixed effects)

The code is as follows:

reghdfe uph PLANNED_UPH, ///
absorb(fe3_j=SKU_ID fe3_i=user_code fe3_t=date_code fe3_dow=dow fe3_shift=shift_type fe3_h=HourDay1 ///
fe3_handle=HANDLING_CLASS fe3_station=STATION_ID fe3_group=GROUP_ID)
quietly estadd local controls "Yes"
quietly estadd local FE_t "Yes"
quietly estadd local FE_i "Yes"
quietly estadd local FE_j "Yes"
est store H3

The output (H3) is as follows:

HDFE Linear regression			Number of obs =	2,480,900
Absorbing 9 HDFE groups			F( 1,2454358) =	1.66
			Prob > F =	0.1971
			R-squared =	0.5447
			Adj R-squared =	0.5398
			Within R-sq. =	0
			Root MSE =	0.2292


uph Coef.	Std. Err.	t	P>t [95% Conf.	Interval]

PLANNED_UPH -2.25e-06	1.75E-06	-1.29	0.197 -5.68e-06	1.17E-06
_cons .4962852	0.002311	214.75	0.000 .4917558	0.5008146

Absorbed degrees of freedom:

Absorbed FE	Categories	Redundant	Num. Coefs
			-
SKU_ID	25692	0	25692
user_code	567	1	566
date_code	232	1	231
dow	7	7	0
shift_type	3	1	2
HourDay1	9	1	8
HANDLING_CLASS	2	2	0
STATION_ID	38	1	37
GROUP_ID	7	2	5

What I have been asked to do is to first, split the data in half by date (I did this by just creating binary dummies called split1 and split2 to represent data from the first and second halves of the year, respectively). I then have to run the same regression again for just the first half and then copy the values of the coefficients on the fixed effects into the data subset from the second half. This way, I can look at the coefficient on each of the fixed effects and interpret them more easily.

To run the regression on the first half of code, I thought of running the code with if-statements so that the regressions would only run if split1==1. Then for each user ID (worker), I could copy the coefficients from split1 to split2 somehow, then run the code only for split2. However, wherever I place the if-statements in the code, it returns with errors. I’m grateful for any ideas, thanks.

Last edited by John Poole; 27 Feb 2021, 07:26.

Tags: None

William Lisowski

Join Date: Dec 2014

Posts: 10150
#2

27 Feb 2021, 08:20

This is actually a simple question, so let me start with some advice.

With regard to using Stata effectively, I'm sympathetic to you as a new user - there is quite a lot to absorb. And even worse if perhaps you are under pressure from your supervisor to produce some output quickly. Nevertheless, I'd like to encourage you to take a step back from your immediate tasks.

When I began using Stata in a serious way, I started, as have others here, by reading my way through the Getting Started with Stata manual relevant to my setup. Chapter 18 then gives suggested further reading, much of which is in the Stata User's Guide, and I worked my way through much of that reading as well. There are a lot of examples to copy and paste into Stata's do-file editor to run yourself, and better yet, to experiment with changing the options to see how the results change.

All of these manuals are included as PDFs in the Stata installation and are accessible from within Stata - for example, through the PDF Documentation section of Stata's Help menu. The objective in doing the reading was not so much to master Stata - I'm still far from that goal - as to be sure I'd become familiar with a wide variety of important basic techniques, so that when the time came that I needed them, I might recall their existence, if not the full syntax, and know how to find out more about them in the help files and PDF manuals.

Stata supplies exceptionally good documentation that amply repays the time spent studying it - there's just a lot of it. The path I followed surfaces the things you need to know to get started in a hurry and to work effectively.

With regard to using Statalist effectively, please take a few moments to review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. Note especially sections 9-12 on how to best pose your question. It's particularly helpful to copy commands and output from your Stata Results window and paste them into your Statalist post using code delimiters [CODE] and [/CODE], and to use the dataex command to provide sample data, as described in section 12 of the FAQ.

Section 12.1 is particularly pertinent

12.1 What to say about your commands and your problem

Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly!
...
Never say just that something "doesn't work" or "didn't work", but explain precisely in what sense you didn't get what you wanted.

The more you help others understand your problem, the more likely others are to be able to help you solve your problem.

Now, with that out of the way, I expect you have used the wrong version of if

Code:

help if help ifcmd

You haven't shown us the code that failed or the results it produced, so consider the following example.

Code:

. sysuse auto, clear (1978 Automobile Data) . if foreign==1 regress mpg weight // this is incorrect syntax . list foreign in 1, nolabel +---------+ | foreign | |---------| 1. | 0 | +---------+ . regress mpg weight if foreign==1 // this is correct syntax Source | SS df MS Number of obs = 22 -------------+---------------------------------- F(1, 20) = 17.47 Model | 427.990298 1 427.990298 Prob > F = 0.0005 Residual | 489.873338 20 24.4936669 R-squared = 0.4663 -------------+---------------------------------- Adj R-squared = 0.4396 Total | 917.863636 21 43.7077922 Root MSE = 4.9491 ------------------------------------------------------------------------------ mpg | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- weight | -.010426 .0024942 -4.18 0.000 -.0156287 -.0052232 _cons | 48.9183 5.871851 8.33 0.000 36.66983 61.16676 ------------------------------------------------------------------------------ .

The first regress does not run because the expression "foreign==1" is evaluated once to decide whether or not to run the command, and foreign is a variable, so what is evaluated is "foreign[1]==1" - using the value of foreign in the first observation. That is zero, so the regress command is bypassed.

The second regress runs, including only those observations for which foreign==1. The hint is in the Syntax section of the output of help regress.

Code:

Syntax regress depvar [indepvars] [if] [in] [weight] [, options]

The optional (because it is enclosed in brackets) if clause is what you needed. FWIW the option in clause allows one to restrict the command to a range of observation numbers, as I did with the list command in the example.

Last edited by William Lisowski; 27 Feb 2021, 08:22.
Comment

Announcement

Using reghdfe command with if-statements

Comment