Fixed effects probit model

Maria Domingo

Join Date: Apr 2020
Posts: 17

Fixed effects probit model

21 Apr 2020, 14:21

Hi everyone:

I would like to ask you the following.
I have a dataset, that is the outcome of a field experiment. It is formed of cross-sectional data: Two interviews took place in January 2017 and August 2017, in which the same individuals participated. The survey in Jan 2017 included socio-demographic variables, but apart from that both surveys had more or less the same questions regarding adoption habits.

During these months, an intervention took place, and the outcome variable (Y) is adoption and equals 1 if the individual adopted by August 2017 and 0 otherwise.

Five regions(R) were part of this intervention, adding up to 32 districts(D). [The treatment was randomized at the district level.] The names of these regions and districts are already in byte format.

Since it is cross-sectional data I would like to have "region" as the fixed effects level. Also, I need to cluster errors terms at the district level, since individuals are likely to be similar within a district than between districts.

The regression is as follows: Y_idr = α + β₀T_idr+ γ_1X_idr + γ_2W_dr + R_r + e_idr

Y_idr is the dependent variable (1 or 0), T_idris the treatment variable (0, 1 or 2), _X_idr is a vector of individual-level variables, _W_dr controls for commune-level variations and R_r is the region strata fixed effects.

Here I provide an example of my dataset (however the data is confidential and I cannot provide with more details. I hope this variables are enough to explain myself)

Region	District	Adoption (Y)	Treatment (T)	Sex	Education	Children	Risk aversion (Jan 17)	Risk aversion (Aug 17)	Income (Jan 17)	Income (Aug 17)
1	1	0	0	1	4	3	1	1	1500	1500
1	3	1	1	0	1	0	0	1	1500	1750
1	4	1	1	0	2	1	0	0	2000	1500
2	5	0	1	0	3	2	1	0	1000	1200
2	6	1	0	1	0	1	1	0	700	750
2	8	1	1	0	1	1	0	1	1000	1000
3	10	1	2	1	6	2	1	1	1500	3000
3	11	0	0	0	0	0	0	0	1000	0
3	12	1	2	1	2	2	1	1	1500	1500
4	14	0	1	1	3	3	0	0	0	1500
4	15	0	1	0	4	4	1	1	4000	4000
4	16	1	0	1	5	1	1	1	500	500
5	20	0	2	0	6	1	0	1	1000	1200
5	22	1	2	0	2	2	1	0	1000	1000

My question basically is
(1) Which command should I use in Stata 16 to run the regression above, accounting for Y being 1 or 0 (probit) and also including region fixed effects.
(2) Also, how can I create and include the vector of individual-level variables (X) (e.g. sex, education, children) and how can I add commune-level variations (W)?

I know it is a very long post, but I would be very grateful if someone helps me. I have been struggling a while but I am stuck.

Thanks in advance,
Maria

Tags: cross sectional, fixed effects, logit, probit, regression

Jeff Wooldridge

Join Date: Apr 2014

Posts: 2168
#2

21 Apr 2020, 17:02

Maria:

You actually have two years of panel data, but you've put it in wide rather than long format. It is easy to switch from wide to long, and there are a number of threads here about it. That is not my comparative advantage.

I assume that the treatment occurred between Jan and Aug -- hopefully. I would actually recommend starting off by ignoring that Y is binary and using a two-period fixed effects analysis. Because this is the same as differencing, you can actually do analysis without without making the data set long. Simple define

gen cincome = Augincome - Janincome
gen crisk = Augrisk - Janrisk
reg cincome i.T crisk, vce(robust)

If you want, you can include the time constant variables, but these would drop out of the first differencing.

The differencing removes fixed effects at the district level, so also at the region level.

You should not use region dummies (fixed effects) with probit when you only have a few observations per region. This creates the incidental parameters problem. I could make some suggestions for probit, but you seem to be a beginner. Thus, I would start with differencing following by OLS.

JW
1 like
Comment
Maria Domingo

Join Date: Apr 2020

Posts: 17
#3

22 Apr 2020, 01:37

Dear Jeff Wooldridge, thanks a lot for your reply.

I will try to switch from wide to long.
Yes, you are right. The treatment took place between Jan and Aug 17.

My dataset is rather large (2000 individuals and 620 variables). On average, around 200 observations per region. Should I apply the probit instead of OLS then?
Also, can I include variables that only exist for the second survey (not possible for differencing)?
And finally, for clustering errors at the district level, is the code < vce(cluster district) >?

Thanks in advance,
Maria
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17709
#4

22 Apr 2020, 01:48

Maria:
I can reply to your last question only:
yes, your code for clustering standard errors on district is correct

Kind regards,
Carlo
(Stata 19.0)
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2168
#5

23 Apr 2020, 19:15

Maria: I discussed the problems with the phrase "fixed effects" when you are simply adding dummies not at the unit of observation in a recent thread:

https://www.statalist.org/forums/for...tal-parameters

To summarize, adding regional dummies when you have 200 observations per region is not a problem. make the data long format, run pooled probit with the region dummies, and cluster at the district level -- provided the treatment was assigned at that level, as appears to be the case.

Hope this helps.

JW
1 like
Comment
Maria Domingo

Join Date: Apr 2020

Posts: 17
#6

26 Apr 2020, 04:09

Dear Jeff Wooldridge, I would like to kindly ask you for some help. I have made the data long format, and I also created new variables for differencing for the original dataset, to compare results.
However, when I run a probit with the long format dataset, some coefficients are 0 and the standard errors empty or omitted.

My long data looks similar to (see picture next post):

producerID month region_code commune_code age sex farmsize agricplotsMay agricplotsDec agricplots adoptionMay adoptionDec adoption incomeDec
3131 Dec Centre-Ouest Bakata 30 1 7.5 3 3 3 5 3 3 15000
3131 May Centre-Ouest Bakata 30 1 7.5 3 3 3 5 3 5 15000
3132 Dec Centre-Ouest Bakata 45 0 4 1 1 1 4 1 1 24000
3132 May Centre-Ouest Bakata 45 0 4 1 1 1 4 1 4 24000
3133 Dec Centre-Ouest Bakata 18 0 2 4 4 4 0 3 3 3000
3133 May Centre-Ouest Bakata 18 0 2 4 4 4 0 3 0 3000
3134 Dec Centre-Ouest Dassa 25 1 11.5 1 1 1 5 0 0 4000
3134 May Centre-Ouest Dassa 25 1 11.5 1 1 1 5 0 5 4000
3135 Dec Centre-Ouest Dassa 31 1 12 2 3 3 7 8 8 12000
3135 May Centre-Ouest Dassa 31 1 12 2 3 2 7 8 7 12000
5241 Dec Sud-Ouest Batie 53 0 5.5 3 0 0 2 2 2 20000
5241 May Sud-Ouest Batie 53 0 5.5 3 0 3 2 2 2 20000
5242 Dec Sud-Ouest Batie 50 0 3 4 4 4 0 2 2 24000
5242 May Sud-Ouest Batie 50 0 3 4 4 4 0 2 0 24000
5243 Dec Sud-Ouest Batie 23 0 1.5 5 7 7 4 1 1 6000
5243 May Sud-Ouest Batie 23 0 1.5 5 7 5 4 1 4 6000

The dataset in which I create differencing, is basically the same but with only one observation per producer, and with variables like: agricplotsDiff = agricplotsDec - agricplotsMay

In both datasets, "treat" (treatment) equals 0, 1 or 2; and I created a dummy variable "adoptionDummy"= 1 if adoption >=0, as my dependent variable.

My dependent variables, therefore, are "adoption", "adoptionDec" and adoptionDummy.

I was trying to run this code:

Code:

xtset producerID May xtreg adoption i.treat age sex farmsize agricplots incomeDec i.region_code, vce(cluster commune_code)

But I am not sure if effects are fixed at the region level. Also, if I add "fe" at the end of this code, then the coefficients become 0.

If I try [CODE]xtset region_code
xtreg SLMPDummyDiff i.treat age sex farmsize agricplots incomeDec, vce(cluster commune_code) fe [\CODE]

I get "panels are not nested within clusters" which makes sense because I have 5 regions and 32 communes, but then I don't know what should I do to fixed effects at the region level.

If I try then this probit, the constant coefficients become 0 with omitted standard errors. I don't understand what's wrong.

Code:

probit adoptionDummy i.treat age sex farmsize agricplots incomeDec i.region_code, vce(cluster commune_code)

Regarding the dataset with differencing, is it correct to run this code:?

Code:

probit adoptionDummy i.treat age sex farmsize agricplotsDiff incomeDec i.region_code, vce(cluster commune_code)

And also, when I take the adoption in December as my dependent variable, am I right if I include the constant variables (e.g., sex, age) and the variables that I have for December (e.g. agricplotsDec, incomeDec) and not the variables that change over the months?

As you can see, I am quite lost, I have never run a fixed effects myself, so I would appreciate a lot some help or guidance.
I hope I have explained myself.
Thanks in advance,
Maria

Last edited by Maria Domingo; 26 Apr 2020, 04:46.
Comment
Maria Domingo

Join Date: Apr 2020

Posts: 17
#7

26 Apr 2020, 04:21

The data looks better here sorry:
Attached Files

Last edited by Maria Domingo; 26 Apr 2020, 04:44.
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2168
#8

01 May 2020, 21:58

You should

xtset producerID month

in which case the fixed effects are at the producer level. But the you forgot the fe option, so you did random effects.
1 like
Comment
Nae Khar

Join Date: Dec 2018

Posts: 56
#9

02 Jul 2020, 11:34

Hello everyone,

Is it possible to run a fixed effects ordered logit model on a pseudo panel? What would be the stata command for a model like this?

My data consists of 10 countries, with 1500 individuals per country over 4 years. Different individuals are surveyed year by year for the same set of countries.

All my variables (dependent and independent) are ordinal on a scale of 1-5.
Comment

Nae Khar

Join Date: Dec 2018
Posts: 56

#10

02 Jul 2020, 11:57

Here is how my data is set up

Individual	country	year	Openness to FDI	Safety	Trust	Education
1	1	2012
2	1	2012
3	1	2012
4,5,…,1500	1	2012
1	1	2014
2	1	2014
3	1	2014
4,5,…,1500	1	2014
1	1	2016
2	1	2016
3	1	2016
4,5,…,1500	1	2016
1	1	2018
2	1	2018
3	1	2018
4,5,…,1500	1	2018
1	2	2012
2	2	2012
3	2	2012
4,5,…,1500	2	2012

Comment

joy yakubu

Join Date: Jun 2020
Posts: 11

#11

25 Oct 2020, 14:38

Hello every one,

Please I am new in stata, I am trying to merge a stata file and an excel file, then run a probit cluster fixed effect on the sample.
The stata file contains the survey response, while the excel file contains climate record based on the cluster.
for instance, the excel file is:

cluster	tempreture	precipitation
1	25.6	1.89
2	27	2.4
3	24.44	1.56
4	24.89	2.32

while the excel file is:

region	cluster	adoption	age	sex
urban	1	1	10	f
rural	1	0	15	m
urban	1	1	14	f
rural	1	0	5	f
urban	2	1	13	m
rural	2	1	6	m
urban	2	0	7	f
rural	3	1	13	f
urban	3	1	4	f
rural	3	0	7	m
urban	3	1	4	m

I imported the excel to stata and saved it as a stata file, then i tried merging the two files using the command,
use "C:\survey\A.dta", clear sort cluster joinby cluster using "C:\temp\B.dta", unmatched(both) sortby cluster: probit adoption age i.sex temperature precipitation I am getting error messages. Kindly guide me please. Thank you.

Announcement

Fixed effects probit model

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment