Two many omitted categories and restrictions with rifreg

Kathrin Schulze

Join Date: Jul 2017

Posts: 7
#1

Two many omitted categories and restrictions with rifreg

11 Jul 2017, 06:01

Dear Statalist members,
I have created an example which reflects my data and is reproducing the problems I have. To give you a quick overview: My depended variable is net wealth ("wealth"). The data includes an indicator called "migrant" being equal to 1 in case of migrants and equal to zero otherwise. The same applies for "migrantp", which is the partners immigration status. Based on these two variables I have tried to create four possible family types: native-only(0), immigrant-only(1), mixed with native-born head(2), mixed with foreign-born head(3). The indicator variable "immigstat" should identify immigrant households.

Code:

clear set obs 10000 g wealth = rnormal(0,10000) g migrant = rbinomial(1,0.5) g migrantp = rbinomial(1,0.5) gen famtype = . replace famtype = 0 if migrant == 0 & migrantp == 0 //native hh replace famtype = 1 if migrant == 1 & migrantp == 1 //immigrant-only hh replace famtype = 2 if migrant == 0 & migrantp == 1 //mixed with native-born head replace famtype = 3 if migrant == 1 & migrantp == 0 //mixed with foreign-born head tab famtype, gen(d_famtype) gen immigstat =. //dummy = 1 for immigrant hh replace immigstat = 1 if inrange(famtype,1,3) replace immigstat = 0 if famtype == 0 reg wealth immigstat i.famtype rifreg wealth immigstat d_famtype2 d_famtype3 d_famtype4, q(0.5)

I am trying to figure out how to solve the following two problems, related to this code:

1. Even if I ran a simple regression with reg, two family types are omitted, which should not be the case. The native-only type should be the reference category, which I am trying to ensure by setting this family type equal to zero, whereas the last category should not be omitted. Do you have any suggestions on this issue?

2. Actually I want to do a quantile regression with the rifreg-command (available at: http://faculty.arts.ubc.ca/nfortin/datahead.html). Within this, I have to restrict my estimated coefficients for the family types to sum to zero. As factor variables are not allowed with this command I have included the respective dummy variables. I have already tried to apply the constraint-command which does not seem to work with rifreg. Do you have any recommendation on how to involve this restriction?

(I am using Stata version 14.2)

Many thanks in advance for your support!

Kind regards,

Kathrin
Tags: None
Kathrin Schulze

Join Date: Jul 2017

Posts: 7
#2

11 Jul 2017, 08:56

depended variable

I am sorry for my mistake, this should obviously be "dependent variable".
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2470
#3

11 Jul 2017, 09:13

Dear Kathrin,
For your first question, at least in regards to the code that you sent, the problem of having two omitted categories in your family type variable is that famtype and immigstat are nested, and thus cannot be fully identified in the model. In other words, if anyone has a immigstat=0 he is part of famtype=0. This is basically a multicollinearity problem.

For the second question, I would suggest the following approach:

Code:

*Step 1. Estimate the RIFvalues for the data in hand rifreg wealth, q(50) retain(q50_wealth) *Step 2. Use standard regression for estimate your model reg q50_wealth immigstat d_famtype2 d_famtype3 d_famtype4, robust ** which will give you the same results as the rifregression command rifreg wealth immigstat d_famtype2 d_famtype3 d_famtype4, q(50)

Hope this helps
Fernando
Comment
Kathrin Schulze

Join Date: Jul 2017

Posts: 7
#4

12 Jul 2017, 01:11

Dear Fernando,

thank you very much for your advice, the second issue seems to be solved now.

In terms of the first problem, I am aware that family type 0 cannot occur in my estimation, this is created on purpose as I just want the immigrant family types to appear in there, but I do not get why family type 3 is omitted as well.

Kind regards,
Kathrin
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2470
#5

12 Jul 2017, 06:27

Hi Katherin
So, the basic answer is, when you use dummy sets, you need to drop one for identification. That accounts for the first omitted category.(famtype 0 and immigstat 0).
Now for the rest of the famtype categories, (1 2 3) you will observe that all have Immgstat 1. Thus, you either need to drop another category in Famtype, or drop immgstat.
Your original question, however, may provide an answer to your problem. Constraining the coefficients might allow you to estimate the coefficients for all Famtype categories.
Fernando
Comment
Kathrin Schulze

Join Date: Jul 2017

Posts: 7
#6

14 Jul 2017, 02:23

Thank you again for the explanation, Fernando. Do you have an example on how the restriction may solve my first problem?

Last edited by Kathrin Schulze; 14 Jul 2017, 02:52.
Comment

FernandoRios

Join Date: Apr 2014
Posts: 2470

14 Jul 2017, 04:55

One possibility is working your model as follows:

Code:

constrain 2  1.famtype+2.famtype+3.famtype=0
 
capture program drop myreg
program myreg
args lnf xb s
qui:replace `lnf'=ln(normalden($ML_y1,`xb',exp(`s')))
end
 ml model lf myreg (xb:wealth=i.immigstat i.famtype, ) (s:), col constrain(  2)
 
ml maximize,

However, im not sure how to do the same using standard OLS regression
Fernando

Comment

Kathrin Schulze

Join Date: Jul 2017
Posts: 7

28 Jul 2017, 08:19

I'm sorry to bother you again, but I was not able to solve my problem of the second omitted category. I thought the imposed restriction would solve my problem, as Fernando suggested. but somehow it does not. Even if I am applying the constraint d_famtype4 is omitted. To illustrate my concern, I am giving you my code and relevant results again:

Code:

clear    
set obs 10000
    
g wealth = rnormal(0,10000)
g migrant = rbinomial(1,0.5)
g migrantp = rbinomial(1,0.5)

gen famtype = .
replace famtype = 0 if migrant == 0 & migrantp == 0 //native hh
replace famtype = 1 if migrant == 1 & migrantp == 1 //immigrant-only hh
replace famtype = 2 if migrant == 0 & migrantp == 1 //mixed with native-born head
replace famtype = 3 if migrant == 1 & migrantp == 0 //mixed with foreign-born head
tab famtype, gen(d_famtype)

gen immigstat =. //Dummy = 1 for immigrant hh
replace immigstat = 1 if inrange(famtype,1,3)
replace immigstat = 0 if famtype == 0 

constraint 1 d_famtype2 + d_famtype3 + d_famtype4 = 0

cnsreg wealth immigstat d_famtype2 d_famtype3 d_famtype4, c(1)   

. cnsreg wealth immigstat d_famtype2 d_famtype3 d_famtype4, c(1)
note: d_famtype4 omitted because of collinearity

Constrained linear regression                   Number of obs     =     10,000
                                                F(   2,   9997)   =       0.17
                                                Prob > F          =     0.8471
                                                Root MSE          = 10139.9961

 ( 1)  d_famtype2 + d_famtype3 + o.d_famtype4 = 0
------------------------------------------------------------------------------
      wealth |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
   immigstat |  -127.8821    235.475    -0.54   0.587    -589.4605    333.6962
  d_famtype2 |  -27.52445   142.7187    -0.19   0.847    -307.2818    252.2329
  d_famtype3 |   27.52445   142.7187     0.19   0.847    -252.2329    307.2818
  d_famtype4 |          0  (omitted)
       _cons |   45.10751   204.4836     0.22   0.825    -355.7216    445.9366
------------------------------------------------------------------------------

As I can reproduce this problem within this example I am pretty sure something is wrong with my definitions of famtype and/or immigstat. It may be quite obvious, but somehow I am not able to find the right solution to it. It would be really helpful if anybody has an advice on how to solve it, as I need to include all family types except for natives into my regression.

Thank you very much for your help in advance!

Kind regards,
Kathrin

Comment

Kathrin Schulze

Join Date: Jul 2017

Posts: 7
#9

24 Aug 2017, 05:53

As I am still trying to solve the above problem, I would be really grateful for any advice! The goal of my analysis should be that I am able to interpret the variations in wealth holdings of each immigrant family-type from the overall wealth holdings of immigrants, but my attempt using the constraint command does not seem to work.
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2470
#10

24 Aug 2017, 06:17

Hi Kathrin,
I think this is your solution for the specific problem you are running into.

Code:

cnsreg wealth immigstat d_famtype2 d_famtype3 d_famtype4, c(1) coll

Fernando
Comment
Kathrin Schulze

Join Date: Jul 2017

Posts: 7
#11

24 Aug 2017, 07:10

I cannot tell you how grateful I am for your reply, Fernando. This is indeed solving my problem, thank you very much!
Kind regrards,
Kathrin
Comment

Announcement