Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Two many omitted categories and restrictions with rifreg

    Dear Statalist members,
    I have created an example which reflects my data and is reproducing the problems I have. To give you a quick overview: My depended variable is net wealth ("wealth"). The data includes an indicator called "migrant" being equal to 1 in case of migrants and equal to zero otherwise. The same applies for "migrantp", which is the partners immigration status. Based on these two variables I have tried to create four possible family types: native-only(0), immigrant-only(1), mixed with native-born head(2), mixed with foreign-born head(3). The indicator variable "immigstat" should identify immigrant households.

    Code:
    clear    
    set obs 10000
        
    g wealth = rnormal(0,10000)
    g migrant = rbinomial(1,0.5)
    g migrantp = rbinomial(1,0.5)
    
    gen famtype = .
    replace famtype = 0 if migrant == 0 & migrantp == 0 //native hh
    replace famtype = 1 if migrant == 1 & migrantp == 1 //immigrant-only hh
    replace famtype = 2 if migrant == 0 & migrantp == 1 //mixed with native-born head
    replace famtype = 3 if migrant == 1 & migrantp == 0 //mixed with foreign-born head
    tab famtype, gen(d_famtype)
    
    gen immigstat =. //dummy = 1 for immigrant hh
    replace immigstat = 1 if inrange(famtype,1,3)
    replace immigstat = 0 if famtype == 0 
    
    reg wealth immigstat i.famtype 
    
    rifreg wealth immigstat d_famtype2 d_famtype3 d_famtype4, q(0.5)
    I am trying to figure out how to solve the following two problems, related to this code:

    1. Even if I ran a simple regression with reg, two family types are omitted, which should not be the case. The native-only type should be the reference category, which I am trying to ensure by setting this family type equal to zero, whereas the last category should not be omitted. Do you have any suggestions on this issue?

    2. Actually I want to do a quantile regression with the rifreg-command (available at: http://faculty.arts.ubc.ca/nfortin/datahead.html). Within this, I have to restrict my estimated coefficients for the family types to sum to zero. As factor variables are not allowed with this command I have included the respective dummy variables. I have already tried to apply the constraint-command which does not seem to work with rifreg. Do you have any recommendation on how to involve this restriction?

    (I am using Stata version 14.2)

    Many thanks in advance for your support!

    Kind regards,

    Kathrin

  • #2
    depended variable
    I am sorry for my mistake, this should obviously be "dependent variable".

    Comment


    • #3
      Dear Kathrin,
      For your first question, at least in regards to the code that you sent, the problem of having two omitted categories in your family type variable is that famtype and immigstat are nested, and thus cannot be fully identified in the model. In other words, if anyone has a immigstat=0 he is part of famtype=0. This is basically a multicollinearity problem.

      For the second question, I would suggest the following approach:
      Code:
      *Step 1. Estimate the RIFvalues for the data in hand
      rifreg wealth, q(50) retain(q50_wealth)
      *Step 2. Use standard regression for estimate your model
      reg q50_wealth immigstat d_famtype2 d_famtype3 d_famtype4, robust
      ** which will give you the same results as the rifregression command
      rifreg wealth immigstat d_famtype2 d_famtype3 d_famtype4, q(50)
      Hope this helps
      Fernando

      Comment


      • #4
        Dear Fernando,

        thank you very much for your advice, the second issue seems to be solved now.

        In terms of the first problem, I am aware that family type 0 cannot occur in my estimation, this is created on purpose as I just want the immigrant family types to appear in there, but I do not get why family type 3 is omitted as well.

        Kind regards,
        Kathrin

        Comment


        • #5
          Hi Katherin
          So, the basic answer is, when you use dummy sets, you need to drop one for identification. That accounts for the first omitted category.(famtype 0 and immigstat 0).
          Now for the rest of the famtype categories, (1 2 3) you will observe that all have Immgstat 1. Thus, you either need to drop another category in Famtype, or drop immgstat.
          Your original question, however, may provide an answer to your problem. Constraining the coefficients might allow you to estimate the coefficients for all Famtype categories.
          Fernando

          Comment


          • #6
            Thank you again for the explanation, Fernando. Do you have an example on how the restriction may solve my first problem?
            Last edited by Kathrin Schulze; 14 Jul 2017, 02:52.

            Comment


            • #7
              One possibility is working your model as follows:
              Code:
              constrain 2  1.famtype+2.famtype+3.famtype=0
               
              capture program drop myreg
              program myreg
              args lnf xb s
              qui:replace `lnf'=ln(normalden($ML_y1,`xb',exp(`s')))
              end
               ml model lf myreg (xb:wealth=i.immigstat i.famtype, ) (s:), col constrain(  2)
               
              ml maximize,
              However, im not sure how to do the same using standard OLS regression
              Fernando

              Comment


              • #8
                I'm sorry to bother you again, but I was not able to solve my problem of the second omitted category. I thought the imposed restriction would solve my problem, as Fernando suggested. but somehow it does not. Even if I am applying the constraint d_famtype4 is omitted. To illustrate my concern, I am giving you my code and relevant results again:

                Code:
                clear    
                set obs 10000
                    
                g wealth = rnormal(0,10000)
                g migrant = rbinomial(1,0.5)
                g migrantp = rbinomial(1,0.5)
                
                gen famtype = .
                replace famtype = 0 if migrant == 0 & migrantp == 0 //native hh
                replace famtype = 1 if migrant == 1 & migrantp == 1 //immigrant-only hh
                replace famtype = 2 if migrant == 0 & migrantp == 1 //mixed with native-born head
                replace famtype = 3 if migrant == 1 & migrantp == 0 //mixed with foreign-born head
                tab famtype, gen(d_famtype)
                
                gen immigstat =. //Dummy = 1 for immigrant hh
                replace immigstat = 1 if inrange(famtype,1,3)
                replace immigstat = 0 if famtype == 0 
                
                constraint 1 d_famtype2 + d_famtype3 + d_famtype4 = 0
                
                cnsreg wealth immigstat d_famtype2 d_famtype3 d_famtype4, c(1)   
                
                . cnsreg wealth immigstat d_famtype2 d_famtype3 d_famtype4, c(1)
                note: d_famtype4 omitted because of collinearity
                
                Constrained linear regression                   Number of obs     =     10,000
                                                                F(   2,   9997)   =       0.17
                                                                Prob > F          =     0.8471
                                                                Root MSE          = 10139.9961
                
                 ( 1)  d_famtype2 + d_famtype3 + o.d_famtype4 = 0
                ------------------------------------------------------------------------------
                      wealth |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                -------------+----------------------------------------------------------------
                   immigstat |  -127.8821    235.475    -0.54   0.587    -589.4605    333.6962
                  d_famtype2 |  -27.52445   142.7187    -0.19   0.847    -307.2818    252.2329
                  d_famtype3 |   27.52445   142.7187     0.19   0.847    -252.2329    307.2818
                  d_famtype4 |          0  (omitted)
                       _cons |   45.10751   204.4836     0.22   0.825    -355.7216    445.9366
                ------------------------------------------------------------------------------
                As I can reproduce this problem within this example I am pretty sure something is wrong with my definitions of famtype and/or immigstat. It may be quite obvious, but somehow I am not able to find the right solution to it. It would be really helpful if anybody has an advice on how to solve it, as I need to include all family types except for natives into my regression.


                Thank you very much for your help in advance!

                Kind regards,
                Kathrin

                Comment


                • #9
                  As I am still trying to solve the above problem, I would be really grateful for any advice! The goal of my analysis should be that I am able to interpret the variations in wealth holdings of each immigrant family-type from the overall wealth holdings of immigrants, but my attempt using the constraint command does not seem to work.

                  Comment


                  • #10
                    Hi Kathrin,
                    I think this is your solution for the specific problem you are running into.
                    Code:
                    cnsreg wealth immigstat d_famtype2 d_famtype3 d_famtype4, c(1)  coll
                    Fernando

                    Comment


                    • #11
                      I cannot tell you how grateful I am for your reply, Fernando. This is indeed solving my problem, thank you very much!
                      Kind regrards,
                      Kathrin

                      Comment

                      Working...
                      X