Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fractional Regression (GLM)

    Hi!

    I want to run regressions where my dependent variable is the share of employment within a particular set of industries in a country and the dependent variable is the percent of people with college degree. For obvious reasons, I cannot run a straight OLS given that the share of employment is bounded [0,1]. Therefore, I tried a couple of fractional regressions as follows using fracreg probit and heteroskedastic probit but I do have a question regarding the marginal effects:

    Code:
     fracreg probit share_tech lhc i.country i.year if tech_intensity==1
    (fracreg probit)

    Code:
     margins, dyex(lhc)
    
    Average marginal effects Number of obs = 10,010
    Model VCE : Robust
    
    Expression : Conditional mean of share_tech, predict()
    dy/ex w.r.t. : lhc
    
    ------------------------------------------------------------------------------
    | Delta-method
    | dy/ex Std. Err. z P>|z| [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    lhc | -.0060946 .0019015 -3.21 0.001 -.0098215 -.0023677
    ------------------------------------------------------------------------------

    Code:
     fracreg probit share_tech lhc i.country i.year if tech_intensity==1, het(lhc) vce(robust)
    (Heteroskedastic Probit)

    Code:
     margins, dyex(lhc)
    
    Average marginal effects                        Number of obs     =     10,010
    Model VCE    : Robust
    
    Expression   : Conditional mean of share_tech, predict()
    dy/ex w.r.t. : lhc
    
    ------------------------------------------------------------------------------
                 |            Delta-method
                 |      dy/ex   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
             lhc |   .0062602   .0023983     2.61   0.009     .0015597    .0109607
    Why do the signs of the marginal effects get completely changed in the second specification? (Any hint?)

    Can I use a fractional regression where I can weaken any distribution assumption and yet get robust results? This change in sign in the marginal effects raised an eyebrow

    dataex country year yr_sch yr_sch_pri yr_sch_sec yr_sch_ter lhc lsc lpc tech_intensity

    ----------------------- copy starting from the next line -----------------------
    Code:
     * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float country double year float(yr_sch yr_sch_pri yr_sch_sec yr_sch_ter lhc lsc lpc tech_intensity)
    4 1975   .97  .622  .259 .089  1.46  .677   .838 0
    4 1980 1.293  .816  .357  .12 1.987 1.052  1.524 0
    4 1985 1.733 1.162  .421 .151 2.649 1.485  3.154 0
    4 1990 2.065  1.44  .448 .177 3.172  1.81  5.004 0
    4 2005 3.321  2.29  .807 .224  4.16 6.415  7.616 0
    4 2010 3.933 2.683 1.022 .228 4.254 8.638 11.648 0
    4 2015 4.826 3.278 1.266 .282 4.967 9.631 18.218 0
    4 1975   .97  .622  .259 .089  1.46  .677   .838 1
    4 1975   .97  .622  .259 .089  1.46  .677   .838 1
    4 1975   .97  .622  .259 .089  1.46  .677   .838 1
    4 1975   .97  .622  .259 .089  1.46  .677   .838 1
    4 1975   .97  .622  .259 .089  1.46  .677   .838 1
    4 1975   .97  .622  .259 .089  1.46  .677   .838 1
    4 1975   .97  .622  .259 .089  1.46  .677   .838 1
    4 1975   .97  .622  .259 .089  1.46  .677   .838 1
    4 1975   .97  .622  .259 .089  1.46  .677   .838 1
    4 1975   .97  .622  .259 .089  1.46  .677   .838 1
    4 1980 1.293  .816  .357  .12 1.987 1.052  1.524 1
    4 1980 1.293  .816  .357  .12 1.987 1.052  1.524 1
    4 1980 1.293  .816  .357  .12 1.987 1.052  1.524 1
    4 1980 1.293  .816  .357  .12 1.987 1.052  1.524 1
    4 1980 1.293  .816  .357  .12 1.987 1.052  1.524 1
    4 1980 1.293  .816  .357  .12 1.987 1.052  1.524 1
    4 1980 1.293  .816  .357  .12 1.987 1.052  1.524 1
    4 1980 1.293  .816  .357  .12 1.987 1.052  1.524 1
    4 1980 1.293  .816  .357  .12 1.987 1.052  1.524 1
    4 1980 1.293  .816  .357  .12 1.987 1.052  1.524 1
    4 1985 1.733 1.162  .421 .151 2.649 1.485  3.154 1
    4 1985 1.733 1.162  .421 .151 2.649 1.485  3.154 1
    4 1985 1.733 1.162  .421 .151 2.649 1.485  3.154 1
    4 1985 1.733 1.162  .421 .151 2.649 1.485  3.154 1
    4 1985 1.733 1.162  .421 .151 2.649 1.485  3.154 1
    4 1985 1.733 1.162  .421 .151 2.649 1.485  3.154 1
    4 1985 1.733 1.162  .421 .151 2.649 1.485  3.154 1
    4 1985 1.733 1.162  .421 .151 2.649 1.485  3.154 1
    4 1985 1.733 1.162  .421 .151 2.649 1.485  3.154 1
    4 1985 1.733 1.162  .421 .151 2.649 1.485  3.154 1
    4 1990 2.065  1.44  .448 .177 3.172  1.81  5.004 1
    4 1990 2.065  1.44  .448 .177 3.172  1.81  5.004 1
    4 1990 2.065  1.44  .448 .177 3.172  1.81  5.004 1
    4 1990 2.065  1.44  .448 .177 3.172  1.81  5.004 1
    4 1990 2.065  1.44  .448 .177 3.172  1.81  5.004 1
    4 1990 2.065  1.44  .448 .177 3.172  1.81  5.004 1
    4 1990 2.065  1.44  .448 .177 3.172  1.81  5.004 1
    4 1990 2.065  1.44  .448 .177 3.172  1.81  5.004 1
    4 1990 2.065  1.44  .448 .177 3.172  1.81  5.004 1
    4 1990 2.065  1.44  .448 .177 3.172  1.81  5.004 1
    4 2005 3.321  2.29  .807 .224  4.16 6.415  7.616 1
    4 2005 3.321  2.29  .807 .224  4.16 6.415  7.616 1
    4 2005 3.321  2.29  .807 .224  4.16 6.415  7.616 1
    4 2005 3.321  2.29  .807 .224  4.16 6.415  7.616 1
    4 2005 3.321  2.29  .807 .224  4.16 6.415  7.616 1
    4 2005 3.321  2.29  .807 .224  4.16 6.415  7.616 1
    4 2005 3.321  2.29  .807 .224  4.16 6.415  7.616 1
    4 2005 3.321  2.29  .807 .224  4.16 6.415  7.616 1
    4 2005 3.321  2.29  .807 .224  4.16 6.415  7.616 1
    4 2005 3.321  2.29  .807 .224  4.16 6.415  7.616 1
    4 2010 3.933 2.683 1.022 .228 4.254 8.638 11.648 1
    4 2010 3.933 2.683 1.022 .228 4.254 8.638 11.648 1
    4 2010 3.933 2.683 1.022 .228 4.254 8.638 11.648 1
    4 2010 3.933 2.683 1.022 .228 4.254 8.638 11.648 1
    4 2010 3.933 2.683 1.022 .228 4.254 8.638 11.648 1
    4 2010 3.933 2.683 1.022 .228 4.254 8.638 11.648 1
    4 2010 3.933 2.683 1.022 .228 4.254 8.638 11.648 1
    4 2010 3.933 2.683 1.022 .228 4.254 8.638 11.648 1
    4 2010 3.933 2.683 1.022 .228 4.254 8.638 11.648 1
    4 2010 3.933 2.683 1.022 .228 4.254 8.638 11.648 1
    4 2015 4.826 3.278 1.266 .282 4.967 9.631 18.218 1
    4 2015 4.826 3.278 1.266 .282 4.967 9.631 18.218 1
    4 2015 4.826 3.278 1.266 .282 4.967 9.631 18.218 1
    4 2015 4.826 3.278 1.266 .282 4.967 9.631 18.218 1
    4 2015 4.826 3.278 1.266 .282 4.967 9.631 18.218 1
    4 2015 4.826 3.278 1.266 .282 4.967 9.631 18.218 1
    4 2015 4.826 3.278 1.266 .282 4.967 9.631 18.218 1
    4 2015 4.826 3.278 1.266 .282 4.967 9.631 18.218 1
    4 2015 4.826 3.278 1.266 .282 4.967 9.631 18.218 1
    4 2015 4.826 3.278 1.266 .282 4.967 9.631 18.218 1
    4 1975   .97  .622  .259 .089  1.46  .677   .838 2
    4 1975   .97  .622  .259 .089  1.46  .677   .838 2
    4 1975   .97  .622  .259 .089  1.46  .677   .838 2
    4 1975   .97  .622  .259 .089  1.46  .677   .838 2
    4 1975   .97  .622  .259 .089  1.46  .677   .838 2
    4 1980 1.293  .816  .357  .12 1.987 1.052  1.524 2
    4 1980 1.293  .816  .357  .12 1.987 1.052  1.524 2
    4 1980 1.293  .816  .357  .12 1.987 1.052  1.524 2
    4 1980 1.293  .816  .357  .12 1.987 1.052  1.524 2
    4 1980 1.293  .816  .357  .12 1.987 1.052  1.524 2
    4 1985 1.733 1.162  .421 .151 2.649 1.485  3.154 2
    4 1985 1.733 1.162  .421 .151 2.649 1.485  3.154 2
    4 1985 1.733 1.162  .421 .151 2.649 1.485  3.154 2
    4 1985 1.733 1.162  .421 .151 2.649 1.485  3.154 2
    4 1985 1.733 1.162  .421 .151 2.649 1.485  3.154 2
    4 1990 2.065  1.44  .448 .177 3.172  1.81  5.004 2
    4 1990 2.065  1.44  .448 .177 3.172  1.81  5.004 2
    4 1990 2.065  1.44  .448 .177 3.172  1.81  5.004 2
    4 1990 2.065  1.44  .448 .177 3.172  1.81  5.004 2
    4 1990 2.065  1.44  .448 .177 3.172  1.81  5.004 2
    4 2005 3.321  2.29  .807 .224  4.16 6.415  7.616 2
    4 2005 3.321  2.29  .807 .224  4.16 6.415  7.616 2
    4 2005 3.321  2.29  .807 .224  4.16 6.415  7.616 2
    end




    Last edited by Hugo Rocha; 21 Jun 2022, 12:53.

  • #2
    Originally posted by Hugo Rocha View Post
    Hi!

    I want to run regressions where my dependent variable is the share of employment within a particular set of industries in a country and the dependent variable is the percent of people with college degree. For obvious reasons, I cannot run a straight OLS given that the share of employment is bounded [0,1]. Therefore, I tried a couple of fractional regressions as follows using fracreg probit and heteroskedastic probit but I do have a question regarding the marginal effects:

    Code:
     fracreg probit share_tech lhc i.country i.year if tech_intensity==1
    (fracreg probit)

    Code:
     margins, dyex(lhc)
    
    Average marginal effects Number of obs = 10,010
    Model VCE : Robust
    
    Expression : Conditional mean of share_tech, predict()
    dy/ex w.r.t. : lhc
    
    ------------------------------------------------------------------------------
    | Delta-method
    | dy/ex Std. Err. z P>|z| [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    lhc | -.0060946 .0019015 -3.21 0.001 -.0098215 -.0023677
    ------------------------------------------------------------------------------

    Code:
     fracreg probit share_tech lhc i.country i.year if tech_intensity==1, het(lhc) vce(robust)
    (Heteroskedastic Probit)

    Code:
     margins, dyex(lhc)
    
    Average marginal effects Number of obs = 10,010
    Model VCE : Robust
    
    Expression : Conditional mean of share_tech, predict()
    dy/ex w.r.t. : lhc
    
    ------------------------------------------------------------------------------
    | Delta-method
    | dy/ex Std. Err. z P>|z| [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    lhc | .0062602 .0023983 2.61 0.009 .0015597 .0109607
    Why do the signs of the marginal effects get completely changed in the second specification? (Any hint?)

    Can I use a fractional regression where I can weaken any distribution assumption and yet get robust results? This change in sign in the marginal effects raised an eyebrow

    dataex country year yr_sch yr_sch_pri yr_sch_sec yr_sch_ter lhc lsc lpc tech_intensity

    ----------------------- copy starting from the next line -----------------------
    Code:
     * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float country double year float(yr_sch yr_sch_pri yr_sch_sec yr_sch_ter lhc lsc lpc tech_intensity)
    4 1975 .97 .622 .259 .089 1.46 .677 .838 0
    4 1980 1.293 .816 .357 .12 1.987 1.052 1.524 0
    4 1985 1.733 1.162 .421 .151 2.649 1.485 3.154 0
    4 1990 2.065 1.44 .448 .177 3.172 1.81 5.004 0
    4 2005 3.321 2.29 .807 .224 4.16 6.415 7.616 0
    4 2010 3.933 2.683 1.022 .228 4.254 8.638 11.648 0
    4 2015 4.826 3.278 1.266 .282 4.967 9.631 18.218 0
    4 1975 .97 .622 .259 .089 1.46 .677 .838 1
    4 1975 .97 .622 .259 .089 1.46 .677 .838 1
    4 1975 .97 .622 .259 .089 1.46 .677 .838 1
    4 1975 .97 .622 .259 .089 1.46 .677 .838 1
    4 1975 .97 .622 .259 .089 1.46 .677 .838 1
    4 1975 .97 .622 .259 .089 1.46 .677 .838 1
    4 1975 .97 .622 .259 .089 1.46 .677 .838 1
    4 1975 .97 .622 .259 .089 1.46 .677 .838 1
    4 1975 .97 .622 .259 .089 1.46 .677 .838 1
    4 1975 .97 .622 .259 .089 1.46 .677 .838 1
    4 1980 1.293 .816 .357 .12 1.987 1.052 1.524 1
    4 1980 1.293 .816 .357 .12 1.987 1.052 1.524 1
    4 1980 1.293 .816 .357 .12 1.987 1.052 1.524 1
    4 1980 1.293 .816 .357 .12 1.987 1.052 1.524 1
    4 1980 1.293 .816 .357 .12 1.987 1.052 1.524 1
    4 1980 1.293 .816 .357 .12 1.987 1.052 1.524 1
    4 1980 1.293 .816 .357 .12 1.987 1.052 1.524 1
    4 1980 1.293 .816 .357 .12 1.987 1.052 1.524 1
    4 1980 1.293 .816 .357 .12 1.987 1.052 1.524 1
    4 1980 1.293 .816 .357 .12 1.987 1.052 1.524 1
    4 1985 1.733 1.162 .421 .151 2.649 1.485 3.154 1
    4 1985 1.733 1.162 .421 .151 2.649 1.485 3.154 1
    4 1985 1.733 1.162 .421 .151 2.649 1.485 3.154 1
    4 1985 1.733 1.162 .421 .151 2.649 1.485 3.154 1
    4 1985 1.733 1.162 .421 .151 2.649 1.485 3.154 1
    4 1985 1.733 1.162 .421 .151 2.649 1.485 3.154 1
    4 1985 1.733 1.162 .421 .151 2.649 1.485 3.154 1
    4 1985 1.733 1.162 .421 .151 2.649 1.485 3.154 1
    4 1985 1.733 1.162 .421 .151 2.649 1.485 3.154 1
    4 1985 1.733 1.162 .421 .151 2.649 1.485 3.154 1
    4 1990 2.065 1.44 .448 .177 3.172 1.81 5.004 1
    4 1990 2.065 1.44 .448 .177 3.172 1.81 5.004 1
    4 1990 2.065 1.44 .448 .177 3.172 1.81 5.004 1
    4 1990 2.065 1.44 .448 .177 3.172 1.81 5.004 1
    4 1990 2.065 1.44 .448 .177 3.172 1.81 5.004 1
    4 1990 2.065 1.44 .448 .177 3.172 1.81 5.004 1
    4 1990 2.065 1.44 .448 .177 3.172 1.81 5.004 1
    4 1990 2.065 1.44 .448 .177 3.172 1.81 5.004 1
    4 1990 2.065 1.44 .448 .177 3.172 1.81 5.004 1
    4 1990 2.065 1.44 .448 .177 3.172 1.81 5.004 1
    4 2005 3.321 2.29 .807 .224 4.16 6.415 7.616 1
    4 2005 3.321 2.29 .807 .224 4.16 6.415 7.616 1
    4 2005 3.321 2.29 .807 .224 4.16 6.415 7.616 1
    4 2005 3.321 2.29 .807 .224 4.16 6.415 7.616 1
    4 2005 3.321 2.29 .807 .224 4.16 6.415 7.616 1
    4 2005 3.321 2.29 .807 .224 4.16 6.415 7.616 1
    4 2005 3.321 2.29 .807 .224 4.16 6.415 7.616 1
    4 2005 3.321 2.29 .807 .224 4.16 6.415 7.616 1
    4 2005 3.321 2.29 .807 .224 4.16 6.415 7.616 1
    4 2005 3.321 2.29 .807 .224 4.16 6.415 7.616 1
    4 2010 3.933 2.683 1.022 .228 4.254 8.638 11.648 1
    4 2010 3.933 2.683 1.022 .228 4.254 8.638 11.648 1
    4 2010 3.933 2.683 1.022 .228 4.254 8.638 11.648 1
    4 2010 3.933 2.683 1.022 .228 4.254 8.638 11.648 1
    4 2010 3.933 2.683 1.022 .228 4.254 8.638 11.648 1
    4 2010 3.933 2.683 1.022 .228 4.254 8.638 11.648 1
    4 2010 3.933 2.683 1.022 .228 4.254 8.638 11.648 1
    4 2010 3.933 2.683 1.022 .228 4.254 8.638 11.648 1
    4 2010 3.933 2.683 1.022 .228 4.254 8.638 11.648 1
    4 2010 3.933 2.683 1.022 .228 4.254 8.638 11.648 1
    4 2015 4.826 3.278 1.266 .282 4.967 9.631 18.218 1
    4 2015 4.826 3.278 1.266 .282 4.967 9.631 18.218 1
    4 2015 4.826 3.278 1.266 .282 4.967 9.631 18.218 1
    4 2015 4.826 3.278 1.266 .282 4.967 9.631 18.218 1
    4 2015 4.826 3.278 1.266 .282 4.967 9.631 18.218 1
    4 2015 4.826 3.278 1.266 .282 4.967 9.631 18.218 1
    4 2015 4.826 3.278 1.266 .282 4.967 9.631 18.218 1
    4 2015 4.826 3.278 1.266 .282 4.967 9.631 18.218 1
    4 2015 4.826 3.278 1.266 .282 4.967 9.631 18.218 1
    4 2015 4.826 3.278 1.266 .282 4.967 9.631 18.218 1
    4 1975 .97 .622 .259 .089 1.46 .677 .838 2
    4 1975 .97 .622 .259 .089 1.46 .677 .838 2
    4 1975 .97 .622 .259 .089 1.46 .677 .838 2
    4 1975 .97 .622 .259 .089 1.46 .677 .838 2
    4 1975 .97 .622 .259 .089 1.46 .677 .838 2
    4 1980 1.293 .816 .357 .12 1.987 1.052 1.524 2
    4 1980 1.293 .816 .357 .12 1.987 1.052 1.524 2
    4 1980 1.293 .816 .357 .12 1.987 1.052 1.524 2
    4 1980 1.293 .816 .357 .12 1.987 1.052 1.524 2
    4 1980 1.293 .816 .357 .12 1.987 1.052 1.524 2
    4 1985 1.733 1.162 .421 .151 2.649 1.485 3.154 2
    4 1985 1.733 1.162 .421 .151 2.649 1.485 3.154 2
    4 1985 1.733 1.162 .421 .151 2.649 1.485 3.154 2
    4 1985 1.733 1.162 .421 .151 2.649 1.485 3.154 2
    4 1985 1.733 1.162 .421 .151 2.649 1.485 3.154 2
    4 1990 2.065 1.44 .448 .177 3.172 1.81 5.004 2
    4 1990 2.065 1.44 .448 .177 3.172 1.81 5.004 2
    4 1990 2.065 1.44 .448 .177 3.172 1.81 5.004 2
    4 1990 2.065 1.44 .448 .177 3.172 1.81 5.004 2
    4 1990 2.065 1.44 .448 .177 3.172 1.81 5.004 2
    4 2005 3.321 2.29 .807 .224 4.16 6.415 7.616 2
    4 2005 3.321 2.29 .807 .224 4.16 6.415 7.616 2
    4 2005 3.321 2.29 .807 .224 4.16 6.415 7.616 2
    end



    I am not sure if the question is clear. My question goes as to why the sign of the marginal effects change so drastically when I make a heteroskedastic probit in the fractional model, my apologies? And what type of model do you suggest in this scenario? Thanks!

    Comment


    • #3
      Hugo: This isn't an issue of being robust to distributional misspecification. The GLM approach makes no distributional assumptions beyond specifying the mean. So your problem is one of pure conditional mean specification. It is puzzling but not unheard of that different conditional means give APEs with different signs. I would avoid the heteroskedastic probit in this case until you try some other remedies, such as including squares and interactions among the variables. It will be easier to see what's going on. The hetprobit model can be poorly identified in some cases, although I don't think that's going on in your case. Incidentally, you should show the estimated parameters, not just what happens when you use margins.

      Comment


      • #4
        Originally posted by Jeff Wooldridge View Post
        Hugo: This isn't an issue of being robust to distributional misspecification. The GLM approach makes no distributional assumptions beyond specifying the mean. So your problem is one of pure conditional mean specification. It is puzzling but not unheard of that different conditional means give APEs with different signs. I would avoid the heteroskedastic probit in this case until you try some other remedies, such as including squares and interactions among the variables. It will be easier to see what's going on. The hetprobit model can be poorly identified in some cases, although I don't think that's going on in your case. Incidentally, you should show the estimated parameters, not just what happens when you use margins.
        Yes, I definitely forgot about the heteroskedastic probit. My issue is the conditional mean...

        Comment


        • #5
          The heteroskedastic probit is a conditional mean issue. It’s a different model for E(y|x). Nothing more, nothing less.

          Comment

          Working...
          X