Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Percentages versus proportions as explanatory variables

    Hi All, I am having some difficulty with regression. My model employs negative binomial regression, the dependent variable is a count and the explanatory variables are proportions. The problem is I get ridiculously large IRRs when I treat the explanatory variables as proportions (e.g 0.3445, etc) but when I multiply by a constant, say 100 to imply a percentage, I get more "meaningful" output. Is it valid or proper to enter 25% as 25 in stata instead of 0.25? I hope I am clear enough and thanks in advance.

  • #2
    In general you want to scale independent variables so that the effect of a one unit change is substantively meaningful and easy to interpret. If you enter the variable as a proportion, then a 1 unit change is a 100 percentage point change -- which can't even happen except for those who start at 0. Multiply by 100 and then a 1 unit change means a 1 percentage point change, which is probably much more meaningful and useful. This applies to other things besides proportions. If you measure income in dollars, the effect of a dollar change may be incredibly small, maybe even so small that you only see zeros reported for the coefficients. Measuring income in thousands of dollars will often work much better. Conversely measuring income in trillions of dollars (at least for individuals) may yield ridiculously large effects. Note that ridiculous does not mean incorrect (unless the scaling is so small or so large that it creates computer precision problems). But a good scaling creates results that are easier to explain and interpret.
    -------------------------------------------
    Richard Williams, Notre Dame Dept of Sociology
    Stata Version: 17.0 MP (2 processor)

    EMAIL: [email protected]
    WWW: https://www3.nd.edu/~rwilliam

    Comment


    • #3
      In general, a regression coefficient tells you the change in the dependent variable for every change in one unit of the explanatory variable. If your explanatory variable is a proportion, a one unit change is 100 percentage points, which is why the coefficients are so large. Using percentages gives you the effect of a one percentage point change, which is much more interpretable. Both are technically correct (presuming, of course, that the proportion/percent variables adhere to the requirements of your model in the first place), since multiplying an explanatory variable by a constant does not change the significance or interpretation.

      Comment


      • #4
        Hey Richard and Joe thank you so very much for the thorough explanations and helping me out with this. I really appreciate it!!!

        Comment


        • #5
          It shouldn't make a difference either way. nbreg is generally more difficult to interpret. If you have a large sample, you could try using OLS as baseline, the estimates are consistent and easier to interpret.

          The other way is to log the dependent variable, then you have a log-linear. Here is some useful resource if you don't find it too elementary:

          http://www.cazaar.com/ta/econ113/interpreting-beta

          Comment


          • #6
            Kevin, I always thought OLS was not appropriate for count data/rare events (dependent variable) and I chose nbreg because the distribution of my data did not satisfy Poisson assumptions - but in any case my model makes more "sense" after converting the proportions to percent (explanatory variables). I am pretty convinced that the coefficients were correct in either case and the issue is just one of interpretation.Thanks!

            Comment


            • #7
              I actually think the exponentiated coefficients in nbreg and other count models are pretty easy to interpret, at least when compared to things like logit and probit. If, say, the exponentiated coefficient for female is 1.5, then you know that the female rate is 50% greater than the male rate.
              -------------------------------------------
              Richard Williams, Notre Dame Dept of Sociology
              Stata Version: 17.0 MP (2 processor)

              EMAIL: [email protected]
              WWW: https://www3.nd.edu/~rwilliam

              Comment


              • #8
                Folks,

                Perhaps someone could shed some light on the following predicament I have.

                I'm looking to examine the influence bias in the distribution of capital grants. I utilise two dependent variables; the first being the natural logarithm of grant club i received. The second is the grant club i received as a proportion of the total amount it sought. In essence grant awarded/grant sought.

                My explanatory variables include the natural logarithm of population, the natural logarithm of population per km2 (urbanisation). I also have, measured as a percentage, those in the age bracket of 0-19. The unemployment rate, and those who are employed as either managers, higher professionals or owners.

                My bias variables is the inverse distance (km) between the Minister of Finance to club i.

                What I'm wondering is there an issue with having such a variety of explanatory variables in different forms, logs, percentages and kms?

                I don't see the logic in transforming my distance variables or percentage variables into logs.

                However, running my model under OLS creates some difficult coefficients to analyse.

                For example, a 1km decrease in the distance between the hometown of the Minister of Finance and club i increases the level of grant awarded to club i by 1.1889 (coefficient score).

                Surely this isn't correct?

                *Note I originally thought it might have something to do with some of the explanatory variables being highly correlated, however this doesn't appear to be true.

                Code:
                         | loggrant   logpop   pop19p unempl~e logurban highea~r    infin
                -------------+---------------------------------------------------------------
                    loggrant |   1.0000
                      logpop |   0.0435   1.0000
                      pop19p |   0.0486   0.0546   1.0000
                unemployma~e |  -0.0616   0.0667  -0.0424   1.0000
                    logurban |   0.0056   0.6123  -0.4425   0.1901   1.0000
                  highearner |   0.0486   0.2357   0.0060  -0.6915   0.1260   1.0000
                       infin |   0.0262   0.0573   0.0134   0.0356   0.0304  -0.0016   1.0000
                     insport |   0.0181  -0.0023  -0.0332  -0.0261  -0.0387  -0.0061  -0.0037
                       ingaa |   0.0058  -0.0180   0.0090  -0.0131  -0.0259  -0.0097  -0.0006
                      inirfu |   0.0144   0.0847  -0.2072  -0.1222   0.2433   0.1813  -0.0027
                       infai |  -0.0013  -0.0039  -0.0150   0.1040   0.0382  -0.0479  -0.0023
                
                             |  insport    ingaa   inirfu    infai
                -------------+------------------------------------
                     insport |   1.0000
                       ingaa |  -0.0012   1.0000
                      inirfu |  -0.0127  -0.0072   1.0000
                       infai |  -0.0037  -0.0001  -0.0128   1.0000

                Code:
                             |      rec   logpop   pop19p unempl~e logurban highea~r    infin
                -------------+---------------------------------------------------------------
                         rec |   1.0000
                      logpop |   0.0719   1.0000
                      pop19p |  -0.0557   0.0546   1.0000
                unemployma~e |  -0.0415   0.0667  -0.0424   1.0000
                    logurban |   0.1338   0.6123  -0.4425   0.1901   1.0000
                  highearner |   0.0955   0.2357   0.0060  -0.6915   0.1260   1.0000
                       infin |  -0.0213   0.0573   0.0134   0.0356   0.0304  -0.0016   1.0000
                     insport |  -0.0202  -0.0023  -0.0332  -0.0261  -0.0387  -0.0061  -0.0037
                       ingaa |   0.0073  -0.0180   0.0090  -0.0131  -0.0259  -0.0097  -0.0006
                      inirfu |   0.0760   0.0847  -0.2072  -0.1222   0.2433   0.1813  -0.0027
                       infai |  -0.0040  -0.0039  -0.0150   0.1040   0.0382  -0.0479  -0.0023
                
                             |  insport    ingaa   inirfu    infai
                -------------+------------------------------------
                     insport |   1.0000
                       ingaa |  -0.0012   1.0000
                      inirfu |  -0.0127  -0.0072   1.0000
                       infai |  -0.0037  -0.0001  -0.0128   1.0000
                Last edited by Sean O'Connor; 19 May 2016, 06:11.

                Comment


                • #9
                  Anyone have any info on the above?

                  Comment


                  • #10
                    Sean.
                    queuing up to others' query is not that fruitful (unless the topic is the same).
                    Hence, you'd better off with starting a new thread.
                    That said, it would have been better posting your OLS code and outcome, too.
                    Kind regards,
                    Carlo
                    (Stata 18.0 SE)

                    Comment


                    • #11
                      Originally posted by Carlo Lazzaro View Post
                      Sean.
                      queuing up to others' query is not that fruitful (unless the topic is the same).
                      Hence, you'd better off with starting a new thread.
                      That said, it would have been better posting your OLS code and outcome, too.
                      Thank you Carlo, I will do that now.

                      Comment

                      Working...
                      X