Percentages versus proportions as explanatory variables

tellvictor

Join Date: Apr 2014

Posts: 3
#1

Percentages versus proportions as explanatory variables

23 Apr 2014, 15:19

Hi All, I am having some difficulty with regression. My model employs negative binomial regression, the dependent variable is a count and the explanatory variables are proportions. The problem is I get ridiculously large IRRs when I treat the explanatory variables as proportions (e.g 0.3445, etc) but when I multiply by a constant, say 100 to imply a percentage, I get more "meaningful" output. Is it valid or proper to enter 25% as 25 in stata instead of 0.25? I hope I am clear enough and thanks in advance.
Tags: None
Richard Williams

Join Date: Apr 2014

Posts: 4829
#2

23 Apr 2014, 15:56

In general you want to scale independent variables so that the effect of a one unit change is substantively meaningful and easy to interpret. If you enter the variable as a proportion, then a 1 unit change is a 100 percentage point change -- which can't even happen except for those who start at 0. Multiply by 100 and then a 1 unit change means a 1 percentage point change, which is probably much more meaningful and useful. This applies to other things besides proportions. If you measure income in dollars, the effect of a dollar change may be incredibly small, maybe even so small that you only see zeros reported for the coefficients. Measuring income in thousands of dollars will often work much better. Conversely measuring income in trillions of dollars (at least for individuals) may yield ridiculously large effects. Note that ridiculous does not mean incorrect (unless the scaling is so small or so large that it creates computer precision problems). But a good scaling creates results that are easier to explain and interpret.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
Stata Version: 17.0 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
1 like
Comment
Joe Canner

Join Date: Mar 2014

Posts: 580
#3

23 Apr 2014, 16:05

In general, a regression coefficient tells you the change in the dependent variable for every change in one unit of the explanatory variable. If your explanatory variable is a proportion, a one unit change is 100 percentage points, which is why the coefficients are so large. Using percentages gives you the effect of a one percentage point change, which is much more interpretable. Both are technically correct (presuming, of course, that the proportion/percent variables adhere to the requirements of your model in the first place), since multiplying an explanatory variable by a constant does not change the significance or interpretation.
1 like
Comment
tellvictor

Join Date: Apr 2014

Posts: 3
#4

24 Apr 2014, 08:12

Hey Richard and Joe thank you so very much for the thorough explanations and helping me out with this. I really appreciate it!!!
Comment
Kevin

Join Date: Apr 2014

Posts: 6
#5

24 Apr 2014, 09:47

It shouldn't make a difference either way. nbreg is generally more difficult to interpret. If you have a large sample, you could try using OLS as baseline, the estimates are consistent and easier to interpret.

The other way is to log the dependent variable, then you have a log-linear. Here is some useful resource if you don't find it too elementary:

http://www.cazaar.com/ta/econ113/interpreting-beta
Comment
tellvictor

Join Date: Apr 2014

Posts: 3
#6

24 Apr 2014, 09:59

Kevin, I always thought OLS was not appropriate for count data/rare events (dependent variable) and I chose nbreg because the distribution of my data did not satisfy Poisson assumptions - but in any case my model makes more "sense" after converting the proportions to percent (explanatory variables). I am pretty convinced that the coefficients were correct in either case and the issue is just one of interpretation.Thanks!
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4829
#7

24 Apr 2014, 10:26

I actually think the exponentiated coefficients in nbreg and other count models are pretty easy to interpret, at least when compared to things like logit and probit. If, say, the exponentiated coefficient for female is 1.5, then you know that the female rate is 50% greater than the male rate.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
Stata Version: 17.0 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment

Sean O'Connor

Join Date: Jun 2014
Posts: 119

19 May 2016, 06:02

Folks,

Perhaps someone could shed some light on the following predicament I have.

I'm looking to examine the influence bias in the distribution of capital grants. I utilise two dependent variables; the first being the natural logarithm of grant club i received. The second is the grant club i received as a proportion of the total amount it sought. In essence grant awarded/grant sought.

My explanatory variables include the natural logarithm of population, the natural logarithm of population per km2 (urbanisation). I also have, measured as a percentage, those in the age bracket of 0-19. The unemployment rate, and those who are employed as either managers, higher professionals or owners.

My bias variables is the inverse distance (km) between the Minister of Finance to club i.

What I'm wondering is there an issue with having such a variety of explanatory variables in different forms, logs, percentages and kms?

I don't see the logic in transforming my distance variables or percentage variables into logs.

However, running my model under OLS creates some difficult coefficients to analyse.

For example, a 1km decrease in the distance between the hometown of the Minister of Finance and club i increases the level of grant awarded to club i by 1.1889 (coefficient score).

Surely this isn't correct?

*Note I originally thought it might have something to do with some of the explanatory variables being highly correlated, however this doesn't appear to be true.

Code:

         | loggrant   logpop   pop19p unempl~e logurban highea~r    infin
-------------+---------------------------------------------------------------
    loggrant |   1.0000
      logpop |   0.0435   1.0000
      pop19p |   0.0486   0.0546   1.0000
unemployma~e |  -0.0616   0.0667  -0.0424   1.0000
    logurban |   0.0056   0.6123  -0.4425   0.1901   1.0000
  highearner |   0.0486   0.2357   0.0060  -0.6915   0.1260   1.0000
       infin |   0.0262   0.0573   0.0134   0.0356   0.0304  -0.0016   1.0000
     insport |   0.0181  -0.0023  -0.0332  -0.0261  -0.0387  -0.0061  -0.0037
       ingaa |   0.0058  -0.0180   0.0090  -0.0131  -0.0259  -0.0097  -0.0006
      inirfu |   0.0144   0.0847  -0.2072  -0.1222   0.2433   0.1813  -0.0027
       infai |  -0.0013  -0.0039  -0.0150   0.1040   0.0382  -0.0479  -0.0023

             |  insport    ingaa   inirfu    infai
-------------+------------------------------------
     insport |   1.0000
       ingaa |  -0.0012   1.0000
      inirfu |  -0.0127  -0.0072   1.0000
       infai |  -0.0037  -0.0001  -0.0128   1.0000

Code:

             |      rec   logpop   pop19p unempl~e logurban highea~r    infin
-------------+---------------------------------------------------------------
         rec |   1.0000
      logpop |   0.0719   1.0000
      pop19p |  -0.0557   0.0546   1.0000
unemployma~e |  -0.0415   0.0667  -0.0424   1.0000
    logurban |   0.1338   0.6123  -0.4425   0.1901   1.0000
  highearner |   0.0955   0.2357   0.0060  -0.6915   0.1260   1.0000
       infin |  -0.0213   0.0573   0.0134   0.0356   0.0304  -0.0016   1.0000
     insport |  -0.0202  -0.0023  -0.0332  -0.0261  -0.0387  -0.0061  -0.0037
       ingaa |   0.0073  -0.0180   0.0090  -0.0131  -0.0259  -0.0097  -0.0006
      inirfu |   0.0760   0.0847  -0.2072  -0.1222   0.2433   0.1813  -0.0027
       infai |  -0.0040  -0.0039  -0.0150   0.1040   0.0382  -0.0479  -0.0023

             |  insport    ingaa   inirfu    infai
-------------+------------------------------------
     insport |   1.0000
       ingaa |  -0.0012   1.0000
      inirfu |  -0.0127  -0.0072   1.0000
       infai |  -0.0037  -0.0001  -0.0128   1.0000

Last edited by Sean O'Connor; 19 May 2016, 06:11.

Comment

Sean O'Connor

Join Date: Jun 2014

Posts: 119
#9

23 May 2016, 02:13

Anyone have any info on the above?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17084
#10

23 May 2016, 03:02

Sean.
queuing up to others' query is not that fruitful (unless the topic is the same).
Hence, you'd better off with starting a new thread.
That said, it would have been better posting your OLS code and outcome, too.

Kind regards,
Carlo
(Stata 18.0 SE)
Comment
Sean O'Connor

Join Date: Jun 2014

Posts: 119
#11

23 May 2016, 03:16

Originally posted by Carlo Lazzaro View Post

Sean.
queuing up to others' query is not that fruitful (unless the topic is the same).
Hence, you'd better off with starting a new thread.
That said, it would have been better posting your OLS code and outcome, too.

Thank you Carlo, I will do that now.
Comment

Announcement