OLS or logistic

Ama Perera

Join Date: Mar 2019

Posts: 43
#1

OLS or logistic

01 Nov 2023, 18:32

Hello everyone,

I'm currently working on cross-sectional regression analyses involving 100 firms during the sample period in 2020. The dependent variable, which measures the quality of compliance, has a range from 0 to 20, where 0 indicates the lowest compliance and 20 signifies the highest compliance. My independent variables include both continuous and dummy variables.

I'm seeking advice on which regression model would be more suitable for these cross-sectional regressions.

01. My initial thought was to use Ordinary Least Squares (OLS) with a simple command like "reg dependent independent controls."

02. However, given that one could argue my dependent variable is a categorical variable (as described above), some may suggest that logistic regression might be a more appropriate model. This would involve a command along the lines of "logit dependent independent controls."

Your insights on this matter would be greatly appreciated

Thank you.
Tags: categorical, logit, OLS, regression
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#2

01 Nov 2023, 19:07

First, I'm assuming that when you say your outcome variable ranges from 0 to 20, you mean that it takes on the discrete values 0, 1, 2, 3, ..., 18, 19, 20. If that is not so, then I don't think there is any case for anything logit-like at al.

-logit dependent independent controls- would not treat your outcome as a 21-category discrete variable. -logit- distinguishes only two categories: 0 and non-zero. So you would be analyzing only the lowest level of compliance vs compliance at any other level. It would not distinguish at all among levels 1 through 20. You might be thinking of -ologit- which would treated it as a 21-category ordinal outcome variable. While this is feasible, the results will be questionable because the proportional odds assumption undelrying -ologit- becomes less and less likely to be true the more categories are involved. -mlogit- requires no such assumption, but it treats the 21 categories as if they have no relationship to each other at all. From your description of the variable as a degree of compliance, that seems far off the mark: at the very least it sounds like it represents an ordinal measurement: 15 represents more compliance than 14 and less than 16, etc. So I would only use -mlogit- as a last resort here.

The major concern with using -regress- is whether there is, in fact, a linear relationship between the outcome and some combination of your predictor variables. You can explore that graphically, and if it seems implausible, you can try transforming some of the variables (outcome or predictor) to achieve at least approximate linearity. Having not seen your data myself, I would start with the assumption that OLS is your best bet here.
1 like
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2173
#3

01 Nov 2023, 20:07

You can either divide the variable by 20 and turn it into a fraction, and then use fractional logit, or use binomial regression -- both with robust standard errors. In the latter case,

Code:

glm y i.x1 c.x2 ... c.xk, fam(bin 20) link(logit) vce(robust) margins, dydx(*)

In the former case:

Code:

gen w = y/20 glm w i.x1 c.x2 ... c.xk, fam(bin) link(logit) vce(robust) margins, dydx(*)

or

Code:

gen w = y/20 fracreg logit w i.x1 c.x2 ... c.xk, vce(robust) margins, dydx(*)

The average partial effects obtained from -margins- can be compared directly with OLS coefficients from a linear model.
1 like
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2173
#4

01 Nov 2023, 20:08

Another possibility is an ordered logit (or ordered probit), but these are more difficult to summarize because you get effects on the probability of moving from one level to another. My guess is you want a kind of average effect, and the previous commands provide that.
Comment
Ama Perera

Join Date: Mar 2019

Posts: 43
#5

01 Nov 2023, 20:30

Originally posted by Jeff Wooldridge View Post

You can either divide the variable by 20 and turn it into a fraction, and then use fractional logit, or use binomial regression -- both with robust standard errors. In the latter case,

Code:

glm y i.x1 c.x2 ... c.xk, fam(bin 20) link(logit) vce(robust) margins, dydx(*)

In the former case:

Code:

gen w = y/20 glm w i.x1 c.x2 ... c.xk, fam(bin) link(logit) vce(robust) margins, dydx(*)

or

Code:

gen w = y/20 fracreg logit w i.x1 c.x2 ... c.xk, vce(robust) margins, dydx(*)

The average partial effects obtained from -margins- can be compared directly with OLS coefficients from a linear model.

Thank you very much Jeff, for your response. Below are the results that I received from OLS coefficients and binomial regression, both with robust standard errors. The coefficients, though slightly differ, are largely similar.

OLS:

-------------------------------------------------------------------------------
| Robust
QUALITY | Coef. Std. Err. t P>|t| [95% Conf. Interval]
--------------+----------------------------------------------------------------
SCALE | .5224818 .2461665 2.12 0.037 .0325949 1.012369
PROFITS| -2.4408 4.261876 -0.57 0.568 -10.9222 6.040604
CVE| 1.428233 1.073366 1.33 0.187 -.7078324 3.564299
ASSURED | 1.417548 .5753828 2.46 0.016 .2724999 2.562596
GENDER | -4.751403 2.420745 -1.96 0.053 -9.56884 .066033
COMMITTEE | 8.378462 .9673264 8.66 0.000 6.453422 10.3035
_cons | 3.601044 2.538984 1.42 0.160 -1.451696 8.653784
-------------------------------------------------------------------------------

Binomial:

-------------------------------------------------------------------------------
| Delta-method
| dy/dx Std. Err. z P>|z| [95% Conf. Interval]
--------------+----------------------------------------------------------------
SCALE| .5761003 .2600962 2.21 0.027 .0663211 1.085879
PROFITS| -2.498156 4.689529 -0.53 0.594 -11.68946 6.693153
CVE| 1.582947 1.373622 1.15 0.249 -1.109303 4.275197
ASSURED| 1.334739 .5131989 2.60 0.009 .3288878 2.340591
GENDER| -4.5195 2.429773 -1.86 0.063 -9.281769 .2427682
COMMITTEE| 6.130753 .4899829 12.51 0.000 5.170404 7.091102
-------------------------------------------------------------------------------
Does this mean using either model would be ok?

Thanks.
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2173
#6

01 Nov 2023, 21:18

Those findings are within expectations, so you can choose one to report and discuss in the text and the other as a robustness check. I don't think it much matters which you choose for reporting the average effects. However, as you move the explanatory towards extreme values -- say, increase profits -- the marginal effects from the binomial will be more plausible. Is one of the variables of more interest than the others?
1 like
Comment

Ama Perera

Join Date: Mar 2019
Posts: 43

01 Nov 2023, 22:13

Originally posted by Jeff Wooldridge View Post

Those findings are within expectations, so you can choose one to report and discuss in the text and the other as a robustness check. I don't think it much matters which you choose for reporting the average effects. However, as you move the explanatory towards extreme values -- say, increase profits -- the marginal effects from the binomial will be more plausible. Is one of the variables of more interest than the others?

Thanks again, Jeff. All variables are of interest, with no main focus on a single variable. Could you please also shed some light on interpreting the coefficients (average effects) from the binomial regression?
Definition of the variables are as follows;

Variable	Variable description

QUALITY (dependent var)	Quality of compliance, value ranges from 0-20
SCALE	Firm size measured by the logarithm of total assets
PROFITS	Return on Assets
CVE	Whether the Firm is defined as a Reporting Entity (this is a dummy variable)
ASSURED	Assurance of quality by a third party (this is a categorical variable with values from 0-2)
GENDER	The proportion of male directors
COMMITTEE	Establishment of a corporate governance Committee (this is a dummy variable)

Announcement

Comment

Comment

Comment

Comment

Comment

Comment