Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • OLS or logistic

    Hello everyone,

    I'm currently working on cross-sectional regression analyses involving 100 firms during the sample period in 2020. The dependent variable, which measures the quality of compliance, has a range from 0 to 20, where 0 indicates the lowest compliance and 20 signifies the highest compliance. My independent variables include both continuous and dummy variables.

    I'm seeking advice on which regression model would be more suitable for these cross-sectional regressions.

    01. My initial thought was to use Ordinary Least Squares (OLS) with a simple command like "reg dependent independent controls."

    02. However, given that one could argue my dependent variable is a categorical variable (as described above), some may suggest that logistic regression might be a more appropriate model. This would involve a command along the lines of "logit dependent independent controls."

    Your insights on this matter would be greatly appreciated


    Thank you.

  • #2
    First, I'm assuming that when you say your outcome variable ranges from 0 to 20, you mean that it takes on the discrete values 0, 1, 2, 3, ..., 18, 19, 20. If that is not so, then I don't think there is any case for anything logit-like at al.

    -logit dependent independent controls- would not treat your outcome as a 21-category discrete variable. -logit- distinguishes only two categories: 0 and non-zero. So you would be analyzing only the lowest level of compliance vs compliance at any other level. It would not distinguish at all among levels 1 through 20. You might be thinking of -ologit- which would treated it as a 21-category ordinal outcome variable. While this is feasible, the results will be questionable because the proportional odds assumption undelrying -ologit- becomes less and less likely to be true the more categories are involved. -mlogit- requires no such assumption, but it treats the 21 categories as if they have no relationship to each other at all. From your description of the variable as a degree of compliance, that seems far off the mark: at the very least it sounds like it represents an ordinal measurement: 15 represents more compliance than 14 and less than 16, etc. So I would only use -mlogit- as a last resort here.

    The major concern with using -regress- is whether there is, in fact, a linear relationship between the outcome and some combination of your predictor variables. You can explore that graphically, and if it seems implausible, you can try transforming some of the variables (outcome or predictor) to achieve at least approximate linearity. Having not seen your data myself, I would start with the assumption that OLS is your best bet here.

    Comment


    • #3
      You can either divide the variable by 20 and turn it into a fraction, and then use fractional logit, or use binomial regression -- both with robust standard errors. In the latter case,

      Code:
      glm y i.x1 c.x2 ... c.xk, fam(bin 20) link(logit) vce(robust)
      margins, dydx(*)
      In the former case:

      Code:
      gen w = y/20
      glm w i.x1 c.x2 ... c.xk, fam(bin) link(logit) vce(robust)
      margins, dydx(*)
      or

      Code:
      gen w = y/20
      fracreg logit w i.x1 c.x2 ... c.xk, vce(robust)
      margins, dydx(*)
      The average partial effects obtained from -margins- can be compared directly with OLS coefficients from a linear model.

      Comment


      • #4
        Another possibility is an ordered logit (or ordered probit), but these are more difficult to summarize because you get effects on the probability of moving from one level to another. My guess is you want a kind of average effect, and the previous commands provide that.

        Comment


        • #5
          Originally posted by Jeff Wooldridge View Post
          You can either divide the variable by 20 and turn it into a fraction, and then use fractional logit, or use binomial regression -- both with robust standard errors. In the latter case,

          Code:
          glm y i.x1 c.x2 ... c.xk, fam(bin 20) link(logit) vce(robust)
          margins, dydx(*)
          In the former case:

          Code:
          gen w = y/20
          glm w i.x1 c.x2 ... c.xk, fam(bin) link(logit) vce(robust)
          margins, dydx(*)
          or

          Code:
          gen w = y/20
          fracreg logit w i.x1 c.x2 ... c.xk, vce(robust)
          margins, dydx(*)
          The average partial effects obtained from -margins- can be compared directly with OLS coefficients from a linear model.
          Thank you very much Jeff, for your response. Below are the results that I received from OLS coefficients and binomial regression, both with robust standard errors. The coefficients, though slightly differ, are largely similar.

          OLS:



          -------------------------------------------------------------------------------
          | Robust
          QUALITY | Coef. Std. Err. t P>|t| [95% Conf. Interval]
          --------------+----------------------------------------------------------------
          SCALE | .5224818 .2461665 2.12 0.037 .0325949 1.012369
          PROFITS| -2.4408 4.261876 -0.57 0.568 -10.9222 6.040604
          CVE| 1.428233 1.073366 1.33 0.187 -.7078324 3.564299
          ASSURED | 1.417548 .5753828 2.46 0.016 .2724999 2.562596
          GENDER | -4.751403 2.420745 -1.96 0.053 -9.56884 .066033
          COMMITTEE | 8.378462 .9673264 8.66 0.000 6.453422 10.3035
          _cons | 3.601044 2.538984 1.42 0.160 -1.451696 8.653784
          -------------------------------------------------------------------------------


          Binomial:

          -------------------------------------------------------------------------------
          | Delta-method
          | dy/dx Std. Err. z P>|z| [95% Conf. Interval]
          --------------+----------------------------------------------------------------
          SCALE| .5761003 .2600962 2.21 0.027 .0663211 1.085879
          PROFITS| -2.498156 4.689529 -0.53 0.594 -11.68946 6.693153
          CVE| 1.582947 1.373622 1.15 0.249 -1.109303 4.275197
          ASSURED| 1.334739 .5131989 2.60 0.009 .3288878 2.340591
          GENDER| -4.5195 2.429773 -1.86 0.063 -9.281769 .2427682
          COMMITTEE| 6.130753 .4899829 12.51 0.000 5.170404 7.091102
          -------------------------------------------------------------------------------
          Does this mean using either model would be ok?

          Thanks.

          Comment


          • #6
            Those findings are within expectations, so you can choose one to report and discuss in the text and the other as a robustness check. I don't think it much matters which you choose for reporting the average effects. However, as you move the explanatory towards extreme values -- say, increase profits -- the marginal effects from the binomial will be more plausible. Is one of the variables of more interest than the others?

            Comment


            • #7
              Originally posted by Jeff Wooldridge View Post
              Those findings are within expectations, so you can choose one to report and discuss in the text and the other as a robustness check. I don't think it much matters which you choose for reporting the average effects. However, as you move the explanatory towards extreme values -- say, increase profits -- the marginal effects from the binomial will be more plausible. Is one of the variables of more interest than the others?
              Thanks again, Jeff. All variables are of interest, with no main focus on a single variable. Could you please also shed some light on interpreting the coefficients (average effects) from the binomial regression?
              Definition of the variables are as follows;
              Variable Variable description
              QUALITY (dependent var) Quality of compliance, value ranges from 0-20
              SCALE Firm size measured by the logarithm of total assets
              PROFITS Return on Assets
              CVE Whether the Firm is defined as a Reporting Entity (this is a dummy variable)
              ASSURED Assurance of quality by a third party (this is a categorical variable with values from 0-2)
              GENDER The proportion of male directors
              COMMITTEE Establishment of a corporate governance Committee (this is a dummy variable)

              Comment

              Working...
              X