Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • OLS vs logistic regression*

    Hello,

    I was hoping someone could clarify this for me. I have used OLS regressions using the PISA dataset where most of my outcome variables are categorical in nature and measured on 4 point Likert scales, ranging from "Strongly disagree to strongly agree" with observations spread across all answer categories. with the exception of one variable that was just 'yes' or 'no'. My independent variables have been recoded as dummy variables including 'gender' and others such as a school being urban or rural.

    Would I be correct to justify OLS as my outcome variables are categorical in nature, non-extreme in the value they take i.e. not just 0 or 1 and independent variables have been recoded as dummies, therefore, OLS can be used over logistic regressions and easily interpretable. All responses in PISA are scaled using item response theory. There has been no issue with my model also.

    thank you in advance.

  • #2
    Grainne:
    if the regressand in in fact ordered, you should go -ologit-.
    The fact that predictors are categorical or continuous has no bearing on what above.
    Kind regards,
    Carlo
    (Stata 18.0 SE)

    Comment


    • #3
      Let's suppose that your outcome variables are coded 1 2 3 4. Then using plain linear regression -- which I guess is what you mean by OLS, as OLS is an estimation method and not a model -- is justified only by whether it seems to give sensible results.

      As supervisors, examiners, reviewers -- depending on what comes next in your work -- may well tell you is that in principle linear regression is wrong if outcomes are grades 1, 2, 3, 4 as

      1. It could produce predictions outside [1, 4] although with your kind of data that may be unlikely with observed data.

      2. It will produce non-integer predictions as in effect you are treating your outcomes as interval scale measurements in which not only do 1.2 or 3.8 (say) make sense, but also all differences among 1, 2, 3, 4 are exactly equivalent differences on a measured scales.

      The fact that logit regression assumes observed outcomes within an interval applies equally to your own outcomes, just with different bounds.

      As Carlo Lazzaro points out, the nature of your predictors is immaterial to these considerations.

      I don't think there is any "right in principle" here. There may be "acceptable in practice". For example, in my day job we routinely take meansof percent marks that would not satisfy a measurement theorist, and so many universities.



      Comment


      • #4
        Thank you Nick, that has helped make more sense of it.

        I have seen quite a few debates on Likert being categorical or continuous which did throw me.
        Like you said my outcomes are coded 1-4.

        My supervisor has allowed for linear regression to be used (and advised) and has just asked for it to be justified briefly over using the logit regressions which is why I did reach out.
        My work then find the predictive margins for 4 categories girls in SS school girls in co-ed and boys in SS and boys in co-ed.

        From what you have said here, it seems that as my outcomes are coded 1-4, and assuming equal space between them linear regression is justified further with results and estimations I produce, which I show. In what you said re different bounds, as an outcome variable, for example, is the degree to which a student likes working with others, I do not see how a measurement of 2.1 would make sense as you would either (strongly) disagree/agree or agree/ disagree?

        Comment


        • #5
          Thank you Nick, that has helped make more sense of it.

          I have seen quite a few debates on Likert being categorical or continuous which did throw me.
          Like you said my outcomes are coded 1-4.

          My supervisor has allowed for linear regression to be used (and advised) and has just asked for it to be justified briefly over using the logit regressions which is why I did reach out.
          My work then finds the predictive margins for 4 categories girls in SS school girls in co-ed and boys in SS and boys in co-ed.

          From what you have said here, it seems that as my outcomes are coded 1-4, and assuming equal space between them linear regression is justified further with results and estimations I produce, which I show. In what you said re different bounds, as an outcome variable, for example, is the degree to which a student likes working with others, I do not see how a measurement of 2.1 would make sense as you would either (strongly) disagree/agree or agree/ disagree?

          Comment


          • #6
            I note that #1 was cross-posted at https://stats.stackexchange.com/ques...igt-regression

            Please note our cross-posting policy, explicit at https://www.statalist.org/forums/help#crossposting: you are asked to tell us about it.

            I think there's a sharp distinction between individual Likert items and what you might do by way of combination or summary of several such. I don't see that there can be any debate on what the former are -- they are ordered categories. The question is what you can do with them.

            It seems to me that ordinal scale being approximate interval scale is an assumption you're making. I am at a loss to know what would be an independent rationale for that assumption. As I said, the justification could be that it works well enough in practice. Your supervisor seems to have put you between a rock and a hard place here. A check might be that results from ordinal logit or probit are consistent, but there is some risk of circularity in the argument.

            Comment


            • #7
              All responses in PISA are scaled using item response theory
              If you are using a bunch of individual Likert questions as dependent variables, then the theoretically correct thing to use is definitely some type of ordered logistic regression - although once you select that, someone else might also object and ask if the proportional odds assumption was violated, and did you check, and if yes why didn’t you use generalized ordered logit.

              if you are a Masters student, it may be acceptable to use linear regression if you haven’t covered more advanced statistical material. but if you try to publish I would bet you will get objections. I know I would definitely object in the context of peer reviewed material, at least in my own field.

              however, you mention IRT. If the individual questions you’re using as DVs were part of a scale measuring attitudes toward something, and PISA estimated an ‘ability’ score for that scale, then why not use that? That score would be scaled to a standard normal distribution, so your coefficients would be like Z score (unless they transformed it to something else; in medicine many scales would get transformed to T scores, i.e. mean 50, SD 10).

              I was under the (potentially wrong) impression that PISA was an educational ability dataset, so I would pretty much expect there to be verbal and math ability estimates on an IRT scale in there. But you referenced attitudes to something, and I don’t know if those were IRT scaled, so I could be wrong and you might as well ignore me.
              Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

              When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

              Comment

              Working...
              X