Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Significance in logistic regression

    Dear all,

    I am not quite sure if it my variables are significant or not and if I therefor can even use my hypotheses. My logistic regression model is:

    logit development i.gender i.age i.studies

    margins, dydx(_all) level (90)
    margins i.gender, level (90)
    margins i.studies, level (90)
    margins i.age, level (90)

    I calculated a ranksum test beforehand and that was significant for "development". Is that the only significance number to consider?
    Or am I looking at the " Prob > chi2 = 0.0035" of my logistic regression model, the P>|z| of gender, studies and age within that model, the P>|z| of my Average Marginal Effects or the P>|z| of my Predictive Margins?

    My questions is, which significance level am I looking at?
    Sorry if that sounds a little confusing.

    My second question ist, I have read quite often that people added variables one by one into their log. regression modell. What exactly is the difference between adding them all at once (like I did above) and a step by step approach?


    Thank you for your help.

  • #2
    Philipp:
    welcome to this forum.
    See the FAQ on how to share what you typed and what Stata gave you back. Thanks.
    Adding predictors one by one is helpful for checking whether or not your regression model starts to gasp (say, does not converge).
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Philipp:
      post what you typed and what Stata gave you back via CODE delimiters (see #shaped toggle, Advance editor).
      That said, it's hard to believe that -dataex- does not work with your data (by the way: as per FAQs you're requested to describe what you mean by "it does not work").
      Please find below a trivial example with -dataex- (directly taken from -help dataex-):
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input str18 make int(price mpg rep78)
      "AMC Concord"   4099 22 3
      "AMC Pacer"     4749 17 3
      "AMC Spirit"    3799 22 .
      "Buick Century" 4816 20 3
      "Buick Electra" 7827 15 4
      end
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment


      • #4
        If I understood right, you’re misinterpreting the results of - margins - , compared to the output of the logistic regression. The p-value presented in margins is a different species.
        Best regards,

        Marcos

        Comment


        • #5
          Philipp:
          Marcos put you on the right track: the significance of your regression model cannot be investigated via -margins-.
          The pseudo-R2 obtained after -logistic- is a bit above the arbitrary 5% cut-off that tries to split the world in significant and non-significant information.
          A rigorous frequentist should tell you that your model is not different from the mean of -SelbstvertrauenAV-.
          My take is a bit different: with 208 observations and three (categorical) predictors only, your model is probably misspecified.
          Are you sure that, according to literature in your research field, you gave a fair and true view of the data generating process?
          In addition, the level of education (Studiengang) is far from being siginficant for both level, wheras gender (Geschlecht) and year of birth (Geburtsjahr2; by the way: why did you plug it in as a categorical variable?) actually are.
          The first step I would take is trying to clarify why your data give back those coefficients (eg, is there too low variation in your predictors?).
          Then I woule re-run your regression entering year of birth as a continuos regressor.
          Eventually, It may also be that you need more predictors and, possibly, a squared term somewhere.
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment


          • #6
            Philipp:
            1) correct. You should look at the pseudo-R2 of your logistic regression to see if it's significant;
            2) those outcomes are actually not related each other. With -ranksum- you test one variable at time, looking for (rank) difference withun the two groups you're interested in. Conversely, with any regression model, the effect of each predictor in causing variation of the regressand in adjusted for the other predictors;
            3) I see your point but, as a general rule, categorizing a continuous predictor should be discouraged (see https://www.ncbi.nlm.nih.gov/pubmed/16217841). In addition, categorizing does not allow you to investigate age squared as a predictor;
            4) in order to get a more detaled picture of your regression outcome, you can run -linktest- after -logistic-. If the squared predicted coefficient reaches statistical significance, your regression model is misspecified (ie, it needs more predictor and/or interactions among them).
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment

            Working...
            X