Significance in logistic regression

Philipp Dewy

Join Date: Jun 2019

Posts: 1
#1

Significance in logistic regression

21 Jun 2019, 14:00

Dear all,

I am not quite sure if it my variables are significant or not and if I therefor can even use my hypotheses. My logistic regression model is:

logit development i.gender i.age i.studies

margins, dydx(_all) level (90)
margins i.gender, level (90)
margins i.studies, level (90)
margins i.age, level (90)

I calculated a ranksum test beforehand and that was significant for "development". Is that the only significance number to consider?
Or am I looking at the " Prob > chi2 = 0.0035" of my logistic regression model, the P>|z| of gender, studies and age within that model, the P>|z| of my Average Marginal Effects or the P>|z| of my Predictive Margins?

My questions is, which significance level am I looking at?
Sorry if that sounds a little confusing.

My second question ist, I have read quite often that people added variables one by one into their log. regression modell. What exactly is the difference between adding them all at once (like I did above) and a step by step approach?

Thank you for your help.
Tags: average marginal effects, logistic regression, predictive margins, significance
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17724
#2

21 Jun 2019, 15:09

Philipp:
welcome to this forum.
See the FAQ on how to share what you typed and what Stata gave you back. Thanks.
Adding predictors one by one is helpful for checking whether or not your regression model starts to gasp (say, does not converge).

Kind regards,
Carlo
(Stata 19.0)
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17724
#3

22 Jun 2019, 11:47

Philipp:
post what you typed and what Stata gave you back via CODE delimiters (see #shaped toggle, Advance editor).
That said, it's hard to believe that -dataex- does not work with your data (by the way: as per FAQs you're requested to describe what you mean by "it does not work").
Please find below a trivial example with -dataex- (directly taken from -help dataex-):

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input str18 make int(price mpg rep78) "AMC Concord" 4099 22 3 "AMC Pacer" 4749 17 3 "AMC Spirit" 3799 22 . "Buick Century" 4816 20 3 "Buick Electra" 7827 15 4 end

Kind regards,
Carlo
(Stata 19.0)
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#4

22 Jun 2019, 15:15

If I understood right, you’re misinterpreting the results of - margins - , compared to the output of the logistic regression. The p-value presented in margins is a different species.

Best regards,

Marcos
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17724
#5

23 Jun 2019, 04:02

Philipp:
Marcos put you on the right track: the significance of your regression model cannot be investigated via -margins-.
The pseudo-R2 obtained after -logistic- is a bit above the arbitrary 5% cut-off that tries to split the world in significant and non-significant information.
A rigorous frequentist should tell you that your model is not different from the mean of -SelbstvertrauenAV-.
My take is a bit different: with 208 observations and three (categorical) predictors only, your model is probably misspecified.
Are you sure that, according to literature in your research field, you gave a fair and true view of the data generating process?
In addition, the level of education (Studiengang) is far from being siginficant for both level, wheras gender (Geschlecht) and year of birth (Geburtsjahr2; by the way: why did you plug it in as a categorical variable?) actually are.
The first step I would take is trying to clarify why your data give back those coefficients (eg, is there too low variation in your predictors?).
Then I woule re-run your regression entering year of birth as a continuos regressor.
Eventually, It may also be that you need more predictors and, possibly, a squared term somewhere.

Kind regards,
Carlo
(Stata 19.0)
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17724
#6

25 Jun 2019, 03:28

Philipp:
1) correct. You should look at the pseudo-R2 of your logistic regression to see if it's significant;
2) those outcomes are actually not related each other. With -ranksum- you test one variable at time, looking for (rank) difference withun the two groups you're interested in. Conversely, with any regression model, the effect of each predictor in causing variation of the regressand in adjusted for the other predictors;
3) I see your point but, as a general rule, categorizing a continuous predictor should be discouraged (see https://www.ncbi.nlm.nih.gov/pubmed/16217841). In addition, categorizing does not allow you to investigate age squared as a predictor;
4) in order to get a more detaled picture of your regression outcome, you can run -linktest- after -logistic-. If the squared predicted coefficient reaches statistical significance, your regression model is misspecified (ie, it needs more predictor and/or interactions among them).

Kind regards,
Carlo
(Stata 19.0)
Comment

Announcement

Significance in logistic regression

Comment

Comment

Comment

Comment

Comment