Logistic Regression Output with Factor Variable i.Year - Issues with Interpretation

Aria Mendoza

Join Date: Aug 2022
Posts: 2

Logistic Regression Output with Factor Variable i.Year - Issues with Interpretation

27 Aug 2022, 02:57

Hi all!

I’m having issues interpreting the coefficients in my regression output. My model has a dichotomous dependent variable: Education (Does have a bachelor’s degree=1, 0 otherwise) and a factor variable: i.Year (for the years 2007 to 2011).
The command:
logistic Education i.Year, coef

Logistic regression

Education	Coef.		St.Err.	t-value		p-value	[95% Conf		Interval]	Sig
2007b	0		.	.		.	.		.
2008	.433		.073	5.95		0	.29		.575	***
2009	.602		.075	8.00		0	.454		.749	***
2010	.217		.074	2.92		.004	.071		.362	***
2011	.225		.077	2.93		.003	.075		.376	***
Constant	-.381		.052	-7.40		0	-.483		-.28	***

Mean dependent var		0.477			SD dependent var			0.500
Pseudo r-squared		0.008			Number of obs			7080
Chi-square		75.687			Prob > chi2			0.000
Akaike crit. (AIC)		9734.809			Bayesian crit. (BIC)			9769.134
* p<.01, p<.05, * p<.1

The coefficient of 2008 is 0.433, which suggests that the proportion of those with a bachelor’s degree from 2007 to 2008 fell by 0.433 log of odds. The standard interpretation of the logistic coefficient is "for a one unit change in variable X, we expect the log of the odds of the outcome to change by *coefficient* units, holding all other variables constant". Since my focus is on the trend of “education” between the years 2007 to 2011, it's difficult to follow the standard interpretation. Therefore, I'm unsure of what 0.433 is supposed to represent in the case of a trend. Would it be correct if I were to interpret the data as "the log odds of graduating with a bachelor’s has increased by 0.433 in 2008 in comparison to 2007? In comparison to 2007, the log odds of graduating with a bachelor’s have increased by 0.602 in 2009 etc?". My data is both nonlinear and non-normal, so the standard trend tests are not suitable. The purpose is to have a table that would indicate the changes in education between the years 2007 to 2011.

Thank you for your help!

Last edited by Aria Mendoza; 27 Aug 2022, 03:01.

Tags: categorical, data, logit, regression

Maxence Morlet

Join Date: Mar 2021

Posts: 653
#2

27 Aug 2022, 03:34

Dear Aria,

This should help you for the interpretation: https://quantifyinghealth.com/interp...-coefficients/.

Two further things:

- Logistic regression is a nonlinear estimation method, and assumes a logistic distribution that allows for fatter tails than the normal distribution. I am not sure if data itself can be nonlinear (statisticians may correct me) but models, equivalently functions, can be nonlinear. Nonlinear means that (conditional) means do not lie on a straight line (or at least that's my understanding).

- In nonlinear models, generally speaking, you should be more interested in the marginal effect than the coefficient. In linear models, e.g. OLS, they are often equivalent if you have specified your model in a linear manner.
In nonlinear models such as logit, they are not the same thing, and you may want to run the

Code:

margins

command postestimation to get the marginal effect.
1 like
Comment

Announcement

Logistic Regression Output with Factor Variable i.Year - Issues with Interpretation

Comment