Discriminant Analysis – Stepwise versus Simultaneous

Adam Guerrero

Join Date: Jun 2014

Posts: 69
#1

Discriminant Analysis – Stepwise versus Simultaneous

05 Apr 2017, 13:19

Hi Statalisters,

I am reading “Multivariate Analysis” by Hair et al. to learn discriminant analysis. The authors follow a “stepwise” approach to “build” their model versus estimating the model including all potential independent variables simultaneously. The stepwise process followed is:
Based on an analysis of the significance of group differences in the means of all potential independent variables, the authors first include the independent variable that has the most statistically significant difference in group means.

After that variable is included in the model, the remaining independent variables are evaluated based on their “incremental discriminating ability, that is the group mean differences after the variance associated with the initially chosen independent variable removed.”

Of all the variables remaining, the next most statistically significant variable is included in the model. This process continues until there are no more statistically significant variables remaining.

The authors went from 13 potential variables to only 3. Their logic is that following a stepwise approach makes it possible for decision-makers to hone in on the most important variables so that the best policies can be constructed.

My question is whether Stata has this capability? Is it possible to conduct stepwise discriminant analysis in Stata? So, is it possible to evaluate temporarily excluded independent variables by examining their “incremental discriminating ability” while controlling for included independent variables?

Finally, when I run the discriminant analysis using all 13 independent variables, then enter “estat structure” the only 3 variables with absolute value discriminant loadings greater than 0.4 (as recommended by Hair et al.) are the variables that were included in their final model... Is this coincidence, or can examining the canonical structure matrix help choose variables for inclusion?

Thanks for any insight provided.

Best,
Adam

Last edited by Adam Guerrero; 05 Apr 2017, 13:39.
Tags: None
Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#2

06 Apr 2017, 17:43

Depending on your objective, many on this list serve would never recommend a stepwise approach. There is a stepwise procedure in Stata but it doesn't work with discriminant analysis - but it does work with logit and probit which are close. You can always do this manually - running repeated discriminant analyses on residuals.
Comment
Adam Guerrero

Join Date: Jun 2014

Posts: 69
#3

10 Apr 2017, 10:51

Thanks so much for your reply, Phil. I will definitely try to run repeated discriminant analyses on residuals. In the example that I am trying to reproduce, the authors recommend a stepwise approach to determine which variables are the best at discriminating customer preferences by region, mainly so that organizational policy-making is more focused (versus getting the most accurate predictions). Also, I found that including all variables provides better predictions, and that I can still pinpoint the most discriminating variables by looking at estat structure's absolute value discriminant loadings, but I just don't know if this is a good way to accomplish the goals of description and prediction. That said, I am really excited to follow your recommendation, so thank you very much! :-)

Best,
Adam
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#4

10 Apr 2017, 10:55

Adam:
recommendations against (any) -stepwise. procedures can be found here: http://andrewgelman.com/2014/06/02/h...se-regression/.

Kind regards,
Carlo
(Stata 19.0)
Comment
Adam Guerrero

Join Date: Jun 2014

Posts: 69
#5

10 Apr 2017, 11:32

Thanks for the link, Carlo. I have seen a lot of arguments for and against stepwise procedures, and I am having a hard time reconciling arguments against stepwise procedures and their heavy use in business and academics.

Best,
Adam
Comment
Adam Guerrero

Join Date: Jun 2014

Posts: 69
#6

10 Apr 2017, 12:16

Phil,

Thanks again for your pointers above... Quick question, how can I estimate the residuals from a discriminant analysis in Stata?

Thanks so much,

Adam
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#7

11 Apr 2017, 00:22

Adam:
an interesting set of "don't do it" reasons against stepwise selection is reported in: http://www.springer.com/us/book/9783319194240#aboutBook, pages 67-72.

Kind regards,
Carlo
(Stata 19.0)
Comment
Adam Guerrero

Join Date: Jun 2014

Posts: 69
#8

11 Apr 2017, 09:45

Thanks again for an additional resource, Carlo. Quick question, what is your recommendation to researchers who are interested in determining the most influential variables in a multivariate analysis (in the presence of high multicollinearity)? In econometrics, I remember learning to use the F-stat to compare an unrestricted to a restricted model to ascertain whether a new variable should be included, but I am having a hard time figuring out how to do something similar in performing discriminant analysis in Stata. Anyway, I do get better predictions when I run the full model, which is interesting, but multicollinearity is making it tricky to determine which variables matter most. Again, thanks for the resource, I will definitely read through it! :-)

Best,
Adam
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#9

11 Apr 2017, 10:09

Adam:
-one approach is skimmimg through the literature of your research field and see what others did in the past when presented with the same reaserch topic (..just to avoid to re-inventing the wheel!);
- with ((I assume) quasi-extreme multicollinearity, the usual choice is to change the model specification.

Kind regards,
Carlo
(Stata 19.0)
Comment
Adam Guerrero

Join Date: Jun 2014

Posts: 69
#10

11 Apr 2017, 11:39

Thanks, Carlo. Also, I agree, and I have definitely reviewed work in this area, and a very common approach is to run a stepwise discriminant analysis using SAS (or SPSS), in which case a more parsimonious model is specified. Also, I am trying to understand why most stats packages include stepwise features if it is such a bad technique... Even Stata includes a feature for stepwise regression. I suppose it boils down to the nature of the problem (and based on much of what I have read along the lines of stepwise pros vs. cons), preference.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#11

11 Apr 2017, 11:51

Adam:
good question.
Probably -stepwise- might be justified for exploratory data analysis on a random sample (say, 50%) of a given database, that is to generate hypotheses or models to be tested on the remaining 50% of the same database.
The cons about -stepwise- focus on its use for selecting ex-post the "best" model (which usually means a sort of data make-up).
So is not the tool bad in itself, but the (mis)use of it.

Kind regards,
Carlo
(Stata 19.0)
Comment
Adam Guerrero

Join Date: Jun 2014

Posts: 69
#12

11 Apr 2017, 13:11

That makes sense. Thanks so much for additional insight into the problem, Carlo. :-)

Cheers,
Adam
Comment

Announcement

Discriminant Analysis – Stepwise versus Simultaneous

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment