Trouble Identifying Latent Traits Using Dichotomous Data - Tetrachoric Correlation and Factor Analysis

Graham Hulsey

Join Date: Nov 2017

Posts: 2
#1

Trouble Identifying Latent Traits Using Dichotomous Data - Tetrachoric Correlation and Factor Analysis

29 Nov 2017, 20:06

Hi,

This is a general problem, and less about specific code. I am working on a project that uses ANES 2016 Time Series Data. There is a series of questions that ask the respondent if they have used a specific media outlet, such as the New York Times Online, where 0=no and 1=yes. There are about 75 specific media outlets included in the data set. I am attempting to identify the underlying structure of each individual's total media system based on which media outlets they reporting consuming. I have tried using -tetrachoric- followed by -factormat- but it seems like the data is not well suited for this, as I keep getting error messages such as "matrix has missing values" when the matrix has no missing values, and I get more than 10 negative eigenvalues.

I was planning on using factor analysis to create a factor scale that I could use in a probit regression, but can't seem to figure out a way to properly identify the underlying structure of the 75 dichotomous variables that I have. Any help would be greatly appreciated!

Thanks
Tags: None
Joseph Coveney

Join Date: Apr 2014

Posts: 4433
#2

30 Nov 2017, 03:21

I think that it would be better to show the list members more about your problem. The following do-file, which uses -tetrachoric- followed by -factormat- on 75 items, runs without any error message. The number of negative eigenvalues probably isn't so much a problem; you're not likely to keep much more than a few factors, anyway.

Code:

clear * set seed `=1420222' tempname Corr matrix define `Corr' = J(75, 75, 0.5) + I(75) * 0.5 local varlist forvalues i = 1/75 { local varlist `varlist' v`i' } drawnorm `varlist', double corr(`Corr') n(1000) forvalues i = 1/75 { quietly replace v`i' = v`i' > 0 } * * Begin here * quietly tetrachoric v* tempname Rho matrix define `Rho' = r(Rho) factormat `Rho', n(1000) exit
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#3

30 Nov 2017, 09:48

Graham, it may be worth thinking about latent class analysis for this topic, and Stata now supports LCA under the gsem command. Examples 50 to 52 deal with LCA. If you don't have Stata 15, Penn State wrote a plugin for LCA.

Say you could extract two factors from an EFA. Imagine that they're conservative media and liberal media. You don't get a sense if people tend to be high on one and low on the other, and not high on both. LCA seems to me like it does that more naturally - you might see that you have one class of people who mainly consume conservative media and one class of people who mainly consume liberal media. Or you might have a third class who consume neither (or both).

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#4

30 Nov 2017, 11:49

Welcome to Statalist, Graham.

Let me start by agreeing with Weiwen Ng's suggestion that LCA may be a better approach for what you are doing. With that said, I'll proceed with some additional advice about the approach you describe.

With reference to the "matrix has missing values" message, there is a long discussion of this error message, perhaps in a similar context to yours, in posts 19 and 20 of the topic linked below (these posts appear on the second page of the thread).

https://www.statalist.org/forums/for...nary-variables

The tl;dr version is, I think, that there is some combination of your 75 variables that perfectly predicts one of the other variables. Maybe.

With that said, please review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. Note especially sections 9-12 on how to best pose your question.

Section 12.1 is particularly pertinent

12.1 What to say about your commands and your problem

Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly!

The more you help others understand your problem, the more likely others are to be able to help you solve your problem.

Do note how the problem was presented in post #19 of the referenced topic.
1 like
Comment

Announcement

Trouble Identifying Latent Traits Using Dichotomous Data - Tetrachoric Correlation and Factor Analysis

Comment

Comment

Comment