Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Trouble Identifying Latent Traits Using Dichotomous Data - Tetrachoric Correlation and Factor Analysis

    Hi,

    This is a general problem, and less about specific code. I am working on a project that uses ANES 2016 Time Series Data. There is a series of questions that ask the respondent if they have used a specific media outlet, such as the New York Times Online, where 0=no and 1=yes. There are about 75 specific media outlets included in the data set. I am attempting to identify the underlying structure of each individual's total media system based on which media outlets they reporting consuming. I have tried using -tetrachoric- followed by -factormat- but it seems like the data is not well suited for this, as I keep getting error messages such as "matrix has missing values" when the matrix has no missing values, and I get more than 10 negative eigenvalues.

    I was planning on using factor analysis to create a factor scale that I could use in a probit regression, but can't seem to figure out a way to properly identify the underlying structure of the 75 dichotomous variables that I have. Any help would be greatly appreciated!

    Thanks

  • #2
    I think that it would be better to show the list members more about your problem. The following do-file, which uses -tetrachoric- followed by -factormat- on 75 items, runs without any error message. The number of negative eigenvalues probably isn't so much a problem; you're not likely to keep much more than a few factors, anyway.
    Code:
    clear *
    
    set seed `=1420222'
    
    tempname Corr
    matrix define `Corr' = J(75, 75, 0.5) + I(75) * 0.5
    
    local varlist
    forvalues i = 1/75 {
        local varlist `varlist' v`i'
    }
    
    drawnorm `varlist', double corr(`Corr') n(1000)
    forvalues i = 1/75 {
        quietly replace v`i' = v`i' > 0
    }
    
    *
    * Begin here
    *
    quietly tetrachoric v*
    tempname Rho
    matrix define `Rho' = r(Rho)
    
    factormat `Rho', n(1000)
    
    exit

    Comment


    • #3
      Graham, it may be worth thinking about latent class analysis for this topic, and Stata now supports LCA under the gsem command. Examples 50 to 52 deal with LCA. If you don't have Stata 15, Penn State wrote a plugin for LCA.

      Say you could extract two factors from an EFA. Imagine that they're conservative media and liberal media. You don't get a sense if people tend to be high on one and low on the other, and not high on both. LCA seems to me like it does that more naturally - you might see that you have one class of people who mainly consume conservative media and one class of people who mainly consume liberal media. Or you might have a third class who consume neither (or both).
      Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

      When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

      Comment


      • #4
        Welcome to Statalist, Graham.

        Let me start by agreeing with Weiwen Ng's suggestion that LCA may be a better approach for what you are doing. With that said, I'll proceed with some additional advice about the approach you describe.

        With reference to the "matrix has missing values" message, there is a long discussion of this error message, perhaps in a similar context to yours, in posts 19 and 20 of the topic linked below (these posts appear on the second page of the thread).

        https://www.statalist.org/forums/for...nary-variables

        The tl;dr version is, I think, that there is some combination of your 75 variables that perfectly predicts one of the other variables. Maybe.

        With that said, please review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. Note especially sections 9-12 on how to best pose your question.

        Section 12.1 is particularly pertinent

        12.1 What to say about your commands and your problem

        Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly!
        The more you help others understand your problem, the more likely others are to be able to help you solve your problem.

        Do note how the problem was presented in post #19 of the referenced topic.


        Comment

        Working...
        X