Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Predict factor analysis

    Hi all.

    I create a different post becasue I guess this is a different question.

    What I understood so far is that predict f1 f2 f3... (performed after factor) gives you the score of the prediction using the first, second and third factor, respectively.

    But I want to keep the information (explained variance) of the three factors in a single parameter. Would it make sense to sum up the three of them? What can I do to have all the information in the same number?

    Thank you

  • #2
    Fit a higher order confirmatory factor analysis model and predict the values of that latent or create a scale from your variables with IRT/CFA.

    Comment


    • #3
      wbuchanan is right you should apply another factor analysis using your predicted variables, but first you should test if there exist enough correlation between the variables to assume the existence of a higher order factor. I recommend the stata-press book: Discovering Structural Equation Modeling Using Stata, as a reference to the must follow steps to achieve this objective.

      Comment


      • #4
        Diego Villacreses I never said to apply another factor analysis using the predicted variables. Doing that is fundamentally problematic as it treats the predicted latents as though they were observed without error. The example below is meant more to illustrate what the code might look like and to illustrate the difference between a higher order CFA and the suggestion to use the predicted values. I purposefully chose to model the example over something that might be familiar to most of us (e.g., taking a math test). In this hypothetical example, you have a set of items being used to estimate the student's math ability. But Math is a bit broad, so you devise "sub-tests" that are intended to capture the student's abilities in specific sub-domains of mathematics (e.g., Algebra, Geometry, and Statistics). The problem then would be three different subscales that are not necessarily scaled to be equivalent so to find a scale that would summarize the information across the subdomains, you add a higher order factor (e.g., Math ability) that "causes" the student to have specific levels of ability in Algebra, Geometry, and Statistics - which each "cause" the observed responses to the test items. If the three factors are orthogonal and/or if joan marc used an orthogonal rotation, a higher order model would be uninformative. There is an inherent assumption that the lower order factors are not completely orthogonal to one another and that there is some shared variance between them that could be best summarized by an additional factor.

        Code:
        gsem (Algebra -> (algitem1 algitem2 algitem3)@a, logit)(Geometry -> (geomitem1 geomitem2 geomitem3)@b, logit)(Statistics -> (statsitem1 statsitem2 statsitem3)@c, logit)(Math -> Algebra Geometry Statistics), var(Algebra@1 Geometry@1 Statistics@1 Math@1)
        Here, the higher order latent variable Math is estimated using all of the available information (e.g., including the error associate with each of the lower-order factors, error associated with manifest variables, etc...). A different approach would be to do something analogous to a bi-factor model:

        Code:
        gsem (Algebra -> (algitem1 algitem2 algitem3)@a, logit)(Geometry -> (geomitem1 geomitem2 geomitem3)@b, logit)(Statistics -> (statsitem1 statsitem2 statsitem3)@c, logit)(Math -> (algitem1 algitem2 algitem3 geomitem1 geomitem2 geomitem3 statsitem1 statsitem2 statsitem3)@d), var(Algebra@1 Geometry@1 Statistics@1 Math@1)
        The difference here is the assumption that the observed response to the items is not the sole result of some latent variable we are trying to measure and random error, but that the response is caused by multiple factors simultaneously. In either case, however, using the predicted values in a subsequent analysis will not be appropriate.

        Comment


        • #5
          wbuchanan
          I ran the first code you mentioned above with my data, however it does not seem to converge.The last line in the output states:

          Fitting full model:

          Iteration 0: log likelihood = -12978.102 (not concave)

          Then it does not generate anything after this. Any suggestions?

          Thanks!
          Nicole

          Comment


          • #6
            The best I could suggest would be to specify a simpler model and work the complexity up from there until things start to break.

            Comment

            Working...
            X