Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • principle component analysis in panel data

    I have panel data from 2001 to 2012 across states.

    I am trying to build an index on crime against women (which consists of 8 variables like assault, insult, cruelty, etc.). Since the variables are highly correlated with each other, review has suggested me to proceed with PCA for index construction.

    I am using Stata for my analysis.

    Q1. After normalizing values for these 8 variables I am running PCA.

    Will simple principal component analysis help me in creating index (because it is a panel data) . Does Stata automatically interpret my data to be panel data while running pca commands?

    I have used commands for panel data.

    And then running pca
    tsset states year pca variables
    Q2. Or are there any specific commands for PCA in panel data in Stata? If yes please explain. I have searched everywhere.

    Q3. Should I use PCA country-wise analysis or year wise or for all the panel data?

  • #2
    Cross-posted at https://stats.stackexchange.com/ques...-against-women

    where you already had one comment by way of answer to Q1. pca pays no attention whatsoever to any panel data structure. Time and even time order have no effect unless perversely you include a time variable in the pca. Different panels are mushed together. That may or may not be fine.

    Comment


    • #3
      I'd say this could go in a lot of ways but comments like "do a PCA" are often not well thought through. As Nick says, the PCA doesn't take into account the panel structure. To some extent, you should be guided by what others are doing in your field of study. I often think we like to get too fancy. Without knowing what your variables represent or the metric on which they are measured I'd start by asking are you simply standardizing the variables to zero mean and unit variance, or are you actually normalizing? If the variables are on different metrics, then standardizing may be useful. And I'm not sure what you would mean by normalizing other than assigning z-scores based on the cumulative probabilities. One of my concerns with either is that it makes replicating the measure in subsequent studies problematic. For example, the standardized score is a function not only of an individual observations observed value, but also the variance in the sample of observations. The variance may differ from sample to sample, etc. So I think working with the observed scores is generally preferable. I've sometimes re-scaled variables so that they are all on some common metric, say 0 to 10 or 0 to 100. I might start by running a PCA on each wave separately, but not calculating principal component scores.If all the measures load strongly on a single principal component, then simply sum the items to form an index. You can use the alpha command to estimate internal consistency reliability and you can even use it go generate the summated index. With this approach you have consistent measurement across waves. A pragmatic issue with things like principal component or factor scores is that each measure is weighted and the weights will depend on the sample used to estimate the model. The weights may be different in sample A than in sample B. Or in your case, different across waves if the waves were analyzed separately. One can develop scoring protocols based on those weights. For example, the SF-12 (a quality of life measure used in medical research) has a scoring protocol in which each response category is assigned a physical health score weight and a mental health score weight. But I don't see things like that published as standard scoring protocols very often. And simply using the principal component score will use weights that likely vary somewhat from sample to sample.



      Taking it to a whole different level, you could use structural equation model to test for measurement invariance, but my guess is that the PCA thing is just being suggested as a data reduction strategy.

      Comment

      Working...
      X