Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • loops to calculate correlations between individuals and the group

    I am very bad at writing loops in Stata. Here is my question:

    I have a dataset with class_id, individual_id, year and test_score. I would like to calculate for each individual observation, the correlation between the individual score and all the scores of the same class (so peers) for the same year. How can I achieve it using the loop? I suppose I can use the command icc?

    Thank you for your help.

    Best,

  • #2
    And the thing I want to generate is a new variable, say, corr, which indicates the correlation between the individual observation and the class score for a given year. Any ideas???

    Comment


    • #3
      I, at least, am very confused - you appear to want a correlation between two sets of numbers where one of the sets has only one entry - this makes no sense to me; please read the FAQ and post an example of your data (or a realistic simulation) using -dataex- and posting in CODE blocks

      Comment


      • #4
        I share Rich's uncertainty about the meaning of the question, but I can think of one interpretation in which it would make sense: Perhaps Luc wants to know something about the relationship between the *mean* score for an individual's class (a contextual variable characterizing an individual), and that individual's score. On that construction, one could do this:
        Code:
        egen class_mean  = total(test_score), by(class_id)
        bysort replace class _mean = (class_mean - test_score)/(_N-1) // without the focal class member
        corr test_score class_mean
        Note that here the correlation is not a variable, but rather a summary description of a relationship in the entire sample, so perhaps this is not what Luc had in mind.

        Comment


        • #5
          [QUOTE=Rich Goldstein;n1661904]I, at least, am very confused - you appear to want a correlation between two sets of numbers where one of the sets has only one entry - this makes no sense to me; please read the FAQ and post an example of your data (or a realistic simulation) using -dataex- and posting in CODE blocks[/QUOTE

          I just realized maybe I misstated my question. So what I try to capture is how the individual scores correlate with the class-level scores across years. It is kind of notion of co-movement. So for each individual id, I would like to know whether the scores move in the same direction as the whole class. Basically for the same individual, i think there will be only one correlation coefficient. I will try to post a simulated dataset.

          Comment


          • #6
            Originally posted by Mike Lacy View Post
            I share Rich's uncertainty about the meaning of the question, but I can think of one interpretation in which it would make sense: Perhaps Luc wants to know something about the relationship between the *mean* score for an individual's class (a contextual variable characterizing an individual), and that individual's score. On that construction, one could do this:
            Code:
            egen class_mean = total(test_score), by(class_id)
            bysort replace class _mean = (class_mean - test_score)/(_N-1) // without the focal class member
            corr test_score class_mean
            Note that here the correlation is not a variable, but rather a summary description of a relationship in the entire sample, so perhaps this is not what Luc had in mind.
            Thanks for you solution. Indeed, what I need is a new variable, not a correlation matrix.

            Comment


            • #7
              Originally posted by Rich Goldstein View Post
              I, at least, am very confused - you appear to want a correlation between two sets of numbers where one of the sets has only one entry - this makes no sense to me; please read the FAQ and post an example of your data (or a realistic simulation) using -dataex- and posting in CODE blocks
              Hi Sorry for the late repost. Please find here a simplified dataset. So what I would like to generate is a new variable corr which captures the correlation between the individual scores and the class scores over time. For example, for individual 1 in class 1, I would like to have a correlation coefficient between 98, 64, 71 (the individual's scores) and the averaged annual class's scores (excluding the individual). So for each individual, there will be a correlation coefficient.

              I have more than 200k observations, so is there a quick way to run this? Thanks.

              Code:
              * Example generated by -dataex-. For more info, type help dataex
              clear
              input byte(class_id individual_id) int year byte score
              1  1 2012 98
              1  2 2012 85
              1  3 2012 88
              1  4 2012 64
              1  5 2012 60
              1  6 2012 93
              1  7 2012 78
              1  1 2013 64
              1  2 2013 89
              1  3 2013 88
              1  4 2013 73
              1  5 2013 63
              1  6 2013 61
              1  7 2013 70
              1  1 2014 71
              1  2 2014 96
              1  3 2014 89
              1  4 2014 66
              1  5 2014 90
              1  6 2014 69
              1  7 2014 86
              2  7 2012 64
              2  8 2012 62
              2  9 2012 96
              2 10 2012 66
              2 11 2012 86
              2  1 2013 92
              2  2 2013 85
              2  3 2013 65
              2  4 2013 93
              2  5 2013 86
              2  6 2013 89
              2  7 2013 95
              2  8 2013 63
              2  9 2013 69
              2 10 2013 62
              2 11 2013 77
              end

              Comment


              • #8
                Any thoughts on this issue??

                Comment


                • #9
                  This is computable, but seems to boil down to lots of correlations based on 3 data points. The code I tried produced lots of correlations that were 1 or -1 because it could only find pairs of points to correlate. At best a correlation involving two points distinct on both variables will produce correlations with absolute value 1 because straight lines fit such cases perfectly. But that seems at least partly a side-effect of a small data example.

                  A deeper issue is what you expect correlations to show you. For example it doesn't measure agreement as many cases such as individual = mean + constant would generate perfect correlations but not agreement, and situations approximating that might be common.

                  Comment


                  • #10
                    Luc Abbey did you find the solution? i`m also in the same problem

                    Comment

                    Working...
                    X