Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Spearman Rank Correlation of observations with hypothetical variables

    Dear Experts

    I did just start using Stata and I am facing a problem which I cannot solve right now. I want to calculate the spearman correlation of some variables within an observation with a sequence of hypothetical variables. Let me show you the structure of the data:

    ID V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
    1 1 2 3 4 5 6 7 8 9 10

    The ID shows the number of the Observation. Now I want to have the spearman correlation between the sequence of Variables (V1 - V10) with the hypothetical values of 1-10. Then I want to create a new variable (V11) which gives the coefficient (in this case = 1). This I want to do with around 150 different observations.

    It was easy in Excel with the command =Correl(rank.avg(V1-V10;V1-V10;1);rank.avg(1-10;1-10;1))

    It seems that I struggle with the way of how Stata handles the variable / observations.

    Any Suggestions?

    Thank you all in advance.
    Best
    Jonas

  • #2
    You are correct that there is a difference in the way that Stata and Excel conceive of variables and observations. I see two ways of structuring your data in Stata to calculate the correlation. What you ultimately want to do with the dataset and results will dictate which approach you use:


    Approach 1: Arrange your dataset so that ID1 is a variable containing the scores of person 1, ID2 is a variable containing the score of person 2, etc. Then create a variable HYPO that contains the hypothetical values you want to compare each set of scores to.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(ID V1 V2 V3 V4 V5 V6 V7 V8 V9 V10)
    1  1 2 3 4 5 6 7 8 9 10
    2 10 9 8 7 6 5 4 3 2  1
    end
    
     drop ID
     xpose, clear varname
     egen hypo=seq(), f(1) t(10) 
    list
    spearman v1 hypo
    spearman v2 hypo
    
    **loop does this too;
    **change the number 2 to the total number of ids (150?) for your dataset
    forvalues i=1/2 {
       spearman v`i' hypo
       }
    In the above output, the new variables v1 v2 will hold the scores for ID1 and ID2. We've created a new variable hypo that is the sequence of hypothetical values (see help egen for more info). Then we run the spearman command (help spearman) for each comparison (this can be routinized in the example loop).



    Approach 2: Arrange your dataset so that all individual scores are in a single variable (stacked) with an indicator for which ID and which V#. Then you can calculate the correlation and record the info in another variable.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(ID V1 V2 V3 V4 V5 V6 V7 V8 V9 V10)
    1  1 2 3 4 5 6 7 8 9 10
    2 10 9 8 7 6 5 4 3 2  1
    end
    
    reshape long V, i(ID) j(scores)
    sort ID V
    egen hypo=seq(), f(1) t(10)
    bysort ID (V): egen sp_rank=corr(scores hypo), spearman
    list
    The above code reshapes that data. Your new variable ID contains the ID number, scores contain the actual scores within the original V#. The variable V contains the V#. We sort the data so they are listed by ID and V (sort id v), then use egen to create a sequence of hypothetical scores. Now we can create another variable sp_rank that is the correlation between scores and hypo for each ID (see help egenmore).
    Stata/MP 14.1 (64-bit x86-64)
    Revision 19 May 2016
    Win 8.1

    Comment


    • #3
      Dear Carole

      Thank you so much, you just solved my problem perfectly!

      Very nice of you
      Best
      Jonas

      Comment


      • #4
        Hi,
        I am working on a dataset with four variables - country, sectors, ranking 1 and ranking 2. For each country, there are 22 sectors being covered and I want to see how the ranks of the 22 sectors (for each country) changes when one switches from ranking 1 to ranking 2. Basically, I want to compute the spearman rank correlation between ranking 1 and 2 separately for each country. In total, I have 45 countries. Please let me know if there is any way to compute the rank correlation in a command or two? or, when I can just create a new variable giving the rank correlations?
        Thank you so much.

        Comment

        Working...
        X