Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Correlation in stata

    Hi there,

    I'm confused on the difference bewetween correlate and pwcorr functions in Stata. The two functions are giving me identical values for some pairs of variables, and only slightly different values for other pairs of variables. How do I decide which to use? Statistically speaking, what type of correlations are these two functions performing?

    I am also wondering how to do correlations between a categorical (but NOT ordinal) variable and binary or continuous variables. One of my categorical variables has 4 values, and the other has about 30 values (depends on the year, some years have more or fewer values).

    Many thanks,
    alyssa

  • #2
    The corr command uses listwise deletion. If a case is missing data on any variable in the list it is dropped, even if it has data for the other variables.

    pwcorr uses pairwise deletion of missing data. The case is only dropped for those correlations where one variable or the other has missing values.
    -------------------------------------------
    Richard Williams, Notre Dame Dept of Sociology
    Stata Version: 17.0 MP (2 processor)

    EMAIL: [email protected]
    WWW: https://www3.nd.edu/~rwilliam

    Comment


    • #3
      These are two different commands (strictly, not functions: in Stata functions and commands are disjoint).

      Any differences in results arise because of different treatment of observations with missing values. Here is an example:

      Code:
      . sysuse auto
      (1978 Automobile Data)
      
      . corr rep78 mpg price
      (obs=69)
      
                   |    rep78      mpg    price
      -------------+---------------------------
             rep78 |   1.0000
               mpg |   0.4023   1.0000
             price |   0.0066  -0.4559   1.0000
      
      . pwcorr rep78 mpg price, obs
      
                   |    rep78      mpg    price
      -------------+---------------------------
             rep78 |   1.0000 
                   |       69
                   |
               mpg |   0.4023   1.0000 
                   |       69       74
                   |
             price |   0.0066  -0.4686   1.0000 
                   |       69       74       74
                   |
      In the auto data with 74 observations there are 5 missing values for rep78. So, what do we do when some observations have missing values?

      correlate ignores all observations with any missing values for all the variables and correlations it shows. So, it focuses consistently in this example on the 69 observations with non-missing values for all the variables mentioned.

      In contrast, pwcorr does what it can to use all the information available. Here you can see that the correlation between mpg and price is calculated for all 74 observations in the dataset.

      Although it's not the default, I recommend showing the number of observations behind each correlation if you use pwcorr.

      Note that the correlation matrix produced by pwcorr is not suitable for subsequent multivariate analysis.

      Comment


      • #4
        Thank you both, this is good to know

        Comment

        Working...
        X