Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Pearson's correlation coefficient - problems with a variable

    Hi
    Please, I need some help.
    By adding the variable "Investment1" in the correlation matrix, the correlation between the variables "Size1" and "ROA1" changed. How can I identify the type of problem with my "Investment1" variable data?
    Investment1 is the ratio of capital expenditure over the revenue [negative or positive and winsor2 Investment1, replace cuts(1 99)].
    . correlate Size1 ROA1
    (obs=1666)
    | Size1 ROA1
    ------------- + --------- ---------
    Size1 | 1
    ROA1 | 0.3358 1
    . correlate Size1 ROA1 Investment1
    (obs=1419)
    | Size1 ROA1 Invest~1
    ------------- + --------- --------- ---------
    Size1 | 1
    ROA1 | 0.1481 1
    Investment1 | 0.0879 -0.0295 1

  • #2
    correlate excludes all observations that have missing values on any variable mentioned in the command. Otherwise it is a completely bivariate techniques, so the only way adding a third variable can influence the correlation between two other variables is through this effect on the sample.
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      The number of observations used changed from command to command. 247 observations were not used in the second calculation because Investment1 was missing in these observations.

      Comment


      • #4
        In this case, what is the correct way to display the correlation matrix? Should I use the "pwcorr" command?

        Comment


        • #5
          Yes, the -pwcorr- command is probably what you want. It depends on why you are calculating these correlation coefficients. If you are simply interested separately in each pairwise correlation, then that is the way to go.

          On the other hand, if you are thinking about the correlations in terms of how they would play out in a regression analysis, then, since regression models also eliminate all observations with any variable missing, the results of -corr- are what you should be looking at. Also, if your plan is to use the resulting correlation matrix as input to, say, a factor analysis or principal components or -sem-, the output of -pwcorr-, not being a real correlation matrix, may turn out not to be positive definite, and your calculations will fail at that point.

          Comment


          • #6
            My proposal is to show the correlation between the group of used variables, indicating the correlations measures, the strength and direction of the linear relationship between them (how they would play out in a regression analysis). So, I'll use the -corr- command instead of -pwcorr command.

            Comment


            • #7
              Tks Maarten, Nick and Clyde.

              Comment


              • #8
                If such a small change in sample size leads to such a large change in correlation, then that suggests outliers to me. So I would start with a simple scatterplot and see if that correlation isn't driven by a small number of observations.
                ---------------------------------
                Maarten L. Buis
                University of Konstanz
                Department of history and sociology
                box 40
                78457 Konstanz
                Germany
                http://www.maartenbuis.nl
                ---------------------------------

                Comment

                Working...
                X