Pearson's correlation coefficient - problems with a variable

Janio Mossinato

Join Date: Feb 2017
Posts: 15

Pearson's correlation coefficient - problems with a variable

18 Feb 2017, 07:56

Hi
Please, I need some help.
By adding the variable "Investment1" in the correlation matrix, the correlation between the variables "Size1" and "ROA1" changed. How can I identify the type of problem with my "Investment1" variable data?
Investment1 is the ratio of capital expenditure over the revenue [negative or positive and winsor2 Investment1, replace cuts(1 99)].

. correlate Size1 ROA1
(obs=1666)
	\|	Size1	ROA1
-------------	+	---------	---------
Size1	\|	1
ROA1	\|	0.3358	1


. correlate Size1 ROA1 Investment1
(obs=1419)
	\|	Size1	ROA1	Invest~1
-------------	+	---------	---------	---------
Size1	\|	1
ROA1	\|	0.1481	1
Investment1	\|	0.0879	-0.0295	1

Tags: None

Maarten Buis

Join Date: Mar 2014

Posts: 3467
#2

18 Feb 2017, 08:19

correlate excludes all observations that have missing values on any variable mentioned in the command. Otherwise it is a completely bivariate techniques, so the only way adding a third variable can influence the correlation between two other variables is through this effect on the sample.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35791
#3

18 Feb 2017, 08:20

The number of observations used changed from command to command. 247 observations were not used in the second calculation because Investment1 was missing in these observations.
Comment
Janio Mossinato

Join Date: Feb 2017

Posts: 15
#4

18 Feb 2017, 09:52

In this case, what is the correct way to display the correlation matrix? Should I use the "pwcorr" command?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30174
#5

18 Feb 2017, 09:57

Yes, the -pwcorr- command is probably what you want. It depends on why you are calculating these correlation coefficients. If you are simply interested separately in each pairwise correlation, then that is the way to go.

On the other hand, if you are thinking about the correlations in terms of how they would play out in a regression analysis, then, since regression models also eliminate all observations with any variable missing, the results of -corr- are what you should be looking at. Also, if your plan is to use the resulting correlation matrix as input to, say, a factor analysis or principal components or -sem-, the output of -pwcorr-, not being a real correlation matrix, may turn out not to be positive definite, and your calculations will fail at that point.
Comment
Janio Mossinato

Join Date: Feb 2017

Posts: 15
#6

18 Feb 2017, 10:21

My proposal is to show the correlation between the group of used variables, indicating the correlations measures, the strength and direction of the linear relationship between them (how they would play out in a regression analysis). So, I'll use the -corr- command instead of -pwcorr command.
Comment
Janio Mossinato

Join Date: Feb 2017

Posts: 15
#7

18 Feb 2017, 10:57

Tks Maarten, Nick and Clyde.
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3467
#8

18 Feb 2017, 11:49

If such a small change in sample size leads to such a large change in correlation, then that suggests outliers to me. So I would start with a simple scatterplot and see if that correlation isn't driven by a small number of observations.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
1 like
Comment

Announcement

Pearson's correlation coefficient - problems with a variable

Comment

Comment

Comment

Comment

Comment

Comment

Comment