correlation analysis

sandeep kaur

Join Date: Jul 2022

Posts: 60
#1

correlation analysis

18 Jul 2022, 21:48

Hi,

For small study/hypothesis generating work sometimes it is recommended to do correlation analysis. As per my uderstanding for continous outcome one can use perason correlation.
But what if
1) outcome is binary?
2) Which factor decides the steps of correlation analysis?
3) Does type of predictors dictates the steps of correlation analysis?

Thanks
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17852
#2

18 Jul 2022, 22:49

Sandeep:
1) tecnnically speaking, you can correlate any type of variables: the issue rests on the meaning of the outcome;
2) the goal of a correlation analysis is to investigate the relationship between two (or more) variables; the type of variables has no bearing on the procedure;
3) if you mention predictors you're implicitly switching from correlation to regression.

Kind regards,
Carlo
(Stata 19.0)
Comment
Nick Cox

Join Date: Mar 2014

Posts: 36054
#3

19 Jul 2022, 03:55

On 1) in #1 Pearson correlation is defined for pairs of binary variables so long as each has some 0s and same 1s.. and a little thought shows that Spearman correlation would give the same answer.

Correlation between a binary predictor and a non-binary outcome can be useful sometimes.
1 like
Comment
sandeep kaur

Join Date: Jul 2022

Posts: 60
#4

19 Jul 2022, 12:24

Carlo Lazzaro : Thanks for your help. My project has small sample size and there are no previous studies in literature with small sample size methodology. Someone recommended to start with correlation analysis. As I can understand from your explanation - if we segregate variables as predictors and outcome, it won’t analyse correlation. It reflects how one variable is predicting the other variable. For correlation analysis we plug in variables to see their relationship without segregating as predictor and outcome.
Basically its' about using the right terminology. Instead of saying predictor- idea is to plug in variables and see the correlation.

Last edited by sandeep kaur; 19 Jul 2022, 12:27.
Comment
sandeep kaur

Join Date: Jul 2022

Posts: 60
#5

19 Jul 2022, 12:28

Nick Cox : Thanks for your reply. Does that mean pearson correlation can be used for any type of variables? Are there other methods?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17852
#6

19 Jul 2022, 13:32

Sandeep:
1) I'd follow Nick's advice as far as correlation is concerned;
2) I do share your distinction between correlation and regression. That said, what I do not understand about the advice you received is "let's start from the correlation". While it could be an approach of exploratory data analysis, the flip side is that you should have at least an idea of the set of statistical analyses that you're going to perform, especially if you're planning to submit your paper to a technical journal of your research field.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
sandeep kaur

Join Date: Jul 2022

Posts: 60
#7

19 Jul 2022, 14:21

Carlo Lazzaro Thanks for the clarification. Plan is to do regression analysis. It’s because study is underpowered , regression might not give answers with many covariates. And we might end up submitting it as hypothesis generating work. I guess in that situation, atleast correlation between variables should be tested.

Is exploratory data analysis same as descriptive statistics?

Last edited by sandeep kaur; 19 Jul 2022, 14:30.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17852
#8

20 Jul 2022, 00:00

Sandeep:
1) I see the issue. But betting all in on correlation may give you an incomplete picture of the data generating process;
2) not quite. See: https://link.springer.com/referencew...87-32833-1_136

Kind regards,
Carlo
(Stata 19.0)
Comment
Nick Cox

Join Date: Mar 2014

Posts: 36054
#9

20 Jul 2022, 00:21

#5 No; as Pearson correlation does not apply to nominal scale variables.
1 like
Comment
sandeep kaur

Join Date: Jul 2022

Posts: 60
#10

20 Jul 2022, 11:12

Carlo Lazzaro
Thanks for sharing the link and clarifying the issues.

A) In case correlation is being tested between predictor and outcome variable:
1) High correlation between predictor and outcome variables is good indicator and should be further tested with tests like regression etc.

B) In case correlation is being tested for variables that are predictors:
1) Correlation means presence of collinearity. If two variables are correlated that means they are collinear. If more than 2 variables, it is multicollinearity.
2) Accordingly predictor should be dropped.

C) Can there be collinearity without correlation?
Is this where tests like VIF are used?
Comment
sandeep kaur

Join Date: Jul 2022

Posts: 60
#11

20 Jul 2022, 11:15

Nick Cox

Thanks for the clarification. Is there any resource/article where different tests for correlation are mentioned? Most of the articles/pages I have been looking for mentions moslty about Pearson correlation but without much clarity about the type of variables uses.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 36054
#12

20 Jul 2022, 11:49

Depends where you look, My impression for example is that economists rarely look beyond Pearson correlation, but there is at least one book entirely on rank correlation and covering Spearman and Kendall rank correlations is standard in texts on nonparametric statistics.
1 like
Comment
sandeep kaur

Join Date: Jul 2022

Posts: 60
#13

20 Jul 2022, 14:28

Nick Cox: Sounds good. Thanks for helping with the queries.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17852
#14

20 Jul 2022, 22:38

Sandeep:
A) yes, but as you know -regress- unlike correlation, is one direction only ( regressand regressed on regressors) and the effect of each predictor on the conditional mean of the regressand is adjusted for the other predictors;
B) and C): not quite. If you take a look at A. Goldberger's textbook A course in econometrics, Chapter 23, you will read that multicollinearity is often an oversold issue. In addition, in case of perfect collinearity, Stata will drop one of your culprit by default. If the quasi-extreme multicollinrarity does not produce weird standard errors, you can leave with it without dropping anything.
That said you cannot have multicollinearity without correlation.

Last edited by Carlo Lazzaro; 20 Jul 2022, 22:41.

Kind regards,
Carlo
(Stata 19.0)
Comment
sandeep kaur

Join Date: Jul 2022

Posts: 60
#15

21 Jul 2022, 13:12

Carlo Lazzaro
Thanks for providing the link. To summarize:

1) One cannot have collinearity without correlation

2) But presence of correlation does not always indicates presence of collinearity. For example, as seen in some articles, that's why: correalation analysis (pearson -coefficients) is followed by tests like VIF to check for collinearity
Comment

Announcement

correlation analysis

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment