Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • correlation analysis

    Hi,

    For small study/hypothesis generating work sometimes it is recommended to do correlation analysis. As per my uderstanding for continous outcome one can use perason correlation.
    But what if
    1) outcome is binary?
    2) Which factor decides the steps of correlation analysis?
    3) Does type of predictors dictates the steps of correlation analysis?

    Thanks

  • #2
    Sandeep:
    1) tecnnically speaking, you can correlate any type of variables: the issue rests on the meaning of the outcome;
    2) the goal of a correlation analysis is to investigate the relationship between two (or more) variables; the type of variables has no bearing on the procedure;
    3) if you mention predictors you're implicitly switching from correlation to regression.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      On 1) in #1 Pearson correlation is defined for pairs of binary variables so long as each has some 0s and same 1s.. and a little thought shows that Spearman correlation would give the same answer.

      Correlation between a binary predictor and a non-binary outcome can be useful sometimes.

      Comment


      • #4
        Carlo Lazzaro : Thanks for your help. My project has small sample size and there are no previous studies in literature with small sample size methodology. Someone recommended to start with correlation analysis. As I can understand from your explanation - if we segregate variables as predictors and outcome, it won’t analyse correlation. It reflects how one variable is predicting the other variable. For correlation analysis we plug in variables to see their relationship without segregating as predictor and outcome.
        Basically its' about using the right terminology. Instead of saying predictor- idea is to plug in variables and see the correlation.
        Last edited by sandeep kaur; 19 Jul 2022, 12:27.

        Comment


        • #5
          Nick Cox : Thanks for your reply. Does that mean pearson correlation can be used for any type of variables? Are there other methods?

          Comment


          • #6
            Sandeep:
            1) I'd follow Nick's advice as far as correlation is concerned;
            2) I do share your distinction between correlation and regression. That said, what I do not understand about the advice you received is "let's start from the correlation". While it could be an approach of exploratory data analysis, the flip side is that you should have at least an idea of the set of statistical analyses that you're going to perform, especially if you're planning to submit your paper to a technical journal of your research field.
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment


            • #7
              Carlo Lazzaro Thanks for the clarification. Plan is to do regression analysis. It’s because study is underpowered , regression might not give answers with many covariates. And we might end up submitting it as hypothesis generating work. I guess in that situation, atleast correlation between variables should be tested.

              Is exploratory data analysis same as descriptive statistics?
              Last edited by sandeep kaur; 19 Jul 2022, 14:30.

              Comment


              • #8
                Sandeep:
                1) I see the issue. But betting all in on correlation may give you an incomplete picture of the data generating process;
                2) not quite. See: https://link.springer.com/referencew...87-32833-1_136
                Kind regards,
                Carlo
                (Stata 19.0)

                Comment


                • #9
                  #5 No; as Pearson correlation does not apply to nominal scale variables.

                  Comment


                  • #10
                    Carlo Lazzaro
                    Thanks for sharing the link and clarifying the issues.

                    A) In case correlation is being tested between predictor and outcome variable:
                    1) High correlation between predictor and outcome variables is good indicator and should be further tested with tests like regression etc.

                    B) In case correlation is being tested for variables that are predictors:
                    1) Correlation means presence of collinearity. If two variables are correlated that means they are collinear. If more than 2 variables, it is multicollinearity.
                    2) Accordingly predictor should be dropped.

                    C) Can there be collinearity without correlation?
                    Is this where tests like VIF are used?

                    Comment


                    • #11
                      Nick Cox

                      Thanks for the clarification. Is there any resource/article where different tests for correlation are mentioned? Most of the articles/pages I have been looking for mentions moslty about Pearson correlation but without much clarity about the type of variables uses.

                      Comment


                      • #12
                        Depends where you look, My impression for example is that economists rarely look beyond Pearson correlation, but there is at least one book entirely on rank correlation and covering Spearman and Kendall rank correlations is standard in texts on nonparametric statistics.

                        Comment


                        • #13
                          Nick Cox: Sounds good. Thanks for helping with the queries.

                          Comment


                          • #14
                            Sandeep:
                            A) yes, but as you know -regress- unlike correlation, is one direction only ( regressand regressed on regressors) and the effect of each predictor on the conditional mean of the regressand is adjusted for the other predictors;
                            B) and C): not quite. If you take a look at A. Goldberger's textbook A course in econometrics, Chapter 23, you will read that multicollinearity is often an oversold issue. In addition, in case of perfect collinearity, Stata will drop one of your culprit by default. If the quasi-extreme multicollinrarity does not produce weird standard errors, you can leave with it without dropping anything.
                            That said you cannot have multicollinearity without correlation.
                            Last edited by Carlo Lazzaro; 20 Jul 2022, 22:41.
                            Kind regards,
                            Carlo
                            (Stata 19.0)

                            Comment


                            • #15
                              Carlo Lazzaro
                              Thanks for providing the link. To summarize:

                              1) One cannot have collinearity without correlation

                              2) But presence of correlation does not always indicates presence of collinearity. For example, as seen in some articles, that's why: correalation analysis (pearson -coefficients) is followed by tests like VIF to check for collinearity

                              Comment

                              Working...
                              X