Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Panel Data- Correlation

    Greetings,

    I have a panel dataset of 522,000 firms operating in 70 countries, and the timeline in question ranges from 2000 to 2014, where the FIRMID is my panel variable and Year my time variable.
    I want to analyze the relationship between two variables ( say, X (firm-level variable) and Y(country-level variable)). However, before starting with my analysis I want to see the whether in the cross-section Y is correlated to X. I have taken the average of X in each country per year, and also the average of Y per country, per year. However, i am not quite sure how to proceed. As far as I know, I can not use pwcorr, since I have a panel dataset.
    What would be the appropriate approach to this problem ?

    My dependent variable: log(X)
    Independent variable: Y
    This is what I have done:
    Code:
    bys country year: egen average_X= mean(X)
    gen logAverage_X=ln(average_X)]
    
    bys country year: egen average_Y=mean(Y)
    Any help and comments would be greatly appreciated! I could use some enlightenment.
    Thank you in advance for your time!
    Last edited by Daniela Fuji; 16 Jul 2017, 09:24.

  • #2
    I think you are not getting a quick answer here because it's not clear what you mean by "in cross section." The usual definition of "cross section" would mean a correlation between X and Y across all available firms within a given country and year. You could also ignore country and just do a within year correlation. If you mean that, I can't see why you would want to aggregate the data. I'm assuming the data are in long form. If so, bys country year: corr Y X would give you a cross sectional correlation for each country/year combination for which you have data. But your question seems to imply that you want to compare aggregate correlations across, what, countries, years, both? If that is the case, you may need to think about reshaping the data into wide from. But I can't tell.what you mean by "cross section" so you need to explain it a bit more.
    Richard T. Campbell
    Emeritus Professor of Biostatistics and Sociology
    University of Illinois at Chicago

    Comment


    • #3
      Dear Sir Campbell,

      Thank you for your reply!
      I do understand your confusion, and I apologize for it. I am new to the terminology and Stata as well.

      Yes, my data is in long format.
      I aggregate the data because I want to have both country-level variables, and want to see how do the two correlate across countries and years.
      e.g., the average Investment at the level of countries per year, and look at the correlation, whether inflation is correlated with the average size of investment.


      Does that make sense?
      Thank you again!

      Comment


      • #4
        If you can give us some more information about how your data are currently arranged it would be easier to give you further guidance. You can use the dataex routine, available from the SSC archive to help with this. I am guessing your data are currently in long form with one line of data per firm per year with a country id on the record. But that may not be correct, and without more information it is hard to give you good advice.
        Richard T. Campbell
        Emeritus Professor of Biostatistics and Sociology
        University of Illinois at Chicago

        Comment


        • #5
          Hello everyone,

          I am new to the forum and I also have a problem related to panel data and correlation.
          My dataset contains among others variables such as IVOL and stock-related characteristics (such as age) of different stocks per month. I would like to generate two variables containing first the monthly (rank) correlation of IVOL with age and second its respective significance level. I have been able to generate a variable containing the correlation coefficient per month with the following command:

          statsby "corr IVOL age" corr = r(rho), by(month_id)

          However, if I try the same command with the significance level, that does not work. If I use the following command instead:

          by month_id: pwcorr IVOL age, sig star(.01)

          that somehow works, but does not generate a new variable.

          My overall goal is to report the share of months for which the (rank) correlation of IVOL and age is significant at the 1% level.

          Any help is highly appreciated. Many thanks in advance.

          Kind regards,
          Guest


          Last edited by sladmin; 24 Jul 2017, 10:07. Reason: anonymize poster

          Comment


          • #6
            You need to show us the output. Telling us something "doesn't work" doesn't provide the reader with enough information to help you.
            Richard T. Campbell
            Emeritus Professor of Biostatistics and Sociology
            University of Illinois at Chicago

            Comment


            • #7
              Excuse me for the inconveniences caused and thank you for your reply.

              I have data in following format:

              _id dscd IVOL size
              348 255956 4 25
              348 313401 14 8
              348 329138 21 2
              348 501264 23 11
              349 501300 19 3
              349 510056 11 17
              349 510057 3 13
              349 255956 4 10
              ...

              Now I would like to calculate for each _id the correlation between IVOL and size and report the results in a new variable. I have been able to do this with the following command:

              statsby "corr IVOL size" corr = r(rho), by(_id)

              However, I am also interested in the respective significance level of the correlation coefficient in order to find out share of _ids for which the correlation is significant at
              the 1% level.

              If i use statsby "corr IVOL size" corr = r(sig), by(month_id)
              - that indicates an empty variable for each observation, such as:

              _id corr
              348 -
              349 -
              350 -
              351 -
              352 -
              353 -

              I know that one can report significance level with the -pwcorr- command, however, that can not be combined with by() - which I need as I have to report by _id. I am also not eager to calculate the significance level manually.

              I hope my problem is a bit clearer now, sorry again for the inconveniences.

              Regards,
              Guest
              Last edited by sladmin; 24 Jul 2017, 10:07. Reason: anonymize poster

              Comment

              Working...
              X