Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Descriptive Statistics and Correlation Matrix - Panel Data

    Hi all,

    For my thesis, I want to show a correlation between my dependent and independent variable, including several control variables (just as the screenshot below with a comparable research).
    I use an unbalanced panel data set with 15 companies over the past 15 years.

    As I am rather new to Stata, which steps and tests do I have to perform to achieve that? With this analysis, I only want to show the correlation between the variables, without any regression analysis.

    Thank you
    Alex




  • #2
    Alexander:
    welcome to this forum.
    As you might have already read in the FAQ, posting screenshots is far from being encouraged on this forum (because they cannot be elaborated on). At the top of that, your screenshot was not attached.
    That said, your query is a bit vague, as correlation estimates can be very different according to the notion of correlation you look up to.
    The following toy-example can hopefully give you some hints:
    Code:
    . use http://www.stata-press.com/data/r15/invest2
    
    . xtgls invest market stock
    
    Cross-sectional time-series FGLS regression
    
    Coefficients:  generalized least squares
    Panels:        homoskedastic
    Correlation:   no autocorrelation
    
    Estimated covariances      =         1          Number of obs     =        100
    Estimated autocorrelations =         0          Number of groups  =          5
    Estimated coefficients     =         3          Time periods      =         20
                                                    Wald chi2(2)      =     352.19
    Log likelihood             = -624.9928          Prob > chi2       =     0.0000
    
    ------------------------------------------------------------------------------
          invest |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
          market |   .1050854   .0112059     9.38   0.000     .0831223    .1270485
           stock |   .3053655   .0428502     7.13   0.000     .2213806    .3893504
           _cons |  -48.02974   21.15551    -2.27   0.023    -89.49377   -6.565701
    ------------------------------------------------------------------------------
    
    . estat vce, corr
    
    Correlation matrix of coefficients of xtgls model
    
            e(V) |   market     stock     _cons
    -------------+------------------------------
          market |   1.0000                    
           stock |  -0.6116    1.0000          
           _cons |  -0.6329   -0.0074    1.0000
    
    . pwcorr invest market stock
    
                 |   invest   market    stock
    -------------+---------------------------
          invest |   1.0000
          market |   0.8164   1.0000
           stock |   0.7644   0.6116   1.0000
    
    .
    Hence, most depends on which notion of correlation you're interested in.
    Unsolicited advice: you should getting yourself familiar with the most frequently used Stata commands (see the .pdf manual that comes with Stata).

    .
    Last edited by Carlo Lazzaro; 08 Apr 2019, 04:12.
    Kind regards,
    Carlo
    (Stata 18.0 SE)

    Comment


    • #3
      Carlo Lazzaro Thank you for the fast reply and the warm welcome to this forum.

      I have used the time to further get into the topic and familiar myself with the most frequently used Stata commands.

      To add further clarification on what I want to achieve with my thesis, I want to see (in a simplified version) if there is a correlation between the gdp development of a country and the development in size and current ratio of companies in this country over time. According to my research, companies adapt over time to external factors and I want to examine that for the before mentioned variables.

      As I want to investigate the effects over time, I declared the data set as panel data.

      As suggested I ran the following command:

      Code:
      pwcorr gdp size cr
      
                   |      gdp     size       cr
      -------------+---------------------------
               gdp |   1.0000
              size |   0.0191   1.0000
                cr |  -0.0600  -0.1297   1.0000
      However, my question here is if "pwcorr" takes into consideration the development of the variables over time, hence if gdp increases by 1 unit, size and cr increase by x units, so is this command also applicable for a panel data set?

      As already mentioned in my previous post, given the low amount of companies, I do not want to investigate causality but rather if there is a correlation between the variables.

      Thank you
      AlexanderThank you
      Last edited by Alexander Cortolezis; 15 Apr 2019, 08:54.

      Comment


      • #4
        Alexander:
        I do not think that -pwcorr- will serve your research goal.
        Hence, I would recommend to consider panel data regression (that does not necessarily mean causation).
        Last edited by Carlo Lazzaro; 15 Apr 2019, 09:18.
        Kind regards,
        Carlo
        (Stata 18.0 SE)

        Comment


        • #5
          Carlo Lazzaro thank you for the answer!

          I tried to get an understanding on regression with panel data. As I understood, autocorrelation and heteroscedasticity might affect the panel design. Hence, I conducted a Wooldridge test (included below) to control for autocorrelation and a Breusch and Pagen (included below) test to assess the heteroscedasticity. Since the tests indicated to account for autocorrelation, I used xtgls, force corr(a1) to derive panel data regression (included below).

          Do you have any doubts about the approach? Moreover, as I am mainly interested in the descriptive statistics and correlation matrix of all variables, it is still unclear to me how to derive this with the panel data regression.


          Code:
          . xtserial emp_log_lag brent cr_lag acq_log_lag at_log_lag ceo_lag up_dep_lag, output
          
          Linear regression                               Number of obs     =         58
                                                          F(6, 10)          =       0.57
                                                          Prob > F          =     0.7458
                                                          R-squared         =     0.1206
                                                          Root MSE          =     .31616
          
                                         (Std. Err. adjusted for 11 clusters in comp_id)
          ------------------------------------------------------------------------------
          D.           |               Robust
           emp_log_lag |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
                 brent |
                   D1. |   .0009274   .0009177     1.01   0.336    -.0011175    .0029722
                       |
                cr_lag |
                   D1. |    .376262   .5338327     0.70   0.497    -.8131914    1.565715
                       |
           acq_log_lag |
                   D1. |   .0232451   .0162145     1.43   0.182    -.0128831    .0593734
                       |
            at_log_lag |
                   D1. |   .0015669   .0110593     0.14   0.890    -.0230748    .0262085
                       |
               ceo_lag |
                   D1. |    .013709   .0817998     0.17   0.870    -.1685523    .1959702
                       |
            up_dep_lag |
                   D1. |  -1.539699   1.318942    -1.17   0.270    -4.478485    1.399086
          ------------------------------------------------------------------------------
          
          Wooldridge test for autocorrelation in panel data
          H0: no first-order autocorrelation
              F(  1,       8) =    340.419
                     Prob > F =      0.0000
          Code:
          . hettest brent cr_lag acq_log_lag at_log_lag ceo_lag up_dep_lag
          
          Breusch-Pagan / Cook-Weisberg test for heteroskedasticity 
                   Ho: Constant variance
                   Variables: brent cr_lag acq_log_lag at_log_lag ceo_lag up_dep_lag
          
                   chi2(6)      =     5.79
                   Prob > chi2  =   0.4474
          Code:
          . xtgls emp_log_lag brent cr_lag acq_log_lag at_log_lag ceo_lag up_dep_lag, force corr(ar1)
          
          Cross-sectional time-series FGLS regression
          
          Coefficients:  generalized least squares
          Panels:        homoskedastic
          Correlation:   common AR(1) coefficient for all panels  (0.7983)
          
          Estimated covariances      =         1          Number of obs     =        100
          Estimated autocorrelations =         1          Number of groups  =         13
          Estimated coefficients     =         7          Obs per group:
                                                                        min =          2
                                                                        avg =   7.692308
                                                                        max =         13
                                                          Wald chi2(6)      =      33.02
                                                          Prob > chi2       =     0.0000
          
          ------------------------------------------------------------------------------
           emp_log_lag |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
                 brent |   .0028673   .0019264     1.49   0.137    -.0009083    .0066429
                cr_lag |   .0068248   .2040249     0.03   0.973    -.3930566    .4067063
           acq_log_lag |   .0281237   .0167628     1.68   0.093    -.0047309    .0609783
            at_log_lag |   .0357351   .0294845     1.21   0.226    -.0220534    .0935235
               ceo_lag |   .0408284   .1007383     0.41   0.685     -.156615    .2382719
            up_dep_lag |  -2.968454   .5930319    -5.01   0.000    -4.130775   -1.806133
                 _cons |   2.692214   .4747354     5.67   0.000      1.76175    3.622679
          ------------------------------------------------------------------------------
          
          .

          Comment


          • #6
            Alexander:
            if you have panel data and you're interested in descriptive statistics and correlation, you may want to try something along the following lines:
            Code:
            . use "http://www.stata-press.com/data/r15/nlswork.dta"
            (National Longitudinal Survey.  Young Women 14-26 years of age in 1968)
            
            . xtreg ln_wage age i.race, re
            
            Random-effects GLS regression                   Number of obs     =     28,510
            Group variable: idcode                          Number of groups  =      4,710
            
            R-sq:                                           Obs per group:
                 within  = 0.1026                                         min =          1
                 between = 0.1032                                         avg =        6.1
                 overall = 0.0945                                         max =         15
            
                                                            Wald chi2(3)      =    3242.34
            corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000
            
            ------------------------------------------------------------------------------
                 ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                     age |    .018534    .000331    55.99   0.000     .0178852    .0191828
                         |
                    race |
                  black  |  -.1209428   .0129079    -9.37   0.000    -.1462418   -.0956439
                  other  |   .0981941   .0538424     1.82   0.068    -.0073351    .2037233
                         |
                   _cons |    1.15423   .0118069    97.76   0.000     1.131089    1.177371
            -------------+----------------------------------------------------------------
                 sigma_u |  .36581626
                 sigma_e |  .30349389
                     rho |  .59231394   (fraction of variance due to u_i)
            ------------------------------------------------------------------------------
            
            . estat vce, corr
            
            Correlation matrix of coefficients of xtreg model
            
                         |                  2.        3.         
                    e(V) |      age      race      race     _cons
            -------------+----------------------------------------
                     age |   1.0000                              
                  2.race |   0.0122    1.0000                    
                  3.race |  -0.0011    0.0680    1.0000          
                   _cons |  -0.8129   -0.3202   -0.0735    1.0000
            
            . xtsum ln_wage age race
            
            Variable         |      Mean   Std. Dev.       Min        Max |    Observations
            -----------------+--------------------------------------------+----------------
            ln_wage  overall |  1.674907   .4780935          0   5.263916 |     N =   28534
                     between |              .424569          0   3.912023 |     n =    4711
                     within  |               .29266  -.4077221    4.78367 | T-bar = 6.05689
                             |                                            |
            age      overall |  29.04511   6.700584         14         46 |     N =   28510
                     between |             5.485756         14         45 |     n =    4710
                     within  |              5.16945   14.79511   43.79511 | T-bar = 6.05308
                             |                                            |
            race     overall |  1.303392   .4822773          1          3 |     N =   28534
                     between |             .4862111          1          3 |     n =    4711
                     within  |                    0   1.303392   1.303392 | T-bar = 6.05689
            Kind regards,
            Carlo
            (Stata 18.0 SE)

            Comment


            • #7
              Thank you a lot, Carlo Lazzaro , you are a big support!!
              Kind regards,
              Alexander

              Comment

              Working...
              X