Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Autocorrelation Via Transforming Data from Panel to Time Series

    Dear all,

    I have a panel of immigration stocks in the US spanning over 15 decadal census years (from 1870 through 2010) across 3,141 counties. What I'm aiming to do with it is the following:

    I want to see if immigration stocks in counties are serially correlated over time. For this purpose, I have initially reshaped the dataset to the wide format so that it resembles a time-series dataset. In this regard, I have run the -greshape- command, which has given me a format with 15 rows (corresponding to decadal 15 census years) and 3,141 columns (immigration stock variable for each county). Now, I need to generate a correlation coefficient for each pair of counties (in total, I will have 4,931,370 pairs: 3,141*3,140 / 2). Afterwards, I need to detect those counties with immigration stocks that are highly correlated. High correlation in my case refers to a positive correlation of 0.5 and above at 5% level of significance. I have tried several scenarios with -pwcorr- and -xcorr- commands, but it won't take me anywhere due to the following. First, I'm running into memory problems, since I'm using the SE version of Stata 15.1. Second, I will need to generate a new variable(s) indicating whether correlation coefficients are above 0.5, which again would give me "memory errors."

    Here's an excerpt from my ORIGINAL (not reshaped) dataset, and the command that I have run to reshape it:



    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float year double(state_county_code_1990 imm_stock)
    1870 10010    .015000000341773576
    1870 10030                   .098
    1870 10050     .10094674141131689
    1870 10070    .013642883346648886
    1870 10090    .006081119205802678
    1870 10110     .05491357941209572
    1870 10130      .0579797865152359
    1870 10150     .05041185797285288
    1870 10170     .04302016248682049
    1870 10190    .017300943067763003
    1870 10210    .018799061596393586
    1870 10230    .026000000132214436
    1870 10250     .03900000011988913
    1870 10270    .005811569597135531
    1870 10290    .007615265620523132
    1870 10310   .0010011623820755631
    1870 10330     .05905699549615384
    1870 10350    .013584035953506827
    1870 10370   .0030009013253147715
    1870 10390 .000017211103811860084
    1870 10410   .0032932473340188157
    1870 10430     .00757321541942656
    1870 10450   .0026323572993278504
    1870 10470     .31306538736820216
    1870 10490    .017295631527900694
    1870 10510     .04100000000000001
    1870 10530    .028415964488862726
    1870 10550     .02841466610506177
    1870 10570                   .002
    1870 10590    .011000000000000001
    1870 10610   .0017494181394577027
    1870 10630     .05700000057400152
    1870 10650     .09286468018963934
    1870 10670    .002873641550540924
    1870 10690   .0027434206455945968
    1870 10710    .020009471569674586
    1870 10730     .02951887506339699
    1870 10750                   .008
    1870 10770    .047999999999999994
    1870 10790     .03094300425052643
    1870 10810    .038985324926674364
    1870 10830                   .031
    1870 10850      .0570013605214615
    1870 10870     .03207139539450873
    1870 10890     .19400000008748808
    1870 10910                   .086
    1870 10930                   .003
    1870 10950    .009902857542037964
    1870 10970      4.144895888964645
    1870 10990    .020999999999999998
    1870 11010      .7530976727786474
    1870 11030     .07640065670013428
    1870 11050     .07513531780242921
    1870 11070                   .022
    1870 11090     .01673010419635102
    1870 11110     .00897827250233968
    1870 11130     .04495203609474993
    1870 11150   .0065828300644643605
    1870 11170     .06386433302302612
    1870 11190     .07800000058050402
    1870 11210    .061189926147460945
    1870 11230     .00697234958410263
    1870 11250      .2155970461186953
    1870 11270   .0005691434796899558
    1870 11290    .017104084895690902
    1870 11310                   .056
    1870 11330   .0013931106328964233
    1870 40010     .06772000819444657
    1870 40030     .11148425900936128
    1870 40050     .11261727261543275
    1870 40070     .02894948261976242
    1870 40090    .061296866863965994
    1870 40110    .014875692646950484
    1870 40120     .05443518477678299
    1870 40130     .09173458960652352
    1870 40150     .05572529421746731
    1870 40170     .06012048554420472
    1870 40190     .16472811722755432
    1870 40210     .08248884278535844
    1870 40230    .022195598095655442
    1870 40250     .04906348389387131
    1870 40270     .06656481522321701
    1870 50010     .10013019472360611
    1870 50030    .036000000000000004
    1870 50050    .002128363482654095
    1870 50070     .03094930160045624
    1870 50090                   .008
    1870 50110   .0070616710782051085
    1870 50130                   .003
    1870 50150    .006026492448174395
    1870 50170     .08018668889999389
    1870 50190     .00898306369781494
    1870 50210    .004427015572786332
    1870 50230    .003855020858347416
    1870 50250    .030786989554762837
    1870 50270    .010470308661460877
    1870 50290    .019335971677326598
    1870 50310                   .005
    1870 50330     .07560726058250292
    1870 50350      .0196459524193433
    end

    Code:
    greshape wide imm_stock, i(year) j(state_county_code_1990)


    All comments and suggestions are highly appreciated!

    Wolfgang.

  • #2
    Given the way the -corr- and -pwcorr- commands work, I understand why your instinct was to go to a wide layout. But, as you have seen, this approach fails due to limited memory in the face of the large number of pairs leading to the impossibility of creating a correlation variable for each pair. But you can do this in long layout using a different approach. Starting with your un-reshaped data set:

    Code:
    tempfile copy
    save `copy'
    rangejoin state_county_code_1990 1 . using `copy', by(year)
    rangestat (corr) imm_stock imm_stock_U, ///
        by(state_county_code_1990 state_county_code_1990_U) interval(year . .)
    gen t_stat = corr_x*sqrt(corr_nobs-2)/sqrt(1-corr_x^2)
    gen p_value = 2*ttail(n_obs-2, abs(t_stat))
    gen byte select =abs(corr_x) > 0.5 & p_value < 0.05
    should get you what you want.

    Notes:
    1. I am assuming you want to filter your correlations on the basis of two-tails. That is, you are just as interested in a correlation of -.7 as you are in a correlation of + .7.
    2. -rangestat- is written by Robert Picard, Nick Cox, and Roberto Ferrer. It is available from SSC.
    3. -rangejoin- is written by Robert Picard, and is also available from SSC. Use of -rangejoin- requires installation of -rangestat-.
    4. Because you provide only one year's data in the example, each county-pair has only a single observation, and therefore no correlations can be calculated. Presumably this is not the case in your real data set. But it does mean that this code is untested. Beware of typos or other errors.

    Comment


    • #3
      Clyde,

      Thank you so much for your response! I got two errors in computing p-values and generating the variable named "select," which was because instead of "n_obs-2," it should have been "corr_nobs-2" inside the parentheses in computing p-values. Other than that, your codes worked perfectly, and in fact, they got me what I wanted.

      Notes:
      1. Yes, indeed, I needed to compute correlations on the basis of two-tails, and tease out the ones taking values 0.5 and above at 5% level of significance.
      2. Even though I had both -rangestat- and -rangejoin- installed in Stata, which I installed upon reading similar posts here, I didn't obtain what I wanted despite several attempts using them. I owe this to me not knowing how to utilize those commands.
      3. I was able to provide only one year's data as an example, since it's my understanding that the number of observations based on -dataex- command is limited to only 100, and those 100 observations happened to correspond to only one year, 1870. Still, I have 15 decadal census years, and your suggested codes did an excellent job in giving me a solution for my task.

      Thanks again for your great help!

      Wolfgang.

      Comment


      • #4
        Sorry about that n_obs-2 error. Glad you recognized it should have been corr_nobs-2. And glad it all worked to get you what you wanted.

        -dataex- produces 100 observations by default. But that is not a limit. You can specify however many observations you want by adding the -count(#)- observation to your -dataex- command. # can be more or less than 100.

        Comment


        • #5
          Clyde,

          Thanks for the great tip on -dataex-. I will keep that in mind for posting in the future.

          Wolfgang.

          Comment

          Working...
          X