Autocorrelation Via Transforming Data from Panel to Time Series

Wolfgang Stein

Join Date: Jun 2022
Posts: 8

Autocorrelation Via Transforming Data from Panel to Time Series

03 Jan 2023, 19:55

Dear all,

I have a panel of immigration stocks in the US spanning over 15 decadal census years (from 1870 through 2010) across 3,141 counties. What I'm aiming to do with it is the following:

I want to see if immigration stocks in counties are serially correlated over time. For this purpose, I have initially reshaped the dataset to the wide format so that it resembles a time-series dataset. In this regard, I have run the -greshape- command, which has given me a format with 15 rows (corresponding to decadal 15 census years) and 3,141 columns (immigration stock variable for each county). Now, I need to generate a correlation coefficient for each pair of counties (in total, I will have 4,931,370 pairs: 3,141*3,140 / 2). Afterwards, I need to detect those counties with immigration stocks that are highly correlated. High correlation in my case refers to a positive correlation of 0.5 and above at 5% level of significance. I have tried several scenarios with -pwcorr- and -xcorr- commands, but it won't take me anywhere due to the following. First, I'm running into memory problems, since I'm using the SE version of Stata 15.1. Second, I will need to generate a new variable(s) indicating whether correlation coefficients are above 0.5, which again would give me "memory errors."

Here's an excerpt from my ORIGINAL (not reshaped) dataset, and the command that I have run to reshape it:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float year double(state_county_code_1990 imm_stock)
1870 10010    .015000000341773576
1870 10030                   .098
1870 10050     .10094674141131689
1870 10070    .013642883346648886
1870 10090    .006081119205802678
1870 10110     .05491357941209572
1870 10130      .0579797865152359
1870 10150     .05041185797285288
1870 10170     .04302016248682049
1870 10190    .017300943067763003
1870 10210    .018799061596393586
1870 10230    .026000000132214436
1870 10250     .03900000011988913
1870 10270    .005811569597135531
1870 10290    .007615265620523132
1870 10310   .0010011623820755631
1870 10330     .05905699549615384
1870 10350    .013584035953506827
1870 10370   .0030009013253147715
1870 10390 .000017211103811860084
1870 10410   .0032932473340188157
1870 10430     .00757321541942656
1870 10450   .0026323572993278504
1870 10470     .31306538736820216
1870 10490    .017295631527900694
1870 10510     .04100000000000001
1870 10530    .028415964488862726
1870 10550     .02841466610506177
1870 10570                   .002
1870 10590    .011000000000000001
1870 10610   .0017494181394577027
1870 10630     .05700000057400152
1870 10650     .09286468018963934
1870 10670    .002873641550540924
1870 10690   .0027434206455945968
1870 10710    .020009471569674586
1870 10730     .02951887506339699
1870 10750                   .008
1870 10770    .047999999999999994
1870 10790     .03094300425052643
1870 10810    .038985324926674364
1870 10830                   .031
1870 10850      .0570013605214615
1870 10870     .03207139539450873
1870 10890     .19400000008748808
1870 10910                   .086
1870 10930                   .003
1870 10950    .009902857542037964
1870 10970      4.144895888964645
1870 10990    .020999999999999998
1870 11010      .7530976727786474
1870 11030     .07640065670013428
1870 11050     .07513531780242921
1870 11070                   .022
1870 11090     .01673010419635102
1870 11110     .00897827250233968
1870 11130     .04495203609474993
1870 11150   .0065828300644643605
1870 11170     .06386433302302612
1870 11190     .07800000058050402
1870 11210    .061189926147460945
1870 11230     .00697234958410263
1870 11250      .2155970461186953
1870 11270   .0005691434796899558
1870 11290    .017104084895690902
1870 11310                   .056
1870 11330   .0013931106328964233
1870 40010     .06772000819444657
1870 40030     .11148425900936128
1870 40050     .11261727261543275
1870 40070     .02894948261976242
1870 40090    .061296866863965994
1870 40110    .014875692646950484
1870 40120     .05443518477678299
1870 40130     .09173458960652352
1870 40150     .05572529421746731
1870 40170     .06012048554420472
1870 40190     .16472811722755432
1870 40210     .08248884278535844
1870 40230    .022195598095655442
1870 40250     .04906348389387131
1870 40270     .06656481522321701
1870 50010     .10013019472360611
1870 50030    .036000000000000004
1870 50050    .002128363482654095
1870 50070     .03094930160045624
1870 50090                   .008
1870 50110   .0070616710782051085
1870 50130                   .003
1870 50150    .006026492448174395
1870 50170     .08018668889999389
1870 50190     .00898306369781494
1870 50210    .004427015572786332
1870 50230    .003855020858347416
1870 50250    .030786989554762837
1870 50270    .010470308661460877
1870 50290    .019335971677326598
1870 50310                   .005
1870 50330     .07560726058250292
1870 50350      .0196459524193433
end

Code:

greshape wide imm_stock, i(year) j(state_county_code_1990)

All comments and suggestions are highly appreciated!

Wolfgang.

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30078
#2

03 Jan 2023, 21:18

Given the way the -corr- and -pwcorr- commands work, I understand why your instinct was to go to a wide layout. But, as you have seen, this approach fails due to limited memory in the face of the large number of pairs leading to the impossibility of creating a correlation variable for each pair. But you can do this in long layout using a different approach. Starting with your un-reshaped data set:

Code:

tempfile copy save `copy' rangejoin state_county_code_1990 1 . using `copy', by(year) rangestat (corr) imm_stock imm_stock_U, /// by(state_county_code_1990 state_county_code_1990_U) interval(year . .) gen t_stat = corr_x*sqrt(corr_nobs-2)/sqrt(1-corr_x^2) gen p_value = 2*ttail(n_obs-2, abs(t_stat)) gen byte select =abs(corr_x) > 0.5 & p_value < 0.05

should get you what you want.

Notes:
1. I am assuming you want to filter your correlations on the basis of two-tails. That is, you are just as interested in a correlation of -.7 as you are in a correlation of + .7.
2. -rangestat- is written by Robert Picard, Nick Cox, and Roberto Ferrer. It is available from SSC.
3. -rangejoin- is written by Robert Picard, and is also available from SSC. Use of -rangejoin- requires installation of -rangestat-.
4. Because you provide only one year's data in the example, each county-pair has only a single observation, and therefore no correlations can be calculated. Presumably this is not the case in your real data set. But it does mean that this code is untested. Beware of typos or other errors.
Comment
Wolfgang Stein

Join Date: Jun 2022

Posts: 8
#3

04 Jan 2023, 15:26

Clyde,

Thank you so much for your response! I got two errors in computing p-values and generating the variable named "select," which was because instead of "n_obs-2," it should have been "corr_nobs-2" inside the parentheses in computing p-values. Other than that, your codes worked perfectly, and in fact, they got me what I wanted.

Notes:
1. Yes, indeed, I needed to compute correlations on the basis of two-tails, and tease out the ones taking values 0.5 and above at 5% level of significance.
2. Even though I had both -rangestat- and -rangejoin- installed in Stata, which I installed upon reading similar posts here, I didn't obtain what I wanted despite several attempts using them. I owe this to me not knowing how to utilize those commands.
3. I was able to provide only one year's data as an example, since it's my understanding that the number of observations based on -dataex- command is limited to only 100, and those 100 observations happened to correspond to only one year, 1870. Still, I have 15 decadal census years, and your suggested codes did an excellent job in giving me a solution for my task.

Thanks again for your great help!

Wolfgang.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30078
#4

04 Jan 2023, 15:34

Sorry about that n_obs-2 error. Glad you recognized it should have been corr_nobs-2. And glad it all worked to get you what you wanted.

-dataex- produces 100 observations by default. But that is not a limit. You can specify however many observations you want by adding the -count(#)- observation to your -dataex- command. # can be more or less than 100.
Comment
Wolfgang Stein

Join Date: Jun 2022

Posts: 8
#5

04 Jan 2023, 16:30

Clyde,

Thanks for the great tip on -dataex-. I will keep that in mind for posting in the future.

Wolfgang.
Comment

Announcement

Autocorrelation Via Transforming Data from Panel to Time Series

Comment

Comment

Comment

Comment