Dear all,
I have a panel of immigration stocks in the US spanning over 15 decadal census years (from 1870 through 2010) across 3,141 counties. What I'm aiming to do with it is the following:
I want to see if immigration stocks in counties are serially correlated over time. For this purpose, I have initially reshaped the dataset to the wide format so that it resembles a time-series dataset. In this regard, I have run the -greshape- command, which has given me a format with 15 rows (corresponding to decadal 15 census years) and 3,141 columns (immigration stock variable for each county). Now, I need to generate a correlation coefficient for each pair of counties (in total, I will have 4,931,370 pairs: 3,141*3,140 / 2). Afterwards, I need to detect those counties with immigration stocks that are highly correlated. High correlation in my case refers to a positive correlation of 0.5 and above at 5% level of significance. I have tried several scenarios with -pwcorr- and -xcorr- commands, but it won't take me anywhere due to the following. First, I'm running into memory problems, since I'm using the SE version of Stata 15.1. Second, I will need to generate a new variable(s) indicating whether correlation coefficients are above 0.5, which again would give me "memory errors."
Here's an excerpt from my ORIGINAL (not reshaped) dataset, and the command that I have run to reshape it:
All comments and suggestions are highly appreciated!
Wolfgang.
I have a panel of immigration stocks in the US spanning over 15 decadal census years (from 1870 through 2010) across 3,141 counties. What I'm aiming to do with it is the following:
I want to see if immigration stocks in counties are serially correlated over time. For this purpose, I have initially reshaped the dataset to the wide format so that it resembles a time-series dataset. In this regard, I have run the -greshape- command, which has given me a format with 15 rows (corresponding to decadal 15 census years) and 3,141 columns (immigration stock variable for each county). Now, I need to generate a correlation coefficient for each pair of counties (in total, I will have 4,931,370 pairs: 3,141*3,140 / 2). Afterwards, I need to detect those counties with immigration stocks that are highly correlated. High correlation in my case refers to a positive correlation of 0.5 and above at 5% level of significance. I have tried several scenarios with -pwcorr- and -xcorr- commands, but it won't take me anywhere due to the following. First, I'm running into memory problems, since I'm using the SE version of Stata 15.1. Second, I will need to generate a new variable(s) indicating whether correlation coefficients are above 0.5, which again would give me "memory errors."
Here's an excerpt from my ORIGINAL (not reshaped) dataset, and the command that I have run to reshape it:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input float year double(state_county_code_1990 imm_stock) 1870 10010 .015000000341773576 1870 10030 .098 1870 10050 .10094674141131689 1870 10070 .013642883346648886 1870 10090 .006081119205802678 1870 10110 .05491357941209572 1870 10130 .0579797865152359 1870 10150 .05041185797285288 1870 10170 .04302016248682049 1870 10190 .017300943067763003 1870 10210 .018799061596393586 1870 10230 .026000000132214436 1870 10250 .03900000011988913 1870 10270 .005811569597135531 1870 10290 .007615265620523132 1870 10310 .0010011623820755631 1870 10330 .05905699549615384 1870 10350 .013584035953506827 1870 10370 .0030009013253147715 1870 10390 .000017211103811860084 1870 10410 .0032932473340188157 1870 10430 .00757321541942656 1870 10450 .0026323572993278504 1870 10470 .31306538736820216 1870 10490 .017295631527900694 1870 10510 .04100000000000001 1870 10530 .028415964488862726 1870 10550 .02841466610506177 1870 10570 .002 1870 10590 .011000000000000001 1870 10610 .0017494181394577027 1870 10630 .05700000057400152 1870 10650 .09286468018963934 1870 10670 .002873641550540924 1870 10690 .0027434206455945968 1870 10710 .020009471569674586 1870 10730 .02951887506339699 1870 10750 .008 1870 10770 .047999999999999994 1870 10790 .03094300425052643 1870 10810 .038985324926674364 1870 10830 .031 1870 10850 .0570013605214615 1870 10870 .03207139539450873 1870 10890 .19400000008748808 1870 10910 .086 1870 10930 .003 1870 10950 .009902857542037964 1870 10970 4.144895888964645 1870 10990 .020999999999999998 1870 11010 .7530976727786474 1870 11030 .07640065670013428 1870 11050 .07513531780242921 1870 11070 .022 1870 11090 .01673010419635102 1870 11110 .00897827250233968 1870 11130 .04495203609474993 1870 11150 .0065828300644643605 1870 11170 .06386433302302612 1870 11190 .07800000058050402 1870 11210 .061189926147460945 1870 11230 .00697234958410263 1870 11250 .2155970461186953 1870 11270 .0005691434796899558 1870 11290 .017104084895690902 1870 11310 .056 1870 11330 .0013931106328964233 1870 40010 .06772000819444657 1870 40030 .11148425900936128 1870 40050 .11261727261543275 1870 40070 .02894948261976242 1870 40090 .061296866863965994 1870 40110 .014875692646950484 1870 40120 .05443518477678299 1870 40130 .09173458960652352 1870 40150 .05572529421746731 1870 40170 .06012048554420472 1870 40190 .16472811722755432 1870 40210 .08248884278535844 1870 40230 .022195598095655442 1870 40250 .04906348389387131 1870 40270 .06656481522321701 1870 50010 .10013019472360611 1870 50030 .036000000000000004 1870 50050 .002128363482654095 1870 50070 .03094930160045624 1870 50090 .008 1870 50110 .0070616710782051085 1870 50130 .003 1870 50150 .006026492448174395 1870 50170 .08018668889999389 1870 50190 .00898306369781494 1870 50210 .004427015572786332 1870 50230 .003855020858347416 1870 50250 .030786989554762837 1870 50270 .010470308661460877 1870 50290 .019335971677326598 1870 50310 .005 1870 50330 .07560726058250292 1870 50350 .0196459524193433 end
Code:
greshape wide imm_stock, i(year) j(state_county_code_1990)
All comments and suggestions are highly appreciated!
Wolfgang.
Comment