Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Import indented text file

    Hello,
    I am trying to import the following indented text file. Below is the way I'd like the data to look like in stata eventually. It's very tricky because the file does not have any proper delimiters. Do you have any suggestions? Thanks!
    02 013 Aleutians East Borough T Ak 145 280
    02 016 Aleutians West Ak 22 15.17 32 11.43
    02 020 Anchorage Borough Ak 18 12.41 43 15.36
    Same State 11 7.59 29 10.36
    Same Region, Diff. State 82 56.55 151 53.93
    Different Region 12 8.28 25 8.93
    02 013 County Non-Migrants 416 1002
    Attached Files

  • #2

    02 020 Anchorage Borough
    Same State
    Same Region, Diff. State
    Different Region
    02 013 County Non-Migrants

    You could fix the misalignment in the first 2 columns resulting from observations not having the two sets of leading digits, but there is no straightforward way of knowing that the 280 belongs to the 7th column as opposed to the 6th or 8th. Unless the spaces are themselves informative, you will need manual input.
    Last edited by Andrew Musau; 26 Jul 2022, 00:40.

    Comment


    • #3
      Looking at the data closely, there may be scope to achieve what you want.

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input str70 v1
      "02 013   Aleutians East Borough T Ak       145              280"      
      "  02 016 Aleutians West           Ak        22  15.17        32  11.43"
      "  02 020 Anchorage Borough        Ak        18  12.41        43  15.36"
      "   Same State                               11   7.59        29  10.36"
      "   Same Region, Diff. State                 82  56.55       151  53.93"
      "   Different Region                         12   8.28        25   8.93"
      "      02 013   County Non-Migrants         416             1002"      
      "02 016   Aleutians West Total Mig Ak       304              535"      
      "  53 033 King                     Wa        41  13.49        61  11.40"
      "  02 020 Anchorage Borough        Ak        21   6.91        41   7.66"
      "  53 053 Pierce                   Wa        16   5.26        22   4.11"
      "   Same State                               23   7.57        47   8.79"
      "   Same Region, Diff. State                151  49.67       272  50.84"
      "   Different Region                         52  17.11        92  17.20"
      "      02 016   County Non-Migrants         991             2185"      
      "02 020   Anchorage Borough Total  Ak      9510            18970"      
      "  02 170 Matanuska-Susitna Boroug Ak       453   4.76       884   4.66"
      "  02 090 Fairbanks North Star Bor Ak       433   4.55       870   4.59"
      "  02 122 Kenai Peninsula Borough  Ak       373   3.92       703   3.71"
      "  53 033 King                     Wa       309   3.25       544   2.87"
      "  57 001 Foreign / Overseas       FR       274   2.88       648   3.42"
      "  04 013 Maricopa                 Az       151   1.59       288   1.52"
      "  06 037 Los Angeles              Ca       151   1.59       287   1.51"
      "  02 261 Valdez-Cordova           Ak       138   1.45       275   1.45"
      "  02 110 Juneau Borough           Ak       125   1.31       216   1.14"
      "  02 150 Kodiak Island Borough    Ak       117   1.23       221   1.16"
      "  53 053 Pierce                   Wa       114   1.20       250   1.32"
      "  48 201 Harris                   Tx        96   1.01       230   1.21"
      "  02 050 Bethel                   Ak        96   1.01       179    .94"
      "  02 180 Nome                     Ak        85    .89       182    .96"
      "  06 073 San Diego                Ca        84    .88       146    .77"
      "  15 003 Honolulu                 Hi        81    .85       144    .76"
      "  53 061 Snohomish                Wa        80    .84       155    .82"
      "  02 188 Northwest Arctic         Ak        75    .79       164    .86"
      "  41 051 Multnomah                Or        71    .75       102    .54"
      "  08 041 El Paso                  Co        63    .66       169    .89"
      "  06 067 Sacramento               Ca        56    .59       114    .60"
      "  53 063 Spokane                  Wa        55    .58       109    .57"
      "  32 003 Clark                    Nv        55    .58       129    .68"
      "  04 019 Pima                     Az        54    .57        97    .51"
      "  06 059 Orange                   Ca        51    .54        81    .43"
      "  41 067 Washington               Or        51    .54       100    .53"
      "  02 185 North Slope Borough      Ak        49    .52        91    .48"
      "  41 039 Lane                     Or        48    .50        83    .44"
      "  06 085 Santa Clara              Ca        47    .49        83    .44"
      "  17 031 Cook                     Il        45    .47        70    .37"
      "  02 016 Aleutians West           Ak        45    .47        95    .50"
      "  48 029 Bexar                    Tx        44    .46        98    .52"
      "  02 070 Dillingham               Ak        43    .45        76    .40"
      "  37 051 Cumberland               NC        43    .45       113    .60"
      "  12 091 Okaloosa                 Fl        41    .43       118    .62"
      "  06 071 San Bernardino           Ca        40    .42        87    .46"
      "  27 053 Hennepin                 Mn        40    .42        55    .29"
      "  02 290 Yukon-Koyukuk            Ak        37    .39        66    .35"
      "  06 029 Kern                     Ca        36    .38        77    .41"
      "  35 001 Bernalillo               NM        35    .37        99    .52"
      "  02 130 Ketchikan Gateway Boroug Ak        35    .37        75    .40"
      "  06 001 Alameda                  Ca        34    .36        55    .29"
      "  48 439 Tarrant                  Tx        34    .36        78    .41"
      "  06 065 Riverside                Ca        34    .36        79    .42"
      "  53 011 Clark                    Wa        33    .35        77    .41"
      "  49 035 Salt Lake                Ut        33    .35        60    .32"
      "  48 113 Dallas                   Tx        33    .35        53    .28"
      "  08 031 Denver                   Co        31    .33        50    .26"
      "  12 025 Dade                     Fl        30    .32        69    .36"
      "  53 067 Thurston                 Wa        30    .32        67    .35"
      "  06 075 San Francisco            Ca        29    .30        41    .22"
      "  02 240 Southeast Fairbanks      Ak        29    .30        53    .28"
      "  48 027 Bell                     Tx        28    .29        82    .43"
      "  02 220 Sitka Borough            Ak        28    .29        61    .32"
      "  12 005 Bay                      Fl        28    .29        69    .36"
      "  32 031 Washoe                   Nv        27    .28        47    .25"
      "  53 035 Kitsap                   Wa        27    .28        59    .31"
      "  06 053 Monterey                 Ca        27    .28        61    .32"
      "  08 059 Jefferson                Co        26    .27        47    .25"
      "  41 047 Marion                   Or        25    .26        48    .25"
      "  41 005 Clackamas                Or        25    .26        44    .23"
      "  53 073 Whatcom                  Wa        25    .26        32    .17"
      "  41 029 Jackson                  Or        24    .25        48    .25"
      "  40 109 Oklahoma                 Ok        24    .25        82    .43"
      "  12 057 Hillsborough             Fl        24    .25        43    .23"
      "  35 035 Otero                    NM        23    .24        60    .32"
      "  30 063 Missoula                 Mt        23    .24        38    .20"
      "  13 215 Muscogee                 Ga        23    .24        57    .30"
      "  06 095 Solano                   Ca        23    .24        50    .26"
      "  08 005 Arapahoe                 Co        23    .24        52    .27"
      "  12 103 Pinellas                 Fl        22    .23        29    .15"
      "  49 049 Utah                     Ut        22    .23        58    .31"
      "  16 001 Ada                      Id        22    .23        39    .21"
      "  25 017 Middlesex                Ma        22    .23        44    .23"
      "  21 093 Hardin                   Ky        22    .23        60    .32"
      "  48 453 Travis                   Tx        21    .22        33    .17"
      "  51 059 Fairfax                  Va        21    .22        56    .30"
      "  06 111 Ventura                  Ca        20    .21        42    .22"
      "  06 081 San Mateo                Ca        20    .21        37    .20"
      "  53 077 Yakima                   Wa        20    .21        26    .14"
      "  30 029 Flathead                 Mt        19    .20        36    .19"
      "  30 111 Yellowstone              Mt        19    .20        31    .16"
      "  26 163 Wayne                    Mi        19    .20        26    .14"
      "  15 001 Hawaii                   Hi        19    .20        32    .17"
      end
      
      replace v1= strtrim(stritrim(v1))
      replace v1= "0000 0000 " + v1 if !ustrregexm(v1, "(^\d+)")
      replace v1= ustrregexra(v1, "(1:|2:|3:|4:)", "r$1")
      replace v1= ustrregexra(v1, "(.*)\s(\d+)\s(\d+)$", "$1 $2 0000 $3 0000") if ustrregexm(v1, "(\d+\s\d+$)")
      replace v1= ustrregexra(v1, "([a-zA-Z\,\.\/])\s([a-zA-Z])", "$1_$2",.)
      replace v1= ustrregexra(v1, "([a-zA-Z :])\s([\/a-zA-Z])", "$1_$2",.)
      split v1, p("") g(var)
      keep var*
      gen state= substr(var3, -2, 2) if ustrregexm(var3, "(\_[A-Z][A-Za-z]$)"), after(var3)
      replace var3= ustrregexra(var3,"\_", " ")
      foreach var of varlist var1 var2 var5 var7{
      replace `var'="" if `var'=="0000"
      }
      Res.:

      Code:
      . l, sep(0)
      
          +----------------------------------------------------------------------------------+
           | var1   var2                          var3   state   var4    var5    var6    var7 |
           |----------------------------------------------------------------------------------|
        1. |   02    013   Aleutians East Borough T Ak      Ak    145             280         |
        2. |   02    016             Aleutians West Ak      Ak     22   15.17      32   11.43 |
        3. |   02    020          Anchorage Borough Ak      Ak     18   12.41      43   15.36 |
        4. |                                Same State             11    7.59      29   10.36 |
        5. |                  Same Region, Diff. State             82   56.55     151   53.93 |
        6. |                          Different Region             12    8.28      25    8.93 |
        7. |   02    013           County Non-Migrants            416            1002         |
        8. |   02    016   Aleutians West Total Mig Ak      Ak    304             535         |
        9. |   53    033                       King Wa      Wa     41   13.49      61   11.40 |
       10. |   02    020          Anchorage Borough Ak      Ak     21    6.91      41    7.66 |
       11. |   53    053                     Pierce Wa      Wa     16    5.26      22    4.11 |
       12. |                                Same State             23    7.57      47    8.79 |
       13. |                  Same Region, Diff. State            151   49.67     272   50.84 |
       14. |                          Different Region             52   17.11      92   17.20 |
       15. |   02    016           County Non-Migrants            991            2185         |
       16. |   02    020    Anchorage Borough Total Ak      Ak   9510           18970         |
       17. |   02    170   Matanuska-Susitna Boroug Ak      Ak    453    4.76     884    4.66 |
       18. |   02    090   Fairbanks North Star Bor Ak      Ak    433    4.55     870    4.59 |
       19. |   02    122    Kenai Peninsula Borough Ak      Ak    373    3.92     703    3.71 |
       20. |   53    033                       King Wa      Wa    309    3.25     544    2.87 |
       21. |   57    001         Foreign / Overseas FR      FR    274    2.88     648    3.42 |
       22. |   04    013                   Maricopa Az      Az    151    1.59     288    1.52 |
       23. |   06    037                Los Angeles Ca      Ca    151    1.59     287    1.51 |
       24. |   02    261             Valdez-Cordova Ak      Ak    138    1.45     275    1.45 |
       25. |   02    110             Juneau Borough Ak      Ak    125    1.31     216    1.14 |
       26. |   02    150      Kodiak Island Borough Ak      Ak    117    1.23     221    1.16 |
       27. |   53    053                     Pierce Wa      Wa    114    1.20     250    1.32 |
       28. |   48    201                     Harris Tx      Tx     96    1.01     230    1.21 |
       29. |   02    050                     Bethel Ak      Ak     96    1.01     179     .94 |
       30. |   02    180                       Nome Ak      Ak     85     .89     182     .96 |
       31. |   06    073                  San Diego Ca      Ca     84     .88     146     .77 |
       32. |   15    003                   Honolulu Hi      Hi     81     .85     144     .76 |
       33. |   53    061                  Snohomish Wa      Wa     80     .84     155     .82 |
       34. |   02    188           Northwest Arctic Ak      Ak     75     .79     164     .86 |
       35. |   41    051                  Multnomah Or      Or     71     .75     102     .54 |
       36. |   08    041                    El Paso Co      Co     63     .66     169     .89 |
       37. |   06    067                 Sacramento Ca      Ca     56     .59     114     .60 |
       38. |   53    063                    Spokane Wa      Wa     55     .58     109     .57 |
       39. |   32    003                      Clark Nv      Nv     55     .58     129     .68 |
       40. |   04    019                       Pima Az      Az     54     .57      97     .51 |
       41. |   06    059                     Orange Ca      Ca     51     .54      81     .43 |
       42. |   41    067                 Washington Or      Or     51     .54     100     .53 |
       43. |   02    185        North Slope Borough Ak      Ak     49     .52      91     .48 |
       44. |   41    039                       Lane Or      Or     48     .50      83     .44 |
       45. |   06    085                Santa Clara Ca      Ca     47     .49      83     .44 |
       46. |   17    031                       Cook Il      Il     45     .47      70     .37 |
       47. |   02    016             Aleutians West Ak      Ak     45     .47      95     .50 |
       48. |   48    029                      Bexar Tx      Tx     44     .46      98     .52 |
       49. |   02    070                 Dillingham Ak      Ak     43     .45      76     .40 |
       50. |   37    051                 Cumberland NC      NC     43     .45     113     .60 |
       51. |   12    091                   Okaloosa Fl      Fl     41     .43     118     .62 |
       52. |   06    071             San Bernardino Ca      Ca     40     .42      87     .46 |
       53. |   27    053                   Hennepin Mn      Mn     40     .42      55     .29 |
       54. |   02    290              Yukon-Koyukuk Ak      Ak     37     .39      66     .35 |
       55. |   06    029                       Kern Ca      Ca     36     .38      77     .41 |
       56. |   35    001                 Bernalillo NM      NM     35     .37      99     .52 |
       57. |   02    130   Ketchikan Gateway Boroug Ak      Ak     35     .37      75     .40 |
       58. |   06    001                    Alameda Ca      Ca     34     .36      55     .29 |
       59. |   48    439                    Tarrant Tx      Tx     34     .36      78     .41 |
       60. |   06    065                  Riverside Ca      Ca     34     .36      79     .42 |
       61. |   53    011                      Clark Wa      Wa     33     .35      77     .41 |
       62. |   49    035                  Salt Lake Ut      Ut     33     .35      60     .32 |
       63. |   48    113                     Dallas Tx      Tx     33     .35      53     .28 |
       64. |   08    031                     Denver Co      Co     31     .33      50     .26 |
       65. |   12    025                       Dade Fl      Fl     30     .32      69     .36 |
       66. |   53    067                   Thurston Wa      Wa     30     .32      67     .35 |
       67. |   06    075              San Francisco Ca      Ca     29     .30      41     .22 |
       68. |   02    240        Southeast Fairbanks Ak      Ak     29     .30      53     .28 |
       69. |   48    027                       Bell Tx      Tx     28     .29      82     .43 |
       70. |   02    220              Sitka Borough Ak      Ak     28     .29      61     .32 |
       71. |   12    005                        Bay Fl      Fl     28     .29      69     .36 |
       72. |   32    031                     Washoe Nv      Nv     27     .28      47     .25 |
       73. |   53    035                     Kitsap Wa      Wa     27     .28      59     .31 |
       74. |   06    053                   Monterey Ca      Ca     27     .28      61     .32 |
       75. |   08    059                  Jefferson Co      Co     26     .27      47     .25 |
       76. |   41    047                     Marion Or      Or     25     .26      48     .25 |
       77. |   41    005                  Clackamas Or      Or     25     .26      44     .23 |
       78. |   53    073                    Whatcom Wa      Wa     25     .26      32     .17 |
       79. |   41    029                    Jackson Or      Or     24     .25      48     .25 |
       80. |   40    109                   Oklahoma Ok      Ok     24     .25      82     .43 |
       81. |   12    057               Hillsborough Fl      Fl     24     .25      43     .23 |
       82. |   35    035                      Otero NM      NM     23     .24      60     .32 |
       83. |   30    063                   Missoula Mt      Mt     23     .24      38     .20 |
       84. |   13    215                   Muscogee Ga      Ga     23     .24      57     .30 |
       85. |   06    095                     Solano Ca      Ca     23     .24      50     .26 |
       86. |   08    005                   Arapahoe Co      Co     23     .24      52     .27 |
       87. |   12    103                   Pinellas Fl      Fl     22     .23      29     .15 |
       88. |   49    049                       Utah Ut      Ut     22     .23      58     .31 |
       89. |   16    001                        Ada Id      Id     22     .23      39     .21 |
       90. |   25    017                  Middlesex Ma      Ma     22     .23      44     .23 |
       91. |   21    093                     Hardin Ky      Ky     22     .23      60     .32 |
       92. |   48    453                     Travis Tx      Tx     21     .22      33     .17 |
       93. |   51    059                    Fairfax Va      Va     21     .22      56     .30 |
       94. |   06    111                    Ventura Ca      Ca     20     .21      42     .22 |
       95. |   06    081                  San Mateo Ca      Ca     20     .21      37     .20 |
       96. |   53    077                     Yakima Wa      Wa     20     .21      26     .14 |
       97. |   30    029                   Flathead Mt      Mt     19     .20      36     .19 |
       98. |   30    111                Yellowstone Mt      Mt     19     .20      31     .16 |
       99. |   26    163                      Wayne Mi      Mi     19     .20      26     .14 |
      100. |   15    001                     Hawaii Hi      Hi     19     .20      32     .17 |
           +----------------------------------------------------------------------------------+
      
      .
      Last edited by Andrew Musau; 26 Jul 2022, 02:15.

      Comment


      • #4
        Thanks, Andrew. That looks great. I would have never been able to that myself.
        Two minor issues:
        1. Would it be possible to remove the state from column 3?
        2. If you look at l.209 - 212, there is an issue because variable 3 should read 'Region 1: Northeast' and not 'Region'.
        Best,
        Christian

        Comment


        • #5
          Thanks, I was able to solve issue 2 and issue 1 is no longer relevant.

          Comment


          • #6
            Given the consistent "columns" of the .txt file, the below code might work. Still, caution and manual rechecking are needed.
            Code:
            infix str a 1-34 str b 35-36 c 45-47 d 48-59 e 60-64 f 65-71 using C9091aki.txt, clear
            
            replace a = strtrim(stritrim(a))
            
            split a if ustrregexm(a, "(^\d+)"), limit(2)
            replace a = substr(a, 8, .) if a1 != ""

            Comment

            Working...
            X