Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Destring randomly generating missing values

    Hello,

    I am trying to destring about 8,000 observations of a variable that has all numeric characters in it. Before the destring, msa_code (variable of interest) is type str5. When I use "destring," the output from Stata generates 3 missing variables, but each time I run this code, the 3 missing variables are different observations. There are also no missing variables in msa_code before the destring. Can someone help explain to me why this is happening and how to solve it?

    My data looks like this after the destring:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long msa_code str78 msa int year float realGDP_avg
        . "Cape Girardeau, MO-IL (Metropolitan Statistical Area)"                2001   5282259
        . "Mount Vernon-Anacortes, WA (Metropolitan Statistical Area)"           2002   5388803
    39100 "Poughkeepsie-Newburgh-Middletown, NY (Metropolitan Statistical Area)" 2003  22057436
    44220 "Dayton-Kettering, OH (Metropolitan Statistical Area)"                 2004  22873324
    44220 "Dayton-Kettering, OH (Metropolitan Statistical Area)"                 2005  23237464
    44220 "Dayton-Kettering, OH (Metropolitan Statistical Area)"                 2006  23199988
    39100 "Poughkeepsie-Newburgh-Middletown, NY (Metropolitan Statistical Area)" 2007  23066732
    39150 "Prescott Valley-Prescott, AZ (Metropolitan Statistical Area)"         2008  23178782
    39100 "Poughkeepsie-Newburgh-Middletown, NY (Metropolitan Statistical Area)" 2009  22353260
    39100 "Poughkeepsie-Newburgh-Middletown, NY (Metropolitan Statistical Area)" 2010  22895242
        . "Seattle-Tacoma-Bremerton, WA (C)"                                        .         .
       40 "Abilene, TX (Metropolitan Statistical Area)"                          2001   4922832
       40 "Abilene, TX (Metropolitan Statistical Area)"                          2002   4979332
      120 "Albany, GA (Metropolitan Statistical Area)"                           2001   5500407
      120 "Albany, GA (Metropolitan Statistical Area)"                           2002   5563164
      160 "Albany-Schenectady-Troy, NY (Metropolitan Statistical Area)"          2001  43695244
      160 "Albany-Schenectady-Troy, NY (Metropolitan Statistical Area)"          2002  44491260
      200 "Albuquerque, NM (Metropolitan Statistical Area)"                      2001  30106354
      200 "Albuquerque, NM (Metropolitan Statistical Area)"                      2002  30593344
      220 "Alexandria, LA (Metropolitan Statistical Area)"                       2001   4710211
      220 "Alexandria, LA (Metropolitan Statistical Area)"                       2002   4886542
      240 "Allentown-Bethlehem-Easton, PA-NJ (Metropolitan Statistical Area)"    2002  42711608
      280 "Altoona, PA (Metropolitan Statistical Area)"                          2001   4668334
      280 "Altoona, PA (Metropolitan Statistical Area)"                          2002   4666244
      320 "Amarillo, TX (Metropolitan Statistical Area)"                         2001   9749376
      320 "Amarillo, TX (Metropolitan Statistical Area)"                         2002   9944163
      380 "Anchorage, AK (Metropolitan Statistical Area)"                        2001  17737364
      380 "Anchorage, AK (Metropolitan Statistical Area)"                        2002  18065608
      450 "Anniston-Oxford, AL (Metropolitan Statistical Area)"                  2001   3487094
      450 "Anniston-Oxford, AL (Metropolitan Statistical Area)"                  2002   3585435
      460 "Appleton, WI (Metropolitan Statistical Area)"                         2001   8766341
      460 "Oshkosh-Neenah, WI (Metropolitan Statistical Area)"                   2002   9076860
      480 "Asheville, NC (Metropolitan Statistical Area)"                        2001  13425564
      480 "Asheville, NC (Metropolitan Statistical Area)"                        2002  13811646
      500 "Athens-Clarke County, GA (Metropolitan Statistical Area)"             2001   6632994
      500 "Athens-Clarke County, GA (Metropolitan Statistical Area)"             2002   6711134
      520 "Atlanta-Sandy Springs-Alpharetta, GA (Metropolitan Statistical Area)" 2001 254139984
      520 "Atlanta-Sandy Springs-Alpharetta, GA (Metropolitan Statistical Area)" 2002 257216464
      580 "Auburn-Opelika, AL (Metropolitan Statistical Area)"                   2001   2959582
      580 "Auburn-Opelika, AL (Metropolitan Statistical Area)"                   2002   3114486
      600 "Augusta-Richmond County, GA-SC (Metropolitan Statistical Area)"       2001  19672896
      600 "Augusta-Richmond County, GA-SC (Metropolitan Statistical Area)"       2002  19913300
      640 "Austin-Round Rock-Georgetown, TX (Metropolitan Statistical Area)"     2001  61468832
      640 "Austin-Round Rock-Georgetown, TX (Metropolitan Statistical Area)"     2002  63385296
      680 "Bakersfield, CA (Metropolitan Statistical Area)"                      2001  27651932
      680 "Bakersfield, CA (Metropolitan Statistical Area)"                      2002  31074712
      730 "Bangor, ME (Metropolitan Statistical Area)"                           2001   5628410
      730 "Bangor, ME (Metropolitan Statistical Area)"                           2002   5795993
      740 "Barnstable Town, MA (Metropolitan Statistical Area)"                  2001  11078518
      740 "Barnstable Town, MA (Metropolitan Statistical Area)"                  2002  11185396
      760 "Baton Rouge, LA (Metropolitan Statistical Area)"                      2001  37594352
      760 "Baton Rouge, LA (Metropolitan Statistical Area)"                      2002  40288516
      840 "Beaumont-Port Arthur, TX (Metropolitan Statistical Area)"             2001  23700504
      840 "Beaumont-Port Arthur, TX (Metropolitan Statistical Area)"             2002  22677152
      860 "Bellingham, WA (Metropolitan Statistical Area)"                       2001   8005248
      860 "Bellingham, WA (Metropolitan Statistical Area)"                       2002   8535873
      870 "Niles, MI (Metropolitan Statistical Area)"                            2001   6330695
      870 "Niles, MI (Metropolitan Statistical Area)"                            2002   6453286
      880 "Billings, MT (Metropolitan Statistical Area)"                         2001   6940033
      880 "Billings, MT (Metropolitan Statistical Area)"                         2002   7038226
      920 "Gulfport-Biloxi, MS (Metropolitan Statistical Area)"                  2001  14598994
      920 "Gulfport-Biloxi, MS (Metropolitan Statistical Area)"                  2002  14679242
      960 "Binghamton, NY (Metropolitan Statistical Area)"                       2001   8416136
      960 "Binghamton, NY (Metropolitan Statistical Area)"                       2002   8392930
     1000 "Birmingham-Hoover, AL (Metropolitan Statistical Area)"                2001  46045708
     1000 "Birmingham-Hoover, AL (Metropolitan Statistical Area)"                2002  47574816
    10020 "Lafayette, LA (Metropolitan Statistical Area)"                        2005  19618180
     1010 "Bismarck, ND (Metropolitan Statistical Area)"                         2001   4133824
     1010 "Bismarck, ND (Metropolitan Statistical Area)"                         2002   4248029
    10180 "Abilene, TX (Metropolitan Statistical Area)"                          2003   5058616
    10180 "Abilene, TX (Metropolitan Statistical Area)"                          2004   5174625
    10180 "Abilene, TX (Metropolitan Statistical Area)"                          2005   5295114
    10180 "Abilene, TX (Metropolitan Statistical Area)"                          2006   5724189
    10180 "Abilene, TX (Metropolitan Statistical Area)"                          2007   5898491
    10180 "Abilene, TX (Metropolitan Statistical Area)"                          2008   5973018
    10180 "Abilene, TX (Metropolitan Statistical Area)"                          2009   5821603
    10180 "Abilene, TX (Metropolitan Statistical Area)"                          2010   6010804
    10180 "Abilene, TX (Metropolitan Statistical Area)"                          2011   6061340
    10180 "Abilene, TX (Metropolitan Statistical Area)"                          2012   6222396
    10180 "Abilene, TX (Metropolitan Statistical Area)"                          2013   6253651
    10180 "Abilene, TX (Metropolitan Statistical Area)"                          2014   6335952
    10180 "Abilene, TX (Metropolitan Statistical Area)"                          2015   6215814
    10180 "Abilene, TX (Metropolitan Statistical Area)"                          2016   6113607
    10180 "Abilene, TX (Metropolitan Statistical Area)"                          2017   6253877
    10180 "Abilene, TX (Metropolitan Statistical Area)"                          2018   6556129
    10180 "Abilene, TX (Metropolitan Statistical Area)"                          2019   6794629
    10180 "Abilene, TX (Metropolitan Statistical Area)"                          2020   7068659
    10180 "Abilene, TX (Metropolitan Statistical Area)"                          2021   7312264
     1020 "Bloomington, IN (Metropolitan Statistical Area)"                      2001   4918594
     1020 "Bloomington, IN (Metropolitan Statistical Area)"                      2002   5048856
     1040 "Bloomington, IL (Metropolitan Statistical Area)"                      2001   8680940
     1040 "Bloomington, IL (Metropolitan Statistical Area)"                      2002   8923939
    10420 "Akron, OH (Metropolitan Statistical Area)"                            2003  30443224
    10420 "Akron, OH (Metropolitan Statistical Area)"                            2004  31100064
    10420 "Akron, OH (Metropolitan Statistical Area)"                            2005  31921116
    10420 "Akron, OH (Metropolitan Statistical Area)"                            2006  31636804
    10420 "Akron, OH (Metropolitan Statistical Area)"                            2007  32168886
    10420 "Akron, OH (Metropolitan Statistical Area)"                            2008  31729364
    10420 "Akron, OH (Metropolitan Statistical Area)"                            2009  29800948
    10420 "Akron, OH (Metropolitan Statistical Area)"                            2010  30541932
    end
    In the sample, you can see the three that were randomly generated to missing. None of them were unlike the other msa_code observations in format.

    My exact destring code is: destring msa_code, replace

    Thank you!

  • #2
    We need to see what the values were before the destring.

    Comment


    • #3
      To expand on #2:

      There are two stories in competition here. One is that there is a weird bug in destring which you have unearthed and are the first to do so. The other is that there is a misunderstanding about your data. I would put more money on the second.

      I have skin in the game as the original author of destring, but more crucially it's been folded into official Stata for a long while and banged on thousands of times.

      The story of three missing values that surprise you is easy to believe. The idea that they occur in different observations is harder to swallow.

      My guess is that you have stray non-numeric characters such as letter O not zero and letter l not 1, which are why this variable that should be numeric was string in the first place. How were the data imported and from where?

      Comment


      • #4
        Code:
        destring msa_code, replace
        obliterates the evidence we need. With the original string msa_code you need something more like

        Code:
        destring msa_code, gen(msa_code2) 
        
        tab msa_code if missing(msa_code2) 
        
        ssc inst chartab 
        
        chartab msa_code if missing(msa_code2)

        Comment

        Working...
        X