Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Strange format of county variable in QCEW data

    I'm working with QCEW data. Strangely the county variable is not something I have ever experienced with any administrative data from USA. I've given a format below.

    The data is like following when I use this command : dataex county if county> "C4978"

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str5 county
    "CS102"
    "CS102"
    "CS102"
    "CS102"
    "CS104"
    "CS104"
    "CS104"
    "CS104"
    "CS112"
    "CS112"
    "CS112"
    "CS112"
    "CS118"
    "CS118"
    "CS118"
    "CS118"
    "CS194"
    "CS194"
    "CS198"
    "CS198"
    "CS198"
    "CS198"
    "CS200"
    "CS200"
    "CS200"
    "CS200"
    end

    Would anyone care to explain how I can destring it? For all the counties it starts with C and for the counties after 51 starts with C and then S shows up. I don't know what that means.

    I tried to see why I can't destring the variable using the normal command by using the following set of commands.
    Code:
    replace county = subinstr(county, ",", "", .) 
    (0 real changes made)
    
    tab county if missing(real(county))
    
         county |      Freq.     Percent        Cum.
    ------------+-----------------------------------
          C1002 |          4        0.00        0.00
          C1010 |          4        0.00        0.00
          C1014 |          4        0.00        0.00
          C1018 |      1,216        0.16        0.16
          C1022 |          4        0.00        0.16
          C1026 |          4        0.00        0.16
          C1030 |          4        0.00        0.16
          C1038 |        984        0.13        0.29
          C1042 |      1,924        0.25        0.54
          C1046 |          4        0.00        0.54
          C1050 |      1,024        0.13        0.67
          C1054 |          4        0.00        0.67
          C1058 |      2,924        0.38        1.05
          C1062 |          4        0.00        1.05
          C1066 |          4        0.00        1.05
          C1070 |          4        0.00        1.05
          C1074 |      2,332        0.30        1.36
          C4974 |      1,288        0.17       98.01
          C4978 |          4        0.00       98.01
          CS102 |          4        0.00       98.01
          CS104 |          4        0.00       98.01
          CS112 |          4        0.00       98.01
          CS118 |          4        0.00       98.01
          CS120 |          4        0.00       98.01
          CS122 |          4        0.00       98.01
          CS132 |          4        0.00       98.01
          CS138 |          4        0.00       98.02
          CS140 |          4        0.00       98.02
          CS142 |          4        0.00       98.02
          CS148 |          4        0.00       98.02
          CS154 |          4        0.00       98.02
          CS160 |          4        0.00       98.02
          CS164 |          4        0.00       98.02
          CS172 |          4        0.00       98.02
          CS174 |          4        0.00       98.02
          CS176 |          4        0.00       98.02
          CS178 |          4        0.00       98.02
          CS180 |          4        0.00       98.02
          CS184 |          4        0.00       98.02
          CS188 |          4        0.00       98.02
          CS192 |          4        0.00       98.02
          CS194 |          4        0.00       98.02
          CS198 |          4        0.00       98.02
          CS200 |          4        0.00       98.02
          CS202 |          4        0.00       98.02
          CS204 |          4        0.00       98.02
          CS206 |          4        0.00       98.03
          CS212 |          4        0.00       98.03
          CS214 |          4        0.00       98.03
          CS216 |          4        0.00       98.03
          CS218 |          4        0.00       98.03
          CS220 |          4        0.00       98.03
          CS222 |          4        0.00       98.03
          CS232 |          4        0.00       98.03
          CS242 |          4        0.00       98.03
          CS244 |          4        0.00       98.03
          CS548 |          4        0.00       98.07
          CS554 |          4        0.00       98.07
          CS556 |          4        0.00       98.07
          CS558 |          4        0.00       98.07
          CS564 |          4        0.00       98.07
          CS566 |          4        0.00       98.07
          US000 |     14,820        1.92      100.00
          USCMS |          4        0.00      100.00
          USMSA |          4        0.00      100.00
          USNMS |          4        0.00      100.00
    ------------+-----------------------------------
          Total |    770,412      100.00

  • #2
    Why do you want to destring it? If you need a numeric version, use encode instead. In any case, ignoring non-numeric characters is likely to give spurious duplicates or even missings. Even ignoring C and S isn't the answer as you have some identifiers that have no numeric characters whatsoever. .

    As the original author of destring, I find it a little melancholy to see people wanting to use it when it likely isn't a good idea.

    Comment


    • #3
      Your problem is that you have not read the documentation for the data you are using. When you see something you don't understand in your data, start by reading the documentation.

      I've never before encountered the QCEW, but brief searching took me to the main website

      https://www.bls.gov/cew/

      from which I clicked the QCEW Data dropdown tab and on the the link to Databases

      https://www.bls.gov/cew/data.htm

      from which I clicked the link to Guide to Downloadable Data Files

      https://www.bls.gov/cew/about-data/data-files-guide.htm

      from which I clicked the link in the Documentation Access paragraph to the Data File Documentation Guide page

      https://www.bls.gov/cew/about-data/d...tion-guide.htm

      from which I opened the QCEW Area Titles page at

      https://www.bls.gov/cew/classificati...rea-titles.htm

      which explains what you are seeing in your data. Scroll down to the bottom past the usual state-and-county numbers.

      Comment


      • #4
        Originally posted by Nick Cox View Post
        Why do you want to destring it? If you need a numeric version, use encode instead. In any case, ignoring non-numeric characters is likely to give spurious duplicates or even missings. Even ignoring C and S isn't the answer as you have some identifiers that have no numeric characters whatsoever. .

        As the original author of destring, I find it a little melancholy to see people wanting to use it when it likely isn't a good idea.
        Thanks a ton, Mr. Cox! I'll try as you kindly suggested. Hopefully, will do to best of my capability from next time! Thanks so much for your time!

        Comment


        • #5
          Originally posted by William Lisowski View Post
          Your problem is that you have not read the documentation for the data you are using. When you see something you don't understand in your data, start by reading the documentation.

          I've never before encountered the QCEW, but brief searching took me to the main website

          https://www.bls.gov/cew/

          from which I clicked the QCEW Data dropdown tab and on the the link to Databases

          https://www.bls.gov/cew/data.htm

          from which I clicked the link to Guide to Downloadable Data Files

          https://www.bls.gov/cew/about-data/data-files-guide.htm

          from which I clicked the link in the Documentation Access paragraph to the Data File Documentation Guide page

          https://www.bls.gov/cew/about-data/d...tion-guide.htm

          from which I opened the QCEW Area Titles page at

          https://www.bls.gov/cew/classificati...rea-titles.htm

          which explains what you are seeing in your data. Scroll down to the bottom past the usual state-and-county numbers.
          Mr. Lisowski,

          Much obliged for the kind guidance! It was a mistake on my part and should have been careful before posting the problem. Will go through this again and will do my best to correct my mistake. Thanks again for your time. Really appreciate it!

          Comment

          Working...
          X