Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to map HS 1992 to Naics 2017 (2-digit) data?

    Dear community,

    I'm struggling to find a concordance table to map my data in HS 92 classification (6 digits) to my Naics 2017 (2 digits) data. I already considered the tables, UNSD provides (https://unstats.un.org/unsd/classifi...esp-isic-other) however it was not helpful. The only thing I could do is to map the HS 92 data to ISIC3 data, then to ISIC 3.1 data, then to ISIC 4 and then to Naics, which seems unrealistic to me. Is there anything else that I can do or any Stata command available for problems like this? It is my first time working with sectoral trade data, so I am sorry if I'm overlooking something obvious.

    Best
    Noemi

  • #2
    The only thing I could do is to map the HS 92 data to ISIC3 data, then to ISIC 3.1 data, then to ISIC 4 and then to Naics, which seems unrealistic to me.
    I could be missing something since this is all very particular to this specific data, but I actually think the above might be the best strategy. Remember, you don't have to merge together every one of those datasets, you only need to merge the concordance tables together to get a final mapping. I also think you might run into situations where there is not a perfect correspondence sometimes, so definitely be on the lookout for that as you go.

    Comment


    • #3
      I'm waiting for a data transfer to complete, so I thought I'd try for a proof of concept. It is definitely not straightforward to merge the correspondence tables together and the result is a little messy. Many of the issues I ran into could probably be resolved with better data cleaning. In this proof of concept, I map like this:

      HS92 -> SITC3 -> SITC4 -> CPC2 -> ISIC4 -> NAICS 2017

      The issue with what you have above is that I don't see a table from HS 92 to ISIC3. After downloading all of the data I convert everything to .csv format, then to Stata native format with the following code:

      Code:
      cd "H:\Users\Daniel Schaefer\Coherence Tables"
      foreach f in "CPCv2_SITCr4.txt" "ISIC_4_to_2017_NAICS.csv" "ISIC4-CPC2.txt" ///
                   "Final S4 to S3r.csv" ///
                   "HS1992 to SITC3 Conversion and Correlation Tables.csv" {
          import delimited using "`f'", clear
          save "`f'.dta", replace
      }
      Then I do some data cleaning.

      Code:
      use "Final S4 to S3r.csv.dta", clear
      drop if v1 == "S4"
      rename v1 s4
      rename v3 s3
      drop if s4 == "I" | s4 == "II"
      drop v*
      replace s3 = subinstr(s3,".","",.)
      destring s3, replace
      save "Final S4 to S3r.csv.dta", replace
      
      use "CPCv2_SITCr4.txt.dta", clear
      rename sitc4code s4
      save "CPCv2_SITCr4.txt.dta", replace
      
      use "ISIC_4_to_2017_NAICS.csv.dta", clear
      replace isic40 = subinstr(isic40,",","",.)
      destring isic40, replace
      rename isic40 isic4code
      save "ISIC_4_to_2017_NAICS.csv.dta", replace
      Then finally I merge. These are many to many merges, but fortunately the number of categories doesn't seem to grow very quickly suggesting these are largely 1 to 1 matches with a few exceptions.

      Code:
      use "HS1992 to SITC3 Conversion and Correlation Tables.csv.dta", clear
      merge m:m s3 using "Final S4 to S3r.csv.dta", nogen
      merge m:m s4 using "CPCv2_SITCr4.txt.dta", nogen
      merge m:m cpc2code using "ISIC4-CPC2.txt.dta", nogen force
      merge m:m isic4code using "ISIC_4_to_2017_NAICS.csv.dta", nogen
      Here are the results of the merge commands.

      Code:
      . merge m:m s3 using "Final S4 to S3r.csv.dta", nogen
      
          Result                      Number of obs
          -----------------------------------------
          Not matched                             7
              from master                         2  
              from using                          5  
      
          Matched                             5,276  
          -----------------------------------------
      
      . merge m:m s4 using "CPCv2_SITCr4.txt.dta", nogen
      
          Result                      Number of obs
          -----------------------------------------
          Not matched                             6
              from master                         4  
              from using                          2  
      
          Matched                             5,496  
          -----------------------------------------
      
      . merge m:m cpc2code using "ISIC4-CPC2.txt.dta", nogen force
      (note: variable cpc2part was str3 in the using data, but will be byte now)
      
          Result                      Number of obs
          -----------------------------------------
          Not matched                         1,040
              from master                       128  
              from using                        912  
      
          Matched                             5,419  
          -----------------------------------------
      
      . merge m:m isic4code using "ISIC_4_to_2017_NAICS.csv.dta", nogen
      
          Result                      Number of obs
          -----------------------------------------
          Not matched                           340
              from master                       129  
              from using                        211  
      
          Matched                             6,518  
          -----------------------------------------
      
      .
      It's definitely not ideal but something like this (with careful data cleaning) might be your best bet.

      Comment

      Working...
      X