Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating new single-country dataset by extracting ISO country code from variable

    Hi,

    I would like to save a new dataset, eg USA.dta, by extracting from the variable 'bvd_id' the first two characters that refer to the ISO country code. In this example, 'US'. Ideally, I would like to automate this to capture all states that are in the master dataset.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str18 bvd_id_number str3 consolidation_code str21 filing_type
    "PT503338435"     "U1" "Local registry filing"
    "MA37163-81"      "U1" "Local registry filing"
    "JP1011001052846" "LF" "Local registry filing"
    "FI10958757"      "U1" "Local registry filing"
    "US115044924GN"   "LF" "Local registry filing"
    end

    Thanks,

    Ciaran

  • #2

    Code:
    gen wanted = substr(bvd_id_number, 1, 2) 
    levelsof wanted, clean
    Then if I understand correctly, you need to loop over those levels to save separate datasets.

    Comment


    • #3
      Hi Nick,

      Thanks - this worked perfectly. I'm now trying to create that loop based on some of your code from https://www.stata.com/support/faqs/d...-with-foreach/

      . levelsof wanted, local(levels) . foreach l of local levels { . save if wanted== `l' . }
      The error

      " . foreach l of local levels {
      foreach command may not result from a macro expansion interactively or in do files"

      is returned

      Comment


      • #4
        It looks as if you are putting separate commands on the same physical line.

        Your mileage may vary, but I've never wanted to use say 200 different datasets rather than 1.
        Last edited by Nick Cox; 02 Feb 2023, 06:30.

        Comment


        • #5
          Hi Nick,

          Thanks! Yes, it would be preferable to just use work witha single dataset. However, the master dataset is a .csv file of around 400gb, which I have split in parts using the chunky utility. I will then reconstruct countries in usable sizes using append.


          Apologies for my earlier formatting error:

          My code to extract country data now looks like:
          -----------

          levelsof wanted, local(levels)



          foreach US of local levels {

          save if wanted == `"US"'

          }


          -------------------------

          This produces the error:

          foreach US of local levels {
          2. save if wanted == `US'
          3. }

          invalid 'wanted'


          Suggestions? Thanks

          Comment


          • #6
            Code:
            save if wanted == "`US'"
            double quote
            left single quote
            macro name
            right (normal-looking) single quote
            double quote

            Comment


            • #7
              Hi

              with the corrected code


              ---------------------------------------------
              levelsof wanted, local(levels)

              foreach US of local levels {
              save if wanted == "`US'"
              }

              ----------------------------------------------
              I am still getting the error:


              foreach US of local levels {
              save if wanted == "`US'"
              }

              invalid 'wanted'

              Thanks,

              Ciaran

              Comment


              • #8
                Yes, sorry. save won't accept if qualifiers -- that was your bug but I didn't spot it. You need a keep statement before the save which is where the if qualifier belongs. Also at some point you need to read in the original file, or a subset, all over again.

                Unfortunately https://www.stata.com/support/faqs/d...-with-foreach/ doesn't explain everything you need to know.

                Comment


                • #9
                  This worked perfectly - thank you

                  Comment


                  • #10
                    #8 Please allow me to quote Joro Kolev's wish here:
                    Originally posted by Joro Kolev View Post
                    The -save- command is fairly primitive, all one can do is save the whole dataset at hand. It would be very helpful if -save- is made more multifunctional like - use [varlist] [if] [in] using filename- is. In particular I would like to be able to 1. Save only particular variables from my dataset. 2. Save only particular observations from my dataset, specified in the "if" and "in" conditions.

                    Comment


                    • #11
                      Hi,

                      The code I ended up using was based on your suggestions, plus the very helpful https://www.stata.com/support/faqs/d...-with-foreach/ support page.

                      Code:
                      
                      egen group = group(countrycode)
                      
                      
                      su group, meanonly 
                      
                      
                      
                      foreach i of num 1/`r(max)' {
                          
                          preserve
                          
                          keep if group == `i'
                          save `i'.dta
                          
                      restore    
                      }
                      
                      
                      clear

                      This code works perfectly to extract from a large dataset countrycodes (AU, NZ, GB...etc), but it saves them as file1.dta, file2.dta. This would be fine, except the number of countries changes each time, so I can't append them. Is there an addition I can make to

                      Code:
                      keep if group == `i'
                          save `i'.dta
                      where `i' would save as, the filename us.dta for example, that has been selected to not be dropped by the keep command.

                      Comment

                      Working...
                      X