Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to simplify this code?

    I'd like to simplify the following code:
    gen prefid=00 if prefecture=="all"
    gen countyid=00 if county=="all"
    replace prefid=99 if prefecture=="high"
    replace countyid=99 if county=="high"

    replace prefid=01 if provid==11 & prefecture=="firstintcourt" & county=="firstintcourt"
    replace countyid=99 if provid==11 & prefecture=="firstintcourt" & county=="firstintcourt"

    replace prefid=01 if provid==11 & prefecture=="firstintcourt" & county=="changpingqu"
    replace countyid=01 if provid==11 & prefecture=="firstintcourt" & county=="changpingqu"

    replace prefid=01 if provid==11 & prefecture=="firstintcourt" & county=="haidianqu"
    replace countyid=08 if provid==11 & prefecture=="firstintcourt" & county=="haidianqu"

    replace prefid=01 if provid==11 & prefecture=="firstintcourt" & county=="mentougouqu"
    replace countyid=09 if provid==11 & prefecture=="firstintcourt" & county=="mentougouqu"

    replace prefid=01 if provid==11 & prefecture=="firstintcourt" & county=="shijingshanqu"
    replace countyid=07 if provid==11 & prefecture=="firstintcourt" & county=="shijingshanqu"

    replace prefid=02 if provid==11 & prefecture=="firstintcourt" & county=="yanqingqu" & year<=2015
    replace countyid=29 if provid==11 & prefecture=="firstintcourt" & county=="yanqingqu" & year<=2015
    replace prefid=01 if provid==11 & prefecture=="firstintcourt" & county=="yanqingqu" & year>=2016
    replace countyid=19 if provid==11 & prefecture=="firstintcourt" & county=="yanqingqu" & year>=2016

    replace prefid=01 if provid==11 & prefecture=="secondintcourt" & county=="daxingqu"
    replace countyid=15 if provid==11 & prefecture=="secondintcourt" & county=="daxingqu"

    replace prefid=01 if provid==11 & prefecture=="secondintcourt" & county=="fangshanqu"
    replace countyid=11 if provid==11 & prefecture=="secondintcourt" & county=="fangshanqu"

    replace prefid=01 if provid==11 & prefecture=="secondintcourt" & county=="fengtaiqu"
    replace countyid=06 if provid==11 & prefecture=="secondintcourt" & county=="fengtaiqu"

    replace prefid=01 if provid==11 & prefecture=="secondintcourt" & county=="secondintcourt"
    replace countyid=98 if provid==11 & prefecture=="secondintcourt" & county=="secondintcourt"

    The ChatGPT first advised the following:

    local lookup "11 firstintcourt firstintcourt 01 99" ///
    "11 firstintcourt changpingqu 01 01" ///
    "11 firstintcourt haidianqu 01 08" ///
    "11 firstintcourt mentougouqu 01 09" ///
    "11 firstintcourt shijingshanqu 01 07" ///
    "11 firstintcourt yanqingqu 02 29 if year<=2015" ///
    "11 firstintcourt yanqingqu 01 19 if year>=2016" ///
    "11 secondintcourt daxingqu 01 15" ///
    "11 secondintcourt fangshanqu 01 11" ///
    "11 secondintcourt fengtaiqu 01 06" ///
    "11 secondintcourt secondintcourt 01 98"

    // generate prefid and countyid using lookup table
    gen prefid = ""
    gen countyid = ""
    foreach row of local lookup {
    tokenize `row'
    if provid == `1' & prefecture == "`2'" & county == "`3'" {
    replace prefid = `4' if inlist(prefecture, "all", "high", "`2'")
    replace countyid = `5' if inlist(county, "all", "high", "`3'")
    }
    }

    I got the following error:
    . gen prefid = ""
    (5,049 missing values generated)

    . gen countyid = ""
    (5,049 missing values generated)

    . foreach row of local lookup {
    2. tokenize `row'
    3. if provid == `1' & prefecture == "`2'" & county == "`3'" {
    4. replace prefid = `4' if inlist(prefecture, "all", "high", "`2'")
    5. replace countyid = `5' if inlist(county, "all", "high", "`3'")
    6. }
    7. }
    firstintcourt not found
    r(111);

    end of do-file

    r(111);

    Then, the ChatGPT advised a second revised code:
    // define lookup table as a matrix
    matrix lookup = (11 "firstintcourt" "firstintcourt" 01 99 \
    11 "firstintcourt" "changpingqu" 01 01 \
    11 "firstintcourt" "haidianqu" 01 08 \
    11 "firstintcourt" "mentougouqu" 01 09 \
    11 "firstintcourt" "shijingshanqu" 01 07 \
    11 "firstintcourt" "yanqingqu" 02 29 if year<=2015 \
    11 "firstintcourt" "yanqingqu" 01 19 if year>=2016 \
    11 "secondintcourt" "daxingqu" 01 15 \
    11 "secondintcourt" "fangshanqu" 01 11 \
    11 "secondintcourt" "fengtaiqu" 01 06 \
    11 "secondintcourt" "secondintcourt" 01 98)

    // generate prefid and countyid using lookup table
    gen prefid = ""
    gen countyid = ""
    forvalues i = 1/rows(lookup) {
    if provid == lookup[`i',1] & prefecture == lookup[`i',2] & county == lookup[`i',3] {
    replace prefid = lookup[`i',4] if inlist(prefecture, "all", "high", lookup[`i',2])
    replace countyid = lookup[`i',5] if inlist(county, "all", "high", lookup[`i',3])
    }
    }


    However, running this code generates the following error:
    . matrix lookup = (11 "firstintcourt" "firstintcourt" 01 99 \
    too few ')' or ']'
    r(132);

    end of do-file

    r(132);

    Thank you very much!
    Linghui

  • #2
    Please provide a data example using dataex.

    Comment


    • #3
      My focus is giving each county in each prefecture of each province a numerical id. It does not have anything to do with the data itself.
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input int year byte provid str12 province str16 prefecture str15 county double greexpsec
      2007 11 "beijing" "all"           "all"                 .
      2008 11 "beijing" "all"           "all"                 .
      2009 11 "beijing" "all"           "all"                 .
      2010 11 "beijing" "all"           "all"                 .
      2011 11 "beijing" "all"           "all"                 .
      2012 11 "beijing" "all"           "all"                 .
      2013 11 "beijing" "all"           "all"                 .
      2014 11 "beijing" "all"           "all"                 .
      2015 11 "beijing" "all"           "all"                 .
      2016 11 "beijing" "all"           "all"                 .
      2017 11 "beijing" "all"           "all"                 .
      2018 11 "beijing" "all"           "all"                 .
      2019 11 "beijing" "all"           "all"                 .
      2020 11 "beijing" "all"           "all"           5148516
      2021 11 "beijing" "all"           "all"           5054055
      2007 11 "beijing" "firstintcourt" "changpingqu"         .
      2008 11 "beijing" "firstintcourt" "changpingqu"         .
      2009 11 "beijing" "firstintcourt" "changpingqu"         .
      2010 11 "beijing" "firstintcourt" "changpingqu"         .
      2011 11 "beijing" "firstintcourt" "changpingqu"         .
      2012 11 "beijing" "firstintcourt" "changpingqu"         .
      2013 11 "beijing" "firstintcourt" "changpingqu"         .
      2014 11 "beijing" "firstintcourt" "changpingqu"         .
      2015 11 "beijing" "firstintcourt" "changpingqu"         .
      2016 11 "beijing" "firstintcourt" "changpingqu"         .
      2017 11 "beijing" "firstintcourt" "changpingqu"         .
      2018 11 "beijing" "firstintcourt" "changpingqu"         .
      2019 11 "beijing" "firstintcourt" "changpingqu"         .
      2020 11 "beijing" "firstintcourt" "changpingqu"         .
      2021 11 "beijing" "firstintcourt" "changpingqu"         .
      2022 11 "beijing" "firstintcourt" "changpingqu"         .
      2007 11 "beijing" "firstintcourt" "firstintcourt"       .
      2008 11 "beijing" "firstintcourt" "firstintcourt"       .
      2009 11 "beijing" "firstintcourt" "firstintcourt"       .
      2010 11 "beijing" "firstintcourt" "firstintcourt"       .
      2011 11 "beijing" "firstintcourt" "firstintcourt"       .
      2012 11 "beijing" "firstintcourt" "firstintcourt"       .
      2013 11 "beijing" "firstintcourt" "firstintcourt"       .
      2014 11 "beijing" "firstintcourt" "firstintcourt"       .
      2015 11 "beijing" "firstintcourt" "firstintcourt"       .
      2016 11 "beijing" "firstintcourt" "firstintcourt"       .
      2017 11 "beijing" "firstintcourt" "firstintcourt"       .
      2018 11 "beijing" "firstintcourt" "firstintcourt"       .
      2019 11 "beijing" "firstintcourt" "firstintcourt"       .
      2020 11 "beijing" "firstintcourt" "firstintcourt"       .
      2021 11 "beijing" "firstintcourt" "firstintcourt"       .
      2022 11 "beijing" "firstintcourt" "firstintcourt"       .
      2007 11 "beijing" "firstintcourt" "haidianqu"           .
      2008 11 "beijing" "firstintcourt" "haidianqu"           .
      2009 11 "beijing" "firstintcourt" "haidianqu"           .
      2010 11 "beijing" "firstintcourt" "haidianqu"           .
      2011 11 "beijing" "firstintcourt" "haidianqu"           .
      2012 11 "beijing" "firstintcourt" "haidianqu"           .
      2013 11 "beijing" "firstintcourt" "haidianqu"           .
      2014 11 "beijing" "firstintcourt" "haidianqu"           .
      2015 11 "beijing" "firstintcourt" "haidianqu"           .
      2016 11 "beijing" "firstintcourt" "haidianqu"           .
      2017 11 "beijing" "firstintcourt" "haidianqu"           .
      2018 11 "beijing" "firstintcourt" "haidianqu"           .
      2019 11 "beijing" "firstintcourt" "haidianqu"           .
      2020 11 "beijing" "firstintcourt" "haidianqu"           .
      2021 11 "beijing" "firstintcourt" "haidianqu"           .
      2022 11 "beijing" "firstintcourt" "haidianqu"           .
      2007 11 "beijing" "firstintcourt" "mentougouqu"         .
      2008 11 "beijing" "firstintcourt" "mentougouqu"         .
      2009 11 "beijing" "firstintcourt" "mentougouqu"         .
      2010 11 "beijing" "firstintcourt" "mentougouqu"         .
      2011 11 "beijing" "firstintcourt" "mentougouqu"         .
      2012 11 "beijing" "firstintcourt" "mentougouqu"         .
      2013 11 "beijing" "firstintcourt" "mentougouqu"         .
      2014 11 "beijing" "firstintcourt" "mentougouqu"         .
      2015 11 "beijing" "firstintcourt" "mentougouqu"         .
      2016 11 "beijing" "firstintcourt" "mentougouqu"         .
      2017 11 "beijing" "firstintcourt" "mentougouqu"         .
      2018 11 "beijing" "firstintcourt" "mentougouqu"         .
      2019 11 "beijing" "firstintcourt" "mentougouqu"         .
      2020 11 "beijing" "firstintcourt" "mentougouqu"         .
      2021 11 "beijing" "firstintcourt" "mentougouqu"         .
      2022 11 "beijing" "firstintcourt" "mentougouqu"         .
      2007 11 "beijing" "firstintcourt" "shijingshanqu"       .
      2008 11 "beijing" "firstintcourt" "shijingshanqu"       .
      2009 11 "beijing" "firstintcourt" "shijingshanqu"       .
      2010 11 "beijing" "firstintcourt" "shijingshanqu"       .
      2011 11 "beijing" "firstintcourt" "shijingshanqu"       .
      2012 11 "beijing" "firstintcourt" "shijingshanqu"       .
      2013 11 "beijing" "firstintcourt" "shijingshanqu"       .
      2014 11 "beijing" "firstintcourt" "shijingshanqu"       .
      2015 11 "beijing" "firstintcourt" "shijingshanqu"       .
      2016 11 "beijing" "firstintcourt" "shijingshanqu"       .
      2017 11 "beijing" "firstintcourt" "shijingshanqu"       .
      2018 11 "beijing" "firstintcourt" "shijingshanqu"       .
      2019 11 "beijing" "firstintcourt" "shijingshanqu"       .
      2020 11 "beijing" "firstintcourt" "shijingshanqu"       .
      2021 11 "beijing" "firstintcourt" "shijingshanqu"       .
      2022 11 "beijing" "firstintcourt" "shijingshanqu"       .
      2007 11 "beijing" "firstintcourt" "yanqingqu"           .
      2008 11 "beijing" "firstintcourt" "yanqingqu"           .
      2009 11 "beijing" "firstintcourt" "yanqingqu"           .
      2010 11 "beijing" "firstintcourt" "yanqingqu"           .
      2011 11 "beijing" "firstintcourt" "yanqingqu"           .
      end

      Comment


      • #4
        As Jared suggested, having the data example is helpful. One interpretation of what you ask for in #3 is "Create a distinct numerical id for every combination of county, prefecture, and province found in the data set." If that's correct, you might start by using the -group()- function of -egen-. The following is to show some technique that you might try, with alterations as relevant:
        Code:
        egen int prefid = group(provid prefecture county), label lname(prefidLbl)
        label list prefidLbl // just to help you check what's being done here
        prefidLbl:
                   1 11 all all
                   2 11 firstintcourt changpingqu
                   3 11 firstintcourt firstintcourt
                   4 11 firstintcourt haidianqu
                   5 11 firstintcourt mentougouqu
                   6 11 firstintcourt shijingshanqu
                   7 11 firstintcourt yanqingqu
        
        tab prefid  // Does this look OK?
        
               group(provid prefecture |
                               county) |      Freq.     Percent        Cum.
        -------------------------------+-----------------------------------
                            11 all all |         15       15.00       15.00
          11 firstintcourt changpingqu |         16       16.00       31.00
        11 firstintcourt firstintcourt |         16       16.00       47.00
            11 firstintcourt haidianqu |         16       16.00       63.00
          11 firstintcourt mentougouqu |         16       16.00       79.00
        11 firstintcourt shijingshanqu |         16       16.00       95.00
            11 firstintcourt yanqingqu |          5        5.00      100.00
        -------------------------------+-----------------------------------
                                 Total |        100      100.00
        You might need to use a few -replace- commands to account for any differences associated with the year, per above, such as:
        Code:
        replace ppcnum =110 if (ppcnum == 11) & (year<=2015)
        There are other ways to shorten your code shown above, but I suspect that using -group- will be less error prone than those.

        From your code above, your "countyid" appears not to depend on prefecture or provid. While -egen, ... group()- could also be used to create that variable, the simpler command of -encode- might accomplish what you want:
        Code:
        encode county, gen(countyid)
        Again, details involving year might be relevant.


        A final note: I have presumed that the particular numerical values used for prefid and countyid don't matter to you. I can understand that they might, such as if there are standard numerical identifiers used for those local areas (as there are in analogous situations in the U.S.) If that's true, then a different kind of solution is possible.

        A side note: Using 01 or 02 for your numerical values is ok, but they're the same as 1 and 2 to Stata. String values like "01" and "02" are, however, different than "1" and "2." While not an issue here, that detail could cause a problem in other contexts.

        Comment


        • #5
          Thank you very much, Mike!

          I should have been clearer that I do need to use the standard numerical identifier.

          Comment


          • #6
            For the case in which you need to assign particular *pre-defined* numeric values for prefid and countyid:

            In that case, you could use -duplicates drop- to make a file just containing just one entry for each combination of the relevant variables, i.e., provid prefecture county. You could then type in the desired numeric values of prefid and countyid for each combination and save that file. Then you could can use -merge m:1 provid prefecture county... - to put the prefid and countyid onto your master file. You could handle the issues with "year" with some -replace- commands after doing that. I'm suggesting this approach based on the assumption that the confusing (to me) code you show above represents a relatively small and illustrative selection of a large number of combinations of prefid/provid/county you need to code numerically.

            Here's a partial illustration, using your -dataex- example above:
            Code:
            keep provid prefecture county year
            duplicates drop provid prefecture county, force
            gen int prefid = .
            gen int countyid = .
            which would give you a data set like the following into which you would fill in "by hand" your prefid and countyid values

            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input int year byte provid str16 prefecture str15 county int(prefid countyid)
            2007 11 "all"           "all"           . .
            2007 11 "firstintcourt" "changpingqu"   . .
            2007 11 "firstintcourt" "firstintcourt" . .
            2007 11 "firstintcourt" "haidianqu"     . .
            2007 11 "firstintcourt" "mentougouqu"   . .
            2007 11 "firstintcourt" "shijingshanqu" . .
            2007 11 "firstintcourt" "yanqingqu"     . .
            end
            There might be simpler solutions than this, but without understanding all the relations of prefecture/county/year (which I'm not inclined to try to do), this is my suggestion.

            Comment


            • #7
              Got it. Thank you very much, Mike!

              There are changes of the numeric identifiers for regions during the sample period. Now there are many missing values in my dataset and this is the reason why there are only two observations in the sample generated by -dataex-.

              Comment

              Working...
              X