Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Should I rename all the variables with the year immediately following the original variable names when I merge data across years with loop

    I run the following code:

    local i = 1998
    while `i' < 2007{

    local j=`i'+1


    **step 10: match by firm ID**

    *deal with duplicates of IDs (there are a few firms that have same IDs)*
    use m`i'.10.dta, clear
    bysort id`i': keep if _N>1
    save duplicates_ID`i'.dta, replace

    use m`i'.10.dta, clear
    bysort id`i': drop if _N>1
    rename id`i' id
    rename B210 stateown
    rename F301 capital_total
    rename F302 capital_state
    rename F392 capital_collective
    rename F393 capital_legal
    rename F394 capital_individual
    rename F395 capital_HMT
    rename F303 capital_foreign
    rename product1_ product
    keep bdat`i' cic`i' dq`i' capital_total`i' capital_state`i' capital_collective`i' capital_legal`i' capital_individual`i' capital_HMT`i' capital_foreign`i' stateown`i' id legal_person`i' name`i' phone`i' product`i' town`i'
    sort id
    save match`i'.1.dta, replace

    use m`j'.10.dta, clear
    bysort id`j': keep if _N>1
    save duplicates_ID`j'.dta, replace

    use m`j'.10.dta, clear
    bysort id`j': drop if _N>1
    rename id`j' id
    keep bdat`j' cic`j' dq`j' capital_total`j' capital_state`j' capital_collective`j' capital_legal`j' capital_individual`j' capital_HMT`j' capital_foreign`j' stateown`j' id legal_person`j' name`j' phone`j' product`j' town`j'
    sort id
    save match`j'.1.dta, replace

    use match`i'.1.dta, clear
    merge id using match`j'.1.dta
    keep if _merge==3
    gen id`i'=id
    rename id id`j'
    drop _merge
    gen match_method_`i'_`j'="ID"
    gen match_status_`i'_`j'="3"
    save matched_by_ID`i'_`j'.dta, replace



    **step 20: match by firm names**

    *match those unmatched firms in previous step by firm names*

    use match`i'.1.dta, clear
    merge id using match`j'.1.dta
    keep if _merge==1
    rename id id`i'
    append using duplicates_ID`i'.dta
    bysort name`i': keep if _N>1
    keep bdat`i' cic`i' dq`i' capital_total`i' capital_state`i' capital_collective`i' capital_legal`i' capital_individual`i' capital_HMT`i' capital_foreign`i' stateown`i' id`i' legal_person`i' name`i' phone`i' product`i' town`i'
    save duplicates_name`i'.dta, replace

    use match`i'.1.dta, clear
    merge id using match`j'.1.dta
    keep if _merge==1
    rename id id`i'
    append using duplicates_ID`i'.dta
    bysort name`i': drop if _N>1
    rename name`i' name
    sort name
    keep bdat`i' cic`i' dq`i' capital_total`i' capital_state`i' capital_collective`i' capital_legal`i' capital_individual`i' capital_HMT`i' capital_foreign`i' stateown`i' id`i' legal_person`i' name`i' phone`i' product`i' town`i'
    save unmatched_by_ID`i'.dta, replace

    use match`i'.1.dta, clear
    merge id using match`j'.1.dta
    keep if _merge==2
    rename id id`j'
    append using duplicates_ID`j'.dta
    bysort name`j': keep if _N>1
    keep bdat`j' cic`j' dq`j' capital_total`j' capital_state`j' capital_collective`j' capital_legal`j' capital_individual`j' capital_HMT`j' capital_foreign`j' stateown`j' id`j' legal_person`j' name`j' phone`j' product`j' town`j'
    save duplicates_name`j'.dta, replace

    use match`i'.1.dta, clear
    merge id using match`j'.1.dta
    keep if _merge==2
    rename id id`j'
    append using duplicates_ID`j'.dta
    bysort name`j': drop if _N>1
    rename name`j' name
    sort name
    keep bdat`j' cic`j' dq`j' capital_total`j' capital_state`j' capital_collective`j' capital_legal`j' capital_individual`j' capital_HMT`j' capital_foreign`j' stateown`j' id`j' legal_person`j' name`j' phone`j' product`j' town`j'
    save unmatched_by_ID`j'.dta, replace

    use unmatched_by_ID`i'.dta, clear
    merge name using unmatched_by_ID`j'.dta
    keep if _merge==3
    gen name`i'=name
    rename name name`j'
    drop _merge
    gen match_method_`i'_`j'="firm name"
    gen match_status_`i'_`j'="3"
    save matched_by_name`i'_`j'.dta, replace



    **step 30: match by the names of legal person representatives**

    *match those unmatched firms in previous steps by firm legal person representatives*

    use unmatched_by_ID`i'.dta, clear
    merge name using unmatched_by_ID`j'.dta
    keep if _merge==1
    rename name name`i'
    append using duplicates_name`i'.dta
    replace legal_person`i'="." if legal_person`i'==""
    gen code1=legal_person`i'+substr(dq`i',1,4)
    bysort code1: keep if _N>1
    keep bdat`i' cic`i' dq`i' capital_total`i' capital_state`i' capital_collective`i' capital_legal`i' capital_individual`i' capital_HMT`i' capital_foreign`i' stateown`i' id`i' legal_person`i' name`i' phone`i' product`i' town`i'
    save duplicates_code1_`i'.dta, replace

    use unmatched_by_ID`i'.dta, clear
    merge name using unmatched_by_ID`j'.dta
    keep if _merge==1
    rename name name`i'
    append using duplicates_name`i'.dta
    replace legal_person`i'="." if legal_person`i'==""
    gen code1=legal_person`i'+substr(dq`i',1,4)
    bysort code1: drop if _N>1
    sort code1
    keep code1 bdat`i' cic`i' dq`i' capital_total`i' capital_state`i' capital_collective`i' capital_legal`i' capital_individual`i' capital_HMT`i' capital_foreign`i' stateown`i' id`i' legal_person`i' name`i' phone`i' product`i' town`i'
    save unmatched_by_ID_and_name`i'.dta, replace

    use unmatched_by_ID`i'.dta, clear
    merge name using unmatched_by_ID`j'.dta
    keep if _merge==2
    rename name name`j'
    append using duplicates_name`j'.dta
    gen code1=legal_person`j'+substr(dq`j',1,4)
    bysort code1: keep if _N>1
    keep bdat`j' cic`j' dq`j' capital_total`j' capital_state`j' capital_collective`j' capital_legal`j' capital_individual`j' capital_HMT`j' capital_foreign`j' stateown`j' id`j' legal_person`j' name`j' phone`j' product`j' town`j'
    save duplicates_code1_`j'.dta, replace

    use unmatched_by_ID`i'.dta, clear
    merge name using unmatched_by_ID`j'.dta
    keep if _merge==2
    rename name name`j'
    append using duplicates_name`j'.dta
    gen code1=legal_person`j'+substr(dq`j',1,4)
    bysort code1: drop if _N>1
    sort code1
    keep code1 bdat`j' cic`j' dq`j' capital_total`j' capital_state`j' capital_collective`j' capital_legal`j' capital_individual`j' capital_HMT`j' capital_foreign`j' stateown`j' id`j' legal_person`j' name`j' phone`j' product`j' town`j'
    save unmatched_by_ID_and_name`j'.dta, replace

    use unmatched_by_ID_and_name`i'.dta, clear
    display _N
    merge code1 using unmatched_by_ID_and_name`j'.dta
    keep if _merge==3
    drop _merge code1
    gen match_method_`i'_`j'="legal person"
    gen match_status_`i'_`j'="3"
    save matched_by_legalperson`i'_`j'.dta, replace



    **step 40: match by phone number + city code**

    *match those unmatched firms in previous steps by phone number + city code*

    use unmatched_by_ID_and_name`i'.dta, clear
    merge code1 using unmatched_by_ID_and_name`j'.dta
    keep if _merge==1
    drop code1
    append using duplicates_code1_`i'
    replace phone`i'="." if phone`i'==""
    gen code2=substr(dq`i',1,4)+substr(cic`i',1,3)+phone`i '
    bysort code2: keep if _N>1
    keep bdat`i' cic`i' dq`i' capital_total`i' capital_state`i' capital_collective`i' capital_legal`i' capital_individual`i' capital_HMT`i' capital_foreign`i' stateown`i' id`i' legal_person`i' name`i' phone`i' product`i' town`i'
    save duplicates_code2_`i'.dta, replace

    use unmatched_by_ID_and_name`i'.dta, clear
    merge code1 using unmatched_by_ID_and_name`j'.dta
    keep if _merge==1
    drop code1
    append using duplicates_code1_`i'
    replace phone`i'="." if phone`i'==""
    gen code2=substr(dq`i',1,4)+substr(cic`i',1,3)+phone`i '
    bysort code2: drop if _N>1
    sort code2
    keep code2 bdat`i' cic`i' dq`i' capital_total`i' capital_state`i' capital_collective`i' capital_legal`i' capital_individual`i' capital_HMT`i' capital_foreign`i' stateown`i' id`i' legal_person`i' name`i' phone`i' product`i' town`i'
    save unmatched_by_ID_and_name_and_legalperson`i'.dta, replace


    use unmatched_by_ID_and_name`i'.dta, clear
    merge code1 using unmatched_by_ID_and_name`j'.dta
    keep if _merge==2
    drop code1
    append using duplicates_code1_`j'
    gen code2=substr(dq`j',1,4)+substr(cic`j',1,3)+phone`j '
    bysort code2: keep if _N>1
    keep bdat`j' cic`j' dq`j' capital_total`j' capital_state`j' capital_collective`j' capital_legal`j' capital_individual`j' capital_HMT`j' capital_foreign`j' stateown`j' id`j' legal_person`j' name`j' phone`j' product`j' town`j'
    save duplicates_code2_`j'.dta, replace

    use unmatched_by_ID_and_name`i'.dta, clear
    merge code1 using unmatched_by_ID_and_name`j'.dta
    keep if _merge==2
    drop code1
    append using duplicates_code1_`j'
    gen code2=substr(dq`j',1,4)+substr(cic`j',1,3)+phone`j '
    bysort code2: drop if _N>1
    sort code2
    keep code2 bdat`j' cic`j' dq`j' capital_total`j' capital_state`j' capital_collective`j' capital_legal`j' capital_individual`j' capital_HMT`j' capital_foreign`j' stateown`j' id`j' legal_person`j' name`j' phone`j' product`j' town`j'
    save unmatched_by_ID_and_name_and_legalperson`j'.dta, replace

    use unmatched_by_ID_and_name_and_legalperson`i'.dta,cl ear
    merge code2 using unmatched_by_ID_and_name_and_legalperson`j'.dta
    keep if _merge==3
    drop _merge code2
    gen match_method_`i'_`j'="phone number"
    gen match_status_`i'_`j'="3"
    save matched_by_phone`i'_`j'.dta, replace



    **step 50: match by code = founding year + geographic code + industry code + name of town + name of main product**

    *match those unmatched firms in previous steps by founding year + geographic code + industry code + name of town + name of main product*

    use unmatched_by_ID_and_name_and_legalperson`i'.dta,cl ear
    merge code2 using unmatched_by_ID_and_name_and_legalperson`j'.dta
    keep if _merge==1
    drop code2
    append using duplicates_code2_`i'.dta
    replace town`i'="." if town`i'==""
    replace product1_`i'="." if product1_`i'==""
    gen code3=bdat`i'+substr(dq`i',1,6)+substr(cic`i',1,4) +town`i'+product1_`i'
    bysort code3: keep if _N>1
    keep bdat`i' cic`i' dq`i' capital_total`i' capital_state`i' capital_collective`i' capital_legal`i' capital_individual`i' capital_HMT`i' capital_foreign`i' stateown`i' id`i' legal_person`i' name`i' phone`i' product`i' town`i'
    save duplicates_code3_`i'.dta, replace

    use unmatched_by_ID_and_name_and_legalperson`i'.dta,cl ear
    merge code2 using unmatched_by_ID_and_name_and_legalperson`j'.dta
    keep if _merge==1
    drop code2
    append using duplicates_code2_`i'.dta
    replace town`i'="." if town`i'==""
    replace product1_`i'="." if product1_`i'==""
    gen code3=bdat`i'+substr(dq`i',1,6)+substr(cic`i',1,4) +town`i'+product1_`i'
    bysort code3: drop if _N>1
    sort code3
    keep code3 bdat`i' cic`i' dq`i' capital_total`i' capital_state`i' capital_collective`i' capital_legal`i' capital_individual`i' capital_HMT`i' capital_foreign`i' stateown`i' id`i' legal_person`i' name`i' phone`i' product`i' town`i'
    save unmatched_by_ID_and_name_and_legalperson_and_phone `i'.dta, replace

    use unmatched_by_ID_and_name_and_legalperson`i'.dta,cl ear
    merge code2 using unmatched_by_ID_and_name_and_legalperson`j'.dta
    keep if _merge==2
    drop code2
    append using duplicates_code2_`j'.dta
    gen code3=bdat`j'+substr(dq`j',1,6)+substr(cic`j',1,4) +town`j'+product1_`j'
    bysort code3: keep if _N>1
    keep bdat`j' cic`j' dq`j' capital_total`j' capital_state`j' capital_collective`j' capital_legal`j' capital_individual`j' capital_HMT`j' capital_foreign`j' stateown`j' id`j' legal_person`j' name`j' phone`j' product`j' town`j'
    save duplicates_code3_`j'.dta, replace

    use unmatched_by_ID_and_name_and_legalperson`i'.dta,cl ear
    merge code2 using unmatched_by_ID_and_name_and_legalperson`j'.dta
    keep if _merge==2
    drop code2
    append using duplicates_code2_`j'.dta
    gen code3=bdat`j'+substr(dq`j',1,6)+substr(cic`j',1,4) +town`j'+product1_`j'
    bysort code3: drop if _N>1
    sort code3
    keep code3 bdat`j' cic`j' dq`j' capital_total`j' capital_state`j' capital_collective`j' capital_legal`j' capital_individual`j' capital_HMT`j' capital_foreign`j' stateown`j' id`j' legal_person`j' name`j' phone`j' product`j' town`j'
    save unmatched_by_ID_and_name_and_legalperson_and_phone `j'.dta, replace

    use unmatched_by_ID_and_name_and_legalperson_and_phone `i'.dta,clear
    display _N
    merge code3 using unmatched_by_ID_and_name_and_legalperson_and_phone `j'.dta
    keep if _merge==3
    drop _merge code3
    gen match_method_`i'_`j'="code 3"
    gen match_status_`i'_`j'="3"
    save matched_by_code3_`i'_`j'.dta, replace

    use unmatched_by_ID_and_name_and_legalperson_and_phone `i'.dta,clear
    merge code3 using unmatched_by_ID_and_name_and_legalperson_and_phone `j'.dta
    keep if _merge==1
    drop _merge code3
    append using duplicates_code3_`i'.dta
    gen match_method_`i'_`j'=""
    gen match_status_`i'_`j'="1"
    save unmatched_by_ID_and_name_and_legalperson_and_phone _and_code2_`i'.dta, replace

    use unmatched_by_ID_and_name_and_legalperson_and_phone `i'.dta,clear
    merge code3 using unmatched_by_ID_and_name_and_legalperson_and_phone `j'.dta
    keep if _merge==2
    drop _merge code3
    append using duplicates_code3_`j'.dta
    gen match_method_`i'_`j'=""
    gen match_status_`i'_`j'="2"
    save unmatched_by_ID_and_name_and_legalperson_and_phone _and_code2_`j'.dta, replace



    **step 60: merge the matched and unmatched files to create files of two consecutive years**

    use matched_by_ID`i'_`j'.dta, clear
    append using matched_by_name`i'_`j'.dta
    append using matched_by_legalperson`i'_`j'.dta
    append using matched_by_phone`i'_`j'.dta
    append using matched_by_code3_`i'_`j'.dta
    append using unmatched_by_ID_and_name_and_legalperson_and_phone _and_code2_`i'.dta
    append using unmatched_by_ID_and_name_and_legalperson_and_phone _and_code2_`j'.dta
    save m`i'-m`j'.dta, replace


    local i = `i' + 1
    }

    But I got this error showed after:

    (165,116 observations deleted)
    file duplicates_ID1998.dta saved
    (2 observations deleted)
    variable bdat1998 not found
    r(111);

    How should I solve the problem? I have to rename all the variables with the year immediately following the original variable names even in this loop command? Am I clear with my question?

    Your help will be greatly appreciated!

  • #2
    My experience is that when someone posts as large a piece of code as you have, people are not likely to try to help. You have over 300 lines of code, the most I have ever seen someone post on StataList. Can you imagine looking at this amount of someone else's code and trying to help them? I think you would find it difficult.

    To debug: You should take Stata's error message at face value, i.e, you are apparently at some point in the code working with a data file that does not contain the variable bdat1998, and Stata halts with an error when it reaches that point.

    You need to first discover where this is happening. In your situation, I would put a -display "something"- command and a -describe- command right before every place in your code where the stub "bdat" is reference, and than re-running your code. Other people might do this thing by using -set trace on- and running your code, but this might produce difficult to understand output for a new user. When you find out where the problem is happening, you might have narrowed down the problem to where someone here can help you. Also, at that point, you yourself can look at the data file in question and see if you can figure out why it does not have a variable named bdat1998 (etc.) even though you think it should contain such a variable.

    And, no, I don't happen to find your question clear. I don't know the meaning of the following sentence: "I have to rename all the variables with the year immediately following the original variable names even in this loop command?" If that is important, perhaps you can get a friend to help you rewrite it.

    Comment


    • #3
      Thank you very much! Really appreciate your answer!

      Comment


      • #4
        I agree entirely with the advice fromMike Lacy. Let me add one thing that lept out at me.
        Code:
        local i = 1998
        ...
        rename F301 capital_total
        ...
        keep keep bdat`i' ... capital_total`i' ...
        You apparently
        Code:
        rename F301 capital_total
        but then you
        Code:
        keep capital_total1998
        This will not work. And I'm guessing your dataset contains a variable named bdat, not bdat1998, and your code failed on the keep command.

        But I cannot recommend a way to fix this, because like Mike I cannot understand what you are trying to accomplish, and I could easily give you a correct recommendation for what turns out to be the wrong approach to what you are trying to accomplish, sending you further down the wrong path. I'm reluctant to do that.

        Comment


        • #5
          Really appreciate your answer!

          Comment

          Working...
          X