Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • add different suffixes to 1500+ variable names using foreach

    Dear all,

    I have been trying to add different suffixes (numbers) to a set of variables named v1, v2, v3... but my code seems not working. The suffixes come from the variable id. Namely, I want to rename my variables as follows:

    v1 ==> d_1101506022
    v2 ==> d_1101506032
    v3 ==> d_1101506073
    .
    .
    .

    My code is as follows:

    foreach x in v* {
    local var = id
    local suffix = `var'[`_n']
    rename `x' d_`suffix'
    }

    Will you please tell me what I am doing wrong? Thank you
    Click image for larger version

Name:	sample 2.png
Views:	1
Size:	247.9 KB
ID:	1529790

    Last edited by Secil Danakol; 24 Dec 2019, 04:12.

  • #2
    Please don't use screenshots. Reasons why not and what to do instead are spelled out in https://www.statalist.org/forums/help#stata -- which you are asked to read before posting.

    The form foreach ... in ... won't unpack variable name wildcards. See the help for foreach. Otherwise, you are getting a little confused about what is, and is not, a local macro.

    It is better for your problem to loop over 1/5 (or 1 to whatever it is) as in

    Code:
    * #1 
    forval j = 1/5 { 
          local this = id[`j'] 
          rename v`j' d_`this' 
    }
    which can be in turn rewritten as

    Code:
    * #2 
    forval j = 1/5 { 
         rename v`j' d_`= id[`j']' 
    }
    although my advice is not to write code like #2 before you are comfortable with code like #1.

    A couple of comments on your code segment:

    Code:
    local var = id
    local suffix = `var'[`_n']
    1. That's wrong for what you want.

    Code:
    local var = id
    passes the value of id (always the value in the first observation, as it happens), which is not what you intend. Here that is 1101506022. You intended to pass the name, to wit id. The way to do that is

    Code:
    local var "id"
    where the
    " " are optional.

    2. But you don't need to do that. You intend putting the name in a local macro, only to take it out again next statement. That is not needed. To think yourself out of writing code in this way, consider this simple analogy.

    I have a pen.
    I put my pen in a box.
    I want my pen.
    I take it out of the box.

    Within nothing else said, the putting in and taking out of the box can be cut.

    There is a bigger question. It seems that you have a matrix with a couple of million values. It is unlikely that this data structure is best for matrix manipulation, but to say more needs detail on what you intend.



    Comment


    • #3

      Dear Nick,

      Thank you for your detailed answer. It really helps. I know I need to do more reading on foreach and forvalues.

      As you rightly said, I have a bigger issue and the way I proceed is not the most efficient way, I think. Let me try to explain what I would like to do:

      I have two matrices 1657 by 1657, where id refers to territorial units. The first matrix includes #of foreign firms in all territorial units. For instance, variable v2012_1101506022 is equal to the number of foreign firms in territory 1101506022 in year 2012

      The second matrix is a row normalized inverse distance matrix for the same territorial units. I want to multiply the #of foreign firms in matrix one in a given cell with the weights in distance matrix in the corresponding cell. The aim is to calculate the weighted average of the number of foreign firms in all other territorial units except i

      Any suggestions are highly appreciated!

      Thanks


      FIRST MATRIX:

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input double id float(v2012_1101506022 v2012_1101506032 v2012_1101506073 v2012_1101506082 v2012_1101506103)
      1101506022 5 2 7 15 90
      1101506032 5 2 7 15 90
      1101506073 5 2 7 15 90
      1101506082 5 2 7 15 90
      1101506103 5 2 7 15 90
      end
      SECOND MATRIX:

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input double id float(d_1101506022 d_1101506032 d_1101506073 d_1101506082 d_1101506103)
      1101506022          0 .017849104 .005700286  .012204472   .00543583
      1101506032  .01793034          0 .005220845  .007281685  .005889584
      1101506073 .005843815 .005328052          0  .005600589  .002872144
      1101506082 .012190173 .007240202 .005456633           0 .0043995706
      1101506103 .005388981 .005812371 .002777456 .0043667695           0
      end

      Comment


      • #4
        I will start with the following statement. I do not understand why you are formulating these datasets as matrices. That is not a natural form for Stata data, and you will see that my solution to your problem of multiplying the corresponding entries works by reshaping both matrices into linear datasets, matching the observations, and then reshaping back to a matrix result. Were this my problem, I would work with linear datasets as much as possible.

        With that said, though, here is code that accomplishes the multiplication you seek.
        Code:
        cls
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input double id float(v2012_1101506022 v2012_1101506032 v2012_1101506073 v2012_1101506082 v2012_1101506103)
        1101506022 5 2 7 15 90
        1101506032 5 2 7 15 90
        1101506073 5 2 7 15 90
        1101506082 5 2 7 15 90
        1101506103 5 2 7 15 90
        end
        reshape long v2012_ , i(id) j(id2)
        save v, replace
        
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input double id float(d_1101506022 d_1101506032 d_1101506073 d_1101506082 d_1101506103)
        1101506022          0 .017849104 .005700286  .012204472   .00543583
        1101506032  .01793034          0 .005220845  .007281685  .005889584
        1101506073 .005843815 .005328052          0  .005600589  .002872144
        1101506082 .012190173 .007240202 .005456633           0 .0043995706
        1101506103 .005388981 .005812371 .002777456 .0043667695           0
        end
        reshape long d_, i(id) j(id2)
        save d, replace
        
        use v, clear
        merge 1:1 id id2 using d, assert(match)
        replace v2012_ = v2012_ * d_
        drop d_ _merge
        reshape wide v2012_, i(id) j(id2)
        // use dataex to list data because variable names are so long
        dataex
        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input double id float(v2012_1101506022 v2012_1101506032 v2012_1101506073 v2012_1101506082 v2012_1101506103)
        1101506022          0  .03569821    .039902  .1830671  .4892247
        1101506032   .0896517          0 .036545914 .10922527 .53006256
        1101506073 .029219074 .010656104          0 .08400884 .25849298
        1101506082  .06095087 .014480404  .03819643         0  .3959613
        1101506103 .026944906 .011624742 .019442193 .06550154         0
        end

        Comment


        • #5
          Dear William,

          Many thanks for your answer. I understand your point. I should have used Mata in the first instance to deal with this issue. In fact, I have already achieved what I want. Thanks again.

          Comment

          Working...
          X