Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Create new variables and add them via loop

    Good morning
    I report my initial situation (I have developed the following code to create and calculate the variables of my interest):
    Code:
    recode v89 v90 v91 v92 v93 v94 v95 v131 v132 v133 v134 v135 v136 v137 v138 v139 v141 v142 v143 v144 v145 v146  v147 v148 v149 v150 v151 v152 v153 v154 v155 v156 v157 v158 v159 v160 v161 v162 v163 v164 v165 v166 v167 v168 v169
    v170 v171 v172 v173 v174 v175 v176 v177 v178 v179 v180 v181 v182 v183 v184 v185 v186 (missing = 0) , prefix(new_)
    
    gen CAPINV2013= new_v89 + new_v131 + new_v138 + new_v145 + new_v152 + new_v159 + new_v166 + new_v173 + new_v180
    gen CAPINV2014= new_v90 + new_v132 + new_v139 + new_v146 + new_v153 + new_v160 + new_v167 + new_v174 + new_v181
    gen CAPINV2015= new_v91 + new_v133 + new_v140 + new_v147 + new_v154 + new_v161 + new_v168 + new_v175 + new_v182
    gen CAPINV2016= new_v92 + new_v134 + new_v141 + new_v148 + new_v155 + new_v162 + new_v169 + new_v176 + new_v183
    gen CAPINV2017= new_v93 + new_v135 + new_v142 + new_v149 + new_v156 + new_v163 + new_v170 + new_v177 + new_v184
    gen CAPINV2018= new_v94 + new_v136 + new_v143 + new_v150 + new_v157 + new_v164 + new_v171 + new_v178 + new_v185
    gen CAPINV2019= new_v95 + new_v137 + new_v144 + new_v151 + new_v158 + new_v165 + new_v172 + new_v179 + new_v189
    
    gen D2013= new_v159 + new_v166 + new_v173 + new_v180
    gen D2014= new_v160 + new_v167 + new_v174 + new_v181
    gen D2015= new_v161 + new_v168 + new_v175 + new_v182
    gen D2016= new_v162 + new_v169 + new_v176 + new_v183
    gen D2017= new_v163 + new_v170 + new_v177 + new_v184
    gen D2018= new_v164 + new_v171 + new_v178 + new_v185
    gen D2019= new_v165 + new_v172 + new_v179 + new_v189
    
    replace CAPINV2013 =. if CAPINV2013 ==0
    replace CAPINV2014 =. if CAPINV2014 ==0
    replace CAPINV2015 =. if CAPINV2015 ==0
    replace CAPINV2016 =. if CAPINV2016 ==0
    replace CAPINV2017 =. if CAPINV2017 ==0
    replace CAPINV2018 =. if CAPINV2018 ==0
    replace CAPINV2019 =. if CAPINV2019 ==0
    
    replace D2013 =. if D2013 ==0
    replace D2014 =. if D2014 ==0
    replace D2015 =. if D2015 ==0
    replace D2016 =. if D2016 ==0
    replace D2017 =. if D2017 ==0
    replace D2018 =. if D2018 ==0
    replace D2019 =. if D2019 ==0
    
    drop new_v89 new_v90 new_v91 new_v92 new_v93 new_v94 new_v95 new_v131 new_v132 new_v133 new_v134 new_v135 new_v136 new_v137 new_v138 new_v139 new_v141 new_v142 new_v143 new_v144 new_v145 new_v146 new_v147 new_v148 new_v149
    new_v150 new_v151 new_v152 new_v153 new_v154 new_v155 new_v156 new_v157 new_v158 new_v159 new_v160 new_v161 new_v162 new_v163 new_v164 new_v165 new_v166 new_v167 new_v168 new_v169 new_v170 new_v171 new_v172 new_v173
    new_v174 new_v175 new_v176 new_v177 new_v178 new_v179 new_v180 new_v181 new_v182 new_v183 new_v184 new_v185 new_v186
    Would I like to create a much more "lean" code through a loop? is it possible in your opinion?
    Thank you
    Last edited by Riccardo Busin; 02 Nov 2021, 09:18.

  • #2
    Your desire to calculate sums treating "." as 0 can be accomplished with the -rowtotal- function of the -egen- command, thus avoiding your initial creation of the various new_* variables. See help -egen-. For example, you can do:
    Code:
    egen CAPINV2013= rowtotal(v89 v131 v138 v145 v152 v159 v166 v173 v180)
    A parallel series of such commands might be done with a loop, but I don't think it will be much shorter or clearer, so I'd just repeat several -egen- commands.

    As for your series of replacements of "0" with "." that can be done with a single -recode-.
    Code:
    recode CAPINV2013 CAPINV2014 CAPINV2015 CAPINV2016 CAPINV2017 CAPINV2018 CAPINV2019 ///
           D2013 D2014 D2015 D2016 D2017 D2018 D2019    (0 = .)
    It appears that your data represents some kind of year based panel, in which case a long rather than wide data structure and a somewhat different naming of your variables might have made what you want much easier, but helping you in that direction would require knowing something of the substance and logic of your dataset. "Long" layout is almost always a better choice in Stata.
    Last edited by Mike Lacy; 02 Nov 2021, 10:07.

    Comment


    • #3
      Yes, it is possible. For example
      Code:
      gen D2013= new_v159 + new_v166 + new_v173 + new_v180
      gen D2014= new_v160 + new_v167 + new_v174 + new_v181
      gen D2015= new_v161 + new_v168 + new_v175 + new_v182
      gen D2016= new_v162 + new_v169 + new_v176 + new_v183
      gen D2017= new_v163 + new_v170 + new_v177 + new_v184
      gen D2018= new_v164 + new_v171 + new_v178 + new_v185
      gen D2019= new_v165 + new_v172 + new_v179 + new_v189
      can be written more compactly as
      Code:
      forvalues i = 1/7 {
          gen D`=2012+`i'' = .
          foreach j of numlist 158(7)179 {
              replace D`2012+`i''  = D`2012+`i'' + new_v`=`j'+`i''
          }
      }
      The code for the CAPINV* variables would be similar, with different numbers bounding the loops, and the "starting point" in the inner loop would not be missing value but the value of new_v`=88+`i''.

      But I would not do this if I were you. The code you have written is very transparent. Even people who have never seen Stata code before can read it and know what it does. And if you have to come back to this after a long absence where you have forgotten the details, you, too, will know at a glance what is going on. By contrast, the compact looping code is very obscure. Only Stata-knowledgable people can understand it at all, and even experienced users will take a long time to figure out what it means. If you had to revisit this code after a long absence, your first reaction to it would likely be bewilderment. And, on top of that, you only shorten the code by 1 line!

      That said, the blocks of code that replace 0 with missing values in the CAPINV* and D* variables you create can each be shortened to one transparent command:

      Code:
      mvencode CAPINV2013-CAPINV2019 D2013-D2019, mv(0)
      In fact, if there are no other variables in the data set that begin with CAPINV or D, you can do even a bit better with:
      Code:
      mvencode CAPINV* D*, mv(0)
      Added: Crossed with #2, where some additional, excellent, suggestions are made.

      Comment


      • #4
        Originally posted by Mike Lacy View Post
        Your desire to calculate sums treating "." as 0 can be accomplished with the -rowtotal- function of the -egen- command, thus avoiding your initial creation of the various new_* variables. See help -egen-. For example, you can do:
        Code:
        egen CAPINV2013= rowtotal(v89 v131 v138 v145 v152 v159 v166 v173 v180)
        A parallel series of such commands might be done with a loop, but I don't think it will be much shorter or clearer, so I'd just repeat several -egen- commands.

        As for your series of replacements of "0" with "." that can be done with a single -recode-.
        Code:
        recode CAPINV2013 CAPINV2014 CAPINV2015 CAPINV2016 CAPINV2017 CAPINV2018 CAPINV2019 ///
        D2013 D2014 D2015 D2016 D2017 D2018 D2019 (0 = .)
        It appears that your data represents some kind of year based panel, in which case a long rather than wide data structure and a somewhat different naming of your variables might have made what you want much easier, but helping you in that direction would require knowing something of the substance and logic of your dataset. "Long" layout is almost always a better choice in Stata.
        Thank you for your help!

        Comment


        • #5
          Originally posted by Clyde Schechter View Post
          Yes, it is possible. For example
          Code:
          gen D2013= new_v159 + new_v166 + new_v173 + new_v180
          gen D2014= new_v160 + new_v167 + new_v174 + new_v181
          gen D2015= new_v161 + new_v168 + new_v175 + new_v182
          gen D2016= new_v162 + new_v169 + new_v176 + new_v183
          gen D2017= new_v163 + new_v170 + new_v177 + new_v184
          gen D2018= new_v164 + new_v171 + new_v178 + new_v185
          gen D2019= new_v165 + new_v172 + new_v179 + new_v189
          can be written more compactly as
          Code:
          forvalues i = 1/7 {
          gen D`=2012+`i'' = .
          foreach j of numlist 158(7)179 {
          replace D`2012+`i'' = D`2012+`i'' + new_v`=`j'+`i''
          }
          }
          The code for the CAPINV* variables would be similar, with different numbers bounding the loops, and the "starting point" in the inner loop would not be missing value but the value of new_v`=88+`i''.

          But I would not do this if I were you. The code you have written is very transparent. Even people who have never seen Stata code before can read it and know what it does. And if you have to come back to this after a long absence where you have forgotten the details, you, too, will know at a glance what is going on. By contrast, the compact looping code is very obscure. Only Stata-knowledgable people can understand it at all, and even experienced users will take a long time to figure out what it means. If you had to revisit this code after a long absence, your first reaction to it would likely be bewilderment. And, on top of that, you only shorten the code by 1 line!

          That said, the blocks of code that replace 0 with missing values in the CAPINV* and D* variables you create can each be shortened to one transparent command:

          Code:
          mvencode CAPINV2013-CAPINV2019 D2013-D2019, mv(0)
          In fact, if there are no other variables in the data set that begin with CAPINV or D, you can do even a bit better with:
          Code:
          mvencode CAPINV* D*, mv(0)
          Added: Crossed with #2, where some additional, excellent, suggestions are made.
          Thank you for your help!
          I wanted to try the suggested code:
          Code:
           
           forvalues i = 1/7 {     gen D`=2012+`i'' = .     foreach j of numlist 158(7)179 {         replace D`2012+`i''  = D`2012+`i'' + new_v`=`j'+`i''     } }
          But when I try to use it, this message appears:

          (22,520 missing values generated)
          (0 real changes made)
          (0 real changes made)
          (0 real changes made)
          (0 real changes made)
          (22,520 missing values generated)
          D ambiguous abbreviation
          r (111);

          I cannot understand where I am wrong. Thanks again

          Comment


          • #6
            Sorry, my mistake. The first line of the inner loop should be:
            Code:
            replace D`=2012+`i'' = D`=2012+`i'' + new_v`=`j'+`i''
            In the future, providing example data using the -dataex- command will avoid such problems. When sample data is available, I test my code before posting it. As there was none here, I did not do so, so my error went uncaught until you tried to use it.

            Comment


            • #7
              Originally posted by Clyde Schechter View Post
              Sorry, my mistake. The first line of the inner loop should be:
              Code:
              replace D`=2012+`i'' = D`=2012+`i'' + new_v`=`j'+`i''
              In the future, providing example data using the -dataex- command will avoid such problems. When sample data is available, I test my code before posting it. As there was none here, I did not do so, so my error went uncaught until you tried to use it.
              Thanks for your help!
              The code creates the variables but all the values are missing.
              I am new to Stata, how can I use dataex?
              Thanks again and sorry for the inconvenience

              Comment


              • #8
                If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

                That said, I see I made another error, which explains why you are getting all missing values. The first command in the outer loop should be:
                Code:
                gen D`=2012+`i'' = 0

                Comment


                • #9
                  Originally posted by Clyde Schechter View Post
                  If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

                  That said, I see I made another error, which explains why you are getting all missing values. The first command in the outer loop should be:
                  Code:
                  gen D`=2012+`i'' = 0
                  Thank you again, now everything works! I will follow your advice with using dataex for my next inquiries.
                  Thanks for your availability and kindness

                  Comment

                  Working...
                  X