Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Foreach and string variables

    I have a panel data set with country, years, and the following three (relevant) variables:

    C (float)
    code (string)
    varcode (string)

    code has string values such as "a", "b", "c", etc.
    varcode has string values such as "i_it_a", "i_it_b", "i_it_c", etc.

    I am trying to create a new variable, i_gfcf_<code>, for each value of code, which sums over the values of C for certain values of varcode. See the loop below:

    local title code
    foreach x of varlist code{
    egen i_gfcf_`x' = sum(C) if (varcode == "i_it_`x'") | (varcode == "i_ct_`x'") | (varcode == "i_soft_`x'") | (varcode == "i_traeq_`x'") | (varcode == "i_omach_`x'") | ///
    (varcode == "i_ocon_`x'") | (varcode == "i_rstruc_`x'") | (varcode == "i_other_`x'")
    }

    PROBLEM: After running this loop, I obtain only one new variable, i_gfcf_code. But code is a variable name, not a value of the variable, i.e. I would expect to have only i_gfcf_a, i_gfcf_b, i_gfcf_c, etc. What's more, i_gfcf_code is incorrectly missing for all observations.

    QUESTION: How can I structure the above loop such that each value of the variable code is used to
    (i) generate a new variable named after that particular value of code, e.g. i_gfcf_a, i_gfcf_b, etc. and
    (ii) adjust the if statement for each value of code, e.g. if (varcode == "i_it_a") | (varcode == "i_ct_a"), etc. or if (varcode == "i_it_b") | (varcode == "i_ct_b"), etc. ?

    Thank you so much.

  • #2
    You gave the foreach loop one variable name and so asked for one new variable

    Code:
    egen i_gfcf_code =
    which you got. You think that the syntax should make look Stata inside the variable varcode and give you several new variables, one for each distinct value. That's not at all what it does. There is a 1:1 substitution of the name you provided as a suffix to the new variable name..

    But why do you want lots and lots of variables here? Consider a single command:

    Code:
    egen wanted = total(C), by(varcode) 
    That should contain the results you want. No loop is needed at all.

    I use total() here rather than sum(), which is undocumented as an egen function from Stata 9. You shouldn't prefer to use undocumented syntax. The two different functions produce the same result, but if someone asks you for documentation, you won't be able to cite it.

    It's not obvious from the syntax why you got all missings. The answer to that lies in your data. You gave no data example (FAQ Advice #12) and I can't say more definitively without seeing one. What you could do is show us the results of

    Code:
    tabulate varcode 
    However, you are testing for equality with these values

    Code:
    varcode == "i_it_code") | (varcode == "i_ct_code") | (varcode == "i_soft_code") | (varcode == "i_traeq_code") | (varcode == "i_omach_code") | ///
    (varcode == "i_ocon_code") | (varcode == "i_rstruc_code") | (varcode == "i_other_code"
    which don't correspond to the example values you give. That's because Stata substitutes the macro name you give as loop index with the single value "code" you ask it to loop over. It gave you what you asked for, unwittingly.

    By the way, there is simpler code for that, which is

    Code:
    inlist(varcode, "i_it_code", "i_ct_code", "i_soft_code", "i_traeq_code", "i_omach_code", "i_ocon_code", "i_rstruc_code", "i_other_code")
    although see

    Code:
    help inlist() 
    for the restriction to no more than 10 distinct string values.

    That all looks like the same radical misunderstanding, that your foreach syntax will make Stata look inside your variable and then loop over its distinct values.

    To summarize, a loop isn't needed here and you're not understanding what the loop does. If you really have a good reason for wanting separate variables, then you need two commands (still no loop)

    Code:
    egen wanted = total(C), by(varcode)
    separate wanted, by(varcode) 
    An ancient tutorial may still be of use


    SJ-2-2 pr0005 . . . . . . Speaking Stata: How to face lists with fortitude
    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
    Q2/02 SJ 2(2):202--222 (no commands)
    demonstrates the usefulness of for, foreach, forvalues, and
    local macros for interactive (non programming) tasks

    https://www.stata-journal.com/sjpdf....iclenum=pr0005

    although the section on for is of no use to you and should be skipped.

    Last edited by Nick Cox; 24 Feb 2019, 02:56.

    Comment


    • #3
      Typo in #2

      Code:
       varcode == "i_it_code") | (varcode == "i_ct_code") | (varcode == "i_soft_code") | (varcode == "i_traeq_code") | (varcode == "i_omach_code") | /// (varcode == "i_ocon_code") | (varcode == "i_rstruc_code") | (varcode == "i_other_code"
      should be

      Code:
       (varcode == "i_it_code") | (varcode == "i_ct_code") | (varcode == "i_soft_code") | (varcode == "i_traeq_code") | (varcode == "i_omach_code") | /// (varcode == "i_ocon_code") | (varcode == "i_rstruc_code") | (varcode == "i_other_code")
      Sorry about that.

      Reading through this again, I am still not sure what you want, but the code discussion should get you closer, and you can always ask further questions.
      Last edited by Nick Cox; 24 Feb 2019, 04:01.

      Comment


      • #4
        Nick, thanks a lot for the help. The article was especially appreciated.

        Let me show you what the goal is, using a simplified example of my data. (My real data has over 7000 observations: 20+ countries, 50+ vars, 30+ codes, and 40+ years).

        I want to create the observations in bold.

        Specifically, I want to add the two obs in red to create the bolded ob in red, and add the two obs in blue to create the bolded ob in blue. I want to do this grouped by country. In other words,

        I_AGG_A = I_IT_A + I_CT_A
        I_AGG_B = I_IT_B + I_CT_B
        J_AGG_A = J_IT_A + J_CT_A
        J_AGG_B = J_IT_B + J_CT_B

        grouped by country.

        Do you have any idea how I could do this?

        I understand Stata isn't meant for these operations on observations, but perhaps I could get around the issue by reformatting the data in long form (creating a new variable "years") and creating these new observations as new variables, then returning to wide form. But this could exceed my variable limits for Stata...

        Thank you!!!
        id country var code varcode _2014 _2015 _2016
        1 USA I_IT A I_IT_A 1 1 1
        2 USA I_IT B I_IT_B 2 2 2
        3 USA I_CT A I_CT_A 3 3 3
        4 USA I_CT B I_CT_B 4 4 4
        5 USA I_AGG A I_AGG_A 4 4 4
        6 USA I_AGG B I_AGG_B 6 6 6
        7 USA J_IT A J_IT_A 5 5 5
        8 USA J_IT B J_IT_B 6 6 6
        9 USA J_CT A J_CT_A 7 7 7
        10 USA J_CT B J_CT_B 8 8 8
        11 USA J_AGG A J_AGG_A 12 12 12
        12 USA J_AGG B J_AGG_B 14 14 14
        13 AT I_IT A I_IT_A 1 1 1
        14 AT I_IT B I_IT_B 2 2 2
        15 AT I_CT A I_CT_A 3 3 3
        16 AT I_CT B I_CT_B 4 4 4
        17 AT I_AGG A I_AGG_A 4 4 4
        18 AT I_AGG B I_AGG_B 6 6 6
        19 AT J_IT A J_IT_A 5 5 5
        20 AT J_IT B J_IT_B 6 6 6
        21 AT J_CT A J_CT_A 7 7 7
        22 AT J_CT B J_CT_B 8 8 8
        23 AT J_AGG A J_AGG_A 12 12 12
        24 AT J_AGG B J_AGG_B 14 14 14

        Comment


        • #5
          Please use dataex to give an example (https://www.statalist.org/forums/help#stata)

          Comment


          • #6
            Here is the same example read into Stata...

            (Note: I am using Stata 13.0)

            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input byte id str3 country str5 var str1 code str7 varcode byte(_2014 _2015 _2016)
             1 "USA" "I_IT"  "A" "I_IT_A"   1  1  1
             2 "USA" "I_IT"  "B" "I_IT_B"   2  2  2
             3 "USA" "I_CT"  "A" "I_CT_A"   3  3  3
             4 "USA" "I_CT"  "B" "I_CT_B"   4  4  4
             5 "USA" "I_AGG" "A" "I_AGG_A"  4  4  4
             6 "USA" "I_AGG" "B" "I_AGG_B"  6  6  6
             7 "USA" "J_IT"  "A" "J_IT_A"   5  5  5
             8 "USA" "J_IT"  "B" "J_IT_B"   6  6  6
             9 "USA" "J_CT"  "A" "J_CT_A"   7  7  7
            10 "USA" "J_CT"  "B" "J_CT_B"   8  8  8
            11 "USA" "J_AGG" "A" "J_AGG_A" 12 12 12
            12 "USA" "J_AGG" "B" "J_AGG_B" 14 14 14
            13 "AT"  "I_IT"  "A" "I_IT_A"   1  1  1
            14 "AT"  "I_IT"  "B" "I_IT_B"   2  2  2
            15 "AT"  "I_CT"  "A" "I_CT_A"   3  3  3
            16 "AT"  "I_CT"  "B" "I_CT_B"   4  4  4
            17 "AT"  "I_AGG" "A" "I_AGG_A"  4  4  4
            18 "AT"  "I_AGG" "B" "I_AGG_B"  6  6  6
            19 "AT"  "J_IT"  "A" "J_IT_A"   5  5  5
            20 "AT"  "J_IT"  "B" "J_IT_B"   6  6  6
            21 "AT"  "J_CT"  "A" "J_CT_A"   7  7  7
            22 "AT"  "J_CT"  "B" "J_CT_B"   8  8  8
            23 "AT"  "J_AGG" "A" "J_AGG_A" 12 12 12
            24 "AT"  "J_AGG" "B" "J_AGG_B" 14 14 14
            end

            Comment


            • #7
              Thanks for the data example. It doesn't bite here, but among our other requests is to tell us in every thread you start if you are using an out-of-date version of Stata.

              Sorry; but wanting to add extra observations showing summary values evokes -- no other way of saying it except forcefully -- a No, no, no from Stata programmers of any experience (and "any experience" is defined by their saying this). It just complicates almost everything you do ever after.

              Also, as you say, holding different variables for different years is using a wide layout (format, structure): almost universally a long layout is preferable in Stata. (FWIW, people using other statistical software usually say the same.) I haven't addressed this in code.

              The way to hold summaries is alongside the data. And, sure, you hold them repeatedly, but you can always avoid that by tagging those you want to work with.

              This is my understanding of what you seek starting with the data example in #6.

              Code:
              . * don't want these!
              . drop if strpos(varcode, "AGG")
              (8 observations deleted)
              
              . encode varcode, gen(nvarcode)
              
              . recode nvarcode (1 3 = 1) (2 4 = 2) (5 7 = 3) (6 8 = 4), gen(varcode2)
              (12 differences between nvarcode and varcode2)
              
              . tab varcode varcode2
              
                         |             RECODE of nvarcode
                 varcode |         1          2          3          4 |     Total
              -----------+--------------------------------------------+----------
                  I_CT_A |         2          0          0          0 |         2
                  I_CT_B |         0          2          0          0 |         2
                  I_IT_A |         2          0          0          0 |         2
                  I_IT_B |         0          2          0          0 |         2
                  J_CT_A |         0          0          2          0 |         2
                  J_CT_B |         0          0          0          2 |         2
                  J_IT_A |         0          0          2          0 |         2
                  J_IT_B |         0          0          0          2 |         2
              -----------+--------------------------------------------+----------
                   Total |         4          4          4          4 |        16
              
              
              . egen wanted = total(_2014), by(country varcode2)
               
              . sort country varcode2 varcode
              .
              . list country varcode* _2014 wanted, sepby(country varcode2)
              
                   +-----------------------------------------------+
                   | country   varcode   varcode2   _2014   wanted |
                   |-----------------------------------------------|
                1. |      AT    I_CT_A          1       3        4 |
                2. |      AT    I_IT_A          1       1        4 |
                   |-----------------------------------------------|
                3. |      AT    I_CT_B          2       4        6 |
                4. |      AT    I_IT_B          2       2        6 |
                   |-----------------------------------------------|
                5. |      AT    J_CT_A          3       7       12 |
                6. |      AT    J_IT_A          3       5       12 |
                   |-----------------------------------------------|
                7. |      AT    J_CT_B          4       8       14 |
                8. |      AT    J_IT_B          4       6       14 |
                   |-----------------------------------------------|
                9. |     USA    I_CT_A          1       3        4 |
               10. |     USA    I_IT_A          1       1        4 |
                   |-----------------------------------------------|
               11. |     USA    I_CT_B          2       4        6 |
               12. |     USA    I_IT_B          2       2        6 |
                   |-----------------------------------------------|
               13. |     USA    J_CT_A          3       7       12 |
               14. |     USA    J_IT_A          3       5       12 |
                   |-----------------------------------------------|
               15. |     USA    J_CT_B          4       8       14 |
               16. |     USA    J_IT_B          4       6       14 |
                   +-----------------------------------------------+
              
              . egen tag = tag(country varcode2)
              
              . list country varcode2 wanted if tag, sepby(country varcode2)
              
                   +-----------------------------+
                   | country   varcode2   wanted |
                   |-----------------------------|
                1. |      AT          1        4 |
                   |-----------------------------|
                3. |      AT          2        6 |
                   |-----------------------------|
                5. |      AT          3       12 |
                   |-----------------------------|
                7. |      AT          4       14 |
                   |-----------------------------|
                9. |     USA          1        4 |
                   |-----------------------------|
               11. |     USA          2        6 |
                   |-----------------------------|
               13. |     USA          3       12 |
                   |-----------------------------|
               15. |     USA          4       14 |
                   +-----------------------------+
              Last edited by Nick Cox; 24 Feb 2019, 09:50.

              Comment


              • #8
                Nick, thank you so much! I really appreciate the time spent on this. Will be super helpful to me.

                Comment

                Working...
                X