Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Vertical (within-variable) concatenate/sum of strings

    Hi all,

    I have the following
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str14 region str31 datatype str30 localbrandname byte(rank2015 rank2024)
    "Western Europe" "Retail Value RSP excl Sales Tax" "Spar"         2 2
    "Western Europe" "Retail Value RSP excl Sales Tax" "Total"        . .
    "Western Europe" "Retail Value RSP excl Sales Tax" "Nah & Frisch" 1 1
    "Western Europe" "Retail Value RSP excl Sales Tax" "Billa Box"    3 .
    end
    I would like to store in a new string variable the rank&name of the first three (e.g. "1.name 2.name 3.name").

    i first tried looping through years something like

    Code:
    forval i = 2015/2024 {
        g cr5_list_`i'= ""
        local list1 = cond(inrange(rank`i', 1, 3), name)
        replace cr5_list_`i'= "`list1'"
    }
    but did not work ( I tried also with
    Code:
    if
    into
    Code:
    local list
    but not capable to make it works.

    Then I did the following:

    Code:
    foreach var of varlist rank* {
        tostring `var', g(`var'_str )
        }
    
    forval i = 2015/2024 {
    g list`i'=rank`i'_str + "."+ name if inrange(rank`i', 1, 3)
    }
    which resulted into

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str14 region str31 datatype str30 localbrandname byte rank2015 str32 list2015
    "Western Europe" "Retail Value RSP excl Sales Tax" "Spar"         2 "2.Spar"        
    "Western Europe" "Retail Value RSP excl Sales Tax" "Total"        . ""              
    "Western Europe" "Retail Value RSP excl Sales Tax" "Nah & Frisch" 1 "1.Nah & Frisch"
    "Western Europe" "Retail Value RSP excl Sales Tax" "Billa Box"    3 "3.Billa Box"   
    end
    But I do not understand how to 'concatenate' the three strings vertically.

    Many thanks for any advice you may have.

    F


  • #2
    Your description of the problem leaves much to the imagination. Does "name" refer to the variable localbrandname? Does rank refer to variable rank2015, or to variable rank2024 The data organization in the full data set is also not clear from your example. I've interpreted the problem as follows. Within each region#datatype combination, localbrandnames are ranked numerically, and you would like to calculate a concatenation of the localbrandnames ranked 1, 2, and 3 in year 2015 within each region#datatype. I assume that the ranking system has no ties. You may have had something different in mind. Hopefully you can adapt this code to your intention. If not, post back with a fuller explanation.

    Code:
    by region datatype (rank2015), sort: gen wanted = localbrandname[1] if _n == 1
    by region datatype (rank2015): replace wanted = wanted[_n-1] + "; " + localbrandname if inrange(_n, 2, 3)
    by region datatype (rank2015): replace wanted = wanted[_n-1] if _n > 3
    by region datatype (rank2015): replace wanted = wanted[_N]

    Comment


    • #3
      See also

      SJ-20-1 pr0071 . . . . Speaking Stata: Concatenating values over observations
      . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
      Q1/20 SJ 20(1):236--243 (no commands)
      discusses basic techniques for concatenating values of
      variables over observations, emphasizing simple loops
      that can be tuned to suit variants as desired

      https://journals.sagepub.com/doi/pdf...36867X20909698

      Comment


      • #4
        Clyde, many thanks, apologies for the unclear dataex, below a hopefully clearer set I am using:

        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input long(enc_country enc_subcategory enc_datatype) byte(rank2015 rank2016 rank2017 rank2018 rank2019 rank2020 rank2021 rank2022 rank2023 rank2024) str30 localbrandname
        1 1 1  .  .  .  .  .  .  .  .  .  0 "Total"            
        1 1 1  3  .  .  .  .  .  .  .  .  0 "Billa Box"        
        1 1 1  1  1  1  1  1  1  1  1  1  1 "Nah & Frisch"     
        1 1 1  2  2  2  2  2  2  2  2  2  2 "Spar"             
        1 1 2  .  .  .  .  .  .  .  .  .  0 "Total"            
        1 1 2  3  .  .  .  .  .  .  .  .  0 "Billa Box"        
        1 1 2  1  1  1  1  1  1  1  1  1  1 "Nah & Frisch"     
        1 1 2  2  2  2  2  2  2  2  2  2  2 "Spar"             
        1 2 1  .  .  .  .  .  .  .  .  .  0 "Total"            
        1 2 1  1  1  1  1  1  1  1  .  .  0 "Shell Select"     
        1 2 1  .  .  .  .  .  .  .  .  .  0 "Others"           
        1 2 1  7  7  7  6  6  6  6  8  .  0 "Merkur Inside"    
        1 2 1  2  2  2  2  2  2  2  2  2  1 "Eni shop"         
        1 2 1  5  5  5  4  4  3  3  3  3  3 "Billa Stop & Shop"
        1 2 1  3  3  3  3  3  4  4  4  4  4 "Viva"             
        1 2 1  4  4  4  5  5  5  5  5  5  5 "BP Shop"          
        1 2 1  6  6  6  7  8  8  7  6  6  6 "Spar Express"     
        1 2 1  .  .  .  8  7  7  8  7  7  7 "Billa Unterwegs"  
        1 2 1  .  .  .  .  .  .  . 13  9  8 "Viva Billa"       
        1 2 1  .  .  .  .  .  .  .  9  8  9 "Billa Now"        
        1 2 1 10 10  9 10 10 10  9 10 10 10 "Nah & Frisch"     
        1 2 1  8  8  8  9  9  9 10 11 11 11 "Avia Shop"        
        1 2 1  9  9 10 11 11 11 11 12 12 12 "MPreis"           
        1 2 2  .  .  .  .  .  .  .  .  .  0 "Others"           
        1 2 2  7  7  7  6  6  6  6  8  .  0 "Merkur Inside"    
        1 2 2  1  1  1  1  1  1  1  .  .  0 "Shell Select"     
        1 2 2  .  .  .  .  .  .  .  .  .  0 "Total"            
        1 2 2  2  2  2  2  2  2  2  2  2  1 "Eni shop"         
        1 2 2  5  5  5  4  4  3  3  3  3  3 "Billa Stop & Shop"
        1 2 2  3  3  3  3  3  4  4  4  4  4 "Viva"             
        1 2 2  4  4  4  5  5  5  5  5  5  5 "BP Shop"          
        end
        label values enc_country enc2_country
        label def enc2_country 1 "AT", modify
        label values enc_subcategory enc2_subcategory
        label def enc2_subcategory 1 "convstor", modify
        label def enc2_subcategory 2 "forec", modify
        label values enc_datatype enc2_datatype
        label def enc2_datatype 1 "value", modify
        label def enc2_datatype 2 "value_tax", modify
        The solution provided worked perfectly (already available in Nick's publication as suggested, sorry) (I slightly modified it to accomodate for the real dataset as per above).

        I realized I have missing for rank* so that I would like to avoid including the brandname from the strings cr3_list* I am generating.

        I tried the following:

        Code:
        forval i = 2015/2024 {
        bys enc_country enc_subcategory enc_datatype (rank`i') : gen cr3_list`i' = localbrandname[1] if _n == 1
        bys enc_country enc_subcategory enc_datatype  (rank`i'): replace cr3_list`i'= cr3_list`i'[_n-1] + "; " + localbrandname if inrange(_n, 2, 3) & rank`i'!=.
        bys enc_country enc_subcategory enc_datatype  (rank`i'): replace cr3_list`i'= cr3_list`i'[_n-1] if _n > 3 & !missing(cr3_list`i'[_n-1])
        bys enc_country enc_subcategory enc_datatype (rank`i'): replace cr3_list`i'= cr3_list`i'[_N]
        }
        but returns an empty variable for when there is a missing string.

        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input long(enc_country enc_subcategory enc_datatype) byte rank2024 str30 localbrandname str64 cr3_list2024
        1 1 1  1 "Nah & Frisch"      ""                                         
        1 1 1  2 "Spar"              ""                                         
        1 1 1  . "Total"             ""                                         
        1 1 1  . "Billa Box"         ""                                         
        1 1 2  1 "Nah & Frisch"      ""                                         
        1 1 2  2 "Spar"              ""                                         
        1 1 2  . "Total"             ""                                         
        1 1 2  . "Billa Box"         ""                                         
        1 2 1  1 "Eni shop"          "Eni shop; Shell Select; Billa Stop & Shop"
        1 2 1  2 "Shell Select"      "Eni shop; Shell Select; Billa Stop & Shop"
        1 2 1  3 "Billa Stop & Shop" "Eni shop; Shell Select; Billa Stop & Shop"
        1 2 1  4 "Viva"              "Eni shop; Shell Select; Billa Stop & Shop"
        1 2 1  5 "BP Shop"           "Eni shop; Shell Select; Billa Stop & Shop"
        1 2 1  6 "Spar Express"      "Eni shop; Shell Select; Billa Stop & Shop"
        1 2 1  7 "Billa Unterwegs"   "Eni shop; Shell Select; Billa Stop & Shop"
        1 2 1  8 "Viva Billa"        "Eni shop; Shell Select; Billa Stop & Shop"
        1 2 1  9 "Billa Now"         "Eni shop; Shell Select; Billa Stop & Shop"
        1 2 1 10 "Nah & Frisch"      "Eni shop; Shell Select; Billa Stop & Shop"
        1 2 1 11 "Avia Shop"         "Eni shop; Shell Select; Billa Stop & Shop"
        1 2 1 12 "MPreis"            "Eni shop; Shell Select; Billa Stop & Shop"
        1 2 1  . "Shell Select"      "Eni shop; Shell Select; Billa Stop & Shop"
        1 2 1  . "Merkur Inside"     "Eni shop; Shell Select; Billa Stop & Shop"
        1 2 1  . "Others"            "Eni shop; Shell Select; Billa Stop & Shop"
        1 2 1  . "Total"             "Eni shop; Shell Select; Billa Stop & Shop"
        1 2 2  1 "Eni shop"          "Eni shop; Shell Select; Billa Stop & Shop"
        1 2 2  2 "Shell Select"      "Eni shop; Shell Select; Billa Stop & Shop"
        1 2 2  3 "Billa Stop & Shop" "Eni shop; Shell Select; Billa Stop & Shop"
        1 2 2  4 "Viva"              "Eni shop; Shell Select; Billa Stop & Shop"
        1 2 2  5 "BP Shop"           "Eni shop; Shell Select; Billa Stop & Shop"
        1 2 2  6 "Spar Express"      "Eni shop; Shell Select; Billa Stop & Shop"
        1 2 2  7 "Billa Unterwegs"   "Eni shop; Shell Select; Billa Stop & Shop"
        end
        label values enc_country enc2_country
        label def enc2_country 1 "AT", modify
        label values enc_subcategory enc2_subcategory
        label def enc2_subcategory 1 "convstor", modify
        label def enc2_subcategory 2 "forec", modify
        label values enc_datatype enc2_datatype
        label def enc2_datatype 1 "value", modify
        label def enc2_datatype 2 "value_tax", modify

        I guess the issue is with the 3rd line, probably I should provide an 'else' condition? I tried but returns 'if may not be combined with by':

        Code:
        forval i = 2015/2024 {
        bys enc_country enc_subcategory enc_datatype (rank`i') : gen cr3_list2`i' = localbrandname[1] if _n == 1
        bys enc_country enc_subcategory enc_datatype  (rank`i'): replace cr3_list2`i'= cr3_list2`i'[_n-1] + "; " + localbrandname if inrange(_n, 2, 3) & rank`i'!=.
        bys enc_country enc_subcategory enc_datatype  (rank`i'): if _n > 3 & cr3_list2`i'[_n-1]!="" {replace cr3_list2`i'= cr3_list2`i'[_n-1]  
            }
            else if _n > 3 & cr3_list2`i'[_n-1]=="" {replace cr3_list2`i'= cr3_list2`i'[_n-2]  
            }
        bys enc_country enc_subcategory enc_datatype (rank`i'): replace cr3_list2`i'= cr3_list2`i'[_N]
        }
        thanks again

        Comment


        • #5
          Indeed, that is not going to work. It is illegal and it also confuses the quite different roles of the if command and the if qualifier, a point which is often misunderstood.

          https://journals.sagepub.com/doi/pdf...6867X231175349


          Code:
          SJ-23-2 st0721  .  When to use the if qualifier and when to use the if command
                  . . . . . . . . . . . . . . . . . . . .  N. J. Cox and C. B. Schechter
                  Q2/23   SJ 23(2):589--594                                (no commands)
                  discusses generally when you should use either the if
                  qualifier or an if command and specifically flags a
                  common pitfall in using the if command
          The pattern to follow is more likely to be splitting that very complicated construct into two commands:

          by <whatever> : replace <stuff> if <condition 1>

          by <whatever> : replace <stuff> if <condition 2>

          I've not tried testing anything.

          Comment


          • #6
            Yep, it worked thanks for the hint (below the code I applied):

            Code:
            forval i = 2015/2024 {
            bys enc_country enc_subcategory enc_datatype (rank`i') : gen cr3_list`i' = localbrandname[1] if _n == 1
            bys enc_country enc_subcategory enc_datatype  (rank`i'): replace cr3_list`i'= cr3_list`i'[_n-1] + "; " + localbrandname if inrange(_n, 2, 3) & rank`i'!=.
            bys enc_country enc_subcategory enc_datatype  (rank`i'): replace cr3_list`i'= cr3_list`i'[_n-1] if _n > 3 & cr3_list`i'[_n-1]!="" 
            bys enc_country enc_subcategory enc_datatype  (rank`i'): replace cr3_list`i'= cr3_list`i'[_n-2] if _n > 3 & cr3_list`i'[_n-1]=="" 
            bys enc_country enc_subcategory enc_datatype  (rank`i'): replace cr3_list`i'= cr3_list`i'[_n-3] if _n > 3 & cr3_list`i'[_n-1]=="" & cr3_list`i'[_n-2]==""  
            bys enc_country enc_subcategory enc_datatype (rank`i'): replace cr3_list`i'= cr3_list`i'[_N]
            }
            Now, I applied the same logic this time to have the first 5 ranked name:

            Code:
            forval i = 2015/2024 {
            bys enc_country enc_subcategory enc_datatype (rank`i') : gen cr5_list`i' = localbrandname[1] if _n == 1
            bys enc_country enc_subcategory enc_datatype  (rank`i'): replace cr5_list`i'= cr5_list`i'[_n-1] + "; " + localbrandname if inrange(_n, 2, 5) & rank`i'!=.
            bys enc_country enc_subcategory enc_datatype  (rank`i'): replace cr5_list`i'= cr5_list`i'[_n-1] if _n > 5 & cr5_list`i'[_n-1]!="" 
            bys enc_country enc_subcategory enc_datatype  (rank`i'): replace cr5_list`i'= cr5_list`i'[_n-2] if _n > 5 & cr5_list`i'[_n-1]=="" 
            bys enc_country enc_subcategory enc_datatype  (rank`i'): replace cr5_list`i'= cr5_list`i'[_n-3] if _n > 5 & cr5_list`i'[_n-1]=="" & cr5_list`i'[_n-2]=="" 
            bys enc_country enc_subcategory enc_datatype  (rank`i'): replace cr5_list`i'= cr5_list`i'[_n-4] if _n > 5 & cr5_list`i'[_n-1]=="" & cr5_list`i'[_n-2]=="" & cr5_list`i'[_n-3]==""  
            bys enc_country enc_subcategory enc_datatype  (rank`i'): replace cr5_list`i'= cr5_list`i'[_n-5] if _n > 5 & cr5_list`i'[_n-1]=="" & cr5_list`i'[_n-2]=="" & cr5_list`i'[_n-3]=="" & cr5_list`i'[_n-4]==""  
            bys enc_country enc_subcategory enc_datatype (rank`i'): replace cr5_list`i'= cr5_list`i'[_N]
            }
            But then I see that when there are less than 5 ranked observations the variable is empty:

            Code:
            * Example generated by -dataex-. For more info, type help dataex
            clear
            input long(enc_country enc_datatype enc_subcategory) str30 localbrandname byte rank2017 str64 cr3_list2017 str105 cr5_list2017
            1 1 1 "Nah & Frisch"       1 "Nah & Frisch; Spar"           ""                                                        
            1 1 1 "Spar"               2 "Nah & Frisch; Spar"           ""                                                        
            1 1 1 "Billa Box"          . "Nah & Frisch; Spar"           ""                                                        
            1 1 1 "Total"              . "Nah & Frisch; Spar"           ""                                                        
            1 1 2 "Eni shop"           2 "Shell Select; Eni shop; Viva" "Shell Select; Eni shop; Viva; BP Shop; Billa Stop & Shop"
            1 1 2 "Shell Select"       . "Shell Select; Eni shop; Viva" "Shell Select; Eni shop; Viva; BP Shop; Billa Stop & Shop"
            1 1 2 "Billa Stop & Shop"  5 "Shell Select; Eni shop; Viva" "Shell Select; Eni shop; Viva; BP Shop; Billa Stop & Shop"
            1 1 2 "Viva"               3 "Shell Select; Eni shop; Viva" "Shell Select; Eni shop; Viva; BP Shop; Billa Stop & Shop"
            1 1 2 "BP Shop"            4 "Shell Select; Eni shop; Viva" "Shell Select; Eni shop; Viva; BP Shop; Billa Stop & Shop"
            1 1 2 "Spar Express"       6 "Shell Select; Eni shop; Viva" "Shell Select; Eni shop; Viva; BP Shop; Billa Stop & Shop"
            1 1 2 "Billa Unterwegs"    . "Shell Select; Eni shop; Viva" "Shell Select; Eni shop; Viva; BP Shop; Billa Stop & Shop"
            1 1 2 "Viva Billa"         . "Shell Select; Eni shop; Viva" "Shell Select; Eni shop; Viva; BP Shop; Billa Stop & Shop"
            1 1 2 "Billa Now"          . "Shell Select; Eni shop; Viva" "Shell Select; Eni shop; Viva; BP Shop; Billa Stop & Shop"
            1 1 2 "Nah & Frisch"       9 "Shell Select; Eni shop; Viva" "Shell Select; Eni shop; Viva; BP Shop; Billa Stop & Shop"
            1 1 2 "Avia Shop"          8 "Shell Select; Eni shop; Viva" "Shell Select; Eni shop; Viva; BP Shop; Billa Stop & Shop"
            1 1 2 "MPreis"            10 "Shell Select; Eni shop; Viva" "Shell Select; Eni shop; Viva; BP Shop; Billa Stop & Shop"
            1 1 2 "Others"             . "Shell Select; Eni shop; Viva" "Shell Select; Eni shop; Viva; BP Shop; Billa Stop & Shop"
            1 1 2 "Shell Select"       1 "Shell Select; Eni shop; Viva" "Shell Select; Eni shop; Viva; BP Shop; Billa Stop & Shop"
            1 1 2 "Total"              . "Shell Select; Eni shop; Viva" "Shell Select; Eni shop; Viva; BP Shop; Billa Stop & Shop"
            1 1 2 "Merkur Inside"      7 "Shell Select; Eni shop; Viva" "Shell Select; Eni shop; Viva; BP Shop; Billa Stop & Shop"
            end
            label values enc_country enc2_country
            label def enc2_country 1 "AT", modify
            label values enc_datatype enc2_datatype
            label def enc2_datatype 1 "value", modify
            label values enc_subcategory enc2_subcategory
            label def enc2_subcategory 1 "convstor", modify
            label def enc2_subcategory 2 "forec", modify
            Here I tried with some additional conditions within the foreach routine above (e.g.
            Code:
            bys enc_country enc_subcategory enc_datatype  (rank`i'): replace cr5_list2`i'= cr5_list2`i'[_n-3] if _N < 5 & inrange(_n, 2, 4) & cr5_list2`i'[_n-1]=="" & cr5_list2`i'[_n-2]==""
            but nope.

            Actually I do not understand why with the coding after Nick's last hint does not work properly, since when _N<=5 _n>5 is never true.

            Thanks in advance for any further hint.

            Comment


            • #7
              I have not tried getting into the detail here but a reference to a string for an observation number that is zero or negative will always evaluate as empty string.

              Code:
              . sysuse auto, clear
              (1978 automobile data)
              
              . di make[0]
              
              
              . di make[-1]
              
              
              . di (make[0] == "")
              1
              I think you're on top of that, but just in case you're not.

              Comment


              • #8
                I think this will do the trick. Observations missing rank are skipped, and up to the first 5 are included, but no problem if there are fewer than 5 ranked. The only important constraint is that the ranks begin with 1 and go up consecutively with no ties and no ranks skipped--and the first command verifies that constraint is met, and aborts the process if it isn't.

                Code:
                by enc_country enc_datatype enc_subcategory (rank2017), sort: ///
                    assert rank2017 == _n | missing(rank2017)
                by enc_country enc_datatype enc_subcategory (rank2017): ///
                    gen wanted = localbrandname if _n == 1 & !missing(rank2017)
                by enc_country enc_datatype enc_subcategory (rank2017): replace wanted = wanted[_n-1] ///
                    + cond(!missing(rank2017) & _n <= 5,"; " + localbrandname, "") if _n > 1
                by enc_country enc_datatype enc_subcategory (rank2017): replace wanted = wanted[_N]
                By the way, you might want to familiarize yourself with the -cond()- function used in the third command: it implements your "if-else" idea in this context.

                Comment


                • #9
                  There was I thinking

                  Code:
                  . * this will only work in the future
                  . di fond("cond()", "Clyde")
                  0
                  See also

                  SJ-5-3 pr0016 . . Depending on conditions: a tutorial on the cond() function
                  . . . . . . . . . . . . . . . . . . . . . . . D. Kantor and N. J. Cox
                  Q3/05 SJ 5(3):413--420 (no commands)
                  tutorial on the cond() function
                  Depending on Conditions: A Tutorial on the Cond Function

                  Comment


                  • #10
                    Clyde and Nick,
                    thanks a lot, worked smoothly.

                    I just had to suppress the assert code line as the condition is not fulfilled (there are some cases where the brandname is repeated within the bys, although when duplicated the rank* values are non-missing just for one). I checked and it still works perfectly.

                    Below the code I used, generalizing from Clyde's:

                    Code:
                    forval i = 2015/2024 {
                    *by enc_country enc_datatype enc_subcategory enc_price enc_exchrate (rank`i'), sort: assert rank`i'== _n | missing(rank`i')
                    bys enc_country enc_datatype enc_subcategory enc_price enc_exchrate (rank`i'): gen cr5_list`i' = localbrandname if _n == 1 & !missing(rank`i')
                    bys enc_country enc_datatype enc_subcategory enc_price enc_exchrate (rank`i'): replace cr5_list`i' = cr5_list`i'[_n-1] + cond(!missing(rank`i') & _n <= 5,"; " + localbrandname, "") if _n > 1
                    bys enc_country enc_datatype enc_subcategory enc_price enc_exchrate (rank`i'): replace cr5_list`i' = cr5_list`i'[_N]
                    }
                    Cheers

                    F

                    Comment


                    • #11
                      I just had to suppress the assert code line as the condition is not fulfilled (there are some cases where the brandname is repeated within the bys, although when duplicated the rank* values are non-missing just for one). I checked and it still works perfectly.
                      Well, what about that one case where brandname was repeated and the duplicate rank* values were non-missing? I'm pretty sure you got wrong results there. At a minimum, if that non-missing rank was <= 5, then, at a minimum, the repeated brandname will also be repeated in the result. Also, if, for example, that repeated rank was 3, then instead of getting the brandnames ranked 1 through 5 concatenated in the result, you will get ranks 1 thorugh 4, with the rank 3 brand name appearing twice.

                      Finally, one should always be suspicious of data sets in which duplicated observations appear. Sometimes that is correct, but often it reflects a mistake in data management. And where one mistake is found, others may lurk yet unnoticed. I would scrutinize this situation to see why these duplicates are there, and if it isn't for a good reason, I would review all of the data management leading up to the creation of that data set to fix that, and fix any other errors found in the process.

                      Comment

                      Working...
                      X