Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Looping over another column and returning part of matching string

    Hello everyone,

    I have a following table:
    Material Objekt
    0001
    0002
    0003
    _uht9_0001
    __lpo870003
    and a command:

    gen Material_match = ""

    foreach i in Objekt{
    foreach j in Material {
    replace Material_match = "`j'" if strpos(Objekt, "`j'") > 0
    }
    }
    I trying to match Objekt if it contains Material and display that Material
    Material Objekt Material_match
    0001
    0002
    0003
    _uht9_0001 0001
    __lpo870003 0003
    Jet the command does not produce any results. Could anyone please help me to get on the right track.

    Sincerely,

    Pavlo

  • #2
    With a reasonably sized dataset:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str11(Material Objekt)
    "0001" ""           
    "0002" ""           
    "0003" ""           
    ""     "_uht9_0001" 
    ""     "__lpo870003"
    end
    
    levelsof Material, local(Materials) sep(|) clean
    g wanted= regexs(0) if regexm(Objekt,  "(`Materials')")
    Res.:

    Code:
    . l
    
         +---------------------------------+
         | Material        Objekt   wanted |
         |---------------------------------|
      1. |     0001                        |
      2. |     0002                        |
      3. |     0003                        |
      4. |             _uht9_0001     0001 |
      5. |            __lpo870003     0003 |
         +---------------------------------+

    Comment


    • #3
      Let's see why your loop code was legal but did nothing.

      Code:
      foreach i in Objekt {
           foreach j in Material {
                replace Material_match = "`j'" if strpos(Objekt, "`j'") > 0
           }
      }
      boils down to a single statement:

      Code:
      replace Material_match = "Material" if strpos(Objekt, "Material") > 0
      which raises a minor and a major point. The minor point is that you may have intended one reference to `j' to be a reference to `i'.

      The major point is a misconception about this loop syntax which I have often seen, which may arise from what loops do in some other software, or perhaps just from wishful thinking.

      The expectation here on your part, I think, is that Stata will see a reference to a variable name and spring into action looping over the distinct values of that variable. But that is not at all what foreach ... in ... does here. You can loop over (for example) the distinct levels of a variable, but you need to supply the list of such values yourself. Each foreach loop here in your code sees one item and acts literally on that item. .

      Comment


      • #4
        Thank you for the reply. However as Andrew pointed out his solution applies to smaller datasets. I have over 350000 observations and it won't do. Nicks Code suggests to return a word "Material" instead of Material value itself.

        Comment


        • #5
          It's your code, not mine. I don't suggest it because it is not what you want.

          Comment


          • #6
            Ah, sorry Nick, i misunderstood it.

            Comment


            • #7
              I could explain why your loops didn't do what you wanted, but I am not clear that I understand your problem. It seems to consist of looking for matches in the last 4 characters of Objekt in a list of values of Material. If so, check out

              https://www.stata.com/support/faqs/d...s-for-subsets/

              Comment


              • #8
                Thank you for the suggestion, but i do not see how merge command could solve an issue here. Let me explain the problem once again.

                I have two .dta datasets:
                Material
                0001
                0002
                0003
                and
                Objekt
                _uht9_0001lp
                __lpo870003rm
                __jofd0069tz
                i would wike to merge 1st dataset into second if Objekt contains Material in itself and show that Material so that:
                Objekt merged_material
                _uht9_0001lp 0001
                __lpo870003rm 0003
                __jofd0069tz
                In my very first post i appended two datasets, as i assumed that it would be easier to do the match in such structured way.

                However the dataset is quite large, more than 100000 observations in each dataset.
                Last edited by Paul Shlapa; 21 Jul 2022, 02:10.

                Comment


                • #9
                  You can cross the datasets and then keep matches, but that will give you 100,000 \(\times\) 100,000 = 10 billion observations to start with. Otherwise, a looping solution could be:

                  Code:
                  use Material, clear
                  merge 1:1 _n using Objekt
                  This will give you something similar to:
                  Code:
                  * Example generated by -dataex-. For more info, type help dataex
                  clear
                  input str11(Material Objekt)
                  "0001" "_uht9_0001"            
                  "0002" "__lpo870003"          
                  "0003" "_lpo870003rm"
                  "0004"  "_uht9_0001lp"
                  "0005"    "__jofd0069tz"
                  end
                  
                  gen wanted=""
                  forval i= 1/`=_N'{
                      local material "`=Material[`i']'"
                      qui replace wanted= "`material'" if regexm(Objekt, "`material'") & missing(wanted)
                  }
                  drop Material
                  Res.:

                  Code:
                  . l
                  
                       +----------------------+
                       |      Objekt   wanted |
                       |----------------------|
                    1. |  _uht9_0001     0001 |
                    2. | __lpo870003     0003 |
                    3. | _lpo870003r     0003 |
                    4. | _uht9_0001l     0001 |
                    5. | __jofd0069t          |
                       +----------------------+

                  Comment

                  Working...
                  X