Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Assert whether observations are similar across 3 variables

    Hello,

    I have a small question regarding assert.

    I want to check how many observations in a data are similar across a set of variables. Data example:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str3 name float(var1 var2 var3)
    "Yes" 888 888 888
    "Yes"   2   1   2
    "Yes"   2   1   2
    "Yes"   4   4   4
    "Yes"   0   1   1
    "Yes"   0   1   1
    "Yes"   1   .   1
    "Yes"   0   1   1
    "Yes"   0   0   0
    "Yes"   0   .   1
    "Yes"   0   0   0
    end
    We can see that there are 4 observations that are similar across var1, var2, and var3 however when I use the assert command to check, I get the following message:

    Code:
    . assert var1 == var2 == var3
    11 contradictions in 11 observations
    assertion is false
    r(9);
    However, using the following code correctly identifies similar observations:

    Code:
    . assert (var1 == var2) & (var2 == var3)
    7 contradictions in 11 observations
    assertion is false
    r(9);
    What is the possible explanation behind why we can not directly assert 3 variables and instead have to use the "&" condition

  • #2
    This is nothing to do with assert as such, as the same puzzlement can arise from Stata's processing of composite logical expressions in other contexts. The problem for your interpretation is that the expression

    Code:
    var1 == var2 == var3
    is evaluated from left to right in steps, and so not simultaneously. Thus it is equivalent to
    Code:
    (var1 == var2) == var3
    from which it can be seen that there are two possibilities
    Code:
    1 == var3
    and
    Code:
    0 = var3
    and only exceptionally will this give the same results as your other code.

    For example, if
    var1 var2 var3 are all 1 then the intention behind writing var1 == var2 == var3 will be satisfied by accident. But any other constants won't give the desired result.

    Otherwise put, this is why parentheses are not just redundant or decorative. They can be needed to insist on the interpretation you want.

    More in the same vein at https://journals.sagepub.com/doi/pdf...6867X231162009

    Comment


    • #3
      Thank you Nick Cox ! that was super useful.

      Comment


      • #4
        You've posted some code at https://github.com/fahad-mirza/tag_o...tata/tree/main

        It seems possible to approach the problem more concisely. If all the variables in a list are equal, then they are all equal to the first variable. It's safe to assume equality as our first guess but change our mind if we find an exception.

        In your example you are checking whether var1 var3 var5 are equal.

        Code:
        * Loading example data
            * Example generated by -dataex-. For more info, type help dataex in Stata
            clear
            input str3 name float(var1 var2 var3 var4 var5)
            "Yes" 888 888 888 888 888
            "Yes"   2   1   2   1   2
            "Yes"   2   2   2   2   2
            "Yes"   4   4   4   4   4
            "Yes"   0   1   1   1   1
            "Yes"   0   1   1   1   1
            "Yes"   1   .   1   .   1
            "Yes"   0   1   1   1   1
            "Yes"   0   0   1   0   0
            "Yes"   0   .   1   .   1
            "Yes"   0   0   0   0   0
            end
            
        gen same = 1
        
        foreach v in var3 var5 {
            replace same = 0 if `v' != var1
        }
        Results
        Code:
        . l var1 var3 var5 if same
        
             +--------------------+
             | var1   var3   var5 |
             |--------------------|
          1. |  888    888    888 |
          2. |    2      2      2 |
          3. |    2      2      2 |
          4. |    4      4      4 |
          7. |    1      1      1 |
             |--------------------|
         11. |    0      0      0 |
             +--------------------+
        
        .
        . l var1 var3 var5 if !same
        
             +--------------------+
             | var1   var3   var5 |
             |--------------------|
          5. |    0      1      1 |
          6. |    0      1      1 |
          8. |    0      1      1 |
          9. |    0      1      0 |
         10. |    0      1      1 |
             +--------------------+
        Last edited by Nick Cox; 16 Jul 2023, 05:58.

        Comment


        • #5
          Here is a quick egen function. Not much tested. If interested, store it appropriately, in directory _ of what adopath calls PLUS.


          Code:
          *! 1.0.0  NJC 16jul2023
          program _gsame
              version 8
              gettoken type 0 : 0
              gettoken g 0 : 0
              gettoken eqs 0 : 0
              syntax varlist [if] [in]
          
              capture confirm numeric var `varlist'
              if _rc {
                  capture confirm str var `varlist'
                  if _rc {
                      di as err "may not mix numeric and string variables"
                      exit 109
                  }
              }
          
              tempvar touse
              mark `touse' `if' `in'
          
              gettoken first others : varlist
          
              quietly {
                  gen byte `g' = 1  if `touse' /* ignore user-supplied `type' */
                  foreach v of local others {
                      replace `g' = 0 if `v' != `first'  & `touse'
                  }
              }
          
              if length("`varlist'") >= 72 {
                   note `g' : same on `varlist'
                   label var `g' "see notes"
              } 
              else label var `g' "same on `varlist'"
          end
          Last edited by Nick Cox; 16 Jul 2023, 08:35.

          Comment


          • #6
            Thank you for showing a more concise way of approaching this problem! I think I was just over complicating it in the github code. I guess my code on github takes inspiration from the graph codes that I write. Also, thank you for showing how it is done in a program format! Consider maybe releasing it as an extension in egen?

            Comment

            Working...
            X