Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Arithmetic functions

    Good afternoon

    I need assistance please,

    I have a dataset where variables such as heights are recorded three times for participants.
    So for example H1 = 174 H2 = 176 H3 =175.3
    Through eyeballing the data i also noticed that there are instances where one measurement is way off from the other for example H1 = 174 H2 = 90 H3 = 175.5
    So in this case H2 = 90 would need to be queried

    So i need assistance with listing or browsing heights that are "far off" from the example i made.

    I tried br H1 H2 H3 if H1 > H2 > H3

    but this did not help as some results are still reasonable.

    Thanks
    Last edited by Kanyo Ledi; 24 Jul 2023, 04:30.

  • #2
    Tag the observations where the absolute difference greater than some threshold for each pair:
    Code:
    gen tag =  abs(H1 - H2) > 10 | abs(H1 - H3) > 10  | abs(H2 - H3) > 10
    list if tag ==1

    Comment


    • #3
      Maybe compute some estimate of dispersion (egen would be your friend here), graphically look for extraordinary values of it in order to home in (hone in) on observations worthy of further inspection. Perhaps something like the following? (Begin at the "Begin here" comment; the top part is just to create an illustrative dataset.)
      Code:
      version 18.0
      
      clear *
      
      // seedem
      set seed 1339419994
      
      quietly set obs 1000
      
      forvalues i = 1/3 {
          generate double var`i' = rnormal(175, cond(runiform() < 0.99, 6, 60))
      }
      
      *
      * Begin here
      *
      egen double vsd = rowsd(var?)
      gsort -vsd
      format v* %3.0f
      generate int obs = _n
      
      // Eyeball this
      set more on
      graph twoway line vsd obs, lcolor(black) scheme(s2color) ylabel( , angle(horizontal) nogrid)
      more
      
      // Then zoom in in order to get a closer estimate of the knee
      graph twoway line vsd obs if vsd > 10, lcolor(black) scheme(s2color) ylabel( , angle(horizontal) nogrid)
      more
      
      // And then inspect
      list in 1/20, noobs separator(0)
      
      // An alternative measure of dispersion
      egen double min = rowmin(var?)
      egen double max = rowmax(var?)
      generate double del = max - min
      gsort -del
      format _all %3.0f
      quietly replace obs = _n
      
      // Work up analogously
      
      exit

      Comment


      • #4
        Thank you so much Scott and Joseph your solutions helped a lot.

        I appreciate it.

        Comment


        • #5
          A neat one-liner for the (observation-wise) median of three variables is the sum of all three MINUS the sum of the minimum and the maximum. This works in the presence of ties. So, if two out of three agree closely and one doesn't agree with either of the other two, the one not agreeing is some distance from the median.

          Code:
          gen median = (H1 + H2 + H3) - min(H1, H2, H3) - max(H1, H2, H3)
          If there are missing values, the result is missing too.

          Code:
          foreach v in H1 H2 H3 { 
          gen `v'_off = abs(median - `v') > 50 & `v' < . 
          }
          will give you indicators for each variable being more than 50 from the median, and choose your own alternative to 50.

          Naturally there are also egen functions as helpfully flagged by Joseph Coveney , but this one-liner deserves to be better known.

          A gotcha featured in #1 but escaped comment hitherto.

          Code:
          if H1 > H2 > H3
          almost never does what you want, whether it appears as a command or a qualifier. It's evaluated from left to right as if you typed

          Code:
          if (H1 > H2) > H3
          from which there are two possibilities

          Code:
          if 1 > H3
          and

          Code:
          if 0 > H3 
          and usually neither is what you want from that code. This often bites even fairly experienced users and quite often appears directly -- and even more often indirectly -- on this forum. See also https://journals.sagepub.com/doi/10....6867X231162009







          Comment

          Working...
          X