Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Compare Observation Values

    Hi all,
    I have three variables V1, V2 and V3.
    Essentially, I want to check if the observations across these variables are the same.

    The code to do this is:
    egen diffcheck = diff(V1 V2 V3)

    The issue is this function doesn't factor for missing variables
    i.e. if the values are
    V1 = 3
    V2 = 3
    V3 = . (missing)

    It will tell me that these values are different. I want to factor for missing values.

    This is the simplified case to this problem. The table is much larger and it is much more than three variables being compared at a time. Once I have the problem solved in the simplest case, I will easily be able to generalize it to the larger problem.

    Please advise.

  • #2
    What does "factor for missing values" mean? Do you mean that 3 3 and missing. are not different because you are happy to ignore the missing? What about 3 missing and missing?

    Comment


    • #3
      Yup that's correct.

      V1 = 3
      V2 = .
      V3 = .

      This shouldn't tell me there's a difference. I want to ignore missing values entirely.

      Comment


      • #4
        If the min() and the max() are the same and not missing then that is what you want. min() and max() will ignore missing values to the extent possible.

        Study the results of this:

        Code:
        clear
        input x y z 
        3 3  3
        3 3 . 
        3 . . 
        . . . 
        3 666 3 
        end 
        
        gen max = max(x, y, z) 
        gen min = min(x, y, z) 
        
        list if min == max 
        
        list if min == max & max < . 
        
        list if min == max & max == . 
        
        list if min != max

        Comment


        • #5
          So here's some code worked out for a toy data set. You will need to modify it to accommodate the actual names of the variables involved in yours.

          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input float(a b c d e f)
           4  4  .  4  4  4
          10 10 10 10  . 10
          17 17 13 17  9 17
           4  4  4  4  4  4
          11 11 11 11 11 11
           4  4  7  4  4  4
          15 11 15 15 15 15
           4 18  4  4  9  4
          10 10 10 10 10  8
          14 14  8  . 14 14
          end
          
          //    RESHAPE DATA TO LONG
          rename a-f v_=
          gen long obs_no = _n
          reshape long v_, i(obs_no) j(vname) string
          
          //    CHECK FOR AGREEMENT AMONG NON-MISSING VALUES IN OBS
          //        FIRST IDENTIFY AGREEMENT OF EACH OBSERVATION WITH SMALLEST VALUE, OR BEING MISSING
          by obs_no (v_), sort: gen byte agree = missing(v_) |(v_ == v_[1])
          //        NOW OBSERVATION WIDE-AGREEMENT IF ALL AGREE
          by obs_no (agree), sort: replace agree = agree[1]
          reshape wide
          rename v_* *
          Crossed with #2, 3, and 4. Nick's approach is far more elegant than mine, though it may be somewhat difficult to implement if the number of variables involved is large, as one must create a comma-separated list of them. My approach is more clunky in execution but is easier to implement if the number of variables is large, as you need only create an appropriate varlist of them to include in the -rename- command where I have "a-f". Presumably if the number of variables is large, some appropriate use of wildcards can accomplish that.
          Last edited by Clyde Schechter; 21 Dec 2016, 13:09.

          Comment


          • #6
            For many variables, the min() and max() functions become less convenient than their egen relatives rowmin() and rowmax().

            Comment


            • #7
              Hi,
              Clyde's method seems more attractive, as it will be able to handle dates which are included in this process.

              I'm just having problems recording the first line to my dataset. I want to make a varlist to store the variables and then rename that, but the code isn't working.

              local varlis V1 V2 V3
              rename "`varlis'" v_=
              gen long obs_no = _n
              reshape long v_, i(obs_no) j(vname) string

              What am I doing wrong in the first two lines?

              Comment


              • #8
                If you want to compare dates, they are numeric too, so absolutely no difference arises between Clyde's code and mine on that score.
                (Conversely, if you are trying to compare string dates, don't do that: convert to numeric.)

                I don't see any scope for using " " in rename

                Code:
                rename (V*) (v_*)
                is one way to

                Code:
                rename (V1 V2 V3) (v_1 v_2 v_3)

                Comment

                Working...
                X