Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Remove same/identical values in a row and keep non-identical values

    I want to remove the same values from a set of different string variables in a row regarding the same person (panel data) e.g. person 1:
    Var1 Var2 Var3 Var4
    sunsine rain sunshine rain

    I want Var1= sunshine and Var2= rain to be kept in the dataset,
    "sunshine" and "rain" need to be removed in Var3 and Var4 and need to become empty cells.

    Please help asap! Thanks in advance!

  • #2
    On the assumption that your data also contains a person ID variable (id) and that it uniquely identifies observations in your data set:

    Code:
    reshape long Var, i(id) j(seq)
    by id Var, sort: keep if _n == 1
    by id (seq): replace seq = _n
    reshape wide
    If your observations are not identified by a single id variable, replace id in the above code by whatever combination of variables does uniquely identify observations. If there are no such variables, then create a unique identifier with:
    Code:
    gen `c(obs_t)' id = _n
    If the real names of your variables are not Var1 through Var4, temporarily -rename- them to that, and then after you are done, you can -rename- them back.

    In the future, to avoid so many "ifs," and get a clean, clear response, always show example data when asking for help with code. Verbal descriptions of data are seldom adequate for writing code. And to be sure that your example data is usable for this purpose, it should be posted using the -dataex- command. If you are running version 16 or later, or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Comment


    • #3
      See also https://www.stata.com/support/faqs/d...tinct-strings/

      Although the focus is there on counting distinct values, the coding principles and examples extend to keeping track of those distnct values.

      Comment


      • #4
        This problem interested me, particularly to see if I could solve it in a way that is not "Stata-ish," but which might be "natural" in some other language. (I recall the dictum, not applicable to me, that "A good Fortran programmer can write Fortran code in any language.") So, for amusement, here's what I came up with, which I believe works. Corrections, shortenings, and improvements welcome.

        Code:
        // Make example data.
        clear
        set obs 5
        local nvars = 7
        local possible = "something this that other"
        set seed 5876
        forval i = 1/`nvars' {
           gen str Var`i' = word("`possible'", runiformint(1, wordcount("`possible'")))
        }
        //
        // Actual algorithm.
        //
        list
        local vm1 = `nvars' - 1
        forval i = 1/`vm1' {
           // Make sure current variable i is not blank.
           forval j = `=`i' + 1'/`vm1'{
              qui replace Var`i' = Var`=`j' + 1' if  Var`i' == ""
           }
           // Blank out any following variable that duplicates what's in the current variable i.
              forval next = `=`i' + 1'/`nvars' {
              qui replace Var`next' = "" if (Var`i' == Var`next')
           }   
        }
        // Drop variables that end up completely blank, which is prettier but perhaps unnecessary.
        forval i = 1/`nvars' {
           qui count if Var`i' == ""
           if (r(N) == _N) drop Var`i'
        }
        list

        Comment

        Working...
        X