Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Removing duplicates across several variables in panel data and keep the dup with non-missing values

    Hi

    As the title suggests, I am trying to remove duplicate values across several variables (c30) in a panel data set (sorted by firm id and year), so for example for Firm 3 there are two 2012 rows and one has observations for the variables and one has missing values.

    I have looked in a couple of other similar posts and in Nick Cox's notes on the topic in general but I have been unable to come up with a solution.

    The code I have tried is:

    Code:
    collapse (firstnm) `vbles', by(BvDIDnumber Year)

    This gave me an error message saying 'varlist required'.

    If I enter the names of all variables instead of 'vbles' the error persists.

    Apologies if I have missed something in the notes I mentioned, which are always extremely useful for various things I've done I'd liek to add.

    Thanks in advance,
    Paul

  • #2
    Where and how are you trying to define local macro

    Code:
    vbles
    ?

    Comment


    • #3
      Apologies - should have aded::

      Code:
      ds BvDIDnumber Year, not
      local vbles `r(varlist)'
      foreach v of varlist `vbles' {
          by BvDIDnumber Year (`v'), sort: assert inlist(`v', `v'[1], .)
      }

      Comment


      • #4
        OK, so you define it somewhere in your code, but it is evidently not visible later. The usual explanation for this is running your code in chunks, so that the definition is not visible to the use.

        The entire code for this shouldn't be very long, so please give it all and tell us more about how you are running the code.

        Comment


        • #5
          Code ran:

          Code:
          ds BvDIDnumber Year, not
          local vbles `r(varlist)'
          foreach v of varlist `vbles' {
              by BvDIDnumber Year (`v'), sort: assert inlist(`v', `v'[1], .)
          }
          
          
          
          collapse (firstnm) `vbles', by(BvDIDnumber Year)


          Taken from the following similar post:

          https://www.statalist.org/forums/for...missing-values

          Comment


          • #6
            I don't think I necessarily made clear but I am looking to keep the duplicate which has more nonmissing values.

            In some cases the duplicate to be dropped has missing values under all variables, in other cases there are some observations but less nonmissing values than the correct entry I would like to keep.

            Comment


            • #7
              I am looking to keep the duplicate which has more nonmissing values
              OK, but then you need code designed to achieve that.

              Code:
              ds BvDIDnumber Year, not
              egen nbad = rowmiss(`r(varlist)')
              bysort BvDIDnumber Year (nbad): keep if _n == 1

              Comment


              • #8
                Thanks so much Nick - the above worked perfectly. Much appreciated.

                Comment

                Working...
                X