Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Finding a match in one variable/column in a previous period time

    Hi,

    I am trying to generate a variable "Desired Output" that tells me if an observation "identification" on an ID basis (identification that might take value from 1 to n) at year 2010 is also present at 2009. In this case, I want STATA to return me the value "No changes", while if the observation is only present at year 2010 and not at year 2009 (means that has been deleted) I want the value "Added". This comparison has to be done for 2010 vs 2009 and 2011 vs 2010 and so on on a ID basis. Of course for the first year of observations per ID there is no output "na" and the first year might be 2009, 2010 or 2011 depending on the ID.
    ID Year Identification Desired Output
    3242 2009 1 na
    3242 2009 2 na
    3242 2010 1 No Changes
    3242 2010 2 No Changes
    3242 2010 3 Added
    3242 2011 3 No Changes
    3242 2011 4 Added
    5566 2010 2 na
    5566 2011 1 Added
    5566 2011 3 Added
    5566 2012 1 No Changes
    5566 2012 3 No Changes
    5566 2012 4 Added
    Thank you

    Best
    Alberto





  • #2
    I'm not sure I fully understand what you are asking for, but the following works in your example data:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int(id year) byte identification str10 desiredoutput
    3242 2009 1 "na"        
    3242 2009 2 "na"        
    3242 2010 1 "No Changes"
    3242 2010 2 "No Changes"
    3242 2010 3 "Added"     
    3242 2011 3 "No Changes"
    3242 2011 4 "Added"     
    5566 2010 2 "na"        
    5566 2011 1 "Added"     
    5566 2011 3 "Added"     
    5566 2012 1 "No Changes"
    5566 2012 3 "No Changes"
    5566 2012 4 "Added"     
    end
    
    isid id identification year, sort
    by id identification (year): gen added = cond(_n == 1, "Added", "No Changes")
    by id (year), sort: replace added = "na" if year == year[1]
    
    assert added == desiredoutput
    In the future, when showing data examples, please use the -dataex- command to do so. If you are running version 15.1 or a fully updated version 14.2, it is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    When asking for help with code, always show example data. When showing example data, always use -dataex-.

    Comment


    • #3
      Thank you so much Clyde. The code you suggested is working perfectly.
      Next time, I'll use -datex- to show example data.

      Comment


      • #4
        Using the same data example, I have one more question.

        I would like to generate one more variable named "Deleted" that will tell me if the identifications at year 2009 will be deleted at year 2010 or not (this case is the other way round compared to the previous case so 2009vs2010 and not 2010vs2009). I want STATA to return me the value "Deleted", if the observation is only present at year 2009 and not at year 2010 (means that will be deleted) and "No changes" if will be present at year 2010. (E.g. id 3242 at year 2010 we have identifications 1, 2 that will be deleted at year 2011)

        This comparison has to be done for 2009 vs 2010 and 2010 vs 2011 and so on a ID basis. In this case there should be value for the first year of observations and not for the last year of observations.

        Thanks in advance.

        Alberto

        Comment


        • #5
          It's nearly the same code, with 1 replaced by _N:

          Code:
          isid id identification year, sort
          by id identification (year): gen deleted = cond(_n == _N, "Deleted", "No Changes")
          by id (year), sort: replace deleted = "na" if year == year[_N]

          Comment

          Working...
          X