Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Drop/ keep observations based on values of different variables

    Hello,

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long ID double ID_initial float pre_creation_date
    500002 500001 18129
    500011 500013 17615
    500012 500013 17615
    500013 500015 19080
    500015 500013 17615
    500025 500027 18568
    500028 500027 18568
    500031 500033 17974
    500033 500031 18385
    500035 500036 18297
    500036 500035 18228
    500038 500036 18297
    500062 500063 17615
    500071 500074 18352
    500076 500078 18172
    500078 500076 18850
    500079 500078 18172
    500083 500084 17630
    500084 500083 18071
    500086 500087 18939
    500087 500088 17632
    500089 500090 18196
    500091 500090 18196
    500095 500094 18928
    500096 500094 18928
    500098 500099 18053
    500099 500100 17755
    500103 500102 17647
    500112 500114 18161
    500114 500112 18518
    500116 500114 18161
    500117 500118 17668
    500120 500121 18246
    500121 500124 18071
    500128 500124 18071
    500131 507599 18175
    500145 500121 18246
    500171 500172 18246
    500173 500172 18246
    500175 500176 18556
    500203 500205 18646
    500204 500205 18646
    500217 500205 18646
    500225 500227 18277
    500228 500227 18277
    500229 500227 18277
    500234 500235 18109
    500236 500235 18109
    500237 500235 18109
    500581 500578 19023
    500582 500578 19023
    500583 500586 18219
    500586 500587 18219
    500587 500586 18219
    500589 500590 18959
    500593 500594 18108
    500594 500593 17884
    500625 500624 17974
    500628 500674 18032
    500673 500674 18032
    500675 500674 18032
    500681 500578 19023
    500683 500685 19060
    500684 500685 19060
    500685 500687 18870
    500687 500685 19060
    500688 500687 18870
    500689 500687 18870
    500701 500703 18382
    500702 500703 18382
    500704 500703 18382
    500705 500703 18382
    500720 500721 18876
    500722 500721 18876
    500723 500721 18876
    500732 527002 18253
    500733 527002 18253
    500773 500772 18246
    500777 500779 18165
    500778 500779 18165
    500780 500779 18165
    500781 500779 18165
    500809 500811 18927
    500810 500811 18927
    500812 500811 18927
    500814 500815 18704
    500816 500815 18704
    500826 500875 17821
    500856 500858 18234
    500857 500858 18234
    500859 500858 18234
    500860 500861 19033
    500862 500861 19033
    500863 500861 19033
    500870 500872 18382
    500871 500872 18382
    500873 500872 18382
    500876 500875 17821
    500877 500875 17821
    500882 500885 18102
    end
    format %td pre_creation_date
    Here is what I am trying to ask Stata to do:

    Look at the variable "ID", and if the same value of this variable appears in any observation in "ID_initial", then look at "pre_creation_date" variable and drop the observation with higher "pre_creation_date".

    For example:
    ID is "500013" in the forth observation, and it also appears in "D_initial" as the second, third and fifth observation:

    500002 500001 18129
    500011 500013 17615
    500012 500013 17615
    500013 500015 19080 drop because 19080 > 17615
    500015 500013 17615

    Any advice?



  • #2
    If, in your example, instead of 19080 the value had been 17600. Then 19080 < 17600. But there would now be three observations with ID_initial = 50013 having a later date than the date for the observation with ID_500013. Which of those three should be dropped? All of them?

    Also, in your example data, the value of pre_creation_date is always the same for all observations having a given value of ID_initial. This seems to be necessary for what you are trying to do to even make sense. But is it true throughout your data?

    Comment


    • #3
      Hi @Clyde,
      The context of my data is that "ID_initial" shows the house ID of a household that joined a program, "ID" is the houshold ID of their neighbors, and "pre_creation_date" is the date when "ID_initial" joined the program. I am looking at the peer effect, and I want to make sure that if two neighbors joined the program on different dates (such as 500013 and 500015), I want to drop the latter neighbor (the fourth observation: 500013 500015 19080) because it will not indicate the peer effect as the neighbor "500013" already joined the program in an earlier time.

      Please see my response below:

      "If, in your example, instead of 19080 the value had been 17600. Then 19080 < 17600. I think you mean 17600 < 17615 . But there would now be three observations with ID_initial = 50013 having a later date than the date for the observation with ID_500013. Which of those three should be dropped? All of them?" No.
      With your example above, I think my precdure will do what I want. because I want only to look at observations with interchangeable ID and ID_initial "500013 500015 19080" and "500015 500013 17615"


      Also, in your example data, the value of pre_creation_date is always the same for all observations having a given value of ID_initial. This seems to be necessary for what you are trying to do to even make sense. But is it true throughout your data? Yes, It is true throughout my data

      Many thanks,
      ​​​​​​​Kareman

      Comment


      • #4
        I'm sure there's a better way, but see if this does what you need.

        Code:
        preserve
            collapse (mean) pre_creation_date , by(ID_initial)
            ren pre_creation_date pcd_check
            ren ID_initial ID
            save pcd_check, replace
        restore
        
        joinby ID using pcd_check , unmatched(master) _merge(_merge_pcd)
        g drop = pre_creation_date>pcd_check

        Comment


        • #5
          Hi George Ford,

          This does exactly what I need.

          Thanks a lot and have a good day!

          Comment

          Working...
          X