Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dropping Variables with Multiple Conditions.

    Hi, Statalist.
    I'm trying to clean a dataset. This is a small snippet below...I deleted some variables to increase clarity in this post.
    [CODE]
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int conflict_id float year str10 start_date byte start_prec str10 start_date2 byte(start_prec2 ep_end) str10 ep_end_date byte ep_end_prec
    200 1946 "1946-07-18" 1 "1946-07-21" 2 1 "1946-07-21" .
    200 1947 ""           . ""           . . ""           .
    200 1948 ""           . ""           . . ""           .
    200 1949 ""           . ""           . . ""           .
    200 1950 ""           . ""           . . ""           .
    200 1951 ""           . ""           . . ""           .
    200 1952 "1946-07-18" 1 "1952-04-09" 1 1 "1952-04-12" .
    200 1953 ""           . ""           . . ""           .
    200 1954 ""           . ""           . . ""           .
    200 1955 ""           . ""           . . ""           .
    200 1956 ""           . ""           . . ""           .
    200 1957 ""           . ""           . . ""           .
    200 1958 ""           . ""           . . ""           .
    200 1959 ""           . ""           . . ""           .
    200 1960 ""           . ""           . . ""           .
    200 1961 ""           . ""           . . ""           .
    200 1962 ""           . ""           . . ""           .
    200 1963 ""           . ""           . . ""           .
    200 1964 ""           . ""           . . ""           .
    200 1965 ""           . ""           . . ""           .
    200 1966 ""           . ""           . . ""           .
    200 1967 "1946-07-18" 1 "1967-03-31" 3 1 "1967-10-16" .
    200 1968 ""           . ""           . . ""           .
    200 1969 ""           . ""           . . ""           .
    200 1970 ""           . ""           . . ""           .
    200 1971 ""           . ""           . . ""           .
    200 1972 ""           . ""           . . ""           .
    200 1973 ""           . ""           . . ""           .
    200 1974 ""           . ""           . . ""           .
    200 1975 ""           . ""           . . ""           .
    200 1976 ""           . ""           . . ""           .
    200 1977 ""           . ""           . . ""           .
    200 1978 ""           . ""           . . ""           .
    200 1979 ""           . ""           . . ""           .
    200 1980 ""           . ""           . . ""           .
    200 1981 ""           . ""           . . ""           .
    200 1982 ""           . ""           . . ""           .
    200 1983 ""           . ""           . . ""           .
    200 1984 ""           . ""           . . ""           .
    200 1985 ""           . ""           . . ""           .
    200 1986 ""           . ""           . . ""           .
    200 1987 ""           . ""           . . ""           .
    200 1988 ""           . ""           . . ""           .
    200 1989 ""           . ""           . . ""           .
    200 1990 ""           . ""           . . ""           .
    200 1991 ""           . ""           . . ""           .
    200 1992 ""           . ""           . . ""           .
    200 1993 ""           . ""           . . ""           .
    200 1994 ""           . ""           . . ""           .
    200 1995 ""           . ""           . . ""           .
    200 1996 ""           . ""           . . ""           .
    200 1997 ""           . ""           . . ""           .
    200 1998 ""           . ""           . . ""           .
    200 1999 ""           . ""           . . ""           .
    200 2000 ""           . ""           . . ""           .
    200 2001 ""           . ""           . . ""           .
    200 2002 ""           . ""           . . ""           .
    200 2003 ""           . ""           . . ""           .
    200 2004 ""           . ""           . . ""           .
    200 2005 ""           . ""           . . ""           .
    200 2006 ""           . ""           . . ""           .
    200 2007 ""           . ""           . . ""           .
    200 2008 ""           . ""           . . ""           .
    200 2009 ""           . ""           . . ""           .
    200 2010 ""           . ""           . . ""           .
    200 2011 ""           . ""           . . ""           .
    200 2012 ""           . ""           . . ""           .
    200 2013 ""           . ""           . . ""           .
    200 2014 ""           . ""           . . ""           .
    200 2015 ""           . ""           . . ""           .
    200 2016 ""           . ""           . . ""           .
    200 2017 ""           . ""           . . ""           .
    200 2018 ""           . ""           . . ""           .
    First, I need to clean the dataset to only include conflict_ids where there is conflict from 2000-2018. In other words, if there is ONLY conflict from the year 2000 and before, it should be eliminated. I then have to merge this dataset to another dataset where the variables of interest begin in 2000. Second, I would like to keep a record of which of the conflict_id were deleted.

    Originally, I was just inputting the following code, but it takes too long and also more prone to human error:
    Code:
    drop if conflict_id==200
    .

    Then, I tried the following code, but it only dropped the observations and not the complete cases, i.e. all of the conflict_id's that meet the conditions:
    Code:
    bysort conflict_id: drop if year<2000 & ep_end==1
    The variable ep_end designates when the conflict was over. When it begins, the variable is coded as zero. When it ends, it is coded as 1.

    I tried using an egen command to perhaps create a new variable, and then drop the cases, but what I put together was incorrect did not give me what I was looking for--
    Code:
    bysort conflict_id (year): egen good = total(year<2000 & ep_end==1)
    The data is a little tricky, because you can have cases where the conflict started way before 2000, but leads into 2000-2018. As a result, these cases should be kept and not dropped.




  • #2
    So, I think it is easier if you think of it in terms of which ones you want to keep, instead of which ones you want to drop. You want to keep only those that have an ep_end = 1 observation in some year between 2000 and 2018: what happens before 2000 is not relevant. (Your description is actually unclear if the criterion for keeping starts with year 2000 or 2001, but I'll go with 2000 to illustrate the approach.

    Code:
    by conflict_id, sort: egen keeper = max(inrange(year, 2000, 2018) & ep_end == 1)
    
    tab conflict_id if !keeper // REPORT WHICH ONES WILL BE DROPPED
    keep if keeper // NOW RETAIN ONLY THE ONES NEEDED

    Comment


    • #3
      Originally posted by Clyde Schechter View Post
      So, I think it is easier if you think of it in terms of which ones you want to keep, instead of which ones you want to drop. You want to keep only those that have an ep_end = 1 observation in some year between 2000 and 2018: what happens before 2000 is not relevant. (Your description is actually unclear if the criterion for keeping starts with year 2000 or 2001, but I'll go with 2000 to illustrate the approach.

      Code:
      by conflict_id, sort: egen keeper = max(inrange(year, 2000, 2018) & ep_end == 1)
      
      tab conflict_id if !keeper // REPORT WHICH ONES WILL BE DROPPED
      keep if keeper // NOW RETAIN ONLY THE ONES NEEDED
      Clyde,
      I'm sorry for the delayed reply, but I was woking with this problem for most of last week when I got the chance. Thank you!
      I played around with the code, including making
      Code:
      ep_end==0
      , but it still kept cases that should have been dropped. This is mainly due to how the dataset was originally arranged.

      I ended up figuring out another way by using other variables in the dataset. I will say that I found your response of thinking in terms of what I want to keep extremely helpful The tab was also useful. I hope you are well during this time.
      Last edited by Eli Aguado; 22 Mar 2020, 12:22. Reason: grammar

      Comment

      Working...
      X