Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Need help regarding merging and ranges!

    Hey all, this is my second time using stateliest as a way of solving problems in stata. I always try to solve the problems I encounter in stata by myself, but this one in particular has been bothering me for the last couple of days because I simply can't get around it. So here's the deal: I have two datasets. One of them contains data on the parliamentary elections for many countries:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str24 country_txt int iyear byte(imonth iday) str13 Electiontype double VoterTurnout long Population int fecha
    "Albania"    2005  7  3 "Parliamentary" .4873   3563112 16620
    "Albania"    2009  6 28 "Parliamentary" .5077   3194417 18076
    "Albania"    2013  6 23 "Parliamentary" .5331   3011405 19532
    "Albania"    2017  6 25 "Parliamentary" .4676   3047987 20995
    "Argentina"  2001 10 14 "Parliamentary" .7521  37694169 15262
    "Argentina"  2005 10 23 "Parliamentary" .7094  39537943 16732
    "Argentina"  2007 10 28 "Parliamentary" .7313  40301927 17467
    "Argentina"  2009  6 28 "Parliamentary" .7239  40913584 18076
    "Argentina"  2011 10 23 "Parliamentary" .7939  41769726 18923
    "Argentina"  2013 10 27 "Parliamentary" .7717  42610981 19658
    "Argentina"  2015 10 25 "Parliamentary" .8107  43431886 20386
    "Argentina"  2017 10 22 "Parliamentary" .7674  44293293 21114
    "Argentina"  2019 10 27 "Parliamentary" .8094  45089492 21849
    "Australia"  2001 11 10 "Parliamentary" .9485  19294257 15289
    "Australia"  2004 10  9 "Parliamentary" .9432  19913144 16353
    "Australia"  2007 11 24 "Parliamentary" .9476  20434176 17494
    "Australia"  2010  8 21 "Parliamentary" .9322  21515754 18495
    "Australia"  2013  9  7 "Parliamentary" .9323  22262501 19608
    "Australia"  2016  7  2 "Parliamentary" .9101  22992654 20637
    "Australia"  2019  5 18 "Parliamentary" .9189  23705630 21687
    "Austria"    2002 11 24 "Parliamentary" .8427   8150835 15668
    "Austria"    2006 10  1 "Parliamentary" .7849   8192880 17075
    "Austria"    2008  9 28 "Parliamentary" .7881   8205533 17803
    "Austria"    2013  9 29 "Parliamentary" .7491   8562634 19630
    "Austria"    2017 10 15 "Parliamentary"    .8   8754413 21107
    "Austria"    2019  9 29 "Parliamentary" .7559   8828456 21821
    "Bangladesh" 2001 10  1 "Parliamentary" .7497 134477534 15249
    "Belgium"    2003  5 18 "Parliamentary" .9163  10274595 15843
    "Belgium"    2007  6 10 "Parliamentary" .9108  10392226 17327
    "Belgium"    2010  6 13 "Parliamentary" .8922  10423493 18426
    end
    format %tdnn/dd/CCYY fecha
    label var country_txt "country_txt" 
    label var iyear "iyear" 
    label var imonth "imonth" 
    label var iday "iday" 
    label var Electiontype "Electiontype" 
    label var VoterTurnout "VoterTurnout" 
    label var Population "Population" 
    label var fecha "fecha"
    The other dataset is actually extracted from the Global Terrorism Database (GTD) and has information on terrorist events for many countries as well, such as the date of the event, the number of fatalities and the number of wounded:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double eventid int iyear byte(imonth iday) int(fecha country) str19 country_txt int(nkill nwound)
    200312120002 2003 12 12 16051  5 "Albania"   0 0
    200907160013 2009  7 16 18094  5 "Albania"   0 0
    201312130034 2013 12 13 19705  5 "Albania"   0 0
    201403180067 2014  3 18 19800  5 "Albania"   0 0
    201408280064 2014  8 28 19963  5 "Albania"   1 2
    201502100033 2015  2 10 20129  5 "Albania"   0 0
    201502100034 2015  2 10 20129  5 "Albania"   0 0
    201502100035 2015  2 10 20129  5 "Albania"   0 0
    201502120047 2015  2 12 20131  5 "Albania"   0 0
    201605250048 2016  5 25 20599  5 "Albania"   0 0
    201608170042 2016  8 17 20683  5 "Albania"   0 0
    201710110028 2017 10 11 21103  5 "Albania"   0 0
    201903270026 2019  3 27 21635  5 "Albania"   0 0
    202010300014 2020 10 30 22218  5 "Albania"   0 0
    200304170002 2003  4 17 15812 11 "Argentina" 0 0
    200506060002 2005  6  6 16593 11 "Argentina" 0 0
    200506060003 2005  6  6 16593 11 "Argentina" 0 0
    200506060004 2005  6  6 16593 11 "Argentina" 0 0
    200908040016 2009  8  4 18113 11 "Argentina" 0 0
    201002030020 2010  2  3 18296 11 "Argentina" 0 0
    201005250003 2010  5 25 18407 11 "Argentina" 0 0
    201009160003 2010  9 16 18521 11 "Argentina" 0 0
    201012200006 2010 12 20 18616 11 "Argentina" 0 0
    201012300001 2010 12 30 18626 11 "Argentina" 0 0
    201112210025 2011 12 21 18982 11 "Argentina" 0 0
    201205010005 2012  5  1 19114 11 "Argentina" 0 0
    201205220021 2012  5 22 19135 11 "Argentina" 0 0
    201309060029 2013  9  6 19607 11 "Argentina" 0 0
    201309190036 2013  9 19 19620 11 "Argentina" 0 2
    201410120078 2014 10 12 20008 11 "Argentina" 0 0
    201511230095 2015 11 23 20415 11 "Argentina" 0 0
    201603050051 2016  3  5 20518 11 "Argentina" 0 2
    201608180060 2016  8 18 20684 11 "Argentina" 0 0
    201707050047 2017  7  5 21005 11 "Argentina" 0 0
    201707230050 2017  7 23 21023 11 "Argentina" 0 0
    201708020032 2017  8  2 21033 11 "Argentina" 0 0
    201807130047 2018  7 13 21378 11 "Argentina" 0 3
    201807130048 2018  7 13 21378 11 "Argentina" 0 0
    201808030053 2018  8  2 21398 11 "Argentina" 0 0
    201811140041 2018 11 14 21502 11 "Argentina" 0 1
    201811140042 2018 11 14 21502 11 "Argentina" 0 0
    201905090028 2019  5  9 21678 11 "Argentina" 2 0
    202009060035 2020  9  6 22164 11 "Argentina" 0 0
    202009060036 2020  9  6 22164 11 "Argentina" 0 0
    202009090027 2020  9  9 22167 11 "Argentina" 0 0
    202011060026 2020 11  6 22225 11 "Argentina" 0 1
    202012010034 2020 12  1 22250 11 "Argentina" 0 0
    200109230005 2001  9 23 15241 14 "Australia" 0 0
    200110150003 2001 10 15 15263 14 "Australia" 0 0
    200604080012 2006  4  8 16899 14 "Australia" 0 0
    200608010001 2006  8  1 17014 14 "Australia" 0 .
    200804210001 2008  4 21 17643 14 "Australia" 0 0
    200910080019 2009 10  8 18178 14 "Australia" 0 0
    201002040023 2010  2  4 18297 14 "Australia" 0 0
    201309020036 2013  9  2 19603 14 "Australia" 0 0
    201402240121 2014  2 24 19778 14 "Australia" 0 0
    201405180095 2014  5 18 19861 14 "Australia" 0 0
    201405180096 2014  5 18 19861 14 "Australia" 0 0
    201408120098 2014  8 12 19947 14 "Australia" 0 0
    201409230034 2014  9 23 19989 14 "Australia" 1 2
    end
    format %tdnn/dd/CCYY fecha
    label var eventid "eventid" 
    label var iyear "iyear" 
    label var imonth "imonth" 
    label var iday "iday" 
    label var fecha "fecha" 
    label var country "country" 
    label var country_txt "country_txt" 
    label var nkill "nkill" 
    label var nwound "nwound"
    I want to merge the two datasets, since I'm working on a dissertation about the effects of terrorism in voter turnout in the 21st century. In order to measure this effect, I'm trying to assign terrorist events to certain elections based on their dates. For example, if a terrorist event has occurred in March 11 2004, the closest elections are in March 14 2004, and the last elections were in 4 April 2000, the terrorist event would be assigned to the latest elections (March 14 2004). The elections dataset has many countries and elections for each country. The dataset on terrorist events has many countries and terrorist events for each country as well. Is there any way I can merge them and assign the attacks to specific elections? Any help is welcome, I'm really struggling with this.

    Thank you so much in advance!

  • #2
    Code:
    use `elections', clear
    gen `c(obs_t)' obs_no = _n
    
    rangejoin fecha . -1 using `terrorism', by(country_txt)
    
    by obs_no, sort: gen latency = fecha - fecha_U
    by obs_no (latency), sort: keep if _n == 1
    -rangejoin- is written by Robert Picard and is available from SSC. To use it, you must also install -rangestat-, by Robert Picard, Nick Cox, and Roberto Ferrer, also available from SSC.

    This code will associate each election with the preceding terrorist event in the same country that is closest to it in time. If there are multiple terrorist events on that same date in that country, all of those are retained.

    Comment


    • #3
      Thanks for the code! I am not with my computer at the moment so I can’t try it out. However, I see from your answer that the code will only associate each election with its preceding terrorist event in that country that is closest in time. In reality, there can be many terrorist attacks at different times before an election. Will the code associate every attack before the elections or only the closest in time?

      Comment


      • #4
        As written, only the closest in time--I had assumed that is what you want. If you want to associate it with all preceding terrorist attacks, just stop after the -rangejoin- command. Of course, that means that in some cases you could have an election associated to terrorist events that occurred before anyone alive at the election was born. But that's up to you.

        Comment


        • #5
          Sorry, I meant that terrorist events happen between elections. So if I have 5 elections for a specific country like Spain, in the years 2004, 2008, 2015, 2016 and 2019, and multiple terrorist events preceding each election, I’d like to assign every event that happened before 2004 to the elections in 2004, every event that happened between the elections of 2004 and 2008 to the elections in 2008; every event that happened between the elections of 2008 and 2015 to the elections of 2015, and so on

          Comment


          • #6
            Oh, sure! That makes sense!
            Code:
            use `elections', clear
            by country_txt (fecha), sort: gen previous_election_date = fecha[_n-1]
            rangejoin fecha previous_election_date -1 using `terrorism', by(country_txt)
            does it.

            Comment


            • #7
              Hey Clyde! Your code works! is there any way I could do the same thing, but instead of associating every terrorist event preceding an election, only associate those within a year of the elections?

              Comment


              • #8
                been trying to figure out the code by myself but can't seem to get it

                Comment


                • #9
                  The date variable fecha is denominated in days, so the key fact is that 1 year = 365 days.
                  Code:
                  use `elections', clear
                  rangejoin fecha -365 -1 using `terrorism', by(country_txt)

                  Comment


                  • #10
                    since I'm actually interested in an answer that takes account of leap years, I'm extending this to ask how one would use -rangejoin- to account for both "standard" and "leap" years in the same data set

                    Comment


                    • #11
                      You can extend the range to 366 days and the eliminate those that are exactly 366 days long but do not involve a leap year. Although it's conceptually simple, a detailed implication is a little bit complicated, because it could turn out that for some election, the only matched terrorism event occurs 366 days before but the range of dates does not span a 29 February. In that case, you want to eliminate all of the matched information, but you don't want to drop the observation because that would delete the election entirely. So I believe the following does it:

                      Code:
                      use `elections', clear
                      gen `c(obs_t)' election_num = _n
                      rangejoin fecha -366 -1 using `terrorism', by(country_txt)
                      
                      //    IDENTIFY OBSERVATIONS WHERE FECHA & FECHA_U STRADDLE A LEAP YEAR
                      gen byte ly_range = isleapyear(yofd(fecha)) & fecha > mdy(2,28, yofd(fecha)) ///
                          | isleapyear(yofd(fecha_U)) & fecha_U <= mdy(2, 28, yofd(fecha_U))
                          
                      //    REMOVE MATCHED DATA WHEN THERE IS NO LEAP YEAR STRADDLE & INTERVAL == 366 DAYS
                      foreach v of varlist *_U {
                          replace `v' = . if !ly_range & fecha - fecha_U == 366
                      }
                      
                      //    DROP THESE OUT-OF-RANGE MATCHES, UNLESS IT WOULD DELETE THE ENTIRE ELECTION
                      by election_num (fecha_U), sort: egen ok_to_drop = max(!missing(fecha_U))
                      drop if missing(fecha_U) & ok_to_drop

                      Comment


                      • #12
                        Actually, I realized it can be done far more simply:
                        Code:
                        use `elections', clear
                        gen lower_limit = birthday(fecha, yofd(fecha)-1, "feb28")
                        rangejoin fecha lower_limit -1 using `terrorism', by(country_txt)
                        That buries all the complication inside the -birthday()- function which StataCorp was kind enough to give us in version 17.

                        Comment


                        • #13
                          thank you

                          Comment

                          Working...
                          X