Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • generating new variable based on changes in panel data

    Hi,

    I've constructed a dataset that includes information about the political party in office for every town between 2002-2014. Elections were held in 2002, 2008 and 2014, so there are 3 election cycles: from 2002-end 2007, 2008-end 2013, and 2014-end 2014 (when my observation period ends). I want to create a variable that indicates whether there was a change in political party for each town between election cycles and how many changes there were total (0, 1 or 2) for the entire observation period.

    The variables I have are:

    polparty_4: categorical variable labeled as left, center, right, other
    code_insee: unique identifier for each town
    year: from 2002-2014

    I've created the following variables based on the above:
    generate party07 = polparty_4 if year==2007
    generate party08 = polparty_4 if year==2008
    generate party13 = polparty_4 if year==2013
    generate party14 = polparty_4 if year==2014

    My plan was to create a binary variable coded as 0 if party07=party08 (or party13=party14) for a particular commune, and 1 otherwise.

    However, I'm struggling to generate the binary variable grouped by town (code_insee), as well as the count variable for the total number of changes in each town.

    Thanks for your help!

  • #2
    Follow the FAQ advice to present data examples using dataex. For example, you may run the following and copy and paste the output here to increase your chances of getting a helpful reply

    Code:
    ssc install dataex
    sort code_insee year
    dataex polparty_4 code_insee year in 1/30

    Comment


    • #3
      Thank you for the tip. The output is below:

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input float polparty_4 str5 code_insee int year
      . ""         .
      1 "01024" 2011
      1 "01024" 2012
      1 "01024" 2013
      3 "01024" 2014
      3 "01032" 2007
      3 "01032" 2008
      3 "01032" 2009
      3 "01032" 2010
      3 "01032" 2011
      3 "01032" 2012
      3 "01032" 2013
      4 "01032" 2014
      3 "01033" 2002
      3 "01033" 2002
      3 "01033" 2003
      3 "01033" 2003
      3 "01033" 2004
      3 "01033" 2004
      3 "01033" 2005
      3 "01033" 2005
      3 "01033" 2006
      3 "01033" 2006
      3 "01033" 2007
      3 "01033" 2007
      4 "01033" 2008
      4 "01033" 2008
      4 "01033" 2011
      4 "01033" 2012
      4 "01033" 2013
      end
      format %ty year
      Not all years are available for all towns because the data are also organized according to organizations, so if an organization existed only between 2011-2014 (like in the first 4 lines of output), and it was the only organization in that town, only the years corresponding to the period from the organization's birth to its failure would appear.

      Comment


      • #4
        Thanks for the data example. Here, I see that you have some duplicate observations. This does not matter for what you want to do but if you are going to analyze this data as a panel, you will have to address this.

        Code:
        encode code_insee, gen(town_id)
        *GENERATE INDICATOR VARIABLE FOR CHANGE
        bys town_id(year):gen change= cond(polparty_4[_n-1]!=polparty_4[_n], 1, 0 ) & _n!=1
        *COUNT TOTAL NUMBER OF CHANGES
        bys town_id:egen nchanges= total(change)

        Comment


        • #5
          Thanks so much! This works wonderfully!
          The "duplicates" are due to the multiple organizations in a town and multiple observations per organization (for each year), but there is a unique organization-year identifier variable that I didn't include here for simplicity. I appreciate the heads up though.

          Comment

          Working...
          X