Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Filling in panel observations for different countries separately.

    Dear all,

    how can I fill in observations for every combination of time values (in this case the variable “start_date”) and panel values (in this case the variable “party_id”) but only of their respective country?
    Neither the tsfill or fillin commands allow the “by(x)” option and I have yet to find another way. Using those commands fills in every start_date of every country for every party_id and blows up my dataset from previously 4,710 observations to 10,298,712.

    In Stata 16.0 I have panel data of political parties in different countries over different elections. Sometimes there are multiple cabinets formed after a single election – in those cases, there are observations for each start_date for each party_id, but all with the same election_date. Sometimes a party does not gain any seats and thus has no observations for the following election, sometimes it gains again in a later election and has an observation again.

    I want to generate a variable stating if a party was in office at least for one year in a legislative term, what I aim to do by calculating the time from the current start_date until the next start_date and then using the “egen x = total(y)” command to sum the time by election_date. For this I first need to fill in observations for each party_id and start_date. In the end I will collapse the data only containing one observation for each party_id and election_date in that the party gained seats, which I will then “xtset” as my panel and time variable for analysis.

    I previously posted this question in another thread which I found fitting, but I am sorry to only now realise, that it should be in a new thread.
    Here is the link to the old post: https://www.statalist.org/forums/for...nel-data/page2

    Hope someone can help here and thanks in advance!

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str14 country_name int(party_id election_date start_date)
    "Netherlands" 300 -15157 -15089
    "Netherlands" 405 -15157 -15089
    "Netherlands" 300 -13694 -13619
    "Netherlands" 405 -13694 -13619
    "Netherlands" 300 -12602 -12568
    "Netherlands" 405 -12602 -12568
    "Netherlands" 300 -12602 -12352
    "Netherlands" 405 -12602 -12352
    "Netherlands" 300 -11139 -11101
    "Netherlands" 405 -11139 -11101
    "Netherlands" 300  -9746  -9716
    "Netherlands" 405  -9746  -9716
    "Netherlands" 300  -9746  -8920
    "Netherlands" 405  -9746  -8920
    "Netherlands" 300  -8255  -8226
    "Netherlands" 405  -8255  -8226
    "Netherlands" 300  -8255  -7465
    "Netherlands" 405  -8255  -7465
    "Netherlands" 300  -8255  -7449
    "Netherlands" 405  -8255  -7449
    end
    format %tddd/nn/CCYY election_date
    format %tddd/nn/CCYY start_date

  • #2
    I saw this when you originally posted it in the other thread, and I passed it over because I could not understand what you have and what you want. That hasn't changed. In your example data, you have one country and two parties, each of which has exactly the same election dates associated with it. So there are no gaps to fill in. Even if there were, I'm not sure I understand what you want the final result to look like.

    I suggest you post back with different example data that actually illustrates the problem you need to solve, and also work out by hand for the example what the result you are looking for is and show that.

    Comment


    • #3
      Thanks for the fast reply and sorry for the unclear example, I will give better example data here.

      The first six rows are an example of different start_dates for the same election_date in Finland. In the last 8 rows, you can see that party_id “345” does not have an observation for election_date “19248” in the Netherlands. I want to automatically create an observation for party_id “345” with the election_date “19248”, the start_date “19302” and the country_name "Netherlands". In general, a new observation should only be created, if a party is missing an observation with a start_date that other parties in its specific country do have an observation for, not if it is missing an observation for start_dates in other countries.
      If I set to panel data using “xtset party_id start_date” and then use “tsfill, full” or use the “fillin country_name party_id start_date” command, I also get new observations for all parties in Finland for all the start_dates in the Netherlands and the other way around.

      I hope this is more clear.


      Original data:
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input str14 country_name int(party_id election_date start_date)
      "Finland"   94 21653 21893
      "Finland"  200 21653 21893
      "Finland"  395 21653 21893
      "Finland"   94 21653 21706
      "Finland"  200 21653 21706
      "Finland"  395 21653 21706
      "Netherlands"  235 20893 21118
      "Netherlands"  345 20893 21118
      "Netherlands"  357 20893 21118
      "Netherlands"  235 19248 19302
      "Netherlands"  357 19248 19302
      "Netherlands"  235 18422 18549
      "Netherlands"  345 18422 18549
      "Netherlands"  357 18422 18549
      end
      format %tddd/nn/CCYY election_date
      format %tddd/nn/CCYY start_date


      This is how the data should look in the end:
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input str14 country_name int(party_id election_date start_date)
      "Finland"   94 21653 21893
      "Finland"  200 21653 21893
      "Finland"  395 21653 21893
      "Finland"   94 21653 21706
      "Finland"  200 21653 21706
      "Finland"  395 21653 21706
      "Netherlands"  235 20893 21118
      "Netherlands"  345 20893 21118
      "Netherlands"  357 20893 21118
      "Netherlands"  235 19248 19302
      "Netherlands"  345 19248 19302
      "Netherlands"  357 19248 19302
      "Netherlands"  235 18422 18549
      "Netherlands"  345 18422 18549
      "Netherlands"  357 18422 18549
      end
      format %tddd/nn/CCYY election_date
      format %tddd/nn/CCYY start_date
      Last edited by Anton Lang; 22 Jun 2023, 05:39.

      Comment


      • #4
        Much clearer, thanks.

        Code:
        capture program drop one_country
        program define one_country
            egen dates = concat(election_date start_date)
            fillin party_id dates
            capture assert _fillin == 0
            if c(rc) {
                sort country_name
                replace country_name = country_name[_N] if _fillin
                by dates (election_date), sort: ///
                    replace election_date = election_date[1] if _fillin
                by dates (start_date), sort: ///
                    replace start_date = start_date[1] if _fillin
            }
            drop dates _fillin
            exit
        end
        
        runby one_country, by(country_name)
        -runby- is written by Robert Picard and me. It is available from SSC.

        Comment


        • #5
          Thanks a lot! This does exactly what I wanted to achieve.

          Comment

          Working...
          X