Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problem with weighted average and collapsing

    Hello,
    I am using the manifesto project data to get indicators for countries' environmental interests (per501), as well as the Right-Left Index (RILE). I have dropped observations before my period of interest, and for most countries I am left with two elections. There is one observation for every party in every country. It's specified how many seats they have, and what percent of the votes (pervote) each party received. I want to collapse the data into RILE and environmental interest by country, but also by these two elections. Furthermore I want the mean that comes out to be weighted by the percentage of votes each party received.

    I don't really understand the stata guide I get from /help, but I did "collapse per501 rile [w=pervote], by(country date)"

    To check whether the weighted mean was correct, I calculated the first "date" of Sweden by hand for environment (per501). Stata gives me the value 7.09, where as my calculation was at 5.7. I assume this flaw could be due to the numbers that are missed, as I rounded up most values, but it still seems like a but too big of a difference.
    Am I doing it correct or should I be doing something else?

    Any help or tips would be greatly appreciated!

  • #2
    Assume we don't come from your area. I have no idea with manifesto project data are. You'll increase your chances of a useful answer by following the FAQ on asking questions - provide Stata (not stata) code in code delimiters, readable Stata output, and sample data using dataex. Try to avoid relevant issues - let us focus on your problem.

    With sample data, and maybe your calculation (done to 4 or whatever decimals - this is not much work), we might be of assistance. Since you want means, you might try checking your analysis with su with the weight for that country-date.





    Comment


    • #3
      Thank you for your suggestion. I hope this is better. Since I asked this I have come a little further, but I still haven't solved the problem.


      I am using the Parlgov dataset (example below). I am trying to get the weighted mean, using the _gwtmean package, and have used it like this:
      Code:
      _gwtmean wtleri= left_right, by(seats) weight(seats_total)
      I want to know the left-right position of parties, weighted by their seats. This does exactly that, but the problem is however that I need this to be done per country, per election. When I use the current command, it gives me the weighted mean for, what I think might be whichever countries have the same "seats_total".
      I tried adding "by(country)" etc, but it only gives me an error message when I do that.
      invalid 'by'
      r(198);
      .

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input str14 country_name str10 election_date int(seats seats_total) float left_right
      "Austria"        "2014-05-25" 0  18      .
      "Cyprus"         "2006-05-21" 0  56      .
      "Czech Republic" "2014-05-24" 0  21      .
      "Romania"        "2009-06-07" 0  33 5.5341
      "Lithuania"      "2004-10-24" 0 141 6.4579
      "Portugal"       "2009-06-07" 0  22    1.2
      "Poland"         "2004-06-13" 0  54    6.2
      "Denmark"        "2004-06-13" 0  14 5.6982
      "Netherlands"    "2014-05-25" 0  26      6
      "Lithuania"      "2012-10-14" 0 141 6.4579
      "Bulgaria"       "2009-06-07" 0  17      6
      "Greece"         "2004-03-07" 0 300    2.3
      "Slovenia"       "2011-12-04" 0  90    3.3
      "Lithuania"      "2014-05-25" 0  11 7.7562
      "Finland"        "2014-05-25" 0  13 7.1884
      "Luxembourg"     "2009-06-07" 0  60    1.2
      "Italy"          "2013-02-25" 0 617    1.2
      "Latvia"         "2004-06-13" 0   9 6.3377
      "Czech Republic" "2010-05-29" 0 200    8.7
      "Slovakia"       "2012-03-10" 0 150      6
      "Cyprus"         "2014-05-25" 0   6    8.7
      "Estonia"        "2014-05-25" 0   6      .
      "Greece"         "2012-06-17" 0 300    7.4
      "Slovakia"       "2012-03-10" 0 150 7.0198
      "Poland"         "2005-09-25" 0 460 9.7368
      "Croatia"        "2015-11-08" 0 151    2.5
      "Luxembourg"     "2014-05-25" 0   6    1.2
      "United Kingdom" "2004-06-10" 0  78    8.7
      "Croatia"        "2013-04-14" 0  12 8.7719
      "Spain"          "2011-11-20" 0 350      .
      "Luxembourg"     "2004-06-13" 0  60    1.2
      "Slovakia"       "2012-03-10" 0 150 6.5005
      "Bulgaria"       "2017-03-26" 0 240    7.4
      "Netherlands"    "2012-09-12" 0 150      .
      "Spain"          "2014-05-25" 0  54      .
      "Latvia"         "2004-06-13" 0   9      .
      "Cyprus"         "2014-05-25" 0   6      .
      "Latvia"         "2011-09-17" 0 100    7.4
      "Denmark"        "2005-02-08" 0 179 5.5609
      "Malta"          "2013-03-09" 0  69 3.7895
      "Croatia"        "2014-05-25" 0  11    8.7
      "Slovenia"       "2004-06-13" 0   7 4.7941
      "Greece"         "2007-09-16" 0 300    2.5
      "Luxembourg"     "2014-05-25" 0   6      .
      "Slovakia"       "2014-05-25" 0  13  .5275
      "Estonia"        "2004-06-13" 0   6 4.5812
      "France"         "2009-06-07" 0  72  .0714
      "Poland"         "2009-06-07" 0  50    7.4
      "Lithuania"      "2014-05-25" 0  11    8.7
      "Poland"         "2004-06-13" 0  54    7.4
      "Greece"         "2004-06-13" 0  24      .
      "Poland"         "2015-10-25" 0 460 2.8299
      "Sweden"         "2009-06-07" 0  18      .
      "Germany"        "2013-09-22" 0 631    8.7
      "Estonia"        "2014-05-25" 0   6    8.7
      "Estonia"        "2014-05-25" 0   6      .
      "Belgium"        "2014-05-25" 0  21    7.4
      "Greece"         "2012-05-06" 0 300    7.4
      "United Kingdom" "2009-06-04" 0  72    6.2
      "Finland"        "2004-06-13" 0  14 7.1884
      "Italy"          "2009-06-07" 0  72    1.2
      "Luxembourg"     "2004-06-13" 0   6 8.8158
      "Lithuania"      "2008-10-12" 0 141    9.8
      "Croatia"        "2013-04-14" 0  12 6.2281
      "Ireland"        "2011-02-25" 0 165  2.435
      "Poland"         "2014-05-25" 0  51    8.7
      "Cyprus"         "2009-06-06" 0   6 5.7895
      "Bulgaria"       "2014-05-25" 0  17    3.3
      "Italy"          "2013-02-25" 0 617    7.4
      "France"         "2004-06-13" 0  78    7.4
      "Latvia"         "2004-06-13" 0   9   1.25
      "Sweden"         "2010-09-19" 0 349      .
      "Estonia"        "2011-03-06" 0 101    2.5
      "United Kingdom" "2010-05-06" 0 650    8.7
      "Czech Republic" "2009-06-07" 0  22    7.4
      "Portugal"       "2011-06-05" 0 230    2.5
      "Slovenia"       "2004-06-13" 0   7 4.2263
      "Luxembourg"     "2014-05-25" 0   6 8.8158
      "France"         "2012-06-17" 0 577      .
      "Germany"        "2014-05-25" 0  96 9.2752
      "Italy"          "2008-04-13" 0 630 9.5762
      "Latvia"         "2006-10-07" 0 100    3.3
      "France"         "2014-05-24" 0  74    2.5
      "France"         "2014-05-24" 0  74      0
      "Poland"         "2004-06-13" 0  54 9.7368
      "Slovakia"       "2012-03-10" 0 150    1.2
      "Romania"        "2008-11-30" 0 334    6.2
      "Lithuania"      "2012-10-14" 0 141      6
      "Cyprus"         "2011-05-22" 0  56    8.7
      "Netherlands"    "2004-06-13" 0  27 8.5609
      "Slovakia"       "2004-06-13" 0  14   5.46
      "Luxembourg"     "2013-10-20" 0  60      .
      "Hungary"        "2004-06-13" 0  24 9.6065
      "Bulgaria"       "2005-06-25" 0 240    2.5
      "Germany"        "2013-09-22" 0 631 9.8246
      "Lithuania"      "2008-10-12" 0 141    1.2
      "Slovenia"       "2009-06-07" 0   7 4.2263
      "Slovakia"       "2014-05-25" 0  13    1.2
      "Portugal"       "2014-05-25" 0  21    2.5
      "Spain"          "2004-03-14" 0 350   4.85
      end

      Comment


      • #4
        Weighted means aren't difficult. You can use a user-written package if you wish but note that you can just work from first principles. A weighted mean within groups of observations if you wish is a three-step.

        Code:
        bysort a b c : egen denom = total(weight) 
        by a b c : egen numer = total(weight * x) 
        gen wtmean = numer/denom 
        But here I am puzzled by your syntax because _gwtmean (SSC) defines an egen function, and I see no egen call.

        Nevertheless something like

        Code:
         
         egen  wtleri= wtmean(left_right), by(country election) weight(seats_total)
        should be legal. Nothing in the syntax restricts the argument of by() to a single variable. You can't however specify by() twice if that is what you did. (It is especially important to give exact code for those statements that produce errors!)

        Comment


        • #5
          Thank you. I don't know why I was making it so complicated in my head. I did:
          Code:
          bysort county_name_short election_date : egen numer = total(left_right*seats)
          gen wtmleri = numer/seats_total
          It seems correct to me.

          Comment

          Working...
          X