Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating Mean and Percentile of a variable that is repeated several times

    Hi,

    I have a variable called VoterTurnout2013 which does not change by time within IDs in my panel dataset. I want to calculate mean and percentiles for this variable for the whole dataset. However, I have problem if I do the ordinary calculations since I have an unbalanced panel and number of observations differ for each ID. Thus, IDs with higher (or lower) observations distort the mean and percentiles.

    How can I calculate the mean and percentiles in this situation? Another thing I would like to ask: How does STATA takes into account N/A cells while calculating means (are they dropped or regarded as "0")?

    Thanks for your help, example of my dataset is depicted below.
    City (ID) Time VoterTurnout2013
    1 2013 88%
    1 2014 88%
    1 2015 88%
    2 2013 72%
    2 2014 72%
    3 2013 79%
    3 2014 79%
    4 2013 91%
    Last edited by James Sonela; 11 Oct 2023, 10:22.

  • #2
    Could someone please help me?

    Comment


    • #3
      Since it doesn't change by time, you could collapse the data to get one value per city id, then compute the mean.

      collapse (first) VoterTurnout2013 , by(cityid)

      Or you could mark the first observation for the city id and restrict the mean computation accordingly.

      HTML Code:
      https://www.stata.com/support/faqs/data-management/first-and-last-occurrences/
      Last edited by George Ford; 13 Oct 2023, 07:56.

      Comment


      • #4
        Originally posted by George Ford View Post
        Since it doesn't change by time, you could collapse the data to get one value per city id, then compute the mean.

        collapse (first) VoterTurnout2013 , by(cityid)

        Or you could mark the first observation for the city id and restrict the mean computation accordingly.

        HTML Code:
        https://www.stata.com/support/faqs/data-management/first-and-last-occurrences/
        Thank you very much. I would like to append these newly calculated percentiles to my dataset. I am thinking of saving the collapsed dataset as different file and merge them with id variables (I have nine variables and for each of theam I will calculate mean, 50th and 75th percentile; which makes 27 columns to add). If you have a quicker solution, I will appreciate it.

        Comment


        • #5
          Code:
          bys cid: g first = _n==1
          foreach var in x1 x2 x3 x4 {
               qui summ `var' if first , d
               g `var'_mean = r(mean)
               g `var'_50 = r(p50)
               g `var'_75 = r(p75)
          }

          Comment


          • #6
            Originally posted by George Ford View Post
            Code:
            bys cid: g first = _n==1
            foreach var in x1 x2 x3 x4 {
            qui summ `var' if first , d
            g `var'_mean = r(mean)
            g `var'_50 = r(p50)
            g `var'_75 = r(p75)
            }
            Dear George, many thanks!

            Comment

            Working...
            X