Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dropping duplicated results with panel data

    Hello Stata people,

    I have a panel dataset where the cross-sectional variable is wage (for different individuals) and the longitudinal variable is time.
    I only want a time-series of total(wage) (sum of all employees' wages in a given year). I'm not interested in the wages of individuals.

    I have used this command: bysort year: egen tot_wage=total(total_income)

    Now I have a time series of the tot_wage variable. Unfortunately, because there were i observations in each year, the tot_wage series is replicated i times in each year. These values appear sequentially e.g.

    ...
    2012 | 5000
    2012 | 5000
    2012 | 5000
    2013 | 5500
    2013 | 5500
    ...

    Not ideal.

    The individuals participating in the wage survey are not the same from year to year so I can't drop all but one person's observations. How might I obtain the simple time series I am after?

    Best,
    Pascal




  • #2
    I have not completely grasped your question, however, if you want to keep on observation for each person in each year, the following code should do the job. Note that I am assuming the there is a variable individual that tracks the identity of each cross-sectional unit.
    Code:
    bysort individual year: keep if _n == _N
    And if you want to keep one observation per year, then
    Code:
    bysort year: keep if _n == _N
    Regards
    --------------------------------------------------
    Attaullah Shah, PhD.
    Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
    FinTechProfessor.com
    https://asdocx.com
    Check out my asdoc program, which sends outputs to MS Word.
    For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.

    Comment


    • #3
      Pascal:
      welcome to this forum.
      You may be interested is something along the following lines:
      Code:
      . set obs 5
      number of observations (_N) was 0, now 5
      
      . g id=1
      
      . g year=2000+_n
      
      . g wage=10000*runiform()
      
      . collapse (sum) wage, by(id)
      
      . list
      
           +---------------+
           | id       wage |
           |---------------|
        1. |  1   16498.94 |
           +---------------+
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment


      • #4
        Dear All,

        I found that the 'collapse' command gave the desired result.
        Thank you both for your responses, you've saved me a lot of bother!

        All the best,
        Pascal


        Comment

        Working...
        X