Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating lagged variables with pooled cross sectional data

    I am working with a dataset that contains student grade observations in a variety of subjects, in different schools, over ten years. This is not a time-series dataset, although the observations are not random, because the students take different exams each year, and some students/year/exams have siginificantly less observations than others.
    I want to create a lagged variable that contains the average test score for each test, in each school, in year t-1. For example, the variable t1_lag_grade_avg in an observation with a student\s grade in exam no. 831 from school no. 5 would receive the average of students' grades in school no. 5, exam no. 831 from year t-1. I want to create two lagged variables: t1_lag_grade_avg and t2_lag_grade_avg that is lagged to year t-2.
    Because this is not panel/time series data, I can't seem to find a way to add the lagged variable to a given group of grades without collapsing the group and losing observations. This is my current code, notice that it works only for the first observation of each group. I am also having trouble creating the t-2 variable using this method. I would appreciate any help with this issue.

    Code:
    sort school exam year
    egen totalgrade= total(grade), by(school exam year) 
    bysort school exam year: gen meangrade = totalgrade/_N 
    bysort school exam (year): gen t1_lag_grade_avg = meangrade[_n-1]

  • #2
    Because this is not panel/time series data, I can't seem to find a way to add the lagged variable to a given group of grades without collapsing the group and losing observations.
    ... and having collapsed your data and calculated the t-1 and t-2 lagged values you want, you then merge this collapsed data with your original dataset by school exam year to add the lagged values into the dataset.

    The following obviously untested code may start you in a useful direction:
    Code:
    use maindata
    collapse (mean) avg_grade0=grade, by(school exam year)
    by school exam (year): generate avg_grade1 = avg_grade0[_n-1]
    by school exam (year): generate avg_grade2 = avg_grade0[_n-2]
    save gradedata, replace
    use maindata
    merge m:1 school exam year using gradedata
    save maindata_with_avg_grade, replace

    Comment


    • #3
      This works, thanks!

      Comment

      Working...
      X