Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Build mean variable based on variables with missing data (ignoring missing values)

    Hi,


    I have a dataset with an income variable reported at 5 different time points (5 waves of data in the format shown below – each respondent corresponds to a row, and each column reflects income for a specific year).

    These variables all have missing values, and I want to build a mean of them (5-year mean income), which should only draw from the years for which the respondent does not have missing data. In the example below, respondent 1 would have a mean based on 3 years of data, while respondent 2 would have a mean based on 3 years of data, and respondent 3 would have a mean based on 4 years of data).

    My current problem is that having one missing value makes the generated mean variable entirely missing.
    I would not like to treat the missingness as zero (as I don’t know what their income was on that year), but rather calculate the mean income based on the non-missing values.

    I would appreciate advice on this, thank you so much in advance!
    Income year 1 Income year 2 Income year 3 Income year 4 Income year 5
    ID1 40,000 missing 42,000 40,000 missing
    ID2 missing missing 32,000 missing 35,000
    ID3 50,000 52,000 missing 54,000 55,000
    Last edited by Daniela Kaiser; 16 Aug 2021, 19:13.

  • #2
    See -help egen- and look at the -rowmean()- function. If your data actually looks like the tableau you show, you will also have to convert the variables to numeric variables with Stata system missing or extended missing values: you cannot perform calculations with string variables. In that case, look at the -destring- command.

    Which brings up another point. What you show is clearly not from a Stata data set. It cannot be, because Stata variable names cannot include blank spaces. In the future, when asking for help with code, show real Stata data examples, and do that by using the -dataex- command. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    When asking for help with code, always show example data. When showing example data, always use -dataex-.

    Comment


    • #3
      Thank you so much Clyde, that command worked! And sorry about the format, I wrote up an example to illustrate what my data looks like instead of presenting my actual data - next time I'll make sure I follow the steps you recommended.
      Thanks for your help,

      Comment

      Working...
      X