Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to do Missing Values Imputation on a Specific Variable in a Longitudinal Dataset with Stata Code?

    I have a longitudinal dataset shown below, which is about grade retention issues in K-12 education system in the US. (id-student id; grade-student grade; state-if a student was retained in grades)

    What I want to do with Stata code is as follows,
    1-Fill the missing values on the variable "grade" with the integer. For example, for the student with id==2, the values filled should be 3,5, and 9.
    For another instance, for the student with id==5, the values filled should be 3,4,5,6,7,8,9,10, and 11.
    2-After the imputation on the grade variable, I want to create a categorical variable called "type", which illustrates the type of the gap that I just filled within each id.
    for the student with id==2 or id==4, then type==1(normal gap===because when the missing value on the grade variable is imputed, there is not any sign to indicate the student was retained in grades)
    for the student with id==3 or id==5 or id==7, then type==2 (abnormal gap=after the variable "grade" was imputed, it was found that the student was demoted or retained in grades)



    input str10 id byte (grade state)
    2 1 0
    2 2 0
    2 . .
    2 4 0
    2 . .
    2 6 0
    2 7 0
    2 8 0
    2 . .
    2 10 0
    2 11 0
    2 12 0
    3 1 0
    3 2 0
    3 3 0
    3 4 0
    3 5 0
    3 6 0
    3 7 0
    3 8 0
    3 9 0
    3 10 0
    3 . .
    3 11 0
    4 1 0
    4 . .
    4 3 0
    4 4 0
    4 5 0
    4 6 0
    4 . .
    4 . .
    4 . .
    4 . .
    4 . .
    4 12 0
    5 1 0
    5 2 0
    5 . .
    5 . .
    5 . .
    5 . .
    5 . .
    5 . .
    5 . .
    5 . .
    5 . .
    5 9 0
    7 1 0
    7 . .
    7 . .
    7 . .
    7 . .
    7 4 0
    7 5 0
    7 6 0
    7 7 0
    7 7 1
    end

    Can anybody help me with Stata code?
    Thank you!
    Last edited by smith Jason; 18 Oct 2022, 16:12.

  • #2
    can anybody help?

    Comment


    • #3
      I believe this will do it:
      Code:
      sort id, stable
      by id: replace grade = grade[_n-1]+1 if missing(grade)
      by id: egen gap_type = max(grade[_n-1] >= grade & _n > 1)
      replace gap_type = gap_type + 1
      Added: Re #2. It is inappropriate to bump when your post has not even been up for an hour and a half. This is not a help line. It's an all-volunteer community who answer questions when they are available, able, and interested. You should never bump in less than, say, 6 hours. Even that, frankly, is pushing it. Generally waiting 24 hours is more appropriate. And then, before just bumping, think about why you're not getting an answer. Usually bumping won't help, because most commonly questions are left unanswered because they are unclear, presented with insufficient information to provide an answer, overly long and complicated, or so specialized that nobody here knows the answer. So better than bumping, it's usually best to try to rewrite your question to make it more attractive to responders. The Forum FAQ has lots of excellent tips on how to do that.
      Last edited by Clyde Schechter; 18 Oct 2022, 18:26.

      Comment


      • #4
        Originally posted by Clyde Schechter View Post
        I believe this will do it:
        Code:
        sort id, stable
        by id: replace grade = grade[_n-1]+1 if missing(grade)
        by id: egen gap_type = max(grade[_n-1] >= grade & _n > 1)
        replace gap_type = gap_type + 1
        Added: Re #2. It is inappropriate to bump when your post has not even been up for an hour and a half. This is not a help line. It's an all-volunteer community who answer questions when they are available, able, and interested. You should never bump in less than, say, 6 hours. Even that, frankly, is pushing it. Generally waiting 24 hours is more appropriate. And then, before just bumping, think about why you're not getting an answer. Usually bumping won't help, because most commonly questions are left unanswered because they are unclear, presented with insufficient information to provide an answer, overly long and complicated, or so specialized that nobody here knows the answer. So better than bumping, it's usually best to try to rewrite your question to make it more attractive to responders. The Forum FAQ has lots of excellent tips on how to do that.
        Dear Professor,
        Thank you very much for your code.
        Initially, my dataset has the student with id==1 and the student with id==6. For the gap type issue, with these two data records, the category of the gap will have 3 types,
        for the student with id==2 or id==4, then type==1(normal gap===because when the missing value on the grade variable is imputed, there is not any sign to indicate the student was retained in grades)
        for the student with id==3 or id==5 or id==7, then type==2 (abnormal gap=after the variable "grade" was imputed, it was found that the student was demoted or retained in grades)
        for the student with id==1 or id==6, then type==3 (don't need to do anything on the state variable because either the student has never been retained in grades or the students has been retained in grade, their data record
        on the state variable is no missing value).
        I will pay attention to the pushing issue you mentioned. Sorry about that.
        Could you please help me with the addition of these two students?

        Thank you!


        clear
        input str10 id byte (grade state)
        1 1 0
        1 2 0
        1 3 0
        1 4 0
        1 5 0
        1 6 0
        1 7 0
        1 8 0
        1 9 0
        1 10 0
        1 11 0
        1 12 0
        2 1 0
        2 2 0
        2 . .
        2 4 0
        2 . .
        2 6 0
        2 7 0
        2 8 0
        2 . .
        2 10 0
        2 11 0
        2 12 0
        3 1 0
        3 2 0
        3 3 0
        3 4 0
        3 5 0
        3 6 0
        3 7 0
        3 8 0
        3 9 0
        3 10 0
        3 . .
        3 11 0
        4 1 0
        4 . .
        4 3 0
        4 4 0
        4 5 0
        4 6 0
        4 . .
        4 . .
        4 . .
        4 . .
        4 . .
        4 12 0
        5 1 0
        5 2 0
        5 . .
        5 . .
        5 . .
        5 . .
        5 . .
        5 . .
        5 . .
        5 . .
        5 . .
        5 9 0
        6 1 0
        6 1 1
        7 1 0
        7 . .
        7 . .
        7 . .
        7 . .
        7 4 0
        7 5 0
        7 6 0
        7 7 0
        7 7 1
        end

        Comment


        • #5
          I'm sorry, but I don't understand type = 3: "either the student has never been retained in grades or the students has been retained in grade, their data record on the state variable is no missing value." The problem is that the first of these, "never been retained in grades" is the criterion for type = 1. Please clarify.

          Comment


          • #6
            Originally posted by Clyde Schechter View Post
            I'm sorry, but I don't understand type = 3: "either the student has never been retained in grades or the students has been retained in grade, their data record on the state variable is no missing value." The problem is that the first of these, "never been retained in grades" is the criterion for type = 1. Please clarify.
            Hello, Professor,
            Thanks for your response. For the data record like the student with id==1 and id==6, I said that they belong to the type 3. Because id==1 have no observed missing values at all, and the student made normal progress on the grade and id==6 just has grade retention only once. In other words, for these two data records, they didn't have any observed missing values on the variable "state". So, we can call this type "don't need to do anything on the state variable".

            As for the type 1, id==2 or id==4, although they are looks normally progressed after imputing the data on the variable "state", in fact, they have missing values on the variable "state". So, that is why we can this normal gap.
            Hope my explanation works.

            Thank you!

            Comment


            • #7
              Originally posted by Clyde Schechter View Post
              I believe this will do it:
              Code:
              sort id, stable
              by id: replace grade = grade[_n-1]+1 if missing(grade)
              by id: egen gap_type = max(grade[_n-1] >= grade & _n > 1)
              replace gap_type = gap_type + 1
              Added: Re #2. It is inappropriate to bump when your post has not even been up for an hour and a half. This is not a help line. It's an all-volunteer community who answer questions when they are available, able, and interested. You should never bump in less than, say, 6 hours. Even that, frankly, is pushing it. Generally waiting 24 hours is more appropriate. And then, before just bumping, think about why you're not getting an answer. Usually bumping won't help, because most commonly questions are left unanswered because they are unclear, presented with insufficient information to provide an answer, overly long and complicated, or so specialized that nobody here knows the answer. So better than bumping, it's usually best to try to rewrite your question to make it more attractive to responders. The Forum FAQ has lots of excellent tips on how to do that.
              Hi, professor. I don't understand this part of your Stata code,
              max(grade[_n-1] >= grade & _n > 1) Could you please explain what does it do? Thank you!

              Comment


              • #8
                Code:
                sort id, stable
                by id: replace grade = grade[_n-1]+1 if missing(grade)
                by id: egen gap_type = max(grade[_n-1] >= grade & _n > 1)
                replace gap_type = gap_type + 1
                by id: egen state_missing = max(missing(state))
                replace gap_type = 3 if gap_type == 1 & state_missing
                will do what you want.

                I don't understand this part of your Stata code,
                max(grade[_n-1] >= grade & _n > 1) Could you please explain what does it do?
                If an id has been retained in grade, there will be some point(s) in that id's data where the value of grade does not increase from one observation to the next. In other words, grade[_n-1], which should normally be less than the current value of grade, will actually be the same as grade, or even larger. This code tests for that. The part about _n > 1 is needed, because in the first observation of an id, _n-1 will be 0, and as there is no 0'th observation in Stata, the value of grade[_n-1] will be a missing value. But in Stata, missing values are always larger than any real number. So everyone would appear to be retained in grade on their first observation. The -& _n > 1- circumvents that problem.

                Comment


                • #9
                  Originally posted by Clyde Schechter View Post
                  Code:
                  sort id, stable
                  by id: replace grade = grade[_n-1]+1 if missing(grade)
                  by id: egen gap_type = max(grade[_n-1] >= grade & _n > 1)
                  replace gap_type = gap_type + 1
                  by id: egen state_missing = max(missing(state))
                  replace gap_type = 3 if gap_type == 1 & state_missing
                  will do what you want.


                  If an id has been retained in grade, there will be some point(s) in that id's data where the value of grade does not increase from one observation to the next. In other words, grade[_n-1], which should normally be less than the current value of grade, will actually be the same as grade, or even larger. This code tests for that. The part about _n > 1 is needed, because in the first observation of an id, _n-1 will be 0, and as there is no 0'th observation in Stata, the value of grade[_n-1] will be a missing value. But in Stata, missing values are always larger than any real number. So everyone would appear to be retained in grade on their first observation. The -& _n > 1- circumvents that problem.
                  Professor, Thank you.
                  After running your code, for the record of id==6, the type is still 2.

                  Comment


                  • #10
                    After running your code, for the record of id==6, the type is still 2.
                    Yes, because id 6 was retained in grade 1. So, what am I missing?

                    Comment


                    • #11
                      Originally posted by Clyde Schechter View Post
                      Yes, because id 6 was retained in grade 1. So, what am I missing?
                      Professor, id==6 should belong to type 3, because this person's data record is complete (no need to do anything due to observed completeness on the variable "state" although the student is retained.
                      Thank you!

                      Comment


                      • #12
                        So, if you mean that whenever the observations for a given id are always non-missing for the variable state, regardless of whether the student has been retained in grade or not, then they are type 3, it would be:
                        Code:
                        sort id, stable
                        by id: replace grade = grade[_n-1]+1 if missing(grade)
                        by id: egen gap_type = max(grade[_n-1] >= grade & _n > 1)
                        replace gap_type = gap_type + 1
                        by id: egen state_missing = max(missing(state))
                        replace gap_type = 3 if !state_missing

                        Comment


                        • #13
                          Thank you!

                          Comment

                          Working...
                          X