Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • drop observations in long panel data set if variable changes over time

    I am working with data, where certain time-invariant variables, e.g. sex and birth year, are changing over time. How do I delete the observations where this happens?

    The data is a long panel data set: xtset id year


    Click image for larger version

Name:	Schermafbeelding 2020-06-22 om 21.08.01.png
Views:	1
Size:	51.8 KB
ID:	1560075



    Here, observations with id==102 should be removed because sex changes over time, and observations with id==103 because ybirth changes over time.

    Just typing "drop if inlist(id,102,103)" is not convenient because my data set is extremely large and a lot of observations should be removed

    Thank you in advance

  • #2
    Something like

    foreach V of varlist sex ybirth {
    bys id (`V') : gen err= `V'[_N]-`V'[1]
    drop if err!=0
    drop err
    }


    should do it.

    hth,
    Jeph

    Comment


    • #3
      Jeph Herrin won't work because the variable is generated inside the loop; so just gen the "err" variable as either 0 or missing prior to the -foreach- loop and then change his -gen- to a -replace-

      still untested as you did not provide data in a usable fashion; please read the FAQ on how to use -dataex- and CODE blocks

      Comment


      • #4
        Rich Goldstein

        I'm sorry, I'm still struggling with Stata. I don't know what to do with -dataex- but does this work? I think I know how to solve my problem now, but just in case I have a question in the future, so people can help me.

        Code:
        input float(id year sex ybirth)
        101 2010 1 1960
        101 2011 1 1960
        101 2012 1 1960
        102 2010 1 1970
        102 2011 2 1970
        102 2012 2 1970
        103 2010 1 1972
        103 2011 1 1972
        103 2012 1 1973
        end
        Thank you both for the help!

        Comment


        • #5
          Rich Goldstein

          Not sure you are correct. Since we want to drop an -id- if that subject has variation in *any* variable, not all of them, we can generate and drop -err- inside the loop. I just tested my code on the sample data and it worked fine.

          Jeph Herrin won't work because the variable is generated inside the loop; so just gen the "err" variable as either 0 or missing prior to the -foreach- loop and then change his -gen- to a -replace-

          Comment


          • #6
            Code:
            foreach V of varlist sex ybirth {
                bys id (`V') : drop if `V'[_N] == `V'[1]
            }

            Comment


            • #7
              Jeph Herrin - whoops - I missed the -drop- part- sorry about that

              Comment


              • #8
                Originally posted by Nick Cox View Post
                Code:
                foreach V of varlist sex ybirth {
                bys id (`V') : drop if `V'[_N] == `V'[1]
                }
                I think !=

                Comment


                • #9
                  Yes, OP wants changes, not constancy.

                  Comment


                  • #10
                    Originally posted by Nick Cox View Post
                    Yes, OP wants changes, not constancy.
                    Hi Nick,could you hlep me deal with the problem I recently faced? After reading your posts,I strongly believe you can help me.
                    Specificly,the problem is about the“insufficient observation”in gllamm.You can see it through my new post.I really need your help,thank you in advance!

                    Comment


                    • #11
                      Cassie Liu Sorry, no. I would have answered your question if I had anything I wanted to say..

                      Comment


                      • #12
                        Originally posted by Nick Cox View Post
                        Cassie Liu Sorry, no. I would have answered your question if I had anything I wanted to say..
                        Thank you,I understand.Luckily,I have already solve the problem by dealing with the variables.

                        Comment


                        • #13
                          Hey guys, There is one twist added to the problem. There is also a variable with missing values present. I want to drop the observations where these variables change over time, but missing values shouldn't be considered as changing.

                          Code:
                          clear
                          input float(id year sex ybirth ethnicity)
                          101 2010 1 1960 .
                          101 2011 1 1960 1
                          101 2012 1 1960 1
                          102 2010 1 1970 .
                          102 2011 2 1970 1
                          102 2012 2 1970 2
                          103 2010 1 1972 1
                          103 2011 1 1972 1
                          103 2012 1 1973 2
                          end
                          Here, observations with id==102 and id==103 should be removed. Observations with id==101 should NOT be dropped because ethnicity stays the same over time, despite the missing value.

                          Comment


                          • #14
                            I solved everything, big thanks for all the help!

                            Comment

                            Working...
                            X