Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generating dummy observations to balance a panel

    I hope this request makes sense, as it is just to aid in my estimation. Below is the dataex of a dummy dataset resembling my original, and below that I will describe my problem.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str1 ID float(phase HasMembership)
    "A" 1 0
    "A" 3 1
    "B" 1 1
    "B" 2 0
    "B" 3 1
    "C" 1 0
    "C" 2 1
    "C" 3 1
    "D" 2 1
    "D" 3 0
    "E" 1 1
    "E" 3 0
    "F" 1 1
    "F" 3 0
    end
    In my previous post, I had requested a way to track an individual's membership changes between phases. The advice given in that post was very good. I was able to generate variables which described whether an individual gained, lost, retained, or retained lack of a membership between any two consecutive phases.

    The problem with my actual full fledged dataset is that there are individuals who don't always have consecutive phases. For example, in the given dataex, individual A has observations only in phase 1 and phase 3, we don't know anything about him in phase 2. Therefore with the solution code given in my previous post, the generated variables could not capture anything for individual A. It is my mistake that when I provided a dummy representative dataset, I made it balanced instead of unbalanced.

    To counter this problem, is there any code or solution in stata by which I can generate dummy observations for individuals whose observations are not in every phase? And of course the values of Membership for those dummy variables would be the missing value. This is only to counter the problem that the solutions won't work for non consecutive periods. Hence since individual A has no observations in phase 2, his p2_p3 variable is missing. But I still want to capture the change that some time between phase 1 and phase 3, he did gain membership.

    Otherwise if there is any other viable solution, I would be grateful to know.

    EDIT: Thanks to Mr. Schechter for pointing out the mistake in the dataex, I have updated it
    Last edited by Sohini Mazumder; 28 Jun 2022, 13:50.

  • #2
    -help tsfill- will show the way.

    BUT, you have a problem with the data. ID "B" has two observations for phase 1, and, worse, they are contradictory. You won't be able to -xtset- your data until you reduce it to a single observation per ID per phase. The fact that you have "surplus" observations like that is disturbing enough, that they contradict each other on another key variable is even worse. Before you move forward, go back and review the data management that created this data set. It seems to be significantly flawed and may contain other errors as well. Get the data right before you move on to analysis.

    Comment


    • #3
      Originally posted by Clyde Schechter View Post
      -help tsfill- will show the way.

      BUT, you have a problem with the data. ID "B" has two observations for phase 1, and, worse, they are contradictory. You won't be able to -xtset- your data until you reduce it to a single observation per ID per phase. The fact that you have "surplus" observations like that is disturbing enough, that they contradict each other on another key variable is even worse. Before you move forward, go back and review the data management that created this data set. It seems to be significantly flawed and may contain other errors as well. Get the data right before you move on to analysis.
      Thank you, Mr. Schechter. Actually the fault is mine, in editing a dummy dataset to provide as a dataex example, I made a hurried mistake. The original dataset is does not have these errors and has only a single observation per ID per phase.

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input str1 ID float(phase HasMembership)
      "A" 1 0
      "A" 3 1
      "B" 1 1
      "B" 2 0
      "B" 3 1
      "C" 1 0
      "C" 2 1
      "C" 3 1
      "D" 2 1
      "D" 3 0
      "E" 1 1
      "E" 3 0
      "F" 1 1
      "F" 3 0
      end
      I have updated a corrected dataex with one observation per ID per phase.

      Comment


      • #4
        Thanks for the corrected example data.

        Code:
        encode ID, gen(n_ID)
        tsset n_ID phase
        tsfill
        list, noobs clean
        Added: It dawns on me that it is unclear how you want to handle ID's like "D" where the "gap" is at the beginning. The code shown above deals with skips within the sequence, but does not deal with situations where the first or final wave is not instantiated. If you want to generate extra observations for those as well, then add the -full- observation to the -tsfill- command and it will do that.
        Last edited by Clyde Schechter; 28 Jun 2022, 13:57.

        Comment


        • #5
          Originally posted by Clyde Schechter View Post
          Thanks for the corrected example data.

          Code:
          encode ID, gen(n_ID)
          tsset n_ID phase
          tsfill
          list, noobs clean
          Added: It dawns on me that it is unclear how you want to handle ID's like "D" where the "gap" is at the beginning. The code shown above deals with skips within the sequence, but does not deal with situations where the first or final wave is not instantiated. If you want to generate extra observations for those as well, then add the -full- observation to the -tsfill- command and it will do that.
          Thanks a lot for the help and the suggestion

          Comment

          Working...
          X