Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Panel Data Duplicates => Create Distinguishing Variable?

    Hi everybody,

    I'm currently calculating a model where I have three time points at which the same event can happen (panel data): T1 T2 T3

    At each time point the observations/persons can go from 0 => 1 (latent change model)

    So we have a pattern like:

    T1 T2 T3
    1 0 0
    0 1 0
    0 0 1
    1 1 0
    0 1 1
    0 0 0

    The cases that interest me and which I need for the rest of my analyses are the ones where the persons of observation go from 0 => 1.

    I also created a variable that distinguishes whether the person has had a "1" in the previous years, so that the 0 => 1 is unique (if there was a 1 before, the cases are set to .)

    What I did was lay the time frame between T2 and T3 over the time frame between T1 and T2 because this is for subsequent years and basically nothing changes (same events can happen, only political climate might be a control factor which for this analysis is irrelevant) - the whole purpose of this analysis was because data for T1 + T2 was insufficient a few years back and now we wanted to check whether adding the new phase would yield more stable results.

    So I appended the data so that I only have two time points now but with data from both phases (T1->T2 & T2->T3)

    The only problem I have now is duplicates. Maybe it's a logical problem I'm having, but I obviously got duplicate cases because the person numbers from the panel data are doubled since I have two time frames with the same persons in it (only additional observations/cases can happen for the 2nd phase). Since the data is made up in a way that to me only the 0 => 1 case is interesting (I want to observe change) I thought it wouldn't be a problem, because it can only happen once, but I need to identify the unique new cases where 0 => 1 happened in the 2nd time frame and definitely not drop them, while finding a way / creating a variable that tells me which cases are duplicates / drop the cases where I have double person numbers and basically compare the same persons from two different phases against each other while pretending they're in the same.

    I thought about bringing in a variable that captures the years and maybe there's a way to easily drop those cases that would be double and harmful to the analysis and keep the ones which have happened uniquely in both T1+T2 & T2+T3 so that I can cleanly compare them...

    Should anybody know how to deal with this and know a solution to the problem (if it is really problematic), I'd be very glad.


    Kind regards
    Kornelius


  • #2
    Hi again,

    I did some more thinking on how to handle this problem and the data and made a few screenshots to illustrate:

    In the first screenshot you can see my code for identifying and deleting duplicates. First I appended the data, generated "newv" to see which observations came from the 2nd phase (T2-T3). I then tried to delete duplicates, but as you can see in screenshot 2 and 3 it did not work out properly or I didn't specify enough for "duplicates drop" to do its job? Anyways, I used the simple "drop" command with "if parameters" in order to get the same result. What I mentioned in the post above is that the change from 0 => 1 is important for my analysis, which is captured in the variable "f_all". So I definitely wanted to keep those cases where f_all == 1. I told drop to delete those observations that came from the 2nd phase (newv==1) and are NOT f_all==1. Problem is, that id (persnr in my analysis) is often duplicate as you can see in screenshot 3, but there are also unique cases in the 2nd phase with no double id/persnr when appended, that get deleted as well when I use "drop if newv==1", so what I THINK I need would be an additional variable that captures the cases where id (persnr) is duplicate and gives those cases from phase 1 a certain value in the variable and the cases from phase 2 another value, so I can always drop one duplicate id (persnr) case - and still keep all the unique cases which have a "newv==1".

    Should anybody have a solution/idea in how to create such a variable that gets the info from id/persnr and in case of a duplicate persnr makes it into values 0 and 1, so two groups, which would enable me to use an expression like "drop if distinguishID==1 & f_all==0" - that would be very helpful and much appreciated!

    Kind regards
    Ralf
    Last edited by Kornelius Sarzahn; 30 May 2015, 08:57.

    Comment

    Working...
    X