Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Adding rows with elements 0 conditional on if statements

    I have a question regarding inserting observations in my dataset. My dataset currently looks like:



    Year X Y Z

    2013 1 2 z(1,2)13
    2014 1 2 z(1,2)14
    2013 1 3 z(1,3)13
    2014 1 3 z(1,3)14
    2014 1 4 z(1,4)14
    2013 1 5 z(1,5)12
    2014 1 5 z(1,5)14



    Here, think of X and Y as different individuals. The way to read this is that individuals X and Y are linked through a common value that they share Z(x,y) in year (13) or (14). Hence, Z(1,2)13 reads the value of variable Z for individuals 1 and 2 in the year 13. I want to create a new variable Z' such that it is the first difference of the variable Z by individual pairing over the 2 years. However, my problem is as follows. As can be seen in the example dataset, individuals 1,4 only have observations for one time period.

    I want to ideally create a row of zeros whenever this happens.
    A complications is that when I take the difference, I want it to take the (value -0) if the missing year is 2013 but (0-value) if the missing year is 2014. I do not know how to implement this in the dataset. I have tried numerous things in vain.

    I guess if I were able to declare my data as panel, first difference operations may be easier to recognize. However, given that my data is not really longitudinal in the conventional sense, Stata responds with an error message:

    repeated time values within panel
    r(451);


    Any help is greatly appreciated.
    Thanks!

  • #2
    First, let's deal with the -xtset- issue. Your panel variable has to be the pair X, Y, not X or Y alone. Of course, -xtset- requires a single panel variable, so you have to create one:

    Code:
    egen xy_pair = group(X Y)
    xtset xy_pair year
    If you still get that error message, then it means you have multiple observations for the same pair in the same year. That would be an error in your data. You can identify those observations (if any) with -duplicates list xy_pair year-, and then you can go about resolving that problem.

    Once your data is -xtset-,

    Code:
    gen z_prime = D1.Z
    replace z_prime = 0 if missing(z_prime)
    As for
    complications is that when I take the difference, I want it to take the (value -0) if the missing year is 2013 but (0-value) if the missing year is 2014.
    I don't under stand what you mean. Please show a hand worked example of some starting data and the desired result that includes these cases.

    Comment


    • #3
      Mr. Schechter: thanks a lot for taking out the time to respond to my query. I was not clear enough in the second part of my question- however your answer made the second question redundant.

      What you have suggested works impeccably. Thanks a lot!

      Comment


      • #4
        Thanks again for your response. Sorry for the bother, but the complication that I alluded to has made a reappearance. I had stated that "complications is that when I take the difference, I want it to take the (value -0) if the missing year is 2013 but (0-value) if the missing year is 2014.".

        What I mean by that is that there are pairings of individuals (i,j) such that the datapoint is only visible for one of the years.
        For instance, say that for (i,j), the datapoint is available for the year 2013.
        The difference value should technically be (0-Z(i,j)13). The reason for that is because in my dataset, a missing value means that the actual value is 0. On the other hand, if the value for (i,j) is only available for 2014, the value of the difference should be (Z(i,j)14-0). This is because in the first case, the individuals went from having some value to 0, whereas in the second case,they went from having 0 to some value. The difference should reflect this. Your current suggestion imputes a differenced value of 0 in both cases, as evidenced by the code:

        replace z_prime = 0 if missing(z_prime)

        Please let me know if you have any suggestions.
        Thank you so much.
        Last edited by Chinmay Sharma; 24 Nov 2015, 07:42.

        Comment


        • #5
          You can easily add a condition if year == 2013 or if year == 2014 to your code as needed.

          Comment


          • #6
            Thanks for the response. It is not obvious to me how to implement the "if" condition. The only way I can envision using it is if I can indicate that the differenced variable should take on the value (0-Z(i,j)2013)) if only the year 2013 is available for the individual pairing. I do not know how to make such an indication.

            Comment


            • #7
              Actually, another way of implementing this would be to create a balanced panel out of my unbalanced panel. In otherwords, for those individual pairings that do not have datapoints for both years, is there a command that could insert a row of 0's for every such missing year in order to create an artifically balanced panel?

              Comment


              • #8
                If only one year is available your problem simply does not arise. Clyde asked for a worked example of your data in #3, but I don't see one here.

                Comment


                • #9
                  I think I should be able to do this now. I use the following code:


                  egen balancecount=count(1), by(I J)
                  gen balance=1 if panel==2

                  This identifies which observations do not have datapoints for both periods. I can proceed with if statements as suggested. Thanks!

                  Comment


                  • #10
                    I can provide a worked example. The code suggested above inserts a value of '0' for values of the differenced variable for:
                    a) the year 2013 when the difference variable does not exist (as it takes the difference of the variable for year 2014 and 2013)
                    b) For those individual pairings that do not have either 2013 or 2014 listed.

                    Consider fhe following dataset where X and Y denote individuals and z their joint value.

                    Year X Y z
                    2013 i j z(i,j)13
                    2013 i k z(i,k)13
                    2014 i k z(i,k) 14

                    The egen z' =D.1z command simply does not work for the first datapoint because this grouping of individuals do not have a value in 2014. The following command:
                    replace z_prime = 0 if missing(z_prime) simply imputes a value of 0 for this datapoint, which is not what I want.
                    I hope this example makes sense.

                    Comment


                    • #11
                      Reply to #9

                      Pleased you solved your problem, but your code as posted makes little sense because you don't use the variable you created. I imagine that you intend something like

                      Code:
                      egen balancecount = count(1), by(I J)
                      gen balance = 1 if balancecount == 2
                      Note that there are ways of getting there more directly, such as

                      Code:
                      bysort I J : gen balance = _N == 2
                      That gives a variable that is 1 or 0 rather than 1 or missing, but it still can be used.

                      Reply to #10

                      Your egen statement is quite illegal on at least three separate grounds. That's why it doesn't work. Or (a little more positively) mixing code and pseudocode as you may be doing makes it somewhere between difficult and impossible to work out exactly what you did.

                      Your worked example doesn't qualify as data we can copy and paste. It's just schematic. For future threads please do read and follow the FAQ Advice #12 on this point.

                      Beginners on this list are prone to regard the FAQ Advice as over-fussy and as something to be skimmed, not followed, but every piece of advice is there for a good reason.

                      Last edited by Nick Cox; 24 Nov 2015, 08:20.

                      Comment


                      • #12
                        Sorry, I typed incorrectly. The first code you posted is what I had run. Thanks a lot for the more efficient code.

                        Comment

                        Working...
                        X