Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Again...egen

    Hi statlist,

    as Prof suggested me I am reading the even pdf command but I am still quite confused. I have the following dataset:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double idfirm float Year double idproduct
    1 2008  3051
    1 2008 12012
    1 2008  8687
    1 2008  4088
    1 2009  8687
    1 2009  4088
    1 2009 12012
    1 2009  3051
    1 2010  8687
    1 2010 16087
    1 2010  3051
    1 2010  4088
    1 2010 12012
    1 2011 18347
    1 2011  4088
    1 2011 12012
    1 2011  3051
    1 2011  8687
    1 2011 10080
    1 2011 16087
    1 2012  8687
    1 2012  4088
    1 2012  7657
    1 2012  3051
    1 2012 16087
    1 2012  6051
    1 2012 10080
    1 2012 12012
    1 2012  7193
    1 2013 10080
    1 2013  6051
    1 2013  3051
    1 2013 12012
    1 2013 11192
    1 2013  7193
    1 2013 11757
    1 2013  7657
    1 2013  4088
    1 2013 16087
    1 2013   287
    1 2013 21124
    1 2013  8687
    1 2013  1474
    1 2014 12012
    1 2014  8087
    1 2014 11757
    1 2014  3026
    1 2014  8687
    1 2014   287
    1 2014 21124
    1 2014  7193
    1 2014 19783
    1 2014 10237
    1 2014 16087
    1 2014 10080
    1 2014  3051
    1 2014  1474
    1 2014  4088
    1 2014  7657
    1 2014  6051
    1 2014 21428
    1 2014 11192
    1 2015  3026
    1 2015 12012
    1 2015 10237
    1 2015  4088
    1 2015   287
    1 2015 21428
    1 2015  8087
    1 2015 11192
    1 2015 11757
    1 2015 19783
    1 2015  1474
    1 2015  7193
    1 2015  3051
    1 2015 21124
    1 2015 16087
    1 2015  6051
    1 2015 10080
    1 2015  7657
    2 2004  2669
    2 2004  4632
    2 2004 19459
    2 2004 19458
    2 2004  1690
    2 2004  1691
    2 2005  1691
    2 2005  4632
    2 2005  1690
    2 2005 19459
    2 2005 19458
    2 2005  2669
    2 2006  1690
    2 2006  4632
    2 2006 19458
    2 2006 19459
    2 2006  2669
    2 2006  1691
    2 2007  4632
    2 2007 19459
    end
    What I would like to do is to build up a variable "counter" taking on 1 if a product is lost in a year, that is if it is observed in one year and it disappears the next year. So the point is that I do not know how to do it because if the product disappears the successive year, I cannot flag it with a 1. I can just flag it in the year before it disappears (e.g. product 18347 in 2011 disappears in 2012; actually I would like to flag product 18347 in 2012 because in a successive stage I could collapse (sum) counter, by(idfirm Year) in order to obtain the number of lost products in 2012, but I cannot because indeed it disappears). How can I do it? The idea that I have is to invert the flag, that is flag with 1 the products that do not disappear from one year to the other and then make the difference between the total number of products in a year and the flagged values...but don't know...

    Thank you very much,

    Federico




  • #2
    Check whether the following is doing what you want to do:

    Code:
    . sort idfirm idproduct Year
    
    . by idfirm: gen disappeared = idproduct != idproduct[_n+1]

    Comment


    • #3
      Hi and thank you for the reply. It actually does not do the job since it puts a flag 1 also in the last year of the sample in which it is obvious that the product will disappear since I have no more years after 2015. Maybe
      Code:
       
       by idfirm idprod Year: gen disappeared = idproduct != idproduct[_n+1]
      ?

      Comment


      • #4
        If the following does not work, then I think you need to create a variable by hand in the example data that shows what you want the new variable to look like:

        Code:
        bysort idfirm idproduct (Year): egen lastyear=max(Year)
        gen flag=0
        replace flag= 1 if  lastyear== Year & Year!=2015
        Stata/MP 14.1 (64-bit x86-64)
        Revision 19 May 2016
        Win 8.1

        Comment


        • #5
          Try your code and you will see that it generates 1 for each observation.

          Note that a command produces a result with total disregards of what is obvious to you

          I am aware that what I proposed flags with 1 observations at the end of the sample, which is neither right, nor wrong. At the end of your sample we simply do not know whether the product will disappear next year or not, because you do not have data on the next year.

          So then the question is what do you want to do with such observations? How you do want to define the observations for which you do not know whether they disappear or not, because next year is missing?

          Comment


          • #6
            Hi Carole and thank you for the reply. I will try and let you know as soon as possible.
            Joro I see your point. So I would say that a missing value will be appropriate in the last year of the sample.

            Comment


            • #7
              Then

              Code:
              by idfirm: gen disappeared = idproduct != idproduct[_n+1] if year<2015
              should do what you want.

              (What Calore does also does the job except that she defines with 0s the observations at the end of the sample.)

              Comment


              • #8
                The sort order is as before, so the whole thing:

                Code:
                .  sort idfirm idproduct Year
                
                . by idfirm: gen disappeared = idproduct != idproduct[_n+1] if Year<2015
                (18 missing values generated)

                Comment


                • #9
                  Both codes (Joro and Carole ones) seem to work fine! That you very much!
                  May I ask you a related question? Actually I think I am doing it in the right way but since it does not seem to work, I would like to share with you the issue:

                  so the dataset is the one I displayed in #1. What I would like to do is to generate a dummy taking on value 1 if a product is new, that is if it does appear in one year but not in the previous one for each firm. Now, since I had the variable age product I had the following:

                  Code:
                  *nuovi_prodotti:
                  gen counter_new_prod = 0
                  bys idf idpr Year: replace counter_new_prod = 1 if agepr == 0
                  Since however it does not seem to work in further analyses I was thinking of redefining the counter of new product as taking on value 1 whenever a product appears in one year but did not appear the year(s) before for a firm (so for instance again product 18347 will be a new product for firm 1 in 2011 because it did not appear in 2010 nor in 2009 or 2008 for firm 1. Therefore the counter will take on value 1 for it and of course 0 if it appear in successive years because it is not more new in the next years).

                  Thanks again!

                  Comment


                  • #10
                    I cannot tell if your sample begins at the same year for each firm. For the two firms you show, the earliest date is 2008 for firm 1, but 2004 for firm 2. I'll assume for the following code that the sample dates vary by firm.
                    Code:
                    bysort idfirm: egen first_year_sample=min(Year)  //first year for each firm
                    bysort idfirm idproduct : egen first_year_prod=min(Year)  //first year for each product within firm
                    gen new_prod=0
                    *it is a new product if the first year of the product is not the first year of the firm sample
                    replace new_prod=1 if first_year_prod==Year & first_year_prod!= first_year_sample
                    If you have specific dates for the whole sample (say all firms begin in 2004), then:
                    Code:
                    bysort idfirm idproduct : egen first_year_prod=min(Year)  //first year for each product within firm
                    gen new_prod=0
                    *it is a new product if the first year of the product is not the first year of the firm sample
                    replace new_prod=1 if first_year_prod==Year & first_year_prod!= 2004
                    Stata/MP 14.1 (64-bit x86-64)
                    Revision 19 May 2016
                    Win 8.1

                    Comment


                    • #11
                      Thank you Carole for the reply.
                      So, yes the initial year for each firm differs. A firm can appear in the sample in a year and another in a different one.
                      So to sum up: a new product is such if the firm had it in a year but did not have it the year before. So for instance if the firm appears for the first time in 2008 and has products A and B, and in 2009 it has A B and C, the firm has 1 new product C which is flagged 1. If the same firm in 2010 has products B C D, D is a new product and has to be flagged with 1 and so on. The code above does this job right?

                      Thank you!

                      Federico

                      Comment


                      • #12
                        Yes, I believe the first block of code will give you what you want.
                        Stata/MP 14.1 (64-bit x86-64)
                        Revision 19 May 2016
                        Win 8.1

                        Comment


                        • #13
                          Indeed it does!

                          Thank you very much!

                          Comment

                          Working...
                          X