Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Counting unique values in subsets with conditionals

    Hi all,
    I have a large dataset of patents that were filed in the period of 1980 till 2009. This dataset is characterised by the patnumbers, permno (company id), fileyear, filedate and class.
    I am having problems generating a command to count how many times a specific "class" occurs in a year compared to the years before by permno (company id). I would like to create a new variable UNIQCLASS that counts how many new patent classes a company files in a year. I first tried to generate a command that would compare the class of the patent and look whether this specific class was already filed by the company in the previous dates. If it was not filed in the previous dates to identify it as unique. Then to generate a new variable that counts these unique patents per year. However, my attempts did not workout, since i do not know how to incorporate the conditional to compare it with the previous dates. Is there any way that i can do this? If someone has any idea, I would really appreciate any suggestions.

    patnum filedate idate permno class subclass ncites fyear
    ​​​​​​3071251 "1/4/1960" "1/1/1963" "" 10006 "210" "349000O" 7 1960
    3086757 "1/5/1960" "4/23/1963" "" 10006 "261" "034200O" 8 1960
    3082875 "1/11/1960" "3/26/1963" "" 10006 "210" "416400O" 6 1960
    3135857 "1/11/1960" "6/2/1964" "" 10006 "219" "124340O" 17 1960

    Thank you in advance!!

    Kind Regards,

    Leidy

  • #2
    Leidy:
    too much surgery is needed to tam your code.
    Please repost the excerpt/example of your data using -dataex- (type search -dataex- from within Stata to install it). Thanks.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      To further Carlo's advice, please review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post, looking especially at sections 9-12 on how to best pose your question. It would be particularly helpful to post a small hand-made example, perhaps with just a few variables and observations, showing the data before the process and how you expect it to look after the process. In particular, please read FAQ #12 and use dataex and CODE delimiters when posting to Statalist.

      With that said, perhaps the following untested code will do what you want. Or perhaps not, since it is untested.
      Code:
      // generate first=1 for the earliest occurrence of each patent class
      by permno class (year), sort: generate first = (_n==1)
      // total the number of firsts by year
      by permno year, sort: egen newclasses = total(first)

      Comment


      • #4
        Often asked here: e.g. see hints in this thread from yesterday https://www.statalist.org/forums/for...ring-variables

        Comment


        • #5
          Thank you all for your help! William Lisowski your code worked out! I only did a small modification on the first part of the code, instead of using year i used the exact date. Then in the second part i used year. It worked perfectly. Thank you so much, you are a lifesaver!

          Comment


          • #6
            Leidy Bedoya Thank you for taking the time to understand how the code I wrote works and to improve on it to meet your needs!

            Comment


            • #7
              Dear all ,
              I have another question regarding the same dataset, thats why i am posting it here and did not create another post. Maybe one of you can help me out again. This time I am trying to append only the column of the new generated variable "newclasses" to another dataset. However it is not working out since, the patents dataset contains more than one observation for one company per year. I was thinking to maybe create a new variable for the observations that have a 1 on firsts, where i would merge the permno and the year so that it became 1001782000 and do the same for the other database, then try to match the two databases based on this new variable and append the newclasses column into it. But then if a company has 0 firsts in a respective year I would also like it to give me a 0. After having a lot of complicated ideas I don't know how i can do this. Do you have any suggestions on how i can do this?
              Thank you in advance

              Dataset Patents:
              permno patnum class fdate fdateyear firsts newclasses
              100178 5638505 202 01/02/00 2000 1 2
              100178 5152506 202 01/03/00 2000 0 2
              100178 5346378 203 02/05/00 2000 1 2
              100178 5346668 405 03/06/01 2001 1 1
              100178 5346669 405 04/06/01 2001 0 1
              Dataset Exploratory Inno:
              Dataset2
              permno fdateyear
              100178 2000
              100178 2001
              How i would like it to look
              after appending
              permno fdateyear ExploratoryInno
              100178 2000 2
              100178 2001 1

              Comment


              • #8
                merge or append is not what you are looking for, If you just want to get a dataset with one observation per id-year. All you need to do is to keep the first observation by id-year in your Patents data.
                Code:
                bys permno fdateyear: keep if _n==1
                Last edited by Oded Mcdossi; 26 Jun 2017, 01:52.

                Comment


                • #9
                  Oded Mcdossi I think I didnt explain it correctly in my previous message. The dataset of patents has many other patents that are not in the dataset 2. Dataset 2 also has other variables, i just included only permno and year in the description since these are the two identifiers that would match in the two databases. thank you for your code! I used it to only keep the first observation of the year. And then i managed to merge the two datasets. However, now I would like to keep those observations for which: _merge==2 _merge==3 and only the ones of _merge==1for the three years before that there was a match. I will give an example of what i mean:

                  lets say that when i merged the databases there was a match for:

                  permno 100177 fdateyear 2000, 2001, 2002. I would also like to keep the observations for 1999, 1998, 1997 for this permno and drop the rest (for the years 1996, 1995 etc)
                  permno 100189 fdateyear 2004, 2005, 2006. I would also like to keep the observations for 2003, 2002, 2001 for this permno and drop the rest
                  ............etc

                  These extra years are now part of the _merge==1. It is also important to note that the _merge==1 part consists of these three prior years that i want to keep, other prior and later observations for the same permno and other permno's that were not in dataset 2 . The reason for doing this is that I may need the information of the previous 3 years later and i don't want to drop all of the observations that were not merged. Is there a way to do this?

                  Comment


                  • #10
                    It's almost impossible to help without a specific example. you have to remember that this forum is multidisciplinary and most of the members do not know the shape of your data even if it is very known in your field. Please follow the FAQ and use dataex (type -ssc install dataex-) to sample from your data and exact details on what you would like get. Having said that, I think the following code does what you want but it is based on assumptions about your data.
                    Code:
                    bys permno (fdateyear): egen max_year=max(fdateyear)
                    bys permno (fdateyear): gen three_years=max_year-fdateyear
                    drop if three_years>6 & _merge==1
                    Last edited by Oded Mcdossi; 26 Jun 2017, 06:15.

                    Comment


                    • #11
                      Dear Oded Mcdossi , here is a sample of my data to make it more comprehensible.

                      For the permno (company id) 10104 the first match was for the year 2005. So I would like to keep those observations for which: _merge==2 _merge==3 and only the ones of _merge==1for the three years before that there was a match (2004,2003,2002). I tried the formula that you gave me but it would not workout (see max_year and three_years). Thank you very much for your help!

                      permno class statafdate yearfdate first newclasses1 ID AcquirorName MAyear _merge max_year three_years
                      10104 718 13522 1997 0 6 master only (1) 2009 12
                      10104 370 13888 1998 1 2 master only (1) 2009 11
                      10104 710 14259 1999 0 1 master only (1) 2009 10
                      10104 714 14612 2000 0 1 master only (1) 2009 9
                      10104 1 14985 2001 0 4 master only (1) 2009 8
                      10104 1 15343 2002 0 2 master only (1) 2009 7
                      10104 1 15712 2003 0 1 master only (1) 2009 6
                      10104 717 16072 2004 0 3 master only (1) 2009 5
                      10104 345 16441 2005 0 0 101042004 Oracle Corp 38359 matched (3) 2009 4
                      10104 714 16804 2006 0 0 101042004 Oracle Corp 38359 matched (3) 2009 3
                      10104 714 16804 2006 0 0 101042005 Oracle Corp 38748 matched (3) 2009 3
                      10104 707 17169 2007 0 1 101042005 Oracle Corp 38748 matched (3) 2009 2
                      10104 707 17169 2007 0 1 101042004 Oracle Corp 38359 matched (3) 2009 2
                      10104 707 17169 2007 0 1 101042006 Oracle Corp 38901 matched (3) 2009 2
                      10104 715 17568 2008 0 0 101042005 Oracle Corp 38748 matched (3) 2009 1
                      10104 715 17568 2008 0 0 101042006 Oracle Corp 38901 matched (3) 2009 1
                      10104 2009 101042006 Oracle Corp 38901 using only (2) 2009 0
                      10107 725 13520 1997 0 2 master only (1) 2009 12
                      10107 704 13884 1998 0 5 master only (1) 2009 11
                      10107 1 14248 1999 0 3 master only (1) 2009 10
                      10107 345 14977 2001 0 1 101072000 Microsoft Corp 36986 matched (3) 2009 8
                      10107 709 15343 2002 0 5 101072000 Microsoft Corp 36986 matched (3) 2009 7
                      10107 715 15707 2003 0 4 101072002 Microsoft Corp 37603 matched (3) 2009 6
                      10107 715 15707 2003 0 4 101072000 Microsoft Corp 36986 matched (3) 2009 6
                      10107 455 16075 2004 0 5 101072002 Microsoft Corp 37603 matched (3) 2009 5
                      10107 717 16439 2005 0 11 101072002 Microsoft Corp 37603 matched (3) 2009 4
                      10107 1 16804 2006 0 1 master only (1) 2009 3
                      10107 709 17169 2007 0 2 master only (1) 2009 2
                      10107 707 17533 2008 0 1 master only (1) 2009 1
                      10107 711 17905 2009 0 0 master only (1) 2009 0

                      Comment


                      • #12
                        Is this what you are looking for?
                        Code:
                        bys permno _merge (yearfdate): egen max_year=max(yearfdate)
                        gen three_years=max_year-yearfdate
                        drop if three_years>6 & _merge==1

                        Comment


                        • #13
                          HTML translation got in the way there. I think Oded's last line should be

                          Code:
                           drop if three_years > 6 & _merge == 1

                          Comment

                          Working...
                          X