Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • NBER Patent Data Project in Stata

    Hello,

    I am trying to match compustat data with patent data by following this guideline: http://users.nber.org/~jbessen/matchdoc.pdf
    Anyone did this before?

    Currently I have a file with financial data sorted by gvkey and year.
    Furthermore, I have downloaded the pdpcohdr and dynass files from the site: https://sites.google.com/site/patent...Home/downloads
    And I downloaded the patent data file from http://elsa.berkeley.edu/pub/users/bhhall/NBER06.html.

    The matchdoc.pdf file contains STATA code to merge the two data sets.
    Two questions:

    1) Looking at the patent data file above, I have multiple pdpasses and IPCs for each patent. For example:
    year patent icl pdpass
    1974 3930732 G01B 1500 10030734
    1974 3930732 G01B 900 10030734

    I want to create the variable npat, which containts number of patents for each pdpass-year. In the example, it should count as 1 patent for pdpass-year 10030734-1974. How do I create this in stata?

    2) The stata example code says:

    * now find the appropriate gvkey to assign the patents
    gen gvkey=.
    forvalue i=1/5 {
    replace gvkey = gvkey`i' if gvkey`i'~=. & year>=begyr`i' &
    year<=endyr`i'
    }

    When I perform this command I get the error code
    invalid syntax
    r(198);

    I have the variables gvkey1 to gvkey5, begyr1 to begyr5, endyr1 to endyr5 (all from dynass file) and renamed appyear to year.
    What could be wrong here?

    Thanks in advance!

    Best,
    MCG

  • #2
    Cross-posted at http://www.talkstats.com/showthread....oject-in-Stata

    See the Advice in the FAQ about our policy on cross-posting.

    Comment


    • #3
      Thank you, Nick.

      Let me explain my first question more in general.
      So let's say I have the following data (+ some more variables not relevant here):

      Year Patent Pdpass
      1977 4166686 6361394
      1977 4171891 6361394
      1977 4171891 6361394
      1977 4166678 6361394
      1977 4175847 6361394
      1977 4166678 6361394
      1977 4166682 6361394
      1977 4146320 6361394

      Where year = application year, patent = patent number and pdpass = a unique company identifier which later can be linked to its appropriate gvkey to match it with the compustat data.
      As you can see, there are some doubles since they could have different international classification numbers. But I do want to count them as 1.
      What I would like to have now is a variable called npat (number of patents) for each pdpass-year. Such that I get:

      Pdpass Year npat
      6361394 1977 6
      1234567 1977 4 (for example)
      ....
      ....

      How could I create this?
      Thanks!

      Best,
      Benno

      Comment


      • #4
        help contract

        Comment


        • #5
          But if I use the contract option it deletes every other variable and it gives the frequency of each unique patent number (instead of giving the number of patents per pdpassyear).
          What can I do to keep the variable 'appyear', while getting one row per pdpassyear with the number of patents?

          Comment


          • #6
            contract is a command, not an option. The very first example in the help (did you try it?) shows how to count cross-combinations of two variables.

            You don't give your code, but you presumably have in mind

            contract pdpass

            What you need is evidently

            contract pdpass appyear

            Comment


            • #7
              MCGNL,

              (I just answered to this in the cross-posting given by Nick Cox.)

              I believe you want to count distinct values, so you can drop the duplicates using duplicates drop before contracting.

              More in http://www.stata.com/support/faqs/da...-observations/

              Please honor the petition in the FAQ to use your real full name.
              You should:

              1. Read the FAQ carefully.

              2. "Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly!"

              3. Describe your dataset. Use list to list data when you are doing so. Use input to type in your own dataset fragment that others can experiment with.

              4. Use the advanced editing options to appropriately format quotes, data, code and Stata output. The advanced options can be toggled on/off using the A button in the top right corner of the text editor.

              Comment

              Working...
              X