Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating a new variables that counts number of patents per firm per year

    Hi all,

    I am currently working on a dataset with different firms (name_std), different years (years_id), their granted patents (patent_id), and with a dummy variable (green_d) that equals 1 when the patent is a green patent.
    What I now want is to generate a new variable that counts how many patents that a firm is granted in a given year is a green patent ( green_d = 1). Hence, it needs to count the number of green patents per year per firm.

    Anyone that can help me with this, it would be much appreciated!

  • #2
    Could be something like

    Code:
    egen n_green = total(green_d), by(name years)

    Comment


    • #3
      What if you would like something like the table below. What would it be the command please?

      Number of Patents Number of Green Patents Number of NoGreen Patents
      Years Total Total Total
      1926
      1927
      1928
      1929

      Comment


      • #4
        Giulia:
        see the -table- suite worked out examples in related entry of Stata .pdf manual (release =>17).
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Thanks Carlo. Unfortunately, I have not found an answer to my question.

          I have created this table that shows the Number of green and non-green patents per year
          egen not_green = total(scheme_code == 0), by (filing_year)
          egen green = total(inrange(scheme_code,1,9)),by (filing_year)
          tabdisp filing_year, c(green not_green)

          Now, I need to add an extra column that shows how many companies filed these green patents and non-green patents.
          To do that I have written:

          egen n_firm_green_p = total(green), by(permno filing_year)
          tabdisp filing_year, c(green not_green n_firm_green_p)

          But it does not give me the correct outcome.

          Where am I mistaking?

          Can anyone help me, please?

          Thanks!

          Comment


          • #6
            Your code is double counting. For example, green is the total over each year of observations ofscheme that arenot 0. It is thus constant across perm_no. Pushing it through total() again won't subdivide; it just multiplies up by the number of observations with the same total in each group.

            I guess you want something quite different, more like

            Code:
            gen is_green = inrange(scheme, 1, 9) if !missing(scheme) 
            egen tag = tag(perm_no filing_year is_green)
            egen wanted = total(tag), by(is_green filing_year)


            Here's a toy dataset and results to discuss.

            Code:
            Code:
            * Example generated by -dataex-. For more info, type help dataex
            clear
            input float (filing_year perm_no scheme)
            1 1 0
            1 2 1
            1 3 2
            1 3 3
            1 4 4
            2 1 0
            2 1 1
            2 1 2
            2 1 3
            2 2 4
            end
            
            
            gen is_green = inrange(scheme, 1, 9) if !missing(scheme) 
            egen tag = tag(perm_no filing_year is_green)
            egen wanted = total(tag), by(is_green filing_year)
            
            tabdisp filing_year is_green, c(wanted)
            
            ----------------------
            filing_ye |  is_green 
            ar        |    0     1
            ----------+-----------
                    1 |    1     3
                    2 |    1     2
            ----------------------

            Comment


            • #7
              Thanks a lot Nick!

              Now I have to determine the number of patents number of firms, citations , avrage Xi_nominal,average Xi_real per year (total at the end)

              To compute how many patents per company I have per year I just add together by filing_year the total amount under 0 and 1 for is_green (see your example above.

              bys permno filing_year: egen n_citations = sum(cites) // I computed how many citation per company I have per year
              bys permno filing_year: egen avg_xi_nominal = mean(xi_nominal) // I have computed the avergae per company per year of xi_nominal
              bys permno filing_year: egen avg_xi_real = mean(xi_real) // I have computed the avergae per company per year of xi_nominal
              sort permno filing_year
              order patent_num permno filing_year n_patents cites n_citations xi_nominal avg_xi_nominal xi_real avg_xi_real cpc scheme_code cites filing_date issue_date issue_year
              tabdisp filing_year, c( n_citations avg_xi_nominal avg_xi_real)

              Is this correct?

              Thanks

              Comment


              • #8
                I suggest the method used in #6. Work with a tiny subset of your dataset for which you can verify independently that the results are correct.

                Comment

                Working...
                X