Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • how to identify the number of observations and unique firms for each country

    Dear members

    I have a panel dataset for firm-year observations in 58 countries spanning from 2005-2020. I would like to have summary statistics showing the number of observations and unique firms for each country in the sample.

    I have created a country_id and firm_id, and the number of observations per country. Here is an example of my data: Thanks for your kind help.
    input str44 Country int Year float(firm_id country_id country_observation)
    "United Arab Emirates" 2017 1 55 20
    "United Arab Emirates" 2018 1 55 20
    "United Arab Emirates" 2019 1 55 20
    "United Arab Emirates" 2018 2 55 20
    "United Arab Emirates" 2014 2 55 20
    "United Arab Emirates" 2015 2 55 20
    "United Arab Emirates" 2016 2 55 20
    "United Arab Emirates" 2019 2 55 20
    "United Arab Emirates" 2017 2 55 20
    "United Arab Emirates" 2013 2 55 20
    "United States" 2015 3 57 5181
    "United States" 2010 3 57 5181
    "United States" 2016 3 57 5181
    "United States" 2018 3 57 5181
    "United States" 2013 3 57 5181
    "United States" 2012 3 57 5181
    "United States" 2005 3 57 5181
    "United States" 2019 3 57 5181
    "United States" 2011 3 57 5181
    "United States" 2009 3 57 5181
    "United States" 2017 3 57 5181
    "United States" 2008 3 57 5181
    "United States" 2014 3 57 5181
    "United States" 2007 3 57 5181
    "United States" 2006 3 57 5181
    "Argentina" 2019 4 1 30
    "Argentina" 2018 4 1 30
    "Argentina" 2016 4 1 30
    "Argentina" 2015 4 1 30
    "Argentina" 2017 4 1 30
    "Argentina" 2014 4 1 30
    "Argentina" 2020 5 1 30
    "Argentina" 2015 5 1 30
    "Argentina" 2019 5 1 30
    "Argentina" 2016 5 1 30
    "Argentina" 2017 5 1 30
    "Argentina" 2018 5 1 30
    "Argentina" 2018 6 1 30
    "Argentina" 2016 6 1 30
    "Argentina" 2019 6 1 30
    "Argentina" 2017 6 1 30
    "Argentina" 2015 6 1 30
    "Argentina" 2016 7 1 30
    "Argentina" 2015 7 1 30
    "Argentina" 2010 7 1 30
    "Argentina" 2014 7 1 30
    "Argentina" 2013 7 1 30
    "Argentina" 2019 7 1 30
    "Argentina" 2017 7 1 30
    "Argentina" 2018 7 1 30
    "Argentina" 2018 8 1 30
    "Argentina" 2019 8 1 30
    "Austria" 2018 9 3 171
    "Austria" 2017 9 3 171
    "Austria" 2019 9 3 171
    "Austria" 2012 9 3 171
    "Austria" 2014 9 3 171
    "Austria" 2011 9 3 171
    "Austria" 2016 9 3 171
    "Austria" 2013 9 3 171

  • #2
    See https://www.stata-journal.com/articl...article=dm0042 for a discussion and especially

    Section 2 for an argument why distinct is a much better term than unique

    p.563 for the method used here.

    Code:
    clear 
    input str44 Country int Year float(firm_id country_id country_observation)
    "United Arab Emirates" 2017 1 55 20
    "United Arab Emirates" 2018 1 55 20
    "United Arab Emirates" 2019 1 55 20
    "United Arab Emirates" 2018 2 55 20
    "United Arab Emirates" 2014 2 55 20
    "United Arab Emirates" 2015 2 55 20
    "United Arab Emirates" 2016 2 55 20
    "United Arab Emirates" 2019 2 55 20
    "United Arab Emirates" 2017 2 55 20
    "United Arab Emirates" 2013 2 55 20
    "United States" 2015 3 57 5181
    "United States" 2010 3 57 5181
    "United States" 2016 3 57 5181
    "United States" 2018 3 57 5181
    "United States" 2013 3 57 5181
    "United States" 2012 3 57 5181
    "United States" 2005 3 57 5181
    "United States" 2019 3 57 5181
    "United States" 2011 3 57 5181
    "United States" 2009 3 57 5181
    "United States" 2017 3 57 5181
    "United States" 2008 3 57 5181
    "United States" 2014 3 57 5181
    "United States" 2007 3 57 5181
    "United States" 2006 3 57 5181
    "Argentina" 2019 4 1 30
    "Argentina" 2018 4 1 30
    "Argentina" 2016 4 1 30
    "Argentina" 2015 4 1 30
    "Argentina" 2017 4 1 30
    "Argentina" 2014 4 1 30
    "Argentina" 2020 5 1 30
    "Argentina" 2015 5 1 30
    "Argentina" 2019 5 1 30
    "Argentina" 2016 5 1 30
    "Argentina" 2017 5 1 30
    "Argentina" 2018 5 1 30
    "Argentina" 2018 6 1 30
    "Argentina" 2016 6 1 30
    "Argentina" 2019 6 1 30
    "Argentina" 2017 6 1 30
    "Argentina" 2015 6 1 30
    "Argentina" 2016 7 1 30
    "Argentina" 2015 7 1 30
    "Argentina" 2010 7 1 30
    "Argentina" 2014 7 1 30
    "Argentina" 2013 7 1 30
    "Argentina" 2019 7 1 30
    "Argentina" 2017 7 1 30
    "Argentina" 2018 7 1 30
    "Argentina" 2018 8 1 30
    "Argentina" 2019 8 1 30
    "Austria" 2018 9 3 171
    "Austria" 2017 9 3 171
    "Austria" 2019 9 3 171
    "Austria" 2012 9 3 171
    "Austria" 2014 9 3 171
    "Austria" 2011 9 3 171
    "Austria" 2016 9 3 171
    "Austria" 2013 9 3 171
    end 
    
    bysort Country : gen nobs = _N 
    egen tag = tag(Country firm_id)
    by Country : egen nfirms = total(tag)
    
    tabdisp Country, c(nobs nfirms)
    
    ---------------------------------------------
                 Country |       nobs      nfirms
    ---------------------+-----------------------
               Argentina |         27           5
                 Austria |          8           1
    United Arab Emirates |         10           2
           United States |         15           1
    ---------------------------------------------

    Comment


    • #3
      Dear Nick

      As always, thank you so much for your helpful response and suggestion.

      Comment


      • #4
        Some people would want to flag that you can do without egen:

        Code:
        . bysort Country (firm_id): gen nobs = _N 
        
        . by Country : gen nfirms = sum(firm_id != firm_id[_n-1])
        
        . by Country : replace nfirms = nfirms[_N]
        (28 real changes made)
        
        . 
        . tabdisp Country, c(nobs nfirms)
        
        ---------------------------------------------
                     Country |       nobs      nfirms
        ---------------------+-----------------------
                   Argentina |         27           5
                     Austria |          8           1
        United Arab Emirates |         10           2
               United States |         15           1
        ---------------------------------------------

        Comment


        • #5
          Thanks for your much-appreciated help. I now need to know the distinct number of firms that receive the treatment. I have a treated dummy =1 if the firm receives the treatment and zero otherwise. Here is an example of my data:
          input int Year float(treatedfirm firm_id)
          2016 0 4
          2013 0 7
          2015 0 6
          2015 0 7
          2017 0 4
          2017 0 5
          2019 0 8
          2019 0 5
          2014 0 7
          2018 0 4
          2014 0 4
          2010 0 7
          2020 0 5
          2017 0 7
          2016 0 5
          2018 0 5
          2018 0 6
          2019 0 6
          2018 0 8
          2015 0 4
          2015 0 5
          2016 0 7
          2016 1 2853
          2019 0 7
          2015 1 2853
          2018 0 7
          2016 0 6
          2019 0 4
          2014 1 2853
          2017 0 6
          2013 1 81
          2016 1 81
          2007 1 3091
          2019 1 81
          2010 1 79
          2014 1 154
          2019 1 99
          2019 1 110
          2016 1 48
          2019 1 154
          2019 1 1591
          2014 1 153
          2018 1 42

          Comment


          • #6
            Code:
            egen tag_treated = tag(Country firm_id) if treatment == 1
            
            bysort Country : egen nfirms_treated = total(tag_treated)
            Code:
            
            

            Comment


            • #7
              Thanks, Nick for your response, but the above code did not provide me the needed result. Here is my data with all the needed variables :
              input str44 Country int Year float(treatedfirm firm_id)
              "Argentina" 2015 1 2853
              "Argentina" 2016 0 5
              "Argentina" 2018 0 7
              "Argentina" 2017 0 6
              "Argentina" 2015 0 5
              "Argentina" 2016 1 2853
              "Argentina" 2016 0 7
              "Argentina" 2014 0 4
              "Argentina" 2015 0 4
              "Argentina" 2019 0 5
              "Argentina" 2014 0 7
              "Argentina" 2020 0 5
              "Argentina" 2019 0 7
              "Argentina" 2019 0 8
              "Argentina" 2019 0 6
              "Argentina" 2015 0 7
              "Argentina" 2017 0 5
              "Argentina" 2016 0 4
              "Argentina" 2013 0 7
              "Argentina" 2017 0 4
              "Argentina" 2019 0 4
              "Argentina" 2018 0 8
              "Argentina" 2017 0 7
              "Argentina" 2010 0 7
              "Argentina" 2016 0 6
              "Argentina" 2014 1 2853
              "Argentina" 2018 0 4
              "Argentina" 2015 0 6
              "Argentina" 2018 0 5
              "Argentina" 2018 0 6
              "Australia" 2005 1 137
              "Australia" 2007 1 151
              "Australia" 2013 1 162
              "Australia" 2020 1 76
              "Australia" 2007 1 130
              "Australia" 2011 1 51
              "Australia" 2011 1 48
              "Australia" 2015 1 41
              "Australia" 2020 1 92
              "Australia" 2014 1 100
              "Australia" 2015 1 39
              "Australia" 2010 1 67
              "Australia" 2010 1 53
              "Australia" 2018 1 56
              "Australia" 2007 1 48
              "Australia" 2020 1 134
              "Australia" 2017 1 285
              "Australia" 2017 1 51
              "Australia" 2013 1 1301
              "Australia" 2014 1 81
              "Australia" 2019 1 87
              "Australia" 2015 1 84

              Comment


              • #8
                So, what does the code do and what do you think it should do differently?

                The data example is a little untidy and the variable Year is irrelevant to the problem stated ("!or each country"), but those are trivia. This is what I get.

                Code:
                clear 
                input str44 Country int Year float(treatedfirm firm_id)
                "Argentina" 2015 1 2853
                "Argentina" 2016 0 5
                "Argentina" 2018 0 7
                "Argentina" 2017 0 6
                "Argentina" 2015 0 5
                "Argentina" 2016 1 2853
                "Argentina" 2016 0 7
                "Argentina" 2014 0 4
                "Argentina" 2015 0 4
                "Argentina" 2019 0 5
                "Argentina" 2014 0 7
                "Argentina" 2020 0 5
                "Argentina" 2019 0 7
                "Argentina" 2019 0 8
                "Argentina" 2019 0 6
                "Argentina" 2015 0 7
                "Argentina" 2017 0 5
                "Argentina" 2016 0 4
                "Argentina" 2013 0 7
                "Argentina" 2017 0 4
                "Argentina" 2019 0 4
                "Argentina" 2018 0 8
                "Argentina" 2017 0 7
                "Argentina" 2010 0 7
                "Argentina" 2016 0 6
                "Argentina" 2014 1 2853
                "Argentina" 2018 0 4
                "Argentina" 2015 0 6
                "Argentina" 2018 0 5
                "Argentina" 2018 0 6
                "Australia" 2005 1 137
                "Australia" 2007 1 151
                "Australia" 2013 1 162
                "Australia" 2020 1 76
                "Australia" 2007 1 130
                "Australia" 2011 1 51
                "Australia" 2011 1 48
                "Australia" 2015 1 41
                "Australia" 2020 1 92
                "Australia" 2014 1 100
                "Australia" 2015 1 39
                "Australia" 2010 1 67
                "Australia" 2010 1 53
                "Australia" 2018 1 56
                "Australia" 2007 1 48
                "Australia" 2020 1 134
                "Australia" 2017 1 285
                "Australia" 2017 1 51
                "Australia" 2013 1 1301
                "Australia" 2014 1 81
                "Australia" 2019 1 87
                "Australia" 2015 1 84
                end 
                
                drop Year 
                sort Country treated firm_id  
                duplicates drop 
                egen tag = tag(Country firm_id) if treated == 1 
                by Country: egen wanted = total(tag) 
                
                list, sepby(Country treated) 
                
                     +-----------------------------------------------+
                     |   Country   treate~m   firm_id   tag   wanted |
                     |-----------------------------------------------|
                  1. | Argentina          0         4     0        1 |
                  2. | Argentina          0         5     0        1 |
                  3. | Argentina          0         6     0        1 |
                  4. | Argentina          0         7     0        1 |
                  5. | Argentina          0         8     0        1 |
                     |-----------------------------------------------|
                  6. | Argentina          1      2853     1        1 |
                     |-----------------------------------------------|
                  7. | Australia          1        39     1       20 |
                  8. | Australia          1        41     1       20 |
                  9. | Australia          1        48     1       20 |
                 10. | Australia          1        51     1       20 |
                 11. | Australia          1        53     1       20 |
                 12. | Australia          1        56     1       20 |
                 13. | Australia          1        67     1       20 |
                 14. | Australia          1        76     1       20 |
                 15. | Australia          1        81     1       20 |
                 16. | Australia          1        84     1       20 |
                 17. | Australia          1        87     1       20 |
                 18. | Australia          1        92     1       20 |
                 19. | Australia          1       100     1       20 |
                 20. | Australia          1       130     1       20 |
                 21. | Australia          1       134     1       20 |
                 22. | Australia          1       137     1       20 |
                 23. | Australia          1       151     1       20 |
                 24. | Australia          1       162     1       20 |
                 25. | Australia          1       285     1       20 |
                 26. | Australia          1      1301     1       20 |
                     +-----------------------------------------------+
                The results are what I expect. What do you expect?

                (Naturally, if the problem is really for each country and year, the calculation needs to be different.)

                Comment


                • #9
                  You didn't reply to this but started a related thread at https://www.statalist.org/forums/for...n-did-analysis

                  Comment

                  Working...
                  X