Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating a panel data identifier

    I have pooled data from a survey that is done twice.

    Some of the observations come from the same firm, but the firm was given different ids over the two years. However, I have an additional variable that signalizes a unique firm. I would use the group command, but I have a lot of missing variables and some numbers can reoccur in the second variable. The latter can be separated by using observations of country and year.
    Obs Firm Panel Country Year
    1 258 2 USA 1
    2 658 3 USA 1
    3 599 3 USA 2
    4 245 . USA 1
    5 689 3 Canada 1
    6 789 3 Canada 2
    7 359 58 Canada 1
    8 385 . Canada 2
    As you can see, by having the same number in the “panel” variable, observation 2 and 3 (+5 and 6) represent the same firm.

    Any suggestions as to how I can convert this into (unbalanced) panel data?
    Last edited by Ellinor Hjelvik; 08 Dec 2015, 09:14.

  • #2
    Are you saying that observations 2, 3, 5, and 6 are all one firm (which operates in both the USA and Canada), or are you saying that observations 2 & 3 are one firm, and 5& 6 are another, both being designated by panel == 3, but distinguished by the country variable?

    Comment


    • #3
      The latter! Observations 2 and 3 are one firm, and 5 and 6 are another. The country variable distinguishes them.

      Comment


      • #4
        OK. I've made some assumptions: Country is never missing, and if there were more than one observation showing the same value of Firm, Country and Year, they would all show the same value of Panel. (If that isn't true, then it doesn't seem like Panel is helpful in identifying firms.) I also assume that if Panel is missing, then we have to presume that Firm & Country combined distinctly identify each firm, as there doesn't seem to be anything else to go on. So try this:

        Code:
        clear*
        input Obs    int Firm    int Panel    str6 Country    int Year
        1    258    2    USA    1
        2    658    3    USA    1
        3    599    3    USA    2
        4    245    .    USA    1
        5    689    3    Canada    1
        6    789    3    Canada    2
        7    359    58    Canada    1
        8    385    .    Canada    2
        end
        
        list, noobs clean
        
        //    VERIFY COUNTRY IS NEVER MISSING
        assert !missing(Country)
        
        //    VERIFY Panel IS CONSISTENT WITHIN FIRM COUNTRY YEAR
        //    (IF MORE THAN ONE SUCH OBSERVATION--THIS DATA DOESN'T HAVE ANY SUCH)
        by Firm Country Year (Panel), sort: assert Panel[1] == Panel[_N]
        
        //    GENERATE AN ID BASED ON PANEL, COUNTRY
        egen id = group(Panel Country)
        //    ELIMINATE ID BASED ON MISSING VALUE OF PANEL
        replace id = . if missing(Panel)
        
        //    FOR REMAINING FIRMS, GENERATE NEW ID BASED ONLY ON FIRM
        summ id, meanonly // DESPITE NAME, ALSO CALCULATES MIN & MAX
        local id_start = r(max)
        
        by id Firm Country, sort: gen id2 = (_n == 1) if missing(id)
        replace id2 = sum(id2)
        replace id = id2 + `id_start' if missing(id)
        
        list, noobs clean

        Comment


        • #5
          Thank you Sir! It worked ... and I got the logic! I was just missing (a lot of) commands.

          None of the observations show the same value of Firm, Country and Year so that would not have been a problem.

          Thank you again for your help, and have a nice day!

          Comment

          Working...
          X