Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Filling up and adding observations - repeated time values within panel

    Dear Statalist Members,

    I am having the following problem and would be happy if someone could help.

    There is a panel database on the company_ID year level, and there are more establishments for each company_ID - year.

    Like this:

    comp_ID year est_ID
    1 2005 A
    1 2005 B
    1 2006 A
    1 2006 B
    1 2007 .
    1 2008 A
    1 2008 B

    However, there are some missing years for establishments, as you can see it here for year 2007.
    Establishment A and B are likely to exist in 2007 as well, but now we only have 1 missing observation instead of 2, plus we have repeated time values on the company - year level.

    How can I make each missing observation to 2 observations and fill them up with the establishment IDs?

    Thank you very much in advance.

    Best regards,
    Zsolt

  • #2
    There might be several ways to do this, but I found at least one way to do it fairly easy. The trick is to use -tsfill-, -encode- and -decode- exactly like it's done in the following code:

    Code:
    encode est_ID, gen(temp_ID)
    tsset temp_ID year
    tsfill
    
    drop if est_ID == "."
    drop est_ID
    decode temp_ID, gen(est_ID)
    This will use your est_ID variable as cross-sectional indicator and the year variable as the time indicator in the time series configuration. It then fills the gaps and drops the one excessive observation. Finally, we use -decode- to convert the temp_ID back to string type, and you now have two observations for 2007, A and B. After this you can drop the temporary variable.

    Comment


    • #3
      Zsolt,
      First make sure each time the value is missing corresponds to an omission before filling up the empty values. I work myself with firm level data, and I know it could be tricky.
      (For example the establishment A could have closed in 2007, then the "A" is available as establishement ID, and will be attributed to the following establishement, openned in 2008.) So make sure that the "A" establishment shares some common properties (such as the address if you have it).

      Then, if you know what you're doing, the following code will do the trick :
      Code:
      *generate your example (extended to a second company)
      clear
      set obs 10
       generate str comp_ID="1" in 1
      . replace comp_ID = "1" in 2
      . replace comp_ID= "1" in 3
      . replace comp_ID = "1" in 4
      . replace comp_ID = "1" in 5
      . replace comp_ID = "1" in 6
      . replace comp_ID = "1" in 7
      . replace comp_ID = "2" in 8
      . replace comp_ID = "2" in 9
      . replace comp_ID = "2" in 10
      
      . generate year = 2005 in 1
      . replace year = 2005 in 2
      . replace year = 2006 in 3
      . replace year = 2006 in 4
      . replace year = 2007 in 5
      . replace year = 2008 in 6
      . replace year = 2008 in 7
      . replace year = 2005 in 8
      . replace year = 2006 in 9
      . replace year = 2007 in 10
      
      
      . generate str est_ID = "A" in 1
      . replace est_ID = "B" in 2
      . replace est_ID = "A" in 3
      . replace est_ID = "B" in 4
      . replace est_ID = "." in 5
      . replace est_ID = "A" in 6
      . replace est_ID = "B" in 7
      . replace est_ID = "A" in 8
      . replace est_ID = "." in 9
      . replace est_ID = "A" in 10
      
      *The code you want:
      egen ID=group(comp est)
      
      xtset ID year
      drop if est=="."
      tsfill
      sort ID est
      by ID   : replace est=est[_N]
      by ID  : replace comp=comp[_N]
      sort comp year est
      browse
      Some explanations :
      I first generate an ID for each pair of company - establishment (so for each establishment, even if two companies share the same establishment name).
      I then define a panel structure across this ID and the year variable, and drop missing values of estID.
      The tsfill command will reconfigure the panel structure for the gaps (here the year 2007, drop the line below) : it will give you the two obs per missing year.

      Then by establishment ID, I recover missing values from existing values.

      Hope this helps,
      Charlie

      Comment


      • #4
        Mathias was quicker than I was, and the core of the code is the same : -tsfill-.

        However, my solution is a littler more complex but (only) useful if two firms share the same establishment ID.

        Best,
        Charlie

        Comment


        • #5
          Originally posted by Charlie Joyez View Post
          I first generate an ID for each pair of company - establishment (so for each establishment, even if two companies share the same establishment name).
          I agree with Charlie, in the case of more than one company with the same establishment ID's, you will have to generate your initial group variable using both the comp_ID and and est_ID, like suggested in his post.

          Comment

          Working...
          X