Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to impute quarterly panel data based on semiannual panel data by copying previous obs.?

    Hello there, I have a panel with variables fundno, fdate, stock_id, stock_price and holdings. Many of fundno - stock_price pairs are observed twice a year( second and fourth quarter) before 2004 and I want to impute first and third quarterly obs. before 2004 by carrying forward the previous fdate's holding for each fundno - stock_id pair. I tried to use "fillin fundno stock_id fdate" to get a balanced panel and then delete consecutive missing obs. (>=2) for each fundno - stock_id pair and then use carryforward, but the balanced data got so huge that my laptop couldn't handle. Plus I couldn't find a good way to delete consecutive missing obs.(tried tsspell but it didn't work) Is there any better way to this?

    Thanks,
    Hank

  • #2
    Hi Hank, and welcome to Statalist!

    1) So, you may not need to fill in the missing data--you're not really adding any "new" information to the dataset, and Stata handles unbalanced panels just fine. (Search Statalist for "unbalanced")

    2) If you do decide to fill in, you could do something like:

    Code:
    bysort fundno stock_id (fdate): replace stock_price = stock_price[_n-1] if stock_price[_n-1]==.
    
    * There are probably other ways to find two consecutive missing
    gen consec_missing = 0
    bysort fundno stock_id (fdate): replace consec_missing =1 if stock_price==. & stock_price[_n-1]==.
    Also, it would be super helpful if you could share a sample of your data using Stata's dataex command. if it helps, I created a Youtube tutorial here. (I made it too long--feel free to watch at 2x speed, and you may only need the first 6 minutes)

    Note that my code above assumes you have been able to add the missing quarters between the observed quarters. If not, then you will want to use the fillin command to fill in for missing quarters. If that is where you are having problems, again, the best way to get help is to (a) post a sample of your data, (b) show the command that you ran, and then (c) list the output or error that Stata returned. Also, I would tell us what time period this covers. (So we know how many missing quarters it is adding)

    (For example, if your data is by mutual fund, quarter, and then the list of actual stocks held by the mutual fund, and if a mutual fund holds 100-200 stocks on average, then using fillin will dramatically expand the size of your dataset). Also, note that "fillin fundno stock_id fdate" will add a placeholder for every stock_id that ever shows up in your data to each mutual_fund fdate combination (even if that mutual fund never holds that particular stock).
    Last edited by David Benson; 16 Feb 2019, 15:43.

    Comment


    • #3
      Originally posted by David Benson View Post
      Hi Hank, and welcome to Statalist!

      1) So, you may not need to fill in the missing data--you're not really adding any "new" information to the dataset, and Stata handles unbalanced panels just fine. (Search Statalist for "unbalanced")

      2) If you do decide to fill in, you could do something like:

      Code:
      bysort fundno stock_id (fdate): replace stock_price = stock_price[_n-1] if stock_price[_n-1]==.
      
      * There are probably other ways to find two consecutive missing
      gen consec_missing = 0
      bysort fundno stock_id (fdate): replace consec_missing =1 if stock_price==. & stock_price[_n-1]==.
      Also, it would be super helpful if you could share a sample of your data using Stata's dataex command. if it helps, I created a Youtube tutorial here. (I made it too long--feel free to watch at 2x speed, and you may only need the first 6 minutes)

      Note that my code above assumes you have been able to add the missing quarters between the observed quarters. If not, then you will want to use the fillin command to fill in for missing quarters. If that is where you are having problems, again, the best way to get help is to (a) post a sample of your data, (b) show the command that you ran, and then (c) list the output or error that Stata returned. Also, I would tell us what time period this covers. (So we know how many missing quarters it is adding)

      (For example, if your data is by mutual fund, quarter, and then the list of actual stocks held by the mutual fund, and if a mutual fund holds 100-200 stocks on average, then using fillin will dramatically expand the size of your dataset). Also, note that "fillin fundno stock_id fdate" will add a placeholder for every stock_id that ever shows up in your data to each mutual_fund fdate combination (even if that mutual fund never holds that particular stock).
      Hi David,

      Thanks for such a timely detailed response! Your code worked well! I'm replicating a paper in corporate finance, which adds new quarters between the observed quarters and imputes data in the way I mentioned. By creating a balanced panel with the command "fillin" and deleting consecutive missing data, I achieved the goal. Because I'm a new user, I'm not familiar with code posting here, but I'll try to do so with dataex next time.

      Best,
      Hank

      Comment

      Working...
      X