No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Unbalanced panel data when applying business calendar

    I have a problem with unbalanced panel data in Stata 14. I have data for a 30 day period, and I want my function to omit non-trading days, using the business calendar function. I use the the business calendar function as I, as an example, want Stata to treat the lagged variable for monday as friday (or previous trading day). However, when I try to set my data to panel data using xtset, they are not balanced.

    My full process:

    I Loaded Excel data in the following format (cmpny collumn contains 9 companies).
    Click image for larger version

Name:	Stata.JPG
Views:	1
Size:	39.5 KB
ID:	1458432

    And ran the commandos in the following order:

    egen cmpnynum = group(cmpny) gen date2 = date(date,"MDY") bcal load BCAF gen date3 = bofd("BCAF", date2) xtset cmpnynum date3 xtset volume return
    And my business calendar looks like this:

    1. purpose "remove non trading days from sample" 2. dateformat mdy 3. range may172018 june152018 4. centerdate may172018 5. omit dayofweek (sa su) 6. omit date may282018
    I ran the commandos because I wanted to group companies using an ID variable (cmpnynum). I wanted to create a non-string variable for the dates, and I wanted to load the business calendar called BCAF, and lastly, set the data to panel. When I ran the last command, I got the error message that panels are unbalanced.

    I have looked over my data, as it less than 200 rows. There are no missing return or volume values. Here are summary statistics after the commandos are applied:

    summarize Variable | Obs Mean Std. Dev. Min > Max -------------+----------------------------------------------- > ---------- cmpny | 0 date | 0 return | 189 .0026109 .0121749 -.0363594 > .0267003 volume | 189 1667010 1807973 9684 > 8029422 sentiment | 270 .0227281 .0822689 -.2629529 > .2539589 -------------+----------------------------------------------- > ---------- cmmpnynum | 270 5 2.586784 1 > 9 date2 | 270 21335.5 8.671515 21321 > 21350 date3 | 189 10 6.071384 0 > 20

    Does anyone know how to get the data to be balanced? Does it have anything to do with how I apply the business calendar? I hope the question is not too trivial. I looked at Youtube, forums posts and Stata's own manuals. Maybe I overlooked something. Any help is really appreciated!
    Attached Files
    Last edited by Hjalmar Jernstrom; 16 Aug 2018, 06:18.

  • #2
    You didn't get a quick answer. You'll increase your chances of a helpful answer by following the FAQ on asking questions - provide Stata code in code delimiters, readable Stata output [what you posted is extremely hard to read], and sample data using dataex. Do include returns (i.e., put commands on separate lines) when you include the code - a long single line of code is hard to read. Many of us won't open files, and pictures are not usable. You're asking us to help us - it behooves you to make it easy for us to do so.

    One way if you have no observations on the weekends is to create an arbitrary date number that will be sequential. If the data include weekends but are missing, you could drop the weekend days and then generate the date number. Something like:

    bysort cmpny: g datenum=_n

    There is probably an egen command that does this also. [I didn't check that this works since you didn't provide usable data.] The date numbers won't be consistent across firms if all the firms don't cover the same set of dates.

    Another alternative would be to do it manually - if you only have 30 dates, you can explicitly assign sequential numbers to each business day. You could do it with a bunch of replace statements. This would take a little code but it would be repetitive so not hard to do. A long encode or recode statement could also do this. While the better programmers usually can offer more elegant ways to do things, sometimes it is faster for a beginner to just do a bunch of replaces or whatever.


    • #3
      Phil Bromiley is right in that dropping observations for non-business days will help make your dataset balanced. Please note that your business calendar date (date3) will automatically be sequential with no gaps (except for missing values for non-business days). So, if you simply drop non-business days with

      drop if missing(date3)
      you will get strongly balanced data with

      xtset cmpnynum date3
      I assume here you will have the same business days for other companies as well. I tried creating your dataset and your business calendar file the best I could, and this code hopefully does what you are looking for:

      egen cmpnynum = group(cmpny)
      gen date2 = date(date,"MDY")
      format %td date2
      bcal load BCAF
      gen date3 = bofd("BCAF", date2)
      drop if missing(date3)
      xtset cmpnynum date3
      -- Kreshna


      • #4
        Thanks alot Kreshna Gopal and Phil Bromiley!