Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • AIC & BIC to decide between different aggregation levels of data?

    I am running some OLS estimation (- areg-) on some panel data set.
    I would like to decide between different aggregation levels of the data using AIC and BIC

    Aggregation level 1: panel id: countries (e.g, Germany), time: weeks
    Aggregation level 2: panel id: regions (e.g., Europe), time : weeks

    The number of obs is significantly reduced on level 2. I run the same estimation command on both aggregation levels (- areg - ) except the difference in the panel-id absorbed.

    I get the following AIC & BIC results ( - estat ic-):
    Level 1:
    Code:
    . estat ic
    
    Akaike's information criterion and Bayesian information criterion
    
    -----------------------------------------------------------------------------
    Model | Obs ll(null) ll(model) df AIC BIC
    -------------+---------------------------------------------------------------
    . | 1,130,616 80749.49 203988.7 13 -407951.4 -407796.2
    -----------------------------------------------------------------------------
    Note: N=Obs used in calculating BIC; see [R] BIC note.
    Level 2:
    Code:
    . estat ic
    
    Akaike's information criterion and Bayesian information criterion
    
    -----------------------------------------------------------------------------
    Model | Obs ll(null) ll(model) df AIC BIC
    -------------+---------------------------------------------------------------
    . | 167,349 35490.21 53725.38 13 -107424.8 -107294.4
    -----------------------------------------------------------------------------
    Note: N=Obs used in calculating BIC; see [R] BIC note.
    Level 1 aggregation is suggested based on AIC & BIC

    Questions:
    1) Are AIC & BIC appropriate to decide between different aggregation levels of the data?
    2) I am thinking about running a regression on each panal separately ending up with time series data. Are AIC & BIC also suggested to decide between time series and panel data structure?

    Thanks for any comments in advance.

    Christian

  • #2
    The level of panel is normally seen more as an issue of theory than fit. That is, what level of stable effects do you need to control for? I don't remember anyone using fit per se as reason for using a specific level. With this large a sample size, I strongly suspect there is an even more micro level on which your sample is really being constructed. That is, you must have individuals or companies or something with multiple years per individual or company - you can't get 1 million observations by countries over time. Normally, you would xtset the data at the lower level (individual or company or whatever) so there is only one year observation in each panel. This let's you use lag operators without worrying about lagging across panels. Then xtreg would automatically put in the fixed effects at that level. This would control for stable differences at the individual or company level.

    There is a another problem in your results. You have radically different samples in the two estimates. You cannot compare an AIC or BIC on a sample with 1.1 million observations to an AIC on 170,000 observations. Not only will sample size influence the calculation, you don't know whether you've get different results if you used the same sample for both estimates. I would start by understanding why you get such different sample sizes - do you really lack country values for 90% of your sample?

    Comment


    • #3
      Thank you for your helpful comments!!!
      I understand, that econometric tests will not answer my questions.

      Comment

      Working...
      X