Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Panel data analysis FE, issue with observation

    Hi,

    Currently I am running a small panel data analysis on data based upon oil consumption of countries. The plan was to look at about 50 countries and their consumption.

    As I have followed guides by my tutors, I ran into the following issue. My FE analysis gave me a lower observation count compared to my actual data and I cannot figure out why. I am a rookie within using Stata, so I will take all feedback well and try to elaborate as much has possible.

    Here is my current output in the following order, FE analysis will be posted at the bottom.

    As you can tell the FE observations are lower than the others, as I am pretty green in using Stata, my problem solving skills are not refined at all.

    Hopefully someone will be able to take their time and see a simple solution within it all. And if it is a more elaborate solution I am all ears.

    Thank you for reading through this and your time hopefully someone can assist me with this.
    Code:
    . describe id t Y2020 Y2022
    
    storage display value
    variable name type format label variable label
    ------------------------------------------------------------------------------------------
    id byte %10.0g id
    t byte %10.0g t
    Y2020 double %10.0g Y2020
    Y2022 double %10.0g Y2022
    
    . summarize id t Y2020 Y2022
    
    Variable | Obs Mean Std. Dev. Min Max
    -------------+---------------------------------------------------------
    id | 102 26 14.79229 1 51
    t | 102 1.5 .5024692 1 2
    Y2020 | 52 1573.711 3054.181 0 17183.32
    Y2022 | 52 1715.656 3255.625 0 19140.24
    
    . sort id t
    
    . xtset id t
    panel variable: id (strongly balanced)
    time variable: t, 1 to 2
    delta: 1 unit
    
    . xtdescribe
    
    id: 1, 2, ..., 51 n = 51
    t: 1, 2, ..., 2 T = 2
    Delta(t) = 1 unit
    Span(t) = 2 periods
    (id*t uniquely identifies each observation)
    
    Distribution of T_i: min 5% 25% 50% 75% 95% max
    2 2 2 2 2 2 2
    
    Freq. Percent Cum. | Pattern
    ---------------------------+---------
    51 100.00 100.00 | 11
    ---------------------------+---------
    51 100.00 | XX
    
    . xtsum id t Y2020 Y2022
    
    Variable | Mean Std. Dev. Min Max | Observations
    -----------------+--------------------------------------------+----------------
    id overall | 26 14.79229 1 51 | N = 102
    between | 14.86607 1 51 | n = 51
    within | 0 26 26 | T = 2
    | |
    t overall | 1.5 .5024692 1 2 | N = 102
    between | 0 1.5 1.5 | n = 51
    within | .5024692 1 2 | T = 2
    | |
    Y2020 overall | 1573.711 3054.181 0 17183.32 | N = 52
    between | 3065.201 58.95946 17183.32 | n = 51
    within | 341.1028 -148.7734 3296.196 | T = 1.01961
    | |
    Y2022 overall | 1715.656 3255.625 0 19140.24 | N = 52
    between | 3264.946 69.38343 19140.24 | n = 51
    within | 383.7324 -222.098 3653.409 | T-bar = 1.01961
    
    . reg Y2020 Y2022
    
    Source | SS df MS Number of obs = 2
    -------------+---------------------------------- F(1, 0) = .
    Model | 5933905.8 1 5933905.8 Prob > F = .
    Residual | 0 0 . R-squared = 1.0000
    -------------+---------------------------------- Adj R-squared = .
    Total | 5933905.8 1 5933905.8 Root MSE = 0
    
    ------------------------------------------------------------------------------
    Y2020 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    Y2022 | -.8889079 . . . . .
    _cons | 3444.969 . . . . .
    ------------------------------------------------------------------------------
    
    
    . xtreg Y2020 Y2022, fe
    
    Fixed-effects (within) regression Number of obs = 2
    Group variable: id Number of groups = 1
    
    R-sq: Obs per group:
    within = 1.0000 min = 2
    between = . avg = 2.0
    overall = 1.0000 max = 2
    
    F(1,0) = .
    corr(u_i, Xb) = . Prob > F = .
    
    ------------------------------------------------------------------------------
    Y2020 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    Y2022 | -.8889079 . . . . .
    _cons | 3444.969 . . . . .
    -------------+----------------------------------------------------------------
    sigma_u | .
    sigma_e | .
    rho | . (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    F test that all u_i=0: F(0, 0) = . Prob > F = .
    Last edited by Hybrid Dylan; 20 Sep 2023, 10:01.

  • #2
    Dylan (I suppose):
    welcome to this forum.
    I am not clear with the way you -xtset- your data.
    In addition, it seems that the -e(sample)- of your regression is composed of one panel only with two observations (the real issue is to undestand how things turned out this way. Missing values? Constant values wiped out by the -fe- estimators?) and the outcome table is sadly consistent with that.
    You may want to share an excerpt/example of your data via -dataex- (please see the FAQ on how to do it. Thanks).
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      It seems that your dataset is mangled. You have 102 observations overall, 51 countries crossed with 2 time periods. But Y2020 and Y2022 each have only 52 observations, implying that they have missing values in nearly half of the data. I'm going to guess that, in general, you have Y2020 non-missing only when t = 1, and Y2022 non-missing only when t = 2. Of course, that would only account for 51 observations, not 52. Looking at the regression results, your sample size for -reg- is N = 2. So this would imply that in addition to the 51 observations of each I have accounted for, you have somewhere in the data set two other observations, each of which contains a non-missing value for both of these variables. That would give you N = 2 for -reg-. And if both of those observations are for the same country, it would also give you N = 1 for -xtreg-, which is what you have.

      This data organization is not usable for -reg- or -xtreg-. Those commands will only deal with observations that contain non-missing values for both Y2020 and Y2022, and you must have enough observations like that to support the intended analysis.

      You need to revisit how this data set got created and see if there is a way to fix it into a suitable dataset. If your purpose is to model the relationship between Y2020 and Y2022, that would involve reducing the data set to 51 observations (one for each country), with a non-missing value for Y2020 and Y2022 in each observation. But this would not be a panel data set and -xtreg- would not be applicable.
      Code:
      collapse (firstnm) Y2020 Y2022, by(id)
      regress Y2020 Y2022 // PLEASE SEE ADDENDUM AT END OF THIS POST
      Creating a true panel data set would be different, but easy from there:
      Code:
      reshape long Y, i(id) j(year)
      gen t = cond(year == 2020, 1, 2)
      xtset id year
      This will enable you to perform general panel data analyses, but relating Y2020 to Y2022 would be done differently:
      Code:
      xtreg Y i.t, fe
      The coefficient of 2.t would then give you an estimate of the within-country difference in Y between t = 1 and t = 2. It would, in fact, be equivalent to a paired t-test of Y2020 = Y2022.

      In the future, when asking for help with code, please use the -dataex- command and show example data. Although sometimes, as here, it is possible to give an answer that has a reasonable probability of being correct, this is usually not the case. Moreover, such answers are necessarily based on experience-based guesses or intuitions about the nature of your data. When those guesses are wrong, both you and the person trying to help you have wasted their time as you end up with useless code. To avoid this, a -dataex- based example provides all of the information needed to develop and test a solution.

      In the future, when showing data examples, please use the -dataex- command to do so. If you are running version 18, 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

      When asking for help with code, always show example data. When showing example data, always use -dataex-.

      ADDENDUM: Although this is a perfectly legal analysis from a perspective of Stata syntax, and is even legitimate statistically, it is counterintuitive to do this in the real world. I'm assuming here that 2020 and 2022 are references to the recent calendar years. Because causality always moves forward in time, it is odd to regress the earlier value on the later one. Usually it is done the other way so that there is at least some potential to provide a causal interpretation to the results. I can imagine circumstances where you want to do it the way you have, where causality is not relevant to the goals of the analysis, but these are all relatively uncommon situations. So just think about this issue and proceed, or change course, accordingly.

      Also Added: Crossed with #2.


      Last edited by Clyde Schechter; 20 Sep 2023, 10:13.

      Comment

      Working...
      X