Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Analyzing panel-data with multiple observations per time period (or two-way fixed effect)

    Hi all,

    I use Stata 15.1.
    I have a daily data (t=90) on Facebook linkclicks (DV), in three different cities (FE). Yet, for each day, each city has 5 data observations (one for each of five age-categories).
    I'm not sure how to run a model as a panel data which accounts for potential time series issues. Below is small sample for two days of the Facebook campaign.
    Thanks!
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str10 reportingstarts byte(campaign_day city_code day_of_week) float(weekend age_cat gender_i) byte linkclicks1 int(reach impressions)
    "2017-01-25" 1 2 4 0 2 2 1 1375 1384
    "2017-01-25" 1 2 4 0 2 1 1 1209 1219
    "2017-01-25" 1 2 4 0 4 2 0   20   21
    "2017-01-25" 1 2 4 0 3 1 0  464  465
    "2017-01-25" 1 2 4 0 4 1 0   31   31
    "2017-01-25" 1 2 4 0 3 2 0  456  459
    "2017-01-25" 1 3 4 0 2 1 1  399  400
    "2017-01-25" 1 3 4 0 4 1 0    7    7
    "2017-01-25" 1 3 4 0 3 2 0  396  401
    "2017-01-25" 1 3 4 0 3 1 0  161  162
    "2017-01-25" 1 3 4 0 2 2 1 1032 1036
    "2017-01-25" 1 3 4 0 4 2 0   19   19
    "2017-01-25" 1 4 4 0 4 2 0   12   13
    "2017-01-25" 1 4 4 0 2 1 0  234  235
    "2017-01-25" 1 4 4 0 3 1 1  106  106
    "2017-01-25" 1 4 4 0 3 2 0  383  384
    "2017-01-25" 1 4 4 0 4 1 0    8    8
    "2017-01-25" 1 4 4 0 2 2 1  848  852
    "2017-01-26" 2 2 5 1 2 1 1  956  962
    "2017-01-26" 2 2 5 1 4 1 0   17   18
    "2017-01-26" 2 2 5 1 3 2 0  377  380
    "2017-01-26" 2 2 5 1 2 2 0 1063 1068
    "2017-01-26" 2 2 5 1 3 1 0  389  389
    "2017-01-26" 2 2 5 1 4 2 0   15   15
    "2017-01-26" 2 3 5 1 4 1 0    9    9
    "2017-01-26" 2 3 5 1 2 2 2  871  873
    "2017-01-26" 2 3 5 1 2 1 1  345  345
    "2017-01-26" 2 3 5 1 4 2 0   20   20
    "2017-01-26" 2 3 5 1 3 1 1  136  137
    "2017-01-26" 2 3 5 1 3 2 1  324  326
    "2017-01-26" 2 4 5 1 4 2 0   13   14
    "2017-01-26" 2 4 5 1 3 2 0  342  343
    "2017-01-26" 2 4 5 1 2 2 1  799  801
    "2017-01-26" 2 4 5 1 2 1 0  252  253
    "2017-01-26" 2 4 5 1 4 1 0    3    3
    "2017-01-26" 2 4 5 1 3 1 0   93   93
    end
    label values city_code city
    label def city 2 "Ramet_Gan", modify
    label def city 3 "Rishon_lezion", modify
    label def city 4 "Tel-Aviv", modify
    label values day_of_week week_day
    label def week_day 4 "Wednesday", modify
    label def week_day 5 "Thursday", modify
    label values age_cat age_cat
    label def age_cat 2 "Age_25-34", modify
    label def age_cat 3 "Age_35-44", modify
    label def age_cat 4 "Age_45-54", modify
    label values gender_i gender_1
    label def gender_1 1 "Male", modify
    label def gender_1 2 "Female", modify

  • #2
    It seems to me that something went wrong with the -dataex- command , (although I followed the routine as required).

    Comment


    • #3
      I'm not sure how to run a model as a panel data which accounts for potential time series issues.
      Well, you can't define lags and leads if there is more than one observation per time period for a given panel. So you can't use the city as the panel. It also appears that in your data, the duplication within cities extends beyond age categories but also to gender.
      Code:
      egen panel = group(city_code age_cat gender_i)
      xtset panel campaign_day
      This will allow you to use time-series operators and autoregressive structures in your analyses. The question, however, is whether this is from other perspectives, a meaningful definition of panel.

      The other question is what exactly you mean by "time series issues." If you are not going to be using lags/leads/seasonal diff/diff or autoregressive correlation structures in your analysis, then all of Stata's -xt- commands will work just fine without a time variable specified in your -xtset- command. So if you think that the meaningful definition of panel is just the city, and you don't need those features, just do -xtset city_code- and proceed.

      Comment


      • #4
        Thank you Clyde, in fact I want to test for autoregressive correlation (whether the amount of clicks in day t is correlated with that of the previous days) and whether the day of the week matters (seasonality) . So it seems that I'll have to specify time variable. In case no autocorrelation or seasonality is found, I'll feel calmer to move to Stata's -xt- commands..

        Comment


        • #5
          Anat:
          as an aside to Clyde's helpful advice, please note that if you have a T>N panel dataset, Stata includes -xt- commands devised to deal with long panels and allow you to model autocorrelation in a richer way (see -xtregar- and -xtgls-).
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment


          • #6
            Thank you very much Carlo. Another feature my data reveals is that the DV (a count variable , i.e. the no. of clicks) receives the value zero occasionally.
            I would Ideally want to apply a model like the ZIP (zero inflated Poisson model), but I can't ignore the fact that I have time dimension.
            The question is, in case I can confirm that no serial autocorrelation exist, could I use the ZIP while controlling for seasonality (day of the week) and trend (campaign_day)?

            Comment


            • #7
              Anat:
              Poisson distribution allows zero counts.
              ZIP is a reasonable option when the number of zeros in your data exceeds the one theoretically allowed by the lambda (mean and, in theory, variance) parameter of the Poisson distribution that fits your data.
              Kind regards,
              Carlo
              (Stata 19.0)

              Comment


              • #8
                Hello Carlo, I have a problem when I want to perform autocorrelation test. It stated that sample may not include multiple panels. Could you explain to me what it means? and what should I use as the command if I want to perform autocorrelation in stata? Please your feedback

                Comment


                • #9
                  Gabriela:
                  welcome to this forum.
                  If you have repeated time values within panel in your dataset, you can -xtset- your data with the -panelid- only. Otherwise, Stata will return an error message, as you can see from the following toy-example:
                  Code:
                  . set obs 2
                  number of observations (_N) was 0, now 2
                  
                  . g id=1
                  
                  . g year=2018
                  
                  . xtset id
                         panel variable:  id (balanced)
                  
                  . xtset id year
                  repeated time values within panel
                  r(451);
                  As the community-contributed command (as per FAQ you should have mentioned) -xtserial- needs -timevar- to work, in your case it obviously cannot do its job.
                  I'm not aware of othet tests for detecting autocorrelation that allow repeated time values within panel (but this may well be my fault).
                  Although you do not report anything about your panel, let's assume that it is N>T and so you go -xtreg-. If you detect heteroskedastcity in the distribution of the idiosyncratic error you should invoke clustered robust standard errors to deal with it. Under -xtreg-, clustered robust standard errors accomodate autocorrelation, too (without the need of testing it).
                  Kind regards,
                  Carlo
                  (Stata 19.0)

                  Comment


                  • #10
                    Hi all,
                    My data set is as follows.
                    Year Acquirer A_CUSIP A_ISIN A_Ticker
                    1999 3COM CORPORATION 88553510 US8855351040 COMS
                    1999 3COM CORPORATION 88553510 US8855351040 COMS
                    2000 3COM CORPORATION 88553510 US8855351040 COMS
                    2000 3COM CORPORATION 88553510 US8855351040 COMS
                    2000 3COM CORPORATION 88553510 US8855351040 COMS
                    2000 3COM CORPORATION 88553510 US8855351040 COMS
                    2009 3D SYSTEMS CORPORATION 88554D20 US88554D2053 DDD
                    2009 3D SYSTEMS CORPORATION 88554D20 US88554D2053 DDD
                    2009 3D SYSTEMS CORPORATION 88554D20 US88554D2053 DDD
                    2010 3D SYSTEMS CORPORATION 88554D20 US88554D2053 DDD
                    2010 3D SYSTEMS CORPORATION 88554D20 US88554D2053 DDD
                    2010 3D SYSTEMS CORPORATION 88554D20 US88554D2053 DDD
                    2010 3D SYSTEMS CORPORATION 88554D20 US88554D2053 DDD
                    2010 3D SYSTEMS CORPORATION 88554D20 US88554D2053 DDD
                    2011 3D SYSTEMS CORPORATION 88554D20 US88554D2053 DDD
                    2011 3D SYSTEMS CORPORATION 88554D20 US88554D2053 DDD
                    2011 3D SYSTEMS CORPORATION 88554D20 US88554D2053 DDD
                    2011 3D SYSTEMS CORPORATION 88554D20 US88554D2053 DDD
                    2011 3D SYSTEMS CORPORATION 88554D20 US88554D2053 DDD
                    2011 3D SYSTEMS CORPORATION 88554D20 US88554D2053 DDD
                    I have more than one observation in a year, I assume I can not use fixed effect as the panels have more than one observations (the same company has more than two entries per year). I have to use Poisson and logit models in the analysis. Can I use the fixed-effect model or how can I treat for time and firm effect in the regression ?
                    Last edited by Inoka Kumari; 24 Mar 2021, 19:59.

                    Comment


                    • #11
                      In the example data you show, the multiple observations in a given year are, in fact, all identical. If this represents your full data, then I would have to ask you what is the point of having all these repetitions of identical observations. It suggests something went wrong in the data management that create the data set, and you should review that.

                      That said, the example data also shows nothing on which you would perform a Poisson or logistic regression, so presumably there are other variables in the data set, and perhaps the observations do show differences in those variables. If this is the case, you can still use the -xt- suite of commands for fixed-effects models. Just do not specify a time variable when you -xtset- the data. Just -xtset firm_id- where firm_id is some numeric variable that identifies the firms uniquely. You will still be able to use -xtlogit- or -xtpoisson-. And you will be able to specify a time fixed effect in your model by including i.year among your predictor variables. The only things you won't be able to do is use lag or forward operators, or estimate autoregressive structure.

                      Comment


                      • #12
                        Hello Clyde,
                        Thank you very much for your prompt reply. Apart from the above variables, I have different observations.

                        Comment

                        Working...
                        X