Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Panel data with irregularly spaced observations


    I am using survey data where 500,000 firms are surveyed once every three years (either in the 2nd or 4th qtr). So for instance for data in 2005, the firms could have been surveyed once during any of the periods 2002Q4-2005Q2. Then again in 2008, the firms could have been surveyed once during any of the periods 2005Q4-2008Q2 and so on. Thus the spacing between any two observations varies from firm to firm. I have between 2 to 5 observations on each firm. My DVs are firm employment and employment growth. My independent variables include age of the firm, a number of initial conditions measured when the firm first appears in the dataset including initial employment, initial productivity, etc. Controls include covariates varying by location-time, and location, year and industry fixed effects and I would want to cluster standard errors by firm. How do I account for the irregular spacing over time – both for the employment and the employment growth variable? Thanks Dana

  • #2
    Dana:
    you may want to take a look at -mi- or -ipolate- for imputing missing observations (provided these options match your needs and are theoretically consistent with your data).
    Otherwise, you may want to let Stata deal with your unbalanced panel dataset.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Thank you Carlo. I want to stay away from imputation for now and use all the available data that I have. But using xtset and treating it as an unbalanced panel does not seem right to me since its not the case of missing observations rather that the observations are at irregularly spaced intervals (I am not even sure what is the correct time variable to use with xtset)

      Comment


      • #4
        I don't know what sort of modeling you have in mind for this data. But if the variable for firm employment means the number of people employed by the firm at the given time, and if you want to model how it evolves over time, I would first define some reference starting date (which might be done separately for each firm, perhaps the first time it appears in the data) and then calculate a new variable: elapsed time = current_date - reference_date, and then do an -xtpoisson- model with the elapsed time variable set as the -exposure()- option.

        You can still -xtset firm-. I would probably avoid specifying a time variable in the -xtset- for this kind of data and situation; you won't need it anyway as it's really only critical for things like lags and leads or autoregressive correlation structures. But in irregularly spaced data you can't really use those anyway.

        Comment


        • #5
          I hadn't thought of using the employment as a count variable so yes I could use xtpoisson. Thank you. But my main DV are actually continuous variables - firm performance measured as growth or productivity and so xtpoisson won't help here. I don't have immediate lags/leads but my main independent variables include the starting position of the firm - so total number of employees or productivity the first time the firm appears in the data, which is typically when its newly born since this is a database on new firms.

          Comment


          • #6
            Although poisson regression is most commonly used for count variables, it can also be used with non-negative continuous variables. Stata will give you a warning that you are responsible for interpretation of non-count dependent variables, but it will run and the results can be perfectly sensible. It is particularly useful with skew-distribution outcomes, or outcomes representing relative growth/shrinkage, where one might otherwise be tempted to log-transform, see http://blog.stata.com/2011/08/22/use...tell-a-friend/.

            Comment

            Working...
            X