Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using expand in Stata - Implications

    I wanted to expand my observations in a dataset to increase the sample size. However, I can't find much in the literature about the implications of doing that. What would be the pros and cons of replicating observations?

  • #2
    The cons are larger dataset size and possibly redundancy depending on what you want to do. For example, expanding on integer frequencies is rarely needed and could bloat a dataset mightily. You wouldn't (shouldn't) expand 50 states of the USA to a dataset with hundreds of millions of observations.

    The pros are that you make explicit whatever needs to be explicit. That can be useful with panel data and some other data with time information. For example, people often come with entry and exit dates (employment or hospitalization spells, or whatever) and keeping track of durations or the overall state of the system can be awkward to impossible without an expansion of the data.

    That said, I did encounter in a different forum the idea that just using several copies of each observation was a way to counteract small sample size. I had to point out that if this was a valid method it would be mentioned in every introductory text and indeed that much that is in introductory texts could be omitted as unnecessary.
    Last edited by Nick Cox; 18 May 2023, 11:58.

    Comment

    Working...
    X