Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How do I aggregate data at regional level?

    Hello, sorry for the basic question (I'm new to econometric analysis and still very insecure): I have repeated cross-sectional data and want to compare the impact of a treatment variable in year 1 on a certain outcome variable in year 2 (compared to the baseline of this same outcome in year 1). Since it's not panel data, I was thinking of aggregating the data at a higher level than individuals (=region), which is common in both datasets, thereby "constructing" panel data. How does the aggregation work (which commands)? Your help would be highly appreciated!! Thanks

  • #2
    Consider using collapse
    Regards
    --------------------------------------------------
    Attaullah Shah, PhD.
    Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
    FinTechProfessor.com
    https://asdocx.com
    Check out my asdoc program, which sends outputs to MS Word.
    For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.

    Comment


    • #3
      Annabell,
      welcome to the list (as per FAQ, please note the preference for full real names on this list. Thanks).
      Things would be easier if you told us what you mean by "comparing".
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment


      • #4
        Hi and many thanks, I did collapse the data (which was definitely the right thing to do) and now I'm in front of another challenge and would highly appreciate your help. The following is my situation: I want to conduct an impact evaluation of job search behaviour on employment probability with the repeated cross-sectional data, as mentioned above, but I only have information on the individual treatment status in the baseline data. Meaning: Does searching (yes or no) in t0 lead to a higher probability of being employed in t1? Both variables were dummies in the original datasets. I aggregated the data at regional level by collapsing and now I have n= 23 (regions) and means per region for both variables. I was trying to conduct a t-test on the difference in being employed in t1 compared to t0 by the search variable, but stata tells me that this is not possible because it has more than two groups (it has 23 groups with 23 mean values all between 0 and 1). I'm sorry for this basic question, but I'm really in the dark right now. Can I conduct a difference-in-difference approach in this situation? Is the t-test the right tool to start with? And what would be the next step? Normally I would have applied probit regression (because I want to say something about the probability of employment), including further variables, but now my outcome is not a dummy anymore.

        Many many thanks for your advice.

        Best,
        Anna

        Comment

        Working...
        X