How do I aggregate data at regional level?

Annabell B

Join Date: Jul 2015

Posts: 5
#1

How do I aggregate data at regional level?

27 Jul 2015, 06:32

Hello, sorry for the basic question (I'm new to econometric analysis and still very insecure): I have repeated cross-sectional data and want to compare the impact of a treatment variable in year 1 on a certain outcome variable in year 2 (compared to the baseline of this same outcome in year 1). Since it's not panel data, I was thinking of aggregating the data at a higher level than individuals (=region), which is common in both datasets, thereby "constructing" panel data. How does the aggregation work (which commands)? Your help would be highly appreciated!! Thanks
Tags: None
Attaullah Shah

Join Date: Aug 2014

Posts: 1669
#2

27 Jul 2015, 06:36

Consider using collapse

Regards
--------------------------------------------------
Attaullah Shah, PhD.
Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
FinTechProfessor.com
https://asdocx.com
Check out my asdoc program, which sends outputs to MS Word.
For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17739
#3

27 Jul 2015, 06:48

Annabell,
welcome to the list (as per FAQ, please note the preference for full real names on this list. Thanks).
Things would be easier if you told us what you mean by "comparing".

Kind regards,
Carlo
(Stata 19.0)
Comment
Annabell B

Join Date: Jul 2015

Posts: 5
#4

29 Jul 2015, 10:24

Hi and many thanks, I did collapse the data (which was definitely the right thing to do) and now I'm in front of another challenge and would highly appreciate your help. The following is my situation: I want to conduct an impact evaluation of job search behaviour on employment probability with the repeated cross-sectional data, as mentioned above, but I only have information on the individual treatment status in the baseline data. Meaning: Does searching (yes or no) in t0 lead to a higher probability of being employed in t1? Both variables were dummies in the original datasets. I aggregated the data at regional level by collapsing and now I have n= 23 (regions) and means per region for both variables. I was trying to conduct a t-test on the difference in being employed in t1 compared to t0 by the search variable, but stata tells me that this is not possible because it has more than two groups (it has 23 groups with 23 mean values all between 0 and 1). I'm sorry for this basic question, but I'm really in the dark right now. Can I conduct a difference-in-difference approach in this situation? Is the t-test the right tool to start with? And what would be the next step? Normally I would have applied probit regression (because I want to say something about the probability of employment), including further variables, but now my outcome is not a dummy anymore.

Many many thanks for your advice.

Best,
Anna
Comment

Announcement

How do I aggregate data at regional level?

Comment

Comment

Comment