Dear Statalisters,
I'm currently working on my dissertation on the Impact of Immigration on Crime in England, Wales and Scotland. For that I've used the Annual Population Survey and ONS' crime statistics. I'm not very good with Stata and I'm relatively new to it. My data spans from 2007-2014.
I needed to aggregate my data by region (12 regions) and year as I was getting the repeated time values error when I tried declaring my data as panel data. So I used the egen command and aggregated all my variables by region and year.
However, when I did that, I realized taking the mean of categorical variables made no sense, where if religion was a categorical variable, 1 would be Christian, 2 Buddhist, 3 Hindu, 4 Jewish, 5 Muslim, 6 Sikh, 7 Other and 8 No religion.
To avoid that, I created dummy variables for all my categorical variables. So, for instance, I created a dummy for Christians, where it would be 0 if the person isn't a Christian and 1 if they were. After this, I took means for all variables by region and year using the egen command.
For example, by region year: egen christi = mean(christian)
I have attached a screenshot of what my data looks like in Excel:

I just wanted to know if I was doing anything wrong? Am I on the right path?
The reason I ask this is because I was getting a lot of negative coefficients and insignificant results when using the xtreg command.
My dependent variable is total crime/total population (the data is separate for every region and every year).
Any help or guidance or even assurance as soon as possible would be greatly appreciated!
Thank you so much for your time.
Kind regards,
Anuvi Godha
I'm currently working on my dissertation on the Impact of Immigration on Crime in England, Wales and Scotland. For that I've used the Annual Population Survey and ONS' crime statistics. I'm not very good with Stata and I'm relatively new to it. My data spans from 2007-2014.
I needed to aggregate my data by region (12 regions) and year as I was getting the repeated time values error when I tried declaring my data as panel data. So I used the egen command and aggregated all my variables by region and year.
However, when I did that, I realized taking the mean of categorical variables made no sense, where if religion was a categorical variable, 1 would be Christian, 2 Buddhist, 3 Hindu, 4 Jewish, 5 Muslim, 6 Sikh, 7 Other and 8 No religion.
To avoid that, I created dummy variables for all my categorical variables. So, for instance, I created a dummy for Christians, where it would be 0 if the person isn't a Christian and 1 if they were. After this, I took means for all variables by region and year using the egen command.
For example, by region year: egen christi = mean(christian)
I have attached a screenshot of what my data looks like in Excel:
I just wanted to know if I was doing anything wrong? Am I on the right path?
The reason I ask this is because I was getting a lot of negative coefficients and insignificant results when using the xtreg command.
My dependent variable is total crime/total population (the data is separate for every region and every year).
Any help or guidance or even assurance as soon as possible would be greatly appreciated!
Thank you so much for your time.
Kind regards,
Anuvi Godha
Comment