Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Aggregating categorical variables

    Dear Statalisters,

    I'm currently working on my dissertation on the Impact of Immigration on Crime in England, Wales and Scotland. For that I've used the Annual Population Survey and ONS' crime statistics. I'm not very good with Stata and I'm relatively new to it. My data spans from 2007-2014.
    I needed to aggregate my data by region (12 regions) and year as I was getting the repeated time values error when I tried declaring my data as panel data. So I used the egen command and aggregated all my variables by region and year.
    However, when I did that, I realized taking the mean of categorical variables made no sense, where if religion was a categorical variable, 1 would be Christian, 2 Buddhist, 3 Hindu, 4 Jewish, 5 Muslim, 6 Sikh, 7 Other and 8 No religion.
    To avoid that, I created dummy variables for all my categorical variables. So, for instance, I created a dummy for Christians, where it would be 0 if the person isn't a Christian and 1 if they were. After this, I took means for all variables by region and year using the egen command.
    For example, by region year: egen christi = mean(christian)
    I have attached a screenshot of what my data looks like in Excel:

    Click image for larger version

Name:	Screenshot (118).png
Views:	1
Size:	29.3 KB
ID:	1332639


    I just wanted to know if I was doing anything wrong? Am I on the right path?
    The reason I ask this is because I was getting a lot of negative coefficients and insignificant results when using the xtreg command.
    My dependent variable is total crime/total population (the data is separate for every region and every year).
    Any help or guidance or even assurance as soon as possible would be greatly appreciated!

    Thank you so much for your time.
    Kind regards,
    Anuvi Godha
    Last edited by Anuvi Godha; 25 Mar 2016, 16:52. Reason: Edited because data got distorted after posting!

  • #2
    I would think you might want to approach modeling the data from a mixed effects perspective where you would include a time fixed effect vs treating it as a panel data problem. This could also give you a bit more flexibility in the case you have observations that are cross-classified but it isn't clear from your description how your dependent variable is measure and/or why the data need to be collapsed.

    Comment


    • #3
      My dependent variable is total crime incidents/population (for every region). I needed to collapse the data because I had individual-level data from the Annual Population Survey, where there were multiple observations in the same year and region. This was creating the Repeated Time Values error in Stata when I tried to declare my data as panel data. So I used the egen command to collapse all my variables into giving the respective means. After aggregating, I deleted the duplicated observations.
      Alternatively, I tried using the xtabond command (xtabond totcrimepop agef durun sex_1 emp unemp inact sing divorsep tgpaywk imm, lags(1) twostep artests(2) ).
      I have attached the results. I'm concerned about the autocorrelation test and sargan test statistics. Have I done something wrong? In case you need any further information, please do let me know!

      Thank you so much for your reply!

      Comment

      Working...
      X