Help with database

Ale Spec

Join Date: Aug 2018

Posts: 4
#1

Help with database

20 Aug 2018, 22:19

Hi everyone

I'm working with the attached database: https://drive.google.com/open?id=1j0...6beozV2ShTmAvB
I'm analysing the factors that affects an vehicle accident in different regions.

The first 52 variables are regions (or commune in this county): =1 if the accident occurred in this region (and of course the others 51 are 0)

then the next 11 are the factors of the accident:

alcoh =1 if accident is caused by drunk driver
impr =1 if accident is caused by imprudent driver
rdef =1 if accident is caused by road deficiencies
drugs =1 if accident is caused by drugged driver

The 4 factors of above are the main factors where: if the accident was caused by alcoh (1), then impr, rdef and drugs are 0

The next ones are independents between each others

uzone =1 if accident occurred in urban zone
sexm =1 if male
age age in driver
popul population per commune
numlic number of driver licences per commune
invexpn investment expenditures per commune
dlight =1 if accident occurred in daylight

My intentions are:

a) Create a table in STATA where I can analyse or show the 52 regions with the 11 factors. How can I do this?

I tried with one region (santiago) but it says this:

reg santiago alcoh impr rdef drugs uzone sexm age popul numlic invexpn dlight
note: rdef omitted because of collinearity

b) should I delete the variable rdef and the observations that have 1 on rdef?

thanks
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#2

21 Aug 2018, 02:31

Ale:
welcome to this forum.
a) first, you should create a .dta file by importing your dataset from its original format (see -help .dta-) Please note that listers are reluctant to download files from unknown sources due to risk of active contents; see the FAQ on how to share an excerpt/example of your dataset via -dataex-. Thanks.
If your dependent variable is the number of accidents in different region, -region. should be an independent variable, not the regressand. Moreover, being the number of accidents a count variable, you should consider -poisson- instead of -regress- (Please be aware of the theoretical differences concerning different regression models to avoid your analysis being software-led instead of researcher-driven).
b) assuming that your model is correctly specified (and as per your description I doubt that it is) no, you should not. Correlation relates to your model specification, which, in turn, should give a fair and true view of the data generating process.

Kind regards,
Carlo
(Stata 19.0)
Comment

Ale Spec

Join Date: Aug 2018
Posts: 4

21 Aug 2018, 22:49

Originally posted by Carlo Lazzaro View Post

Ale:
welcome to this forum.
a) first, you should create a .dta file by importing your dataset from its original format (see -help .dta-) Please note that listers are reluctant to download files from unknown sources due to risk of active contents; see the FAQ on how to share an excerpt/example of your dataset via -dataex-. Thanks.
If your dependent variable is the number of accidents in different region, -region. should be an independent variable, not the regressand. Moreover, being the number of accidents a count variable, you should consider -poisson- instead of -regress- (Please be aware of the theoretical differences concerning different regression models to avoid your analysis being software-led instead of researcher-driven).
b) assuming that your model is correctly specified (and as per your description I doubt that it is) no, you should not. Correlation relates to your model specification, which, in turn, should give a fair and true view of the data generating process.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte(PROVIDENCIA RENCA RECOLETA SANJOAQUIN SANRAMON VITACURA ELMONTE ISLADEMAIPO PADREHURTADO SANPEDRO alcoh impr rdef drugs uzone sexm edad) long pobxcom int(numlic invexpn) byte dlight
0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 1 18  35923  5505  433 1
0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 1 50  35923  5505  433 1
0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 1 31  63250  5012 1180 0
0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 1 48  63250  5012 1180 0
0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 1 48  63250  5012 1180 0
0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 35  35923  5505  433 0
0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 56  35923  5505  433 0
0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 1 34  35923  5505  433 0
0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 1 52  35923  5505  433 0
0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 1 26   9726  7989  667 1
0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 1 50   9726  7989  667 1
1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 26 142079 18770 5260 0
1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 59 142079 18770 5260 1
1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 26 142079 18770 5260 1
1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 31 142079 18770 5260 1
1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 34 142079 18770 5260 1
1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 53 142079 18770 5260 1
1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 56 142079 18770 5260 1
0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 1 50  36219  3250  140 0
1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 19 142079 18770 5260 0
1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 53 142079 18770 5260 0
1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 60 142079 18770 5260 0
1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 42 142079 18770 5260 0
1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 28 142079 18770 5260 0
1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 39 142079 18770 5260 0
0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 1 28  63250  5012 1180 0
0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 1 24  63250  5012 1180 1
0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 1 34  63250  5012 1180 1
0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 1 37  63250  5012 1180 1
0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 1 52  63250  5012 1180 0
1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 30 142079 18770 5260 1
1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 74 142079 18770 5260 0
0 1 0 0 0 0 0 0 0 0 0 1 0 0 1 1 24 147151  7172 1096 0
0 1 0 0 0 0 0 0 0 0 0 1 0 0 1 1 63 147151  7172 1096 0
0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 0 30  85384 14403 6039 0
0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 1 28  85384 14403 6039 0
0 0 1 0 0 0 0 0 0 0 0 1 0 0 1 0 45 157851 10117 1401 1
0 0 1 0 0 0 0 0 0 0 0 1 0 0 1 1 23 157851 10117 1401 1
0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 1 37  36219  3250  140 1
0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 1 49  36219  3250  140 1
1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 69 142079 18770 5260 1
1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 44 142079 18770 5260 1
1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 56 142079 18770 5260 1
0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 1 55  85384 14403 6039 0
0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 1 57  85384 14403 6039 0
0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 46  36219  3250  140 1
1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 25 142079 18770 5260 1
1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 49 142079 18770 5260 1
1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 35 142079 18770 5260 1
1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 27 142079 18770 5260 1
1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 35 142079 18770 5260 1
1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 27 142079 18770 5260 1
0 0 1 0 0 0 0 0 0 0 0 1 0 0 1 0 29 157851 10117 1401 1
0 0 1 0 0 0 0 0 0 0 0 1 0 0 1 1 21 157851 10117 1401 1
0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 1 22  85384 14403 6039 0
0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 1 62  85384 14403 6039 0
0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 30  35923  5505  433 1
0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 48  35923  5505  433 1
0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 1 36  36219  3250  140 1
0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 1 42  36219  3250  140 1
0 1 0 0 0 0 0 0 0 0 0 1 0 0 1 0 43 147151  7172 1096 1
0 1 0 0 0 0 0 0 0 0 0 1 0 0 1 1 20 147151  7172 1096 1
0 0 0 0 1 0 0 0 0 0 1 0 0 0 1 1 48  82900  5893 3533 0
0 0 0 0 0 0 0 0 1 0 1 0 0 0 1 1 25  63250  5012 1180 1
0 0 0 0 1 0 0 0 0 0 1 0 0 0 1 1 25  82900  5893 3533 1
0 0 0 0 1 0 0 0 0 0 1 0 0 0 1 1 47  82900  5893 3533 1
1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 38 142079 18770 5260 1
1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 55 142079 18770 5260 1
1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 48 142079 18770 5260 1
1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 36 142079 18770 5260 1
1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 49 142079 18770 5260 1
1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 32 142079 18770 5260 0
1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 34 142079 18770 5260 0
0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 1 41  35923  5505  433 0
0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 1 41  35923  5505  433 0
1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 53 142079 18770 5260 1
0 0 1 0 0 0 0 0 0 0 0 1 0 0 1 1 34 157851 10117 1401 1
0 0 1 0 0 0 0 0 0 0 0 1 0 0 1 1 38 157851 10117 1401 1
0 0 1 0 0 0 0 0 0 0 0 1 0 0 1 1 54 157851 10117 1401 1
0 0 0 1 0 0 0 0 0 0 0 1 0 0 1 1 48  94492 12268 1095 1
0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 1 37  35923  5505  433 1
0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 1 39  35923  5505  433 1
0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 1 67  85384 14403 6039 1
1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 20 142079 18770 5260 1
1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 32 142079 18770 5260 1
1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 18 142079 18770 5260 1
1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 29 142079 18770 5260 1
0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 1 27 157851 10117 1401 1
0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 1 55 157851 10117 1401 1
1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 27 142079 18770 5260 0
1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 44 142079 18770 5260 0
0 0 0 0 0 0 0 1 0 0 1 0 0 0 1 0 20  36219  3250  140 0
0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 1 32  85384 14403 6039 0
0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 1 40  85384 14403 6039 0
0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 1 61  85384 14403 6039 0
0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 1 80  85384 14403 6039 0
1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 34 142079 18770 5260 1
1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 46 142079 18770 5260 1
0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 1 25  35923  5505  433 1
0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 1 47  35923  5505  433 1
end

Thanks for your answer,

I get the idea that region (number of accidents per regions) is an independent variable, that is my intention.
I heard from a teacher that I may need to work with poisson, but I don´t know how to do that if I have to use stata, I mean, Can Stata work with poisson?

I had to delete some regions, I only left 10 of them (and of course I deleted the observations of the another regions), so I could use dataex.

So regarding my question Can i create a table where I can analyse or show the 10 regions with the 11 factors. How can I do this?

Thanks

Comment

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#4

22 Aug 2018, 02:25

Alex:
thanks for providing a -dataex- example; as a by-product, it creates a .dta file which is useful to perform the analysis.
However, I cannot find your dependent variable (ie, number of accidents).
Stata can perform -poisson- regression.
As far as table creation is concerned, you may want to take a look at the user-written programme -outreg- (type -search outreg- from within Stata to spot and install it).

Kind regards,
Carlo
(Stata 19.0)
Comment
Ale Spec

Join Date: Aug 2018

Posts: 4
#5

22 Aug 2018, 08:17

Originally posted by Carlo Lazzaro View Post

Alex:
thanks for providing a -dataex- example; as a by-product, it creates a .dta file which is useful to perform the analysis.
However, I cannot find your dependent variable (ie, number of accidents).
Stata can perform -poisson- regression.
As far as table creation is concerned, you may want to take a look at the user-written programme -outreg- (type -search outreg- from within Stata to spot and install it).

Can it be that the number of rows are the dependent variables because they are the number of accidents?
Let's say that the number of accidents in PROVIDENCIA

for example, lets say if the coef. of alcoh is positive, and if there is a drunk driver in PROVIDENCIA can increase the number of accidents or if age is positive, a person with 50 years old can increase the number of accidents.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#6

22 Aug 2018, 08:58

Ale:
you should have a dependent variable to run a regression (in your case the number of accidents).
Alcohol intake is a reasonable predictor for the number of accidents when adjusted for the remaining independent variable.

Kind regards,
Carlo
(Stata 19.0)
Comment
Ale Spec

Join Date: Aug 2018

Posts: 4
#7

23 Aug 2018, 22:51

Originally posted by Carlo Lazzaro View Post

Ale:
you should have a dependent variable to run a regression (in your case the number of accidents).
Alcohol intake is a reasonable predictor for the number of accidents when adjusted for the remaining independent variable.

Thanks I've been working with my table, so I've come to this:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input byte idcom int nofac long popxc int(numxc invexc) float(agem alcom imprm rdefm drugsm uzonem sexmm dlightm) 1 3066 142079 18770 5260 41.42727 .0084801 .9849967 .0022831 .0042401 .9990215 .7462492 .6868885 2 469 147151 7172 1096 39.44563 .1023454 .8678038 .010661 .0191898 .9872068 .8251599 .6865672 3 923 157851 10117 1401 40.52113 .0834236 .9100758 .0054171 .0010834 .9989166 .8353196 .5904658 4 159 94492 12268 1095 40.18868 .0566038 .9119497 0 .0314465 1 .8427673 .6603774 5 418 82900 5893 3533 39.99282 .1674641 .8253589 .007177 0 .9976076 .8708134 .5645933 6 1053 85384 14403 6039 41.37702 .0978158 .8964862 .0047483 .0009497 .985755 .6970561 .6087369 7 294 35923 5505 433 40.71088 .1564626 .8197279 0 .0238095 .707483 .8333333 .6122449 8 132 36219 3250 140 40.5303 .2727273 .6893939 .0227273 .0151515 .9469697 .8409091 .5454546 9 488 63250 5012 1180 40.47131 .0553279 .9282787 .0020492 .0143443 .5860656 .8258197 .6393443 10 135 9726 7989 667 39.93333 .1185185 .7555556 .0592593 .0666667 .0518519 .8296296 .637037 end

I started by generating an id per commune

gen idcom=1 if providencia==1
replace idcom=2 if renca==1........ I did this until idcom=10 which is sanpedro

So now I have 10 observations (10 regions)

then I created a column with the number of accidents per commune nofac (This is now my Dependant Variable)

the variables population, investment expenditures, number of licences per commune I already had them

for the age I worked with the mean

bys idcom: egen agem=mean(age)

But I don't know how to generate a value per commune for the another dummy variables (alcoh, rdef, drugs,impr, sexm,dlight,uzone)

I was thinking of generating the values by calculating the percentage of accidents caused by alcoh, rdef, drugs,impr, sexm,dlight,uzone.
So i calculated the mean of them.

But this doesn't seem to work because the bigger model (the one with 52 regions) I got p-values extremely big.

There is a correct way to calculate the values of the dummy variables per commune?

Thanks
Comment

Announcement