Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help with database

    Hi everyone

    I'm working with the attached database: https://drive.google.com/open?id=1j0...6beozV2ShTmAvB
    I'm analysing the factors that affects an vehicle accident in different regions.

    The first 52 variables are regions (or commune in this county): =1 if the accident occurred in this region (and of course the others 51 are 0)

    then the next 11 are the factors of the accident:

    alcoh =1 if accident is caused by drunk driver
    impr =1 if accident is caused by imprudent driver
    rdef =1 if accident is caused by road deficiencies
    drugs =1 if accident is caused by drugged driver

    The 4 factors of above are the main factors where: if the accident was caused by alcoh (1), then impr, rdef and drugs are 0

    The next ones are independents between each others

    uzone =1 if accident occurred in urban zone
    sexm =1 if male
    age age in driver
    popul population per commune
    numlic number of driver licences per commune
    invexpn investment expenditures per commune
    dlight =1 if accident occurred in daylight

    My intentions are:

    a) Create a table in STATA where I can analyse or show the 52 regions with the 11 factors. How can I do this?

    I tried with one region (santiago) but it says this:

    reg santiago alcoh impr rdef drugs uzone sexm age popul numlic invexpn dlight
    note: rdef omitted because of collinearity

    b) should I delete the variable rdef and the observations that have 1 on rdef?

    thanks

  • #2
    Ale:
    welcome to this forum.
    a) first, you should create a .dta file by importing your dataset from its original format (see -help .dta-) Please note that listers are reluctant to download files from unknown sources due to risk of active contents; see the FAQ on how to share an excerpt/example of your dataset via -dataex-. Thanks.
    If your dependent variable is the number of accidents in different region, -region. should be an independent variable, not the regressand. Moreover, being the number of accidents a count variable, you should consider -poisson- instead of -regress- (Please be aware of the theoretical differences concerning different regression models to avoid your analysis being software-led instead of researcher-driven).
    b) assuming that your model is correctly specified (and as per your description I doubt that it is) no, you should not. Correlation relates to your model specification, which, in turn, should give a fair and true view of the data generating process.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Originally posted by Carlo Lazzaro View Post
      Ale:
      welcome to this forum.
      a) first, you should create a .dta file by importing your dataset from its original format (see -help .dta-) Please note that listers are reluctant to download files from unknown sources due to risk of active contents; see the FAQ on how to share an excerpt/example of your dataset via -dataex-. Thanks.
      If your dependent variable is the number of accidents in different region, -region. should be an independent variable, not the regressand. Moreover, being the number of accidents a count variable, you should consider -poisson- instead of -regress- (Please be aware of the theoretical differences concerning different regression models to avoid your analysis being software-led instead of researcher-driven).
      b) assuming that your model is correctly specified (and as per your description I doubt that it is) no, you should not. Correlation relates to your model specification, which, in turn, should give a fair and true view of the data generating process.
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input byte(PROVIDENCIA RENCA RECOLETA SANJOAQUIN SANRAMON VITACURA ELMONTE ISLADEMAIPO PADREHURTADO SANPEDRO alcoh impr rdef drugs uzone sexm edad) long pobxcom int(numlic invexpn) byte dlight
      0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 1 18  35923  5505  433 1
      0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 1 50  35923  5505  433 1
      0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 1 31  63250  5012 1180 0
      0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 1 48  63250  5012 1180 0
      0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 1 48  63250  5012 1180 0
      0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 35  35923  5505  433 0
      0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 56  35923  5505  433 0
      0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 1 34  35923  5505  433 0
      0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 1 52  35923  5505  433 0
      0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 1 26   9726  7989  667 1
      0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 1 50   9726  7989  667 1
      1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 26 142079 18770 5260 0
      1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 59 142079 18770 5260 1
      1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 26 142079 18770 5260 1
      1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 31 142079 18770 5260 1
      1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 34 142079 18770 5260 1
      1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 53 142079 18770 5260 1
      1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 56 142079 18770 5260 1
      0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 1 50  36219  3250  140 0
      1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 19 142079 18770 5260 0
      1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 53 142079 18770 5260 0
      1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 60 142079 18770 5260 0
      1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 42 142079 18770 5260 0
      1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 28 142079 18770 5260 0
      1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 39 142079 18770 5260 0
      0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 1 28  63250  5012 1180 0
      0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 1 24  63250  5012 1180 1
      0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 1 34  63250  5012 1180 1
      0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 1 37  63250  5012 1180 1
      0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 1 52  63250  5012 1180 0
      1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 30 142079 18770 5260 1
      1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 74 142079 18770 5260 0
      0 1 0 0 0 0 0 0 0 0 0 1 0 0 1 1 24 147151  7172 1096 0
      0 1 0 0 0 0 0 0 0 0 0 1 0 0 1 1 63 147151  7172 1096 0
      0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 0 30  85384 14403 6039 0
      0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 1 28  85384 14403 6039 0
      0 0 1 0 0 0 0 0 0 0 0 1 0 0 1 0 45 157851 10117 1401 1
      0 0 1 0 0 0 0 0 0 0 0 1 0 0 1 1 23 157851 10117 1401 1
      0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 1 37  36219  3250  140 1
      0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 1 49  36219  3250  140 1
      1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 69 142079 18770 5260 1
      1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 44 142079 18770 5260 1
      1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 56 142079 18770 5260 1
      0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 1 55  85384 14403 6039 0
      0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 1 57  85384 14403 6039 0
      0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 46  36219  3250  140 1
      1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 25 142079 18770 5260 1
      1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 49 142079 18770 5260 1
      1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 35 142079 18770 5260 1
      1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 27 142079 18770 5260 1
      1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 35 142079 18770 5260 1
      1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 27 142079 18770 5260 1
      0 0 1 0 0 0 0 0 0 0 0 1 0 0 1 0 29 157851 10117 1401 1
      0 0 1 0 0 0 0 0 0 0 0 1 0 0 1 1 21 157851 10117 1401 1
      0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 1 22  85384 14403 6039 0
      0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 1 62  85384 14403 6039 0
      0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 30  35923  5505  433 1
      0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 48  35923  5505  433 1
      0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 1 36  36219  3250  140 1
      0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 1 42  36219  3250  140 1
      0 1 0 0 0 0 0 0 0 0 0 1 0 0 1 0 43 147151  7172 1096 1
      0 1 0 0 0 0 0 0 0 0 0 1 0 0 1 1 20 147151  7172 1096 1
      0 0 0 0 1 0 0 0 0 0 1 0 0 0 1 1 48  82900  5893 3533 0
      0 0 0 0 0 0 0 0 1 0 1 0 0 0 1 1 25  63250  5012 1180 1
      0 0 0 0 1 0 0 0 0 0 1 0 0 0 1 1 25  82900  5893 3533 1
      0 0 0 0 1 0 0 0 0 0 1 0 0 0 1 1 47  82900  5893 3533 1
      1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 38 142079 18770 5260 1
      1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 55 142079 18770 5260 1
      1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 48 142079 18770 5260 1
      1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 36 142079 18770 5260 1
      1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 49 142079 18770 5260 1
      1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 32 142079 18770 5260 0
      1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 34 142079 18770 5260 0
      0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 1 41  35923  5505  433 0
      0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 1 41  35923  5505  433 0
      1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 53 142079 18770 5260 1
      0 0 1 0 0 0 0 0 0 0 0 1 0 0 1 1 34 157851 10117 1401 1
      0 0 1 0 0 0 0 0 0 0 0 1 0 0 1 1 38 157851 10117 1401 1
      0 0 1 0 0 0 0 0 0 0 0 1 0 0 1 1 54 157851 10117 1401 1
      0 0 0 1 0 0 0 0 0 0 0 1 0 0 1 1 48  94492 12268 1095 1
      0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 1 37  35923  5505  433 1
      0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 1 39  35923  5505  433 1
      0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 1 67  85384 14403 6039 1
      1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 20 142079 18770 5260 1
      1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 32 142079 18770 5260 1
      1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 18 142079 18770 5260 1
      1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 29 142079 18770 5260 1
      0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 1 27 157851 10117 1401 1
      0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 1 55 157851 10117 1401 1
      1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 27 142079 18770 5260 0
      1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 44 142079 18770 5260 0
      0 0 0 0 0 0 0 1 0 0 1 0 0 0 1 0 20  36219  3250  140 0
      0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 1 32  85384 14403 6039 0
      0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 1 40  85384 14403 6039 0
      0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 1 61  85384 14403 6039 0
      0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 1 80  85384 14403 6039 0
      1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 34 142079 18770 5260 1
      1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 46 142079 18770 5260 1
      0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 1 25  35923  5505  433 1
      0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 1 47  35923  5505  433 1
      end
      Thanks for your answer,

      I get the idea that region (number of accidents per regions) is an independent variable, that is my intention.
      I heard from a teacher that I may need to work with poisson, but I don“t know how to do that if I have to use stata, I mean, Can Stata work with poisson?

      I had to delete some regions, I only left 10 of them (and of course I deleted the observations of the another regions), so I could use dataex.

      So regarding my question Can i create a table where I can analyse or show the 10 regions with the 11 factors. How can I do this?

      Thanks

      Comment


      • #4
        Alex:
        thanks for providing a -dataex- example; as a by-product, it creates a .dta file which is useful to perform the analysis.
        However, I cannot find your dependent variable (ie, number of accidents).
        Stata can perform -poisson- regression.
        As far as table creation is concerned, you may want to take a look at the user-written programme -outreg- (type -search outreg- from within Stata to spot and install it).
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Originally posted by Carlo Lazzaro View Post
          Alex:
          thanks for providing a -dataex- example; as a by-product, it creates a .dta file which is useful to perform the analysis.
          However, I cannot find your dependent variable (ie, number of accidents).
          Stata can perform -poisson- regression.
          As far as table creation is concerned, you may want to take a look at the user-written programme -outreg- (type -search outreg- from within Stata to spot and install it).
          Can it be that the number of rows are the dependent variables because they are the number of accidents?
          Let's say that the number of accidents in PROVIDENCIA

          for example, lets say if the coef. of alcoh is positive, and if there is a drunk driver in PROVIDENCIA can increase the number of accidents or if age is positive, a person with 50 years old can increase the number of accidents.

          Comment


          • #6
            Ale:
            you should have a dependent variable to run a regression (in your case the number of accidents).
            Alcohol intake is a reasonable predictor for the number of accidents when adjusted for the remaining independent variable.
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment


            • #7
              Originally posted by Carlo Lazzaro View Post
              Ale:
              you should have a dependent variable to run a regression (in your case the number of accidents).
              Alcohol intake is a reasonable predictor for the number of accidents when adjusted for the remaining independent variable.
              Thanks I've been working with my table, so I've come to this:

              Code:
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input byte idcom int nofac long popxc int(numxc invexc) float(agem alcom imprm rdefm drugsm uzonem sexmm dlightm)
               1 3066 142079 18770 5260 41.42727 .0084801 .9849967 .0022831 .0042401 .9990215 .7462492 .6868885
               2  469 147151  7172 1096 39.44563 .1023454 .8678038  .010661 .0191898 .9872068 .8251599 .6865672
               3  923 157851 10117 1401 40.52113 .0834236 .9100758 .0054171 .0010834 .9989166 .8353196 .5904658
               4  159  94492 12268 1095 40.18868 .0566038 .9119497        0 .0314465        1 .8427673 .6603774
               5  418  82900  5893 3533 39.99282 .1674641 .8253589  .007177        0 .9976076 .8708134 .5645933
               6 1053  85384 14403 6039 41.37702 .0978158 .8964862 .0047483 .0009497  .985755 .6970561 .6087369
               7  294  35923  5505  433 40.71088 .1564626 .8197279        0 .0238095  .707483 .8333333 .6122449
               8  132  36219  3250  140  40.5303 .2727273 .6893939 .0227273 .0151515 .9469697 .8409091 .5454546
               9  488  63250  5012 1180 40.47131 .0553279 .9282787 .0020492 .0143443 .5860656 .8258197 .6393443
              10  135   9726  7989  667 39.93333 .1185185 .7555556 .0592593 .0666667 .0518519 .8296296  .637037
              end

              I started by generating an id per commune

              gen idcom=1 if providencia==1
              replace idcom=2 if renca==1........ I did this until idcom=10 which is sanpedro


              So now I have 10 observations (10 regions)

              then I created a column with the number of accidents per commune nofac (This is now my Dependant Variable)

              the variables population, investment expenditures, number of licences per commune I already had them

              for the age I worked with the mean

              bys idcom: egen agem=mean(age)

              But I don't know how to generate a value per commune for the another dummy variables (alcoh, rdef, drugs,impr, sexm,dlight,uzone)

              I was thinking of generating the values by calculating the percentage of accidents caused by alcoh, rdef, drugs,impr, sexm,dlight,uzone.
              So i calculated the mean of them.

              But this doesn't seem to work because the bigger model (the one with 52 regions) I got p-values extremely big.

              There is a correct way to calculate the values of the dummy variables per commune?

              Thanks

              Comment

              Working...
              X