Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Finding Slope of Linear Regression: Battle-Related Deaths & Year

    Hello, I’m trying to find the slope of the regression line of the three graphs attached to this post.
    For each graph, I want to make a statement similar to: "From 2010-2019, the average number of battle deaths in the region increased by 25 deaths."

    Dependent Variable: Number of Battle-Related Deaths
    Time Span: 2010-2019
    Countries: 10

    Countries in graph labeled: "Figure #: Battle-Related Deaths in the Sahel": All 10 Countries
    Countries in graph labeled "Figure 2: Battle-Related Deaths in the G5 Sahel Region": Chad, Mali, Niger, Burkina Faso, Mauritania
    Countries in graph labeled "Figure #: Battle-Related Deaths in Non-G5S": Algeria, Senegal, Chad, Nigeria, Eritrea


    Panel DataSet
    country year battledeaths
    Mali 2010 28
    Mali 2011 18
    Mali 2012 218
    Mali 2013 805
    Mali 2014 189
    Mali 2015 172
    Mali 2016 96
    Mali 2017 359
    Mali 2018 522
    Mali 2019 603
    Senegal 2010
    Senegal 2011 25
    Senegal 2012
    Senegal 2013
    Senegal 2014
    Senegal 2015
    Senegal 2016
    Senegal 2017
    Senegal 2018
    Senegal 2019
    Sudan 2010 1054
    Sudan 2011 1404
    Sudan 2012 1411
    Sudan 2013 593
    Sudan 2014 849
    Sudan 2015 1264
    Sudan 2016 1309
    Sudan 2017 160
    Sudan 2018 243
    Sudan 2019
    Nigeria 2010
    Nigeria 2011 324
    Nigeria 2012 811
    Nigeria 2013 1629
    Nigeria 2014 3811
    Nigeria 2015 4493
    Nigeria 2016 2488
    Nigeria 2017 1879
    Nigeria 2018 1173
    Nigeria 2019 1327
    Burkina Faso 2010
    Burkina Faso 2011
    Burkina Faso 2012
    Burkina Faso 2013
    Burkina Faso 2014
    Burkina Faso 2015
    Burkina Faso 2016
    Burkina Faso 2017
    Burkina Faso 2018 79
    Burkina Faso 2019 344
    Niger 2010
    Niger 2011
    Niger 2012
    Niger 2013 31
    Niger 2014
    Niger 2015 368
    Niger 2016 274
    Niger 2017 229
    Niger 2018 74
    Niger 2019 298
    Mauritania 2010 1
    Mauritania 2011 8
    Mauritania 2012
    Mauritania 2013
    Mauritania 2014
    Mauritania 2015
    Mauritania 2016
    Mauritania 2017
    Mauritania 2018
    Mauritania 2019
    Algeria 2010 236
    Algeria 2011 267
    Algeria 2012 256
    Algeria 2013 147
    Algeria 2014 107
    Algeria 2015 110
    Algeria 2016 86
    Algeria 2017 60
    Algeria 2018 33
    Algeria 2019
    Chad 2010 4
    Chad 2011
    Chad 2012
    Chad 2013
    Chad 2014
    Chad 2015 275
    Chad 2016
    Chad 2017 57
    Chad 2018 150
    Chad 2019 184
    Eritrea 2010
    Eritrea 2011
    Eritrea 2012
    Eritrea 2013
    Eritrea 2014
    Eritrea 2015
    Eritrea 2016 25
    Eritrea 2017
    Eritrea 2018
    Eritrea 2019
    For the first graph, can this be accomplished by running a simple "regress battledeaths year"? I'm assuming my "year" variable is a categorical variable, so I'm not sure if the coefficient that STATA provides is biased.
    Attached Files
    Last edited by Doug Kalagian; 03 Feb 2021, 10:42.

  • #2
    First let's be clear what you actually want.

    I’m trying to find the slope of the regression line of the three graphs attached to this post.
    For each graph, I want to make a statement similar to: "From 2010-2019, the average number of battle deaths in the region increased by 25 deaths."
    Those two sentences contradict each other. Which is it?

    The first requires fitting the regressions and finding the slopes of the regression lines. It would also require reporting the result as number of deaths per year, not as a number of deaths.

    The second requires ascertaining the number of battle deaths in the region in 2010, and again in 2019 and subtracting. That's different and requires no regression at all.

    On the assumption that you actually want the regression slopes, what you need to do here is to treat year as a continuous variable, not a discrete one. so -regress battledeaths year- will be fine, and the coefficient in the year row of the regression table Stata creates will be what you are looking for.

    Comment


    • #3
      You are right! My mistake. I meant to say: For each graph, I want to make a statement similar to: "From 2010-2019, the average number of battle deaths in the region increased by 25 deaths per year."

      In the case of treating 'year' as a continuous variable, this was my initial intuition, but here is where my confusion arose.

      For Graph 2 (Chad, Mali, Niger, Burkina Faso,) my coefficient is 25.16
      For Graph 3 (Algeria, Senegal, Chad, Nigeria, Eritrea), my coefficient is 31.32
      For Graph 1, when I combine all 10 countries, my coefficient is 6.95

      I would expect that by combining all 10 countries, the coefficient would be somewhere between 25-31 battle-related deaths per year. Is my thinking here off?
      Attached Files

      Comment


      • #4
        I would expect that by combining all 10 countries, the coefficient would be somewhere between 25-31 battle-related deaths per year. Is my thinking here off?
        Yes, your thinking is off here. This is a common fallacy, and it is generally not taught about in statistics courses.

        Notice that the vertical axes in figures 2 and 3 are on very different scales.

        Try re-doing figure 1, where everything is on the same scale, using a different symbol or color for the two different groups of countries and you will see with your eyes what is going on here.

        In any case, you should discard the intuition that if you put two subsets together for a regression that the combined regression slope will be some kind of (possibly weighted) average of the slopes in the separate subsets. It's only true in special circumstances.

        Comment


        • #5
          Ahh!!! Thank you!! This was extremely difficult for me to find the answer to this question. Sending some good Karma your way! ⭐⭐⭐⭐⭐

          Comment


          • #6
            Note that linear regression may be what is needed (instructed???) here but the application cries out for Poisson regression if a simple trend makes any sense at all. Most crucially a straight line is qualitatively wrong as implying zero deaths at some recent date and resurrection at any point earlier.

            Comment


            • #7
              Another issue that you might want to addess is that you do not seem to be allowing any country to have zero -battledeaths- in a year. You are treating the absence of reported battle deaths as missing data when in fact, for many of these country-years, the absence of data may indicate an absence of battle deaths. Should you convert all the missing value to zero before running your regressions? Or do you have a way of distinguishing which country years are likely to be zero in reality and which are just unknown (and thus should rightfully be coded as missing)?

              If in fact many years are truly zero, then you might want to separately model whether a country has a non-zero number of deaths (using a simple regression or logit or probit) and the number of deaths, given that it has some non-zero number of deaths. The first of these two sugggested regresssions might turn out to be most useful, since your slopes could then be interpreted as the increase in the probability of battle deaths per year. You could then distinguish the change in the probability of battledeaths per year between Sahel and non-Sahel countries. More sophisticated "hurdle" models are also possible, but might not add much.

              As a fomer Peace Corps Volunteer to Burkina Faso, the subject of your research is of particular interest to me. Best of luck with your analysis.

              Comment


              • #8
                Originally posted by Mead Over View Post
                Another issue that you might want to addess is that you do not seem to be allowing any country to have zero -battledeaths- in a year. You are treating the absence of reported battle deaths as missing data when in fact, for many of these country-years, the absence of data may indicate an absence of battle deaths. Should you convert all the missing value to zero before running your regressions? Or do you have a way of distinguishing which country years are likely to be zero in reality and which are just unknown (and thus should rightfully be coded as missing)?

                If in fact many years are truly zero, then you might want to separately model whether a country has a non-zero number of deaths (using a simple regression or logit or probit) and the number of deaths, given that it has some non-zero number of deaths. The first of these two suggested regresssions might turn out to be most useful, since your slopes could then be interpreted as the increase in the probability of battle deaths per year. You could then distinguish the change in the probability of battledeaths per year between Sahel and non-Sahel countries. More sophisticated "hurdle" models are also possible, but might not add much.

                As a fomer Peace Corps Volunteer to Burkina Faso, the subject of your research is of particular interest to me. Best of luck with your analysis.
                Hi Mead, I am a former Peace Corps Volunteer as well! I was placed in China back in 2013. I'm sure you have countless interesting stories from this time of your life

                Thank you for your feedback. This particular dependent variable is the best estimate of the number of battle deaths in a given year (as explained in the UCDP codebook). While it may be true that there were no battle deaths in a given year, the data is provided as missing, without a 0, because it would be inaccurate to say there are 0 deaths when more deaths may have occurred and were not reported. This, at least, is my understanding of the situation.

                Comment

                Working...
                X