Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generating mean of a variable (e.g. population density) for each observation, when each observation spans multiple states/provinces.

    Hi all,

    I am trying to create a mean population density variable for every observation in my dataset. Each observation has this basic structure:
    Observation No. Province/State Return on Investment mean population density x3 (some other variable)
    1 Delhi, Bengal 5.5 ?
    2 Tamil Nadu, Kerala 7.9 ?
    3 Kerala 6.8 ?
    Therefore, any particular observation can span multiple provinces/states. I have population density data by state. I want to generate a "mean population density" variable for each observation. For example, the mean population density for Observation 1 will be the mean of the population densities in Delhi and Bengal.

    How do I get Stata to go through every observation and perform the mean generating operation? It seems like I may have to use a loop of some kind, though I am not sure what's the easiest way.

    Thank you!


  • #2
    I suggest calculating total population / total area, if only because mean population density is all too likely to be inflated by outliers.

    Comment


    • #3
      For example, the mean population density for Observation 1 will be the mean of the population densities in Delhi and Bengal.
      Well, you need to have some other data that gives you the population density in each state. Probably that will be in a different data set. But without that, you can't proceed.

      It seems like I may have to use a loop of some kind,
      Actually, probably not. Unlike many other systems for data management, loops are of limited use in Stata. The -by- prefix, along with the -gen- and -egen- commands and their associated functions do most of this kind of thing without loops.

      For specific help you need to, first, import your data into Stata. Then use the -dataex- command to produce an example of your data set that you can post here. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

      You also need to find a data source for the population densities of the individual provinces and states, import that into Stata, and again use -dataex- to post an example of that here. Then it may be possible for somebody to offer you a solution that will actually work for you.

      Let me add one comment: are you sure you want to calculate the mean population density by doing a simple average of the densities in each state? It's possible that for your purposes, whatever they may be, this really makes sense. But in most contexts, if you had an observation with two states, with densities 2 and 4, but the former has a population of 20M and the latter has a population of 5M, you would want to take a population weighted average, not a simple average, since the combined density for the area would be dominated by the density of the larger state. Think it over.

      Added: Crossed with #2 which makes the same point I made in my last paragraph.

      Comment


      • #4
        See also https://www.journals.uchicago.edu/do...10.1086/284994 on population density.

        Comment

        Working...
        X