Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Standard deviation of sub-samples as dependent variable in linear regression

    I intend to run a linear regression to measure the possible influence of the Internet on mass political polarization. The explanatory variable Internet is measured as the Internet resp. broadband penetration rate, i.e., one value per year and country (data obtained from the World Bank). I want to measure the dependent variable political polarization by the standard deviation of the left-right political self-placement of the observations on a scale from 1 – 10 (data obtained from the yearly Eurobarometer survey).

    I would like to do this not only for the full sample, but also for specific subsamples (e.g., people under 30, people born after 1980, the unemployed, the least educated). I wonder if itβ€˜s possible to run these regressions automatically without having to first separately calculate the standard deviation of each country-year for each desired subsample. Because I use 12 – 15 countries and 17 – 22 years, this would take hours. Many thanks in advance.
    Last edited by Stefan Edel; 22 Oct 2017, 23:51.

  • #2
    Why would it take many hours? Here's some code that generates a toy data set for 15 countries and 22 years, with 10,000 observations for each country/year, with 10 population subgroups. To calculate the 10 population subgroup specific standard deviations within each country-year combination took 47.32 seconds on my machine. My machine is not especially fast; it's a mid-level Dell that's about 5 years old now.

    Code:
    clear
    set obs `=15*22'
    gen country_year = _n
    expand 10000
    by country_year, sort: gen subgroup = mod(_n, 10)
    
    
    set seed 1234
    gen x = runiformint(1, 10)
    
    timer clear
    timer on 1
    forvalues i = 0/9 {
        by country_year, sort: egen sd`i' = sd(x) if subgroup == `i'
    }
    timer off 1
    
    timer list 1
    And, in fact, I can think of a couple of ways to do this even faster, but the time programming it would greatly outweigh the execution time savings. So just go and do it. You'll be done in a matter of minutes.

    Comment


    • #3
      Thank you for your answer, I just did not yet find out how to edit the code so that it is applicable to my dataset.

      Comment

      Working...
      X