Standard deviation of sub-samples as dependent variable in linear regression

Stefan Edel

Join Date: Oct 2017

Posts: 4
#1

Standard deviation of sub-samples as dependent variable in linear regression

22 Oct 2017, 23:49

I intend to run a linear regression to measure the possible influence of the Internet on mass political polarization. The explanatory variable Internet is measured as the Internet resp. broadband penetration rate, i.e., one value per year and country (data obtained from the World Bank). I want to measure the dependent variable political polarization by the standard deviation of the left-right political self-placement of the observations on a scale from 1 – 10 (data obtained from the yearly Eurobarometer survey).

I would like to do this not only for the full sample, but also for specific subsamples (e.g., people under 30, people born after 1980, the unemployed, the least educated). I wonder if it‘s possible to run these regressions automatically without having to first separately calculate the standard deviation of each country-year for each desired subsample. Because I use 12 – 15 countries and 17 – 22 years, this would take hours. Many thanks in advance.

Last edited by Stefan Edel; 22 Oct 2017, 23:51.
Tags: linear regression, regression, standard deviation, sub-sample, subsample
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

23 Oct 2017, 00:09

Why would it take many hours? Here's some code that generates a toy data set for 15 countries and 22 years, with 10,000 observations for each country/year, with 10 population subgroups. To calculate the 10 population subgroup specific standard deviations within each country-year combination took 47.32 seconds on my machine. My machine is not especially fast; it's a mid-level Dell that's about 5 years old now.

Code:

clear set obs `=15*22' gen country_year = _n expand 10000 by country_year, sort: gen subgroup = mod(_n, 10) set seed 1234 gen x = runiformint(1, 10) timer clear timer on 1 forvalues i = 0/9 { by country_year, sort: egen sd`i' = sd(x) if subgroup == `i' } timer off 1 timer list 1

And, in fact, I can think of a couple of ways to do this even faster, but the time programming it would greatly outweigh the execution time savings. So just go and do it. You'll be done in a matter of minutes.
1 like
Comment
Stefan Edel

Join Date: Oct 2017

Posts: 4
#3

24 Oct 2017, 15:30

Thank you for your answer, I just did not yet find out how to edit the code so that it is applicable to my dataset.
Comment

Announcement

Standard deviation of sub-samples as dependent variable in linear regression

Comment

Comment