Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Store Regression Coefficients by State

    Hello,

    I am working with population data by state and year. I've labeled the population variables blacktotalpop, and whitetotalpop. The variable blacktotalpop has some missing values in the states with smaller populations, so I want to estimate these missing values. To do this, I regress blacktotalpop on whitetotalpop by state, then I want to store these results by state, then plug in the corresponding values of whitetotalpop into the regression equation to estimate values of blacktotalpop. So, I want to store the slope coefficient and intercept of these regressions into new variables, then use these new variables to make my estimation. To do this, I've written this code:

    Code:
    sort id2 year
    
    by id2: reg blacktotalpop whitetotalpop
    
    by id2: gen coeff = _b[whitetotalpop] 
    by id2: gen intercept = _b[_cons]
    
    replace blacktotalpop = coeff*whitetotalpop - intercept if missing(blacktotalpop)
    This code, rather than generating different a different coeff and intercept for each state, generates the same value for every state. How do I generate a different value for these variables by state?

    Thank you in advance for any help!

    Here is an example of my data:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int year byte id2 str20 geography double blacktotalpop long whitetotalpop float(coeff intercept)
    2006  1 "Alabama"      710635  2195747 .017742475 -2582.177
    2007  1 "Alabama"      720302  2202402 .017742475 -2582.177
    2008  1 "Alabama"      726764  2218900 .017742475 -2582.177
    2009  1 "Alabama"      740939  2227745 .017742475 -2582.177
    2010  1 "Alabama"      759609  2242013 .017742475 -2582.177
    2011  1 "Alabama"      775682  2244173 .017742475 -2582.177
    2012  1 "Alabama"      780045  2256486 .017742475 -2582.177
    2013  1 "Alabama"      788353  2261274 .017742475 -2582.177
    2014  1 "Alabama"      803651  2269895 .017742475 -2582.177
    2015  1 "Alabama"      816758  2279702 .017742475 -2582.177
    2016  1 "Alabama"      825982  2285863 .017742475 -2582.177
    2017  1 "Alabama"      833006  2283461 .017742475 -2582.177
    2006  2 "Alaska"        12652   297450 .017742475 -2582.177
    2007  2 "Alaska"        15133   298221 .017742475 -2582.177
    2008  2 "Alaska"        13576   302892 .017742475 -2582.177
    2009  2 "Alaska"        13838   303541 .017742475 -2582.177
    2010  2 "Alaska"        14023   316631 .017742475 -2582.177
    2011  2 "Alaska"        14143   318668 .017742475 -2582.177
    2012  2 "Alaska"        15659   319793 .017742475 -2582.177
    2013  2 "Alaska"        14008   317755 .017742475 -2582.177
    2014  2 "Alaska"        15075   317114 .017742475 -2582.177
    2015  2 "Alaska"        15302   316043 .017742475 -2582.177
    2016  2 "Alaska"        14926   320221 .017742475 -2582.177
    2017  2 "Alaska"        13931   320775 .017742475 -2582.177
    2006  4 "Arizona"      123037  2636542 .017742475 -2582.177
    2007  4 "Arizona"      127450  2694268 .017742475 -2582.177
    2008  4 "Arizona"      137598  2745639 .017742475 -2582.177
    2009  4 "Arizona"      144222  2760906 .017742475 -2582.177
    2010  4 "Arizona"      152212  2722084 .017742475 -2582.177
    2011  4 "Arizona"      152992  2743025 .017742475 -2582.177
    2012  4 "Arizona"      166884  2768669 .017742475 -2582.177
    2013  4 "Arizona"      166546  2790891 .017742475 -2582.177
    2014  4 "Arizona"      172012  2825371 .017742475 -2582.177
    2015  4 "Arizona"      182629  2862388 .017742475 -2582.177
    2016  4 "Arizona"      182951  2904722 .017742475 -2582.177
    2017  4 "Arizona"      192907  2918549 .017742475 -2582.177
    2006  5 "Arkansas"     248308  1479962 .017742475 -2582.177
    2007  5 "Arkansas"     252257  1487709 .017742475 -2582.177
    2008  5 "Arkansas"     251829  1498851 .017742475 -2582.177
    2009  5 "Arkansas"     259727  1509150 .017742475 -2582.177
    2010  5 "Arkansas"     267714  1513457 .017742475 -2582.177
    2011  5 "Arkansas"     271892  1524464 .017742475 -2582.177
    2012  5 "Arkansas"     271996  1524820 .017742475 -2582.177
    2013  5 "Arkansas"     276858  1526124 .017742475 -2582.177
    2014  5 "Arkansas"     276933  1529290 .017742475 -2582.177
    2015  5 "Arkansas"     282880  1538245 .017742475 -2582.177
    2016  5 "Arkansas"     281785  1537148 .017742475 -2582.177
    2017  5 "Arkansas"     278042  1540589 .017742475 -2582.177
    2006  6 "California"  1365596 11252889 .017742475 -2582.177
    2007  6 "California"  1381274 11232296 .017742475 -2582.177
    2008  6 "California"  1394700 11218791 .017742475 -2582.177
    2009  6 "California"  1406765 11250383 .017742475 -2582.177
    2010  6 "California"  1428183 11130961 .017742475 -2582.177
    2011  6 "California"  1439189 11151511 .017742475 -2582.177
    2012  6 "California"  1458497 11183681 .017742475 -2582.177
    2013  6 "California"  1472178 11221820 .017742475 -2582.177
    2014  6 "California"  1484683 11250601 .017742475 -2582.177
    2015  6 "California"  1520866 11281942 .017742475 -2582.177
    2016  6 "California"  1523493 11269651 .017742475 -2582.177
    2017  6 "California"  1553323 11228631 .017742475 -2582.177
    2006  8 "Colorado"     108083  2366177 .017742475 -2582.177
    2007  8 "Colorado"     110070  2423890 .017742475 -2582.177
    2008  8 "Colorado"     113953  2460396 .017742475 -2582.177
    2009  8 "Colorado"     117813  2487371 .017742475 -2582.177
    2010  8 "Colorado"     122003  2500740 .017742475 -2582.177
    2011  8 "Colorado"     129110  2536575 .017742475 -2582.177
    2012  8 "Colorado"     130236  2572176 .017742475 -2582.177
    2013  8 "Colorado"     132359  2614677 .017742475 -2582.177
    2014  8 "Colorado"     133513  2653095 .017742475 -2582.177
    2015  8 "Colorado"     141755  2704318 .017742475 -2582.177
    2016  8 "Colorado"     145505  2755399 .017742475 -2582.177
    2017  8 "Colorado"     146066  2796851 .017742475 -2582.177
    2006  9 "Connecticut"  196621  1848126 .017742475 -2582.177
    2007  9 "Connecticut"  195071  1832304 .017742475 -2582.177
    2008  9 "Connecticut"  200970  1829460 .017742475 -2582.177
    2009  9 "Connecticut"  207780  1821786 .017742475 -2582.177
    2010  9 "Connecticut"  216746  1841062 .017742475 -2582.177
    2011  9 "Connecticut"  220634  1839505 .017742475 -2582.177
    2012  9 "Connecticut"  222724  1834631 .017742475 -2582.177
    2013  9 "Connecticut"  229629  1825161 .017742475 -2582.177
    2014  9 "Connecticut"  231444  1807653 .017742475 -2582.177
    2015  9 "Connecticut"  240257  1793942 .017742475 -2582.177
    2016  9 "Connecticut"  241134  1778311 .017742475 -2582.177
    2017  9 "Connecticut"  245862  1772594 .017742475 -2582.177
    2006 10 "Delaware"     104931   412821 .017742475 -2582.177
    2007 10 "Delaware"     105289   416409 .017742475 -2582.177
    2008 10 "Delaware"     108022   420271 .017742475 -2582.177
    2009 10 "Delaware"     112743   426060 .017742475 -2582.177
    2010 10 "Delaware"     112889   424379 .017742475 -2582.177
    2011 10 "Delaware"     118294   427326 .017742475 -2582.177
    2012 10 "Delaware"     121408   428493 .017742475 -2582.177
    2013 10 "Delaware"     123883   434756 .017742475 -2582.177
    2014 10 "Delaware"     128116   440632 .017742475 -2582.177
    2015 10 "Delaware"     128898   445757 .017742475 -2582.177
    2016 10 "Delaware"     132020   449466 .017742475 -2582.177
    2017 10 "Delaware"     137339   452331 .017742475 -2582.177
    2006 12 "Florida"     1590956  8183958 .017742475 -2582.177
    2007 12 "Florida"     1623115  8196656 .017742475 -2582.177
    2008 12 "Florida"     1665996  8204879 .017742475 -2582.177
    2009 12 "Florida"     1704707  8189165 .017742475 -2582.177
    end



  • #2
    Edit: In the original post I posted the formula as
    Code:
     blacktotalpop = coeff*whitetotalpop - intercept
    but it should be
    Code:
    blacktotalpop = coeff*whitetotalpop + intercept
    .

    Comment


    • #3
      Jake:
      set aside the regression issue, I do niot think that the way you're planning to replace missing values is correct (see https://uk.sagepub.com/en-gb/eur/missing-data/book9419 pages 9-12).
      Are you sure that you cannot replace the missing values for population via official sources?
      As an aside, take a look at -fvvarlist- for creating categorical variables and interactions.
      Kind regards,
      Carlo
      (Stata 18.0 SE)

      Comment


      • #4
        #1 is unfortunately wishful thinking. The code can be simplified to

        Code:
        bysort id2: reg blacktotalpop whitetotalpop
        by id2: gen coeff = _b[whitetotalpop]  
        by id2: gen intercept = _b[_cons]
        replace blacktotalpop = coeff*whitetotalpop + intercept if missing(blacktotalpop)
        as the order within id2 is immaterial to the regression, but that is not the issue.

        The first command will run separate regressions, but afterwards only one set of coefficients is in memory, from the last of those regressions. There is no sense in which
        regress results include several intercepts (for example) within
        Code:
        _b[_cons]
        .

        Check again: you should see that your coefficients match those from the last regression done.


        rangestat (SSC) is one way to get what you want, although I am sceptical without knowing your data on whether your regression is a good solution.

        I would be more willing to interpolate missing values.

        Code:
        rangestat (reg) blacktotalpop whitetotalpop, int(id2 0 0)  
        gen predicted = b_cons + b_whitetotalpop * whitetotalpop
        Last edited by Nick Cox; 03 Jul 2019, 02:23.

        Comment


        • #5
          Here is one more solution using asreg (from SSC)
          Code:
          ssc install asreg
          
          . bys id2: asreg blacktotalpop whitetotalpop, fit
          
          
          . list _* in 1/10
          
               +--------------------------------------------------------------------------------+
               | _Nobs         _R2     _adjR2   _b_whit~p      _b_cons     _fitted   _residuals |
               |--------------------------------------------------------------------------------|
            1. |    12   .97646618   .9741128   1.3465798   -2252698.4   704050.09    6584.9064 |
            2. |    12   .97646618   .9741128   1.3465798   -2252698.4   713011.58    7290.4179 |
            3. |    12   .97646618   .9741128   1.3465798   -2252698.4   735227.46   -8463.4555 |
            4. |    12   .97646618   .9741128   1.3465798   -2252698.4   747137.95   -6198.9537 |
            5. |    12   .97646618   .9741128   1.3465798   -2252698.4   766350.95   -6741.9542 |
               |--------------------------------------------------------------------------------|
            6. |    12   .97646618   .9741128   1.3465798   -2252698.4   769259.57    6422.4335 |
            7. |    12   .97646618   .9741128   1.3465798   -2252698.4      785840   -5795.0035 |
            8. |    12   .97646618   .9741128   1.3465798   -2252698.4   792287.43   -3934.4276 |
            9. |    12   .97646618   .9741128   1.3465798   -2252698.4   803896.29   -245.29194 |
           10. |    12   .97646618   .9741128   1.3465798   -2252698.4    817102.2   -344.19995 |
               +--------------------------------------------------------------------------------+
          Regards
          --------------------------------------------------
          Attaullah Shah, PhD.
          Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
          FinTechProfessor.com
          https://asdocx.com
          Check out my asdoc program, which sends outputs to MS Word.
          For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.

          Comment


          • #6
            Thank you all very much for the helpful responses. After reading your comments, I've changed the method to deal with these missing values. Unfortunately, I can't find the data from official sources, because I'm looking for college and non-college educated population totals. This data is available from the ACS in the US Census, but it's missing a couple of the totals for the small states, because the census can't provide sound estimations of such a small population.

            The final variable I am trying to produce is a ratio of non-college educated Black population to the non-college educated White population, called the Education Index. So, rather than estimate the blacktotalpop, I've decided to interpolate the missing values of the Education Index for each state, as you suggested. So, I'm regressing the Education Index on year, and still storing regression results by state, but this time using a regression with hopefully more sound results. Please let me know if you see any potential problems with these new estimations, and thank you again for all of your help!

            Comment

            Working...
            X