svy: reg get standardized coefficients for both continuous and categorical predictors

Yingyi Lin

Join Date: Nov 2017

Posts: 68
#1

svy: reg get standardized coefficients for both continuous and categorical predictors

02 Jan 2022, 07:00

Hi Statalist,

I came across an issue for getting standardized coefficients for svy: reg, when there are both continuous and categorical predictors.

For continuous predictors, I used to just standardize both y and x before regress, which works. Alternatively, there is this post here showing how to do that if all predictors are continuous: https://www.statalist.org/forums/for...-weighted-data

However, it become tricky when I have categorical variables. I am not so sure how to standardize a categorical/binary variable (I tried but the svy: reg failed..).

There is also this post here showing how to get the standardized coefficient when there is *one* 0/1 binary variable (example: the foreign variable in the "sysuse auto" dataset ) (https://www.statalist.org/forums/for...-weighted-data). I also wonder if this strategy would work when the predictor is, let's say, i.rep78 (a categorical variable)

I wonder how to combine the solutions for both posts? The example regression model that I am interested at is:

Code:

sysuse auto, clear svyset turn [pw = price] svy: reg mpg turn length weight i.foregin i.rep78

Thanks in advance for any thoughts or suggestions!

Yingyi
Tags: None

Yingyi Lin

Join Date: Nov 2017
Posts: 68

05 Jan 2022, 12:05

Hi Statalist,

I tried to figure this out but still could not find a solution. I tried to modify the scripts from this post (from Steve Samuels: https://www.statalist.org/forums/for...eighted-data):

Code:

 
 sysuse auto, clear local y mpg   /* outcome */ local xvars turn length weight /* predictors */  svyset turn [pw = price] /* Get coefficients */   svy: regress `y' `xvars'     matrix b = e(b)       /* Get SDs of y and predictors */     svy: mean `y'     estat sd     matrix sy = r(sd)      svy: mean `xvars'     estat sd     matrix sx = r(sd)      /*Compute standardized coefficients */     mata:      sy = st_matrix("sy")      sx = st_matrix("sx")'      b = st_matrix("b")      bx = b[1 ,1..(cols(b)-1)]'      st_matrix("betas",(sx:/sy):*bx)     end   matrix rownames betas = `xvars' matrix list betas

I added another local variable for categorical predictors, but I started to get error message when calculating SD:

Code:

sysuse auto, clear
local y mpg   /* outcome */
local xvars turn length weight /* predictors */
local xvarscat i.foreign i.rep78 /*categorical predictors*/
svyset turn [pw = price]
/* Get coefficients */
 svy: regress `y' `xvars' `xvarscat'
    matrix b = e(b)
    matrix list b
    
     /* Get SDs of y and predictors */
    svy: mean `y'
    estat sd
    matrix sy = r(sd)

    svy: mean `xvars'
    estat sd
    matrix sx = r(sd)
    
    svy: mean `xvarscat'
    estat sd
    matrix sxcat = 1

Here is the error message:
. svy: mean `xvarscat'
(running mean on estimation sample)
factor-variable and time-series operators not allowed
r(101);

I wonder if anyone has any insights towards this question? I am very much appreciated for any directions! Thanks again!

Yingyi

Comment

David Radwin

Join Date: Mar 2014
Posts: 369

07 Jan 2022, 11:03

I'm not sure why this example isn't working for you, but it works for me:

Code:

. sysuse auto, clear
(1978 automobile data)

. local xvarscat i.foreign i.rep78 /*categorical predictors*/

. svyset turn [pw = price]

Sampling weights: price
             VCE: linearized
     Single unit: missing
        Strata 1: <one>
 Sampling unit 1: turn
           FPC 1: <zero>

. svy: mean `xvarscat'
(running mean on estimation sample)

Survey: Mean estimation

Number of strata =  1                Number of obs   =      69
Number of PSUs   = 18                Population size = 424,077
                                     Design df       =      17

--------------------------------------------------------------
             |             Linearized
             |       Mean   std. err.     [95% conf. interval]
-------------+------------------------------------------------
     foreign |
   Domestic  |   .6994107    .121862      .4423043    .9565171
    Foreign  |   .3005893    .121862      .0434829    .5576957
             |
       rep78 |
          1  |   .0215268    .014786      -.009669    .0527225
          2  |   .1125763   .0469625      .0134941    .2116584
          3  |    .454816   .0645902      .3185425    .5910895
          4  |   .2577056   .0536099      .1445986    .3708125
          5  |   .1533754   .0775002     -.0101357    .3168866
--------------------------------------------------------------

David Radwin
Senior Researcher, California Competes
californiacompetes.org
Pronouns: He/Him

Comment

Yingyi Lin

Join Date: Nov 2017

Posts: 68
#4

10 Feb 2022, 11:47

Hi David, thank you very much for your input here. I really appreciate it!

I was looking for a way to get standardized coefficients and se for each predictor (including both continuous and categorical predictors) using svy: regress.

I could not find a way to automate what I want. I ended up standardizing every individual predictor manually. For categorical variable, I created dummies for each level of the variable. (That said, the results are npot perfect because I did not know how to standardize the intercept...)

I really wish that svy: regress could encompass the ", beta" option like we have in regular regress statement.
Comment
David Radwin

Join Date: Mar 2014

Posts: 369
#5

10 Feb 2022, 16:24

I think I understand. If you just want the standardized coefficients and don't care about standard errors, or if you want to check your manual calculations, you could use aweights like

Code:

regress `y' `xvars' `xvarscat' [aweight=price] , beta

David Radwin
Senior Researcher, California Competes
californiacompetes.org
Pronouns: He/Him
1 like
Comment
Yingyi Lin

Join Date: Nov 2017

Posts: 68
#6

18 Mar 2022, 18:56

Originally posted by David Radwin View Post

I think I understand. If you just want the standardized coefficients and don't care about standard errors, or if you want to check your manual calculations, you could use aweights like

Code:

regress `y' `xvars' `xvarscat' [aweight=price] , beta

Hi David, thank you so much for the follow up! The [aweight = ] option seems the way to go to solve my puzzle. I wonder if this could further encompass sampling strata? For instance, this is how I set up sample weights:
svyset turn [pw = price], strata(rep78)

Code:

sysuse auto, clear local y mpg /* outcome */ local xvars length weight /* predictors */ local xvarscat i.foreign i.rep78 /*categorical predictors*/ svyset turn [pw = price], strata(rep78) svy: mean `xvarscat' regress `y' `xvars' `xvarscat' [aweight=price] , beta

For the last line of scripts, how can I encompass the stratum variable?

Thanks again!
Comment
David Radwin

Join Date: Mar 2014

Posts: 369
#7

21 Mar 2022, 11:10

How to properly incorporate sampling strata is a question that is probably best answered by the creator or distributor of the dataset. I'm sorry I can't help with that topic.

David Radwin
Senior Researcher, California Competes
californiacompetes.org
Pronouns: He/Him
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5025
#8

21 Mar 2022, 11:53

David says his solution works so long as you don't care about the standard errors, i.e. you only want the point estimates. If you only want the point estimates, things like sampling strata do not matter (they are just used to adjust the standard errors, not the point estimates].

Also, somebody can correct me if I'm wrong, but wouldn't the t-values for the standardized coefficients be the same as the t-values you got by using svy: reg? That is, if you add the command to your do-file

svy: regress `y' `xvars' `xvarscat'

Aren't the t-values for each variable provided by it the correct t-values for where you instead use aweights? (I don't know that for a fact, but it seems logical to me. At least it seems logical if David's statement about the use of aweights is correct -- off the top of my head, I don't know if it is or isn't.)

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
Comment

David Radwin

Join Date: Mar 2014
Posts: 369

21 Mar 2022, 12:16

If I understand your (Richard Williams') question correctly, I am afraid not. Moreover, the difference in t-values between the two methods is not trivial.

To illustrate, in the following example based on the original post, using svy: reg, the t-value for length is -1.69, whereas in the same regression model using reg with aweights, the t-value for length is -2.52. Setting aside for a moment the important debate over the use of null hypothesis statistical tests, under the conventional p < .05 standard and 2-tailed test, the first result is statistically significant and the second is not.

Code:

. sysuse auto, clear
(1978 automobile data)

. svyset turn [pw = price]

Sampling weights: price
             VCE: linearized
     Single unit: missing
        Strata 1: <one>
 Sampling unit 1: turn
           FPC 1: <zero>

. svy: reg mpg turn length weight i.foreign i.rep78
(running regress on estimation sample)

Survey: Linear regression

Number of strata =  1                                Number of obs   =      69
Number of PSUs   = 18                                Population size = 424,077
                                                     Design df       =      17
                                                     F(8, 10)        =   24.58
                                                     Prob > F        =  0.0000
                                                     R-squared       =  0.7003

------------------------------------------------------------------------------
             |             Linearized
         mpg | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
        turn |  -.0021137   .1683286    -0.01   0.990    -.3572561    .3530286
      length |  -.1510464   .0894002    -1.69   0.109    -.3396644    .0375716
      weight |  -.0021393   .0019522    -1.10   0.288    -.0062581    .0019795
             |
     foreign |
    Foreign  |  -3.501613   1.348396    -2.60   0.019    -6.346481   -.6567456
             |
       rep78 |
          2  |  -.3229413   1.215734    -0.27   0.794    -2.887916    2.242034
          3  |   .2452496   .9106538     0.27   0.791    -1.676062    2.166561
          4  |   1.687671   1.158219     1.46   0.163    -.7559587      4.1313
          5  |   3.532869   1.864861     1.89   0.075    -.4016428    7.467381
             |
       _cons |   56.19769   8.184245     6.87   0.000     38.93044    73.46494
------------------------------------------------------------------------------

. reg mpg turn length weight i.foreign i.rep78 [aweight=price]
(sum of wgt is 424,077)

      Source |       SS           df       MS      Number of obs   =        69
-------------+----------------------------------   F(8, 60)        =     17.53
       Model |  1575.88882         8  196.986103   Prob > F        =    0.0000
    Residual |   674.31245        60  11.2385408   R-squared       =    0.7003
-------------+----------------------------------   Adj R-squared   =    0.6604
       Total |  2250.20127        68  33.0911952   Root MSE        =    3.3524

------------------------------------------------------------------------------
         mpg | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
        turn |  -.0021137    .218006    -0.01   0.992    -.4381906    .4339632
      length |  -.1510464   .0600039    -2.52   0.015     -.271072   -.0310208
      weight |  -.0021393   .0016669    -1.28   0.204    -.0054736     .001195
             |
     foreign |
    Foreign  |  -3.501613   1.460497    -2.40   0.020    -6.423042   -.5801845
             |
       rep78 |
          2  |  -.3229413   3.019317    -0.11   0.915    -6.362475    5.716592
          3  |   .2452496   2.842812     0.09   0.932    -5.441222    5.931721
          4  |   1.687671   2.961134     0.57   0.571     -4.23548    7.610821
          5  |   3.532869   3.151439     1.12   0.267    -2.770947    9.836686
             |
       _cons |   56.19769   8.248686     6.81   0.000     39.69786    72.69752
------------------------------------------------------------------------------

David Radwin
Senior Researcher, California Competes
californiacompetes.org
Pronouns: He/Him

Comment

Richard Williams

Join Date: Apr 2014
Posts: 5025

#10

21 Mar 2022, 13:14

Well, now my intuition is getting confused. ;-) But remember, svy uses pweights, and David's regression example was using aweights. pweights is basically aweights + vce(robust). Further, the beta option DOES work with pweights. So,

Code:

. svy, vce(robust): reg mpg turn length weight i.foreign i.rep78
(running regress on estimation sample)

Survey: Linear regression

Number of strata =  1                                Number of obs   =      69
Number of PSUs   = 18                                Population size = 424,077
                                                     Design df       =      17
                                                     F(8, 10)        =   24.58
                                                     Prob > F        =  0.0000
                                                     R-squared       =  0.7003

------------------------------------------------------------------------------
             |             Linearized
         mpg | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
        turn |  -.0021137   .1683286    -0.01   0.990    -.3572561    .3530286
      length |  -.1510464   .0894002    -1.69   0.109    -.3396644    .0375716
      weight |  -.0021393   .0019522    -1.10   0.288    -.0062581    .0019795
             |
     foreign |
    Foreign  |  -3.501613   1.348396    -2.60   0.019    -6.346481   -.6567456
             |
       rep78 |
          2  |  -.3229413   1.215734    -0.27   0.794    -2.887916    2.242034
          3  |   .2452496   .9106538     0.27   0.791    -1.676062    2.166561
          4  |   1.687671   1.158219     1.46   0.163    -.7559587      4.1313
          5  |   3.532869   1.864861     1.89   0.075    -.4016428    7.467381
             |
       _cons |   56.19769   8.184245     6.87   0.000     38.93044    73.46494
------------------------------------------------------------------------------

. reg mpg turn length weight i.foreign i.rep78 [pweight=price], beta
(sum of wgt is 424,077)

Linear regression                               Number of obs     =         69
                                                F(8, 60)          =      20.51
                                                Prob > F          =     0.0000
                                                R-squared         =     0.7003
                                                Root MSE          =     3.3524

------------------------------------------------------------------------------
             |               Robust
         mpg | Coefficient  std. err.      t    P>|t|                     Beta
-------------+----------------------------------------------------------------
        turn |  -.0021137   .1988278    -0.01   0.992                -.0016806
      length |  -.1510464   .0862039    -1.75   0.085                -.5974503
      weight |  -.0021393   .0021876    -0.98   0.332                -.3142225
             |
     foreign |
    Foreign  |  -3.501613   1.105775    -3.17   0.002                 -.281148
             |
       rep78 |
          2  |  -.3229413   1.245539    -0.26   0.796                -.0178742
          3  |   .2452496   .8204691     0.30   0.766                 .0213851
          4  |   1.687671   1.190294     1.42   0.161                 .1292563
          5  |   3.532869   2.036567     1.73   0.088                 .2229281
             |
       _cons |   56.19769   8.345655     6.73   0.000                        .
------------------------------------------------------------------------------

So. the t-values are not identical across approaches, like I thought they would be, but they also aren't as far apart as David's example had indicated.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/

Comment

Federico Tedeschi

Join Date: Mar 2015

Posts: 137
#11

19 Jul 2022, 04:48

the t-values are not identical across approaches

My guess is: with "svy", not only weights are used, but standard errors are calculated by considerinng clustering according to a declared variable. In this case, we are comparing two regressions that differ because, in the first case, there's a clustering wrt the variable "turn", while in the second case there's no clustering. Is my conjecture correct? In case, are there solutions to this issue? I see that clustering of errors may not be combined with the "beta" option. If I understand it correctly, this is because looking for beta coefficients when observartions are not i.i.d. is questionable: https://stackoverflow.com/questions/...d-coefficients
If however my coefficient has a clear interpretation (a difference between the means of two groups), I find it quite natural to be willing to find a standardized mean difference/effect size.
Comment
Federico Tedeschi

Join Date: Mar 2015

Posts: 137
#12

19 Jul 2022, 08:32

Originally posted by Federico Tedeschi View Post

If however my coefficient has a clear interpretation (a difference between the means of two groups), I find it quite natural to be willing to find a standardized mean difference/effect size.

On second thought, the solution in this case is easier, since it's enough to properly standardize the outcome in the sample used for regression (in this case, for example, if we were interested in the difference between foreign and national car origin:

Code:

center mpg [pw=price] if foreign!=., standardize svy: reg c_mpg i.foreign

)
Comment

Announcement