Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Compare levelsof to original var in if condition

    I am looping over a variable, running a regression, and extracting the coefficient for each level (to plot on a twoway graph). I created a new variable to collapse later (maybe not most efficient but easy) and store the values in there. But I am struggling with the fact that the value in the local of levelsof is different to the level of the variable.

    This MWE does not work but should illustrate what I am attempting to do

    Code:
    sysuse auto.dta, replace
    
    gen corr = .
    
    levelsof foreign, local(foreign)
    foreach f in `foreign' {
        reg price weight
        mat result = e(b)
        replace corr = result[1,1] if foreign == `f'
    }

  • #2
    Your regression is for all available data. Being inside a loop doesn't affect that. I don't know if that answers your question, which I find hard to follow.

    I first wrote levelsof against some scepticism that it was needed or useful. Now the joke's on me as I see many examples in which I wouldn't use it. For your token example I would do this:

    Code:
    . sysuse auto, clear
    (1978 automobile data)
    
    . rangestat (reg) price weight, int(foreign 0 0)
    
    . tabdisp foreign, c(b_weight)
    
    --------------------------------
    Car       |
    origin    |             b_weight
    ----------+---------------------
     Domestic |            2.9948135
      Foreign |            5.3620402
    --------------------------------

    I note that

    1. To be fair, rangestat from SSC wasn't written until after levelsof

    2. What you are picking up after the regression is not the correlation, as your variable name might be thought to imply, but the coefficient of weight.

    Comment


    • #3
      Thank you for the reply. In the real example the variable over which I want to loop is year and there are years. A table might be okay for that, but a twoway graph of the coefficient on y and the year on the x axis will be much better I think. But to graph this I need to store the coefficients somewhere. That's why I wanted to create this variable, call it b_weight, and store the regression coefficient for each year in there. Then I will collapse and plot

      Code:
      collapse (mean) b_weight, by(year)
      twoway connected b_weight year

      On the correlation. I thought the was just a scaling issue but indeed even for rescaled variables the two are different. Interesting

      Code:
      . sysuse auto.dta, replace
      (1978 automobile data)
      
      . 
      . foreach var of varlist price mpg {
        2.     qui sum `var'
        3.         replace `var' = (`var' - `r(min)') / (`r(max)'-`r(min)')
        4. }
      variable price was int now float
      (74 real changes made)
      variable mpg was int now float
      (74 real changes made)
      
      . 
      . corr price mpg
      (obs=74)
      
                   |    price      mpg
      -------------+------------------
             price |   1.0000
               mpg |  -0.4686   1.0000
      
      
      . reg price mpg
      
            Source |       SS           df       MS      Number of obs   =        74
      -------------+----------------------------------   F(1, 72)        =     20.26
             Model |  .876278986         1  .876278986   Prob > F        =    0.0000
          Residual |  3.11437388        72  .043255193   R-squared       =    0.2196
      -------------+----------------------------------   Adj R-squared   =    0.2087
             Total |  3.99065287        73  .054666478   Root MSE        =    .20798
      
      ------------------------------------------------------------------------------
             price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
      -------------+----------------------------------------------------------------
               mpg |  -.5491824   .1220154    -4.50   0.000    -.7924156   -.3059492
             _cons |   .4039103   .0459861     8.78   0.000     .3122386     .495582
      ------------------------------------------------------------------------------

      Comment


      • #4
        This appears to miss a main point of #2, which is that the code I give also puts the regression results into new variables directly, so that you could graph them straight away. I couldn't have used tabdisp otherwise.

        You don't need to
        collapse as something akin to

        Code:
        egen tag = tag(foreign)
        scatter b_weight foreign if tag
        will suffice.


        Your procedure implicitly assumes that range is proportional to SD, which is likely to be roughly correct, but highly unlikely to be exactly correct. If you standardize each variable to (value MINUS mean) / SD you will then find that the correlation and regression coefficients are identical.
        Last edited by Nick Cox; 02 Dec 2021, 07:28.

        Comment

        Working...
        X