Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Extracting dummy regression coefficients using statsby

    I have the dataset shared further below. I need to: a) run several regressions whereby a continuous variable is projected on one or two sets of dummy variables and nothing else; b) extract the coefficients on the dummy variables from each of these regressions and; c) merge the extracted coefficients from each regression and add them as a variable to my original datase (i.e. the one on which I ran the regressions).

    I thought I could do this by running statsby and then extracting and merging in the coefficients.

    However, I am facing issues. So, if I run - reg curr ib2.geo_code, baselevels - I get the below results, which contains the dummy coefficients I am after for this model.
    Click image for larger version

Name:	regde.JPG
Views:	1
Size:	39.7 KB
ID:	1519471



    However, if I run statsby, the coefficients on each 'geo_code' variable come out as zero.
    Click image for larger version

Name:	regde2.JPG
Views:	1
Size:	28.8 KB
ID:	1519472




    Perhaps I am missing something, likely related to the 'by(group)' option in statsby that is causing this.

    Also, I am able to get what I want using the following code, with the caveat that the coefficient on the base group (i.e. PAC) should be zero and in this case is not (it's close, but not exactly zero). I feel there is a better and more accurate way to do what I am after.

    Code:
    qui reg curr ib2.geo_code, baselevels
    local intercept=_b[_cons]
    
    statsby _b[_cons], saving(tempreg.dta, replace) by(geo_code): reg curr ib2.geo_code, baselevels
    
    merge m:1 geo_code using tempreg
    drop _merge
    
    gen coef=_stat_1-`intercept'
    drop _stat_1
    
    list
    Any tips would be appreciated. Here is my dataset:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str3(geo iso) float urb_cl double urb float curr long geo_code
    "FER" "CMR" .             47.836         . 1
    "FER" "DEQ" 0 31.763999999999996 1401.7407 1
    "FER" "LBR" .             51.807         . 1
    "FER" "LBR" .             51.807         . 1
    "FER" "SYC" .              46.28         . 1
    "FER" "TGO" .             61.968         . 1
    "FER" "TYE" 2             66.793  6300.079 1
    "FER" "YTE" 2             39.472 4489.9014 1
    "FER" "YTE" 2             39.472  22500.28 1
    "FER" "YTE" .             39.472         . 1
    "FER" "YTE" 2             39.472  3346.619 1
    "PAC" "ASA" 2             82.892 1873.3757 2
    "PAC" "ASA" 2             82.892 1252.2163 2
    "PAC" "BGD" 2             68.775  3583.354 2
    "PAC" "FJI" .             47.317         . 2
    "PAC" "KHM" 2             79.337  7045.824 2
    "PAC" "VNM" 2              68.92  1321.158 2
    "PAC" "VNM" 2              68.92  4199.222 2
    "YRP" "ASA" 0             11.305 4502.1187 3
    "YRP" "ASA" .             11.305         . 3
    "YRP" "ASA" 0             11.305  7127.425 3
    "YRP" "KWW" .                  0         . 3
    "YRP" "REQ" 0              1.391  8219.178 3
    "YRP" "TRS" 2             66.793 2925.0366 3
    end
    label values urb_cl label_n
    label def label_n 0 "low", modify
    label def label_n 2 "high", modify
    label values geo_code geo_code
    label def geo_code 1 "FER", modify
    label def geo_code 2 "PAC", modify
    label def geo_code 3 "YRP", modify
    Last edited by Ernestina delPiero; 07 Oct 2019, 08:41.

  • #2
    What you're trying to do here doesn't make any sense to me. Your original regression, -regress curr i.geo_code- gives you coefficients for each value of geo_code. Fine.

    But what you're trying to do with -statsby- is meaningless. When you restrict the data to a single geo_code (which is what -statsby- does, one geo_code at a time) and try to -regress curr i.geo_code-, geo_code is now a constant, so, in fact, there is no regression: the geo_code variable is actually omitted from the equation, which is reported in your results as a coefficient of 0. You can only use geo_code as a regressor in a data set where it varies. -statsby- obliterates that variation.

    If what you want is to get the coefficients from the original regression into the data set in the observations with the corresponding values of geo_code, you can do that as follows:
    Code:
    regress curr ib2.geo_code, baselevels
    levelsof geo_code if e(sample), local(gcs)
    gen coeff = .
    foreach gc of local gcs {
        replace coeff = _b[`gc'.geo_code] if geo_code == `gc'
    }

    Comment


    • #3
      Thank you, Clyde. That's exactly what I am after. I definitely misunderstood what 'statsby' does.

      A follow-up question. Say I want to get the coefficients a set of interaction variables in my regression. How can I generalize or modify the code you shared so that 'levelsof' also capture the levels of not 'geo_code' but of, for example, 'geo_code#tier'.

      Comment


      • #4
        Correction to the above. What I want to do is extract the coefficients from a set of interaction terms in my regression, using the same or similar method to what Clyde proposed above.

        Comment


        • #5
          Code:
          regress curr ib2.geo_code#i.tier, baselevels
          levelsof geo_code if e(sample), local(gcs)
          gen coeff = .
          foreach gc of local gcs {
              levelsof tier if geo_code == `gc' & e(sample), local(tiers)
              foreach t of local tiers {
                  replace coeff = _b[`gc'.geo_code#`t'.tier] if geo_code == `gc' & tier == `t'
              }
          }

          Comment


          • #6
            Thanks again, Clyde. The code works great!

            Comment

            Working...
            X