Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating a Matrix from Value Labels and an e(class) Matrix

    Hey everyone. More matrix manipulation today.

    After much work, I've finally managed to extract the betas returned by cvlasso, which of course you'll need to follow my code below. The significance of this is as follows: the below code constructs a synthetic Basque Country that did not experience a wave of terrorism post-1975. The LASSO selects the coefficients (in this case, the donor units) and stores them into a matrix. Luckily though, all this is sort of background knowledge and not really necessary. Here's precisely what I want: For myself, and for the users of my command, I want them to be able to see the exact units LASSO uses to construct the synthetic unit, preferably in matrix form. In this case, the units selected are Cataluña, Madrid, Principado De Asturias, and La Rioja, where my command keys on their unique ID as we've denoted them in our dataset.

    Here's the code to partly get us to where we should be (note I use greshape from ssc, but this isn't needed, and the regular one can be used)
    Code:
    clear *
    qui {
    *import delim "https://raw.githubusercontent.com/SucreRouge/synth_control/master/basque.csv", clear
    u "http://econ.korea.ac.kr/~chirokhan/panelbook/data/basque-clean.dta", clear
    
    replace regionname = "Asturias" if regionname=="Principado De Asturias" 
    
    loc int_time = 1975
    
    loc lambda lopt
    
    //sysuse basque, clear
    
    g treated = cond(regionno==17 & year >= `int_time',1,0)
    
    labvars year gdpcap "Year" "ln(GDP per 100,000)"
    
    replace regionname = trim(regexr(regionname,"\(.+\) *",""))
    
    egen id = group(regionname), label(regionname) // makes a unique ID
    
    order id, b(year)
    
    *keep if year >= 1960
    drop if inlist(id,18) //12
    
    keep gdp id year
    xtset id year, y
    
    cls
    preserve
    greshape wide gdp, j(id) i( year)
    
    tsset year, y
    
    order gdpcap5, a(year)
    
    qui cvlasso gdpcap5 gdpcap1-gdpcap17 if year < `int_time', h(1) roll postres
    
    qui cvlasso, `lambda' postres // get weights
    
    qui predict cf, xb
    
    keep year gdpcap5 cf
    
    greshape long gdpcap, i(year) j(id)
    
    sa Basque, replace
    restore
    }
    mat l e(beta)
    My desired result is something like this
    Code:
    ----------------------------------
                   Co_No | Unit_Weight
    ---------------------+------------
                Asturias | .10749576
                Cataluna |.58400395
                  Madrid | .030129
                   Rioja | .24480456
    ----------------------------------
    We already have the beta matrix, which returns
    Code:
    mat l e(beta)
    
    e(beta)[1,5]
               gdpcap3   gdpcap10   gdpcap14   gdpcap17      _cons
    gdpcap5  .10749576  .58400395    .030129  .24480456  .74178677
    My first instinct told me to create and reshape a temporary dataset via svmat, but this won't work because of the _cons column. Furthermore, Nick Cox once (rather humorously) said that svmat and working with it would
    be like adding an engine to a donkey
    and that it quite generally shouldn't be used.

    So to summarize, my question is this: I'd like to create a matrix exclusively of the selected units from the e(betas) matrix, where the rowname is the selected unit/covariate (e.g., GDP: Asturias) and the column reflects the LASSO coefficients/weights. I don't want users to simply have gdp3, gdp10, and so on, I want them to be able to see the precise units contributing to the synthetic control, and having a matrix as I've listed above would be a good way of doing that. How might I begin this?

  • #2
    The question appears pretty braod. I appreciate the full example. I found it a bit too complex, so I built my own. Perhaps this helps:

    Code:
    sysuse auto
    regress price mpg weight
    
    matlist e(b)
    
    // step 1: pick the relevant colums and transpose
    matrix my_b = e(b)[1, 1..2]'
    
    matlist my_b
    
    // step 2: get the variable names
    local rownames : rownames my_b
    
    // step 3: built new row names from variable labels
    foreach name of local rownames {
        local label : variable label `name'
        if (`"`label'"' == "") local label `name'
        local new_rownames `"`new_rownames' `"`label'"'"'
    }
    
    // step 4: apply new row names
    matrix rownames my_b = `new_rownames'
    
    // step 5: add a column name
    matrix colnames my_b = "Co_No"
    
    // step 6: add rowtitle 
    matlist my_b , rowtitle("Unit_Weight")
    I have not taken the time to dive into the new table/collect stuff; it might be useful here.

    Comment


    • #3
      playing with -collect-
      Code:
      collect clear
      collect get _r_b
      
      foreach el in `e(selected)' {
          
          local match = ustrregexm("`el'","^(.*)(\d{1,2})$")
          local vname  = ustrregexs(1)
          local level = ustrregexs(2)
          local name : label regionname `level'
          collect label levels colname `vname'`level' "`name'" , modify
      }
      
      collect style header colname, title(label)
      collect label dim colname "Co_No", modify
      collect label levels result _r_b "Unit_Weight", modify
      
      collect layout (colname)(result)
      Code:
      -------------------------------
                        | Unit_Weight
      ------------------+------------
      Co_No             |            
        Asturias        |     .374461
        Canarias        |   -.2262664
        Castilla Y Leon |    .7381042
        Andalucia       |   -.9016507
        Aragon          |   -.5633621
        Baleares        |    .3363918
        Basque Country  |     .093009
        Canarias        |    .5431172
        Cantabria       |    .4310963
        Intercept       |    .9627634
      -------------------------------

      Comment


      • #4
        This is the full code, worked out along the lines you and I had in mind. It does precisely as I wish, with one exception. EDIT: Nevermind, the solution would be `"`controls'"'
        Code:
        clear *
        qui {
        *import delim "https://raw.githubusercontent.com/SucreRouge/synth_control/master/basque.csv", clear
        u "http://econ.korea.ac.kr/~chirokhan/panelbook/data/basque-clean.dta", clear
        
        replace regionname = "Asturias" if regionname=="Principado De Asturias"
        
        loc int_time = 1975
        
        loc lambda lopt
        
        //sysuse basque, clear
        
        g treated = cond(regionno==17 & year >= `int_time',1,0)
        
        labvars year gdpcap "Year" "ln(GDP per 100,000)"
        
        replace regionname = trim(regexr(regionname,"\(.+\) *",""))
        
        egen id = group(regionname), label(regionname) // makes a unique ID
        
        order id, b(year)
        
        *keep if year >= 1960
        drop if inlist(id,18) //12
        
        keep gdp id year
        xtset id year, y
        
        cls
        preserve
        greshape wide gdp, j(id) i( year)
        
        tsset year, y
        
        order gdpcap5, a(year)
        
        qui cvlasso gdpcap5 gdpcap1-gdpcap17 if year < `int_time', ///
        h(1) roll postres
        
        qui cvlasso, `lambda' postres // get weights
        
        qui predict cf, lopt
        
        keep year gdpcap5 cf
        
        greshape long gdpcap, i(year) j(id)
        
        restore
        }
        
        loc weight_cols: colsof e(beta)
        
        // Synthetic Weights
        
        mat W = e(beta)[1, 1..`weight_cols'-1]'
        
        // everything but the constant
        
        loc q: rowfullnames W
        // getting the rownames
        
        local newrow : subinstr loc q " " ",", all
        // put commas between these elements
        cls
        
        local newrow : subinstr local newrow "gdpcap" "", all
        
                levelsof id if inlist(id,`newrow'), l(labs)
        
                local lab : value label id
        
                foreach l of local labs {
                    local controls `controls' `: label `lab' `l''
                }
        
        // step 4: apply new row names
        matrix rownames W = `controls'
        
        mat l W
        Presumably, there's a more elegant Mata solution, too daniel klein. Either way, consider
        Code:
        clear *
        qui {
        *import delim "https://raw.githubusercontent.com/SucreRouge/synth_control/master/basque.csv", clear
        u "http://econ.korea.ac.kr/~chirokhan/panelbook/data/basque-clean.dta", clear
        
        replace regionname = "Asturias" if regionname=="Principado De Asturias"
        
        loc int_time = 1975
        
        loc lambda lopt
        
        //sysuse basque, clear
        
        g treated = cond(regionno==17 & year >= `int_time',1,0)
        
        labvars year gdpcap "Year" "ln(GDP per 100,000)"
        
        replace regionname = trim(regexr(regionname,"\(.+\) *",""))
        
        egen id = group(regionname), label(regionname) // makes a unique ID
        
        order id, b(year)
        
        *keep if year >= 1960
        drop if inlist(id,18) //12
        
        keep gdp id year
        xtset id year, y
        
        cls
        preserve
        greshape wide gdp, j(id) i( year)
        
        tsset year, y
        
        order gdpcap5, a(year)
        
        qui cvlasso gdpcap5 gdpcap1-gdpcap17 if year < `int_time', ///
        h(1) roll postres adaptive
        
        qui cvlasso, `lambda' postres // get weights
        
        qui predict cf, lopt
        
        keep year gdpcap5 cf
        
        greshape long gdpcap, i(year) j(id)
        
        restore
        }
        
        loc weight_cols: colsof e(beta)
        
        // Synthetic Weights
        
        mat W = e(beta)[1, 1..`weight_cols'-1]'
        
        // everything but the constant
        
        loc q: rowfullnames W
        // getting the rownames
        
        local newrow : subinstr loc q " " ",", all
        // put commas between these elements
        cls
        
        local newrow : subinstr local newrow "gdpcap" "", all
        
                levelsof id if inlist(id,`newrow'), l(labs)
        
                local lab : value label id
        
                foreach l of local labs {
                    local controls `controls' `: label `lab' `l''
                }
        
        // step 4: apply new row names
        matrix rownames W = `controls'
        
        mat l W
        I've now used the adaptive LASSO, which selects only one donor unit, interestingly (and runs faster!). Either way, the issue is now, the donor has spaces in it, "Castilla y Leon". This issues a conformability error when we reach step 4
        Code:
        // step 4: apply new row names
        matrix rownames W = `controls'
        conformability error
        Because now it reads as matrix rownames W = Castilla y Leon. Now the issue is having all the donor units in quotation marks, so Stata understands that I intend each value label to be one word, instead of three, in this case. How might I do this?

        By the way, I really appreciate you rising to the proverbial challenge on this. All other synthetic control commands show us the specific donors they pick, and this particular element is the one I was missing, so once I integrate this, my paper/command will be ready for Stata Journal/Journal of Statistical Software.
        Last edited by Jared Greathouse; 18 Jun 2022, 07:55.

        Comment


        • #5
          Bjarte Aagnes As usual, you make the complicated, super simple. Thanks so much!

          EDIT: Something strange happens, now.
          Code:
          clear *
          qui {
          *import delim "https://raw.githubusercontent.com/SucreRouge/synth_control/master/basque.csv", clear
          u "http://econ.korea.ac.kr/~chirokhan/panelbook/data/basque-clean.dta", clear
          
          replace regionname = "Asturias" if regionname=="Principado De Asturias" 
          
          loc int_time = 1975
          
          loc lambda lopt
          
          //sysuse basque, clear
          
          g treated = cond(regionno==17 & year >= `int_time',1,0)
          
          labvars year gdpcap "Year" "ln(GDP per 100,000)"
          
          replace regionname = trim(regexr(regionname,"\(.+\) *",""))
          
          egen id = group(regionname), label(regionname) // makes a unique ID
          
          order id, b(year)
          
          *keep if year >= 1960
          drop if inlist(id,18) //12
          
          keep gdp id year
          xtset id year, y
          
          cls
          preserve
          greshape wide gdp, j(id) i( year)
          
          tsset year, y
          
          order gdpcap5, a(year)
          
          qui cvlasso gdpcap5 gdpcap1-gdpcap17 if year < `int_time', ///
          h(1) roll postres
          
          qui cvlasso, `lambda' postres // get weights
          
          qui predict cf, lopt
          
          keep year gdpcap5 cf
          
          greshape long gdpcap, i(year) j(id)
          
          restore
          }
          collect clear
          collect get _r_b
          
          foreach el in `e(selected)' {
              
              local match = ustrregexm("`el'","^(.*)(\d{1,2})$")
              local vname  = ustrregexs(1)
              local level = ustrregexs(2)
              local name : label regionname `level'
              collect label levels colname `vname'`level' "`name'" , modify
          }
          
          collect style header colname, title(label)
          collect label dim colname "Co_No", modify
          collect label levels result _r_b "Unit_Weight", modify
          
          collect layout (colname)(result)
          returns, for me, the following
          Code:
          Collection: default
                Rows: colname
             Columns: result
             Table 1: 6 x 1
          
          -------------------------
                      | Unit_Weight
          ------------+------------
          Co_No       |            
            Asturias  |    .1074958
            0         |    .5840039
            Baleares  |     .030129
            Cantabria |    .2448046
            Intercept |    .7417868
          -------------------------
          This is strange, because Baleares and Cantabria weren't selected. Additionally, Asturias, Catalunia, Madrid, and Rioja were.

          Also, the original matrix doesn't seem to make sense, since the Basque Country, the treated unit, is included here. Well, the Basque Country is the treated unit, and therefore can't have weight assigned to it.

          The unit weights appear to be correct, the issue is just the column names.
          Last edited by Jared Greathouse; 18 Jun 2022, 08:19.

          Comment


          • #6
            I made an error in the regex which is corrected below. Also, added defining the cols to keep - thus avoiding presenting the constant. Also, be aware of adding command between your estimation command and the collect which could change content of e().
            Code:
            collect clear
            collect get _r_b
            
            foreach el in `e(selected)' {
                
                local match = ustrregexm("`el'","^([\D]+)(\d{1,2})$")
                local vname  = ustrregexs(1)
                local level = ustrregexs(2)
                local name : label regionname `level'
                local vnames = "`vnames'" + char(32) + "`vname'`level'"
                collect label levels colname `vname'`level' "`name'" , modify
            }
            
            collect style header colname, title(label)
            collect label dim colname "Co_No", modify
            collect label levels result _r_b "Unit_Weight", modify
            
            collect layout (colname[`vnames'])(result)
            Code:
            . collect layout (colname[`vnames'])(result)
            
            Collection: default
                  Rows: colname[ gdpcap3 gdpcap6 gdpcap8 gdpcap11 gdpcap12 gdpcap14 gdpcap15 gdpcap16 gdpcap17]
               Columns: result
               Table 1: 10 x 1
            
            ------------------------------------
                                   | Unit_Weight
            -----------------------+------------
            Co_No                  |            
              Asturias             |     .374461
              Canarias             |   -.2262664
              Castilla Y Leon      |    .7381042
              Comunidad Valenciana |   -.9016507
              Extremadura          |   -.5633621
              Madrid               |    .3363918
              Murcia               |     .093009
              Navarra              |    .5431172
              Rioja                |    .4310963
            ------------------------------------
            
            . di "`e(selected)'"
            gdpcap3 gdpcap6 gdpcap8 gdpcap11 gdpcap12 gdpcap14 gdpcap15 gdpcap16 gdpcap17
            Last edited by Bjarte Aagnes; 19 Jun 2022, 11:36.

            Comment

            Working...
            X