Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Collect with Oaxaca command

    Hello,
    I am working on a paper that examines marital quality (marital happiness, divorce proneness, etcetera) across three US marriage cohorts, 1980, 2000, and 2022 - extending this paper using new data from 2022. I am decomposing the change in marital happiness due to various factors using new data from 2022. I am decomposing the change in marital happiness due to various factors using the oaxaca command. I am using Stata 19.

    Here is my command, comparing 1980 and 2022 where “marhap” is a continuous measure of marital happiness, “agecurrmarr” through “relig” are the predictors, “cohort19802022” distinguishes the two cohorts (it is 0/1), and it is svy weighted and conducted for the subpop of wives.

    Code:
    oaxaca marhap agecurrmarr yrsmar remarried_couple premarcohab blackresp latinoresp othernwresp lesshs_w hs_w somecoll_w lesshs_h hs_h somecoll_h wifepart wifefull husbpart husbfull wifeextend husbextend anypschool anyschool hhwork hwsat wifefair husbfair edecide genderatt lifelongmarr relig, by(cohort19802022) noisily weight(1) svy(,subpop(if wiferesp == 1))
    It works beautifully.

    Click image for larger version

Name:	results for statalist.png
Views:	1
Size:	319.0 KB
ID:	1785206


    Now I want to feed them into a table using the "collect" suite of commands. But, I cannot figure out how to refer to the specific parts of the matrices I need. For example, how do I call the e(b) for the first group (i.e. 1980 cohort) and how do I call the "explained" and "unexplained"? I am *very* new to the collect command, so I apologize in advance if this is obvious. But most examples are with simple regression commands I have seen, as opposed to a command like oaxaca that produces several matrices.

    Note that this table is from an earlier version, I dropped a few variables and I am doing it separately for men and women, but I think what I am eventually going for comes across.

    Click image for larger version

Name:	results for statalist 2.png
Views:	1
Size:	84.9 KB
ID:	1785207

    Any help would be greatly appreciated!!!!

    Claire Kamp Dush
    Professor, Sociology and Minnesota Population Center
    University of Minnesota, Twin Cities
    Minneapolis, MN US

  • #2
    OK, I am replying to myself. I figured I could run the regressions separately, and then back my way into the table. I was working on that, then I read this thread, and decided to try Claude. I spent $20 getting the fancier Claude, not sure if that matters. I told it my problem, and after a bit of back and forth, it produced exactly what I showed up above. The code is below. So, I guess I don't need help anymore. I checked the code and it is right.

    Code:
    *===============================================================
    * SETUP
    *===============================================================
    
    local wives_vars "agecurrmarr yrsmar remarried_couple premarcohab blackresp latinoresp othernwresp lesshs_w hs_w somecoll_w lesshs_h hs_h somecoll_h wifepart wifefull husbpart husbfull wifeextend husbextend anypschool anyschool hhwork hwsat wifefair husbfair edecide genderatt lifelongmarr relig"
    
    *===============================================================
    * HELPER PROGRAM: STARS
    *===============================================================
    
    capture program drop getstars
    program define getstars, rclass
        args coef se
        local tstat = abs(`coef' / `se')
        local pval = 2 * (1 - normal(`tstat'))
        if `pval' < 0.001 {
            return local stars "***"
        }
        else if `pval' < 0.01 {
            return local stars "**"
        }
        else if `pval' < 0.05 {
            return local stars "*"
        }
        else {
            return local stars ""
        }
    end
    
    *===============================================================
    * RUN REGRESSIONS AND STORE RESULTS
    *===============================================================
    
    svy, subpop(if wiferesp == 1): reg z_marhap `wives_vars' if year == 1980 & wiferesp == 1
    matrix b1980 = e(b)
    matrix V1980 = e(V)
    local n1980  = e(N_sub)
    local r21980 = e(r2)
    
    svy, subpop(if wiferesp == 1): reg z_marhap `wives_vars' if year == 2000 & wiferesp == 1
    matrix b2000 = e(b)
    matrix V2000 = e(V)
    local n2000  = e(N_sub)
    local r22000 = e(r2)
    
    svy, subpop(if wiferesp == 1): reg z_marhap `wives_vars' if year == 2022 & wiferesp == 1
    matrix b2022 = e(b)
    matrix V2022 = e(V)
    local n2022  = e(N_sub)
    local r22022 = e(r2)
    
    *===============================================================
    * RUN OAXACA DECOMPOSITIONS
    *===============================================================
    
    oaxaca z_marhap `wives_vars', by(cohort19802022) noisily weight(1) svy(, subpop(if wiferesp == 1))
    matrix box1980 = e(b)
    matrix Vox1980 = e(V)
    
    oaxaca z_marhap `wives_vars', by(cohort20002022) noisily weight(1) svy(, subpop(if wiferesp == 1))
    matrix box2000 = e(b)
    matrix Vox2000 = e(V)
    
    *===============================================================
    * EXPORT TO EXCEL
    *===============================================================
    
    putexcel set "wives_marhap.xlsx", replace sheet("z_marhap")
    
    * Headers
    putexcel A1 = "Variable"
    putexcel B1 = "Coef 1980"
    putexcel C1 = "Coef 2000"
    putexcel D1 = "Coef 2022"
    putexcel E1 = "Explained 1980v2022"
    putexcel F1 = "Explained 2000v2022"
    putexcel G1 = "Unexplained 1980v2022"
    putexcel H1 = "Unexplained 2000v2022"
    
    * Variable rows
    local row = 2
    foreach v of local wives_vars {
    
        local vlab : variable label `v'
        if "`vlab'" == "" local vlab "`v'"
    
        local c1980 = colnumb(b1980, "`v'")
        local c2000 = colnumb(b2000, "`v'")
        local c2022 = colnumb(b2022, "`v'")
        local ce80  = colnumb(box1980, "explained:`v'")
        local ce00  = colnumb(box2000, "explained:`v'")
        local cu80  = colnumb(box1980, "unexplained:`v'")
        local cu00  = colnumb(box2000, "unexplained:`v'")
    
        local coef1980  = b1980[1, `c1980']
        local coef2000  = b2000[1, `c2000']
        local coef2022  = b2022[1, `c2022']
        local exp1980   = box1980[1, `ce80']
        local exp2000   = box2000[1, `ce00']
        local unexp1980 = box1980[1, `cu80']
        local unexp2000 = box2000[1, `cu00']
    
        local se1980      = sqrt(V1980[`c1980', `c1980'])
        local se2000      = sqrt(V2000[`c2000', `c2000'])
        local se2022      = sqrt(V2022[`c2022', `c2022'])
        local seexp1980   = sqrt(Vox1980[`ce80', `ce80'])
        local seexp2000   = sqrt(Vox2000[`ce00', `ce00'])
        local seunexp1980 = sqrt(Vox1980[`cu80', `cu80'])
        local seunexp2000 = sqrt(Vox2000[`cu00', `cu00'])
    
        getstars `coef1980' `se1980'
        local s1980 = r(stars)
        if "`s1980'" == "." local s1980 ""
        getstars `coef2000' `se2000'
        local s2000 = r(stars)
        if "`s2000'" == "." local s2000 ""
        getstars `coef2022' `se2022'
        local s2022 = r(stars)
        if "`s2022'" == "." local s2022 ""
        getstars `exp1980' `seexp1980'
        local se80 = r(stars)
        if "`se80'" == "." local se80 ""
        getstars `exp2000' `seexp2000'
        local se00 = r(stars)
        if "`se00'" == "." local se00 ""
        getstars `unexp1980' `seunexp1980'
        local su80 = r(stars)
        if "`su80'" == "." local su80 ""
        getstars `unexp2000' `seunexp2000'
        local su00 = r(stars)
        if "`su00'" == "." local su00 ""
    
        putexcel A`row' = "`vlab'"
        putexcel B`row' = "`: display %6.2f `coef1980''`s1980'"
        putexcel C`row' = "`: display %6.2f `coef2000''`s2000'"
        putexcel D`row' = "`: display %6.2f `coef2022''`s2022'"
        putexcel E`row' = "`: display %6.2f `exp1980''`se80'"
        putexcel F`row' = "`: display %6.2f `exp2000''`se00'"
        putexcel G`row' = "`: display %6.2f `unexp1980''`su80'"
        putexcel H`row' = "`: display %6.2f `unexp2000''`su00'"
    
        local row = `row' + 1
    }
    
    *--- Constant row ---*
    local cc1980 = colnumb(b1980, "_cons")
    local cc2000 = colnumb(b2000, "_cons")
    local cc2022 = colnumb(b2022, "_cons")
    
    local cons1980 = b1980[1, `cc1980']
    local cons2000 = b2000[1, `cc2000']
    local cons2022 = b2022[1, `cc2022']
    
    local scons1980 = sqrt(V1980[`cc1980', `cc1980'])
    local scons2000 = sqrt(V2000[`cc2000', `cc2000'])
    local scons2022 = sqrt(V2022[`cc2022', `cc2022'])
    
    getstars `cons1980' `scons1980'
    local sc1980 = r(stars)
    if "`sc1980'" == "." local sc1980 ""
    getstars `cons2000' `scons2000'
    local sc2000 = r(stars)
    if "`sc2000'" == "." local sc2000 ""
    getstars `cons2022' `scons2022'
    local sc2022 = r(stars)
    if "`sc2022'" == "." local sc2022 ""
    
    putexcel A`row' = "Constant"
    putexcel B`row' = "`: display %6.2f `cons1980''`sc1980'"
    putexcel C`row' = "`: display %6.2f `cons2000''`sc2000'"
    putexcel D`row' = "`: display %6.2f `cons2022''`sc2022'"
    
    *--- Overall oaxaca row ---*
    local row = `row' + 1
    local ceov80 = colnumb(box1980, "overall:explained")
    local ceov00 = colnumb(box2000, "overall:explained")
    local cuov80 = colnumb(box1980, "overall:unexplained")
    local cuov00 = colnumb(box2000, "overall:unexplained")
    
    local exp_ov80   = box1980[1, `ceov80']
    local exp_ov00   = box2000[1, `ceov00']
    local unexp_ov80 = box1980[1, `cuov80']
    local unexp_ov00 = box2000[1, `cuov00']
    
    local seov80 = sqrt(Vox1980[`ceov80', `ceov80'])
    local seov00 = sqrt(Vox2000[`ceov00', `ceov00'])
    local suov80 = sqrt(Vox1980[`cuov80', `cuov80'])
    local suov00 = sqrt(Vox2000[`cuov00', `cuov00'])
    
    getstars `exp_ov80' `seov80'
    local seo80 = r(stars)
    if "`seo80'" == "." local seo80 ""
    getstars `exp_ov00' `seov00'
    local seo00 = r(stars)
    if "`seo00'" == "." local seo00 ""
    getstars `unexp_ov80' `suov80'
    local suo80 = r(stars)
    if "`suo80'" == "." local suo80 ""
    getstars `unexp_ov00' `suov00'
    local suo00 = r(stars)
    if "`suo00'" == "." local suo00 ""
    
    putexcel A`row' = "OVERALL"
    putexcel E`row' = "`: display %6.2f `exp_ov80''`seo80'"
    putexcel F`row' = "`: display %6.2f `exp_ov00''`seo00'"
    putexcel G`row' = "`: display %6.2f `unexp_ov80''`suo80'"
    putexcel H`row' = "`: display %6.2f `unexp_ov00''`suo00'"
    
    *--- N row ---*
    local row = `row' + 1
    putexcel A`row' = "N"
    putexcel B`row' = `n1980'
    putexcel C`row' = `n2000'
    putexcel D`row' = `n2022'
    
    *--- R-squared row ---*
    local row = `row' + 1
    putexcel A`row' = "R-squared"
    putexcel B`row' = "`: display %6.3f `r21980''"
    putexcel C`row' = "`: display %6.3f `r22000''"
    putexcel D`row' = "`: display %6.3f `r22022''"
    
    putexcel save
    Here is a screenshot of what it produced.

    Click image for larger version

Name:	results for statalist 3.png
Views:	1
Size:	152.0 KB
ID:	1785213


    Welp, glad I read the Statalist, given I have 12 tables overall, this saved me an insane amount of time.

    Comment


    • #3
      Thanks a lot for sharing your experience with Claude and the code. This seems like a valid application for LLMs. Theoretically, what you wanted to do should work in Stata, but as we see, it requires some effort to understand how the data are stored to recombine them into the desired format. Even if one does not easily understand the code, one can quickly verify that the output is identical to the raw Stata output and check that the LLM did not mess up. Out of curiosity: did you try other LLMs or the free version of Claude before?
      Best wishes

      Stata 18.0 MP | ORCID | Google Scholar

      Comment


      • #4
        I have tried asking Gemini for help with Stata code, but it messes it up a lot. I have basically given up on it. After I did this, I tried a code review by giving Claude the codebook and my own code I wrote, and I asked it to create a code review, and I ran it, gave Claude the output, and it did find an error in my code in my accounting for a skip pattern that allowed me to add back over 60 women to my sample. So, that was awesome. I have not tried the free Claude. I think I could have eventually figured this out on my own, but it would certainly have taken me a lot more time. Claude did make some mistakes, and getting the code right definitely took some back and forth. But, I am pleased with the outcome, and I definitely think I would use it again for help with complicated coding, but ONLY after checking everything myself.

        Comment

        Working...
        X