Storing and retreiving beta coefficients and p values from a loop

SL Navarro

Join Date: Jun 2016

Posts: 23
#1

Storing and retreiving beta coefficients and p values from a loop

21 Jun 2016, 12:49

I have the following loop and am trying to export the beta coefficients and p values for each variable into a csv file. I have tried searching various forums and cannot get anything that works. I realize I need to store the information in a matrix and then export and am not sure where my problem is.

My code (simplified) is as follows:

foreach var of varlist metabolite1 metabolite2 metabolite3... {
mixed `var' diet sex age baseline || id:
}

I would like 3 columns in a csv file that include: metabolites, beta coefficients and p values

What do I need to add to my basic code to achieve that?

Thanks in advance.
Sandi
Tags: None

Clyde Schechter

Join Date: Apr 2014
Posts: 29969

21 Jun 2016, 13:32

Not tested, by try this:

Code:

gen long obs_no = _n
rename metabolite* outcome*
reshape long outcome, i(obs_no) j(metabolite)
statsby, clear by(metabolite): mixed outcome diet sex age baseline || id:
foreach x in diet sex age baseline {
    gen t_`x' = _b_`x'/_se_`x'
    gen p_`x' = 2*normal(-abs(t_`x'))
}
export delimited using my_results, replace

I think it will give you what you want.

Last edited by Clyde Schechter; 21 Jun 2016, 13:37.

Comment

SL Navarro

Join Date: Jun 2016

Posts: 23
#3

22 Jun 2016, 12:49

Thank you for your time and the suggestion. I've tried to adapt your code to mine but keep getting errors. Perhaps I've simplified my code too much. Here is my actual code (and I'm sure there is a better way to build this loop, I just haven't worked with loops much). One of the issues I have is that I need to adjust for baseline values of each metabolite within each separate regression and the code I have now works for that purpose (using the ln`var' and lnbase`var' grabs, for example, both cytidine and basecytidine the way I have it set up, and likewise with the rest of the metabolites). Renaming leads to too many characters for the baseline (I think) and if I try to shorten it, I get an error that there are no xij variables found.

foreach var of varlist cytidine deoxycarnitine spermidine propionate gdp n2n2dimethylguanosine ///
reducedglutathione aminoisobutyrate melatonin nacetylneuraminate homoserineds xanthosine ///
glycerol3p ohphenylpyruvate uracil adenylosuccinate methylhistamine adenine cadaverine ///
oxidizedglutathione linoleicacid epinephrinenorm margaricacid lactose hode sarcosine sorbitol ///
uridine adipicacid glyceraldehyde oxalicacid lkynurenine myristicacid glutaricacid oxypurinol ///
niacinamide kynurenate asparticacid sucrose ascorbicacid amp erythrose fumaricacid guanidinoacetate ///
cmp pentothenate trimethylamine hydroxybutyrate glycochenodeoxycholate glycerate cystamine ///
anthranilate xanthurenate methyladenosine phosphoglycericacid dsuccinate glycine allantoin ///
pyroglutamicacid biotin adp citraconicacid xanthine benzoicacid dimethylglycine acetylcholine ///
azelaicacid dleucicacid malonicacid isovalericacid aconitate linolenicacid hydroxyproline nacetylglycine ///
pyruvate hydroxyglutarate asparagine inositol hypoxanthine methylmalonate methionine succinate methylhistidine ///
cystine ornithine tyrosine ketoglutaricacid hyppuricacid serine valine pyridoxicacid taurine pipecolate ///
hydroxyisovalericacid creatine glutamicacid oxalacetate creatinine acetylcarnitine threonine carnitine ///
arginine trimethylaminenoxide tryptophan alanine isoleucine choline aminovalericacid citrulline betaine ///
shikimicacid lysine hba histidine leucine proline lactate phenylalanine urate glutamine glucose {
gen ln`var'=ln(`var')
gen lnbase`var'=ln(base`var')
mixed ln`var' diet batch dietseq period sex age bodyfat lnbase`var' || id:
}
Comment
Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#4

22 Jun 2016, 13:04

I assume

mixed lncytidine diet batch dietseq period sex age bodyfat lnbasecytidine || id:

runs. If not, fix it first and that will tell you what is wrong in your loop.

Your example is needlessly long. It makes no difference whether the varlist is 2 variables or 50 since you're using them one at a time.
To debug this kind of thing, start with one or two variables in your varlist. Add a summary statement after your two generate statements to make sure you're getting non-zero non-missing values. If that doesn't show you the problem, you can add set trace on to see how Stata is interpreting your variables.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29969
#5

22 Jun 2016, 13:33

So, your actual dependent variables do not have systematic names, and in attempting to make them systematic but still retain mnemonic value you are hitting problems. OK. So let's use a different approach:

Code:

capture postutil clear tempfile results postfile handle str32 metabolite float batch se_batch diet se_diet dietsq se_dietsq period se_period /// sex se_sex age se_age bodyfat se_bodyfat baseline se_baseline using `results' foreach var of varlist cytidine deoxycarnitine spermidine propionate gdp n2n2dimethylguanosine /// reducedglutathione aminoisobutyrate melatonin nacetylneuraminate homoserineds xanthosine /// glycerol3p ohphenylpyruvate uracil adenylosuccinate methylhistamine adenine cadaverine /// oxidizedglutathione linoleicacid epinephrinenorm margaricacid lactose hode sarcosine sorbitol /// uridine adipicacid glyceraldehyde oxalicacid lkynurenine myristicacid glutaricacid oxypurinol /// niacinamide kynurenate asparticacid sucrose ascorbicacid amp erythrose fumaricacid guanidinoacetate /// cmp pentothenate trimethylamine hydroxybutyrate glycochenodeoxycholate glycerate cystamine /// anthranilate xanthurenate methyladenosine phosphoglycericacid dsuccinate glycine allantoin /// pyroglutamicacid biotin adp citraconicacid xanthine benzoicacid dimethylglycine acetylcholine /// azelaicacid dleucicacid malonicacid isovalericacid aconitate linolenicacid hydroxyproline nacetylglycine /// pyruvate hydroxyglutarate asparagine inositol hypoxanthine methylmalonate methionine succinate methylhistidine /// cystine ornithine tyrosine ketoglutaricacid hyppuricacid serine valine pyridoxicacid taurine pipecolate /// hydroxyisovalericacid creatine glutamicacid oxalacetate creatinine acetylcarnitine threonine carnitine /// arginine trimethylaminenoxide tryptophan alanine isoleucine choline aminovalericacid citrulline betaine /// shikimicacid lysine hba histidine leucine proline lactate phenylalanine urate glutamine glucose { // CREATE LOG TRANSFORMED VARIABLES gen ln`var'=ln(`var') gen lnbase`var'=ln(base`var') // RUN MIXED EFFECTS REGRESSION mixed ln`var' c.diet##c.diet batch period sex age bodyfat lnbase`var' || id: // LOOP OVER REGRESSORS TO CREATE THE MATERIAL TO POST local topost ("`var'") foreach x in diet c.diet#c.diet batch period sex age bodyfat lnbase`var' { local topost `topost' (_b[ln`var':`x']) (_se[ln`var':`x']) } // POST IT post handle `topost' } postclose handle use `results', clear foreach v of varlist batch diet dietsq period sex age bodyfat baseline{ gen t_`v' = `v'/se_`v' gen p_`v' = 2*normal(-abs(t_`v')) order t_`v' p_`v', after(se_`v') }

should do the trick. Again, not tested, and there may be some typos or other errors, but this is the gist of it.

Now, something has caught my eye here about your modeling. It appears that you have repeated observations on a series of people (or animals?), and you also have baseline data on your outcome variables. You are using a mixed-effects regression, and you are including the baseline value (log transformed) as one of the regressors. If your data set includes a separate baseline observation, then you are double-adjusting for the baseline value by both including it as a regressor and also including the baseline observation in the estimation sample. In fact, if you think about what the baseline observation looks like: the outcome variable is identical to the final regressor, you will see that this cannot end well. So, if this is your situation, you either have to drop the baseline regressor, or you have to exclude the baseline observations from the estimation sample. Either way is largely acceptable: inclusion of the baseline observation does in fact cause the analysis to be adjusted for the baseline value.

The two approaches are algebraically equivalent, although they require different interpretations of their results. In the analysis where you omit the baseline value regressor and include all observations, you get the unvarnished effects of batch, diet, sex, etc. and you can separately calculate the intraclass correlation of your outcome variable from the variance component estimates at the end of the -mixed- output if you want it. If you omit the baseline observation and include the baseline value regressor, the coefficients of your predictor variables are the effects attenuated by the intraclass correlation coefficient. And the latter is not correctly estimated anywhere in the model, so you cannot readily disentangle the effects from the measurement reliability. In general, I prefer the approach that separates the effects from the intraclass correlation (i.e., use the baseline observation but not the baseline regressor), but that is largely a matter of preference; some people feel that the attenuated estimates are better because they account for measurement unreliability.

If you decide to remove the regressor and include the baseline observation in the estimation sample, consider adjusting for heteroscedasticity using the -residuals()- option, as the baseline observation is sometimes results of pre-study conditions that are more variable than the post-study conditions.

Well, I've gone on at length here about a problem you may not even have. But do check that you are not double-adjusting for baseline.
Comment
SL Navarro

Join Date: Jun 2016

Posts: 23
#6

22 Jun 2016, 14:18

Thank you, Clyde, that worked, it ran from start to finish without error. Still one small issue outstanding. Getting my results into a csv file.

1.) This worked, but I now need to export the values. I used your code from the above post "export delimited using my_results, replace" but get an error message that the file is not found. Do I need to do something else to get this information into a csv file?

2.) To answer your first question above, my entire code ran without error as well, I just could not figure out how to get the beta coefficients and p-values into a document so that I wouldn't have to copy/paste them (too error prone and time-consuming).

3.) Regarding the repeated measures, yes, there are 4 measures per person. One at baseline of each diet and one at the end of each diet. I have my data set up as two lines per person such that the metabolites at day 0 and end of intervention are in the same row and contain different names, e.g., "cytidine" and "basecytidine" and different values. I'm not sure what you mean by double-adjusting. I think what you mean is that I have "cytidine" in the model and then "basecytidine" again as a regressor. The way that I have set up my model, I know it is grabbing these two separate values cytidine and basecytidine because I tested it in single models. I have worked with a statistician on this and followed his code from SAS. I just prefer to work in Stata.

4.) You are an extremely patient and helpful person and I am grateful for your time and expertise.

-Sandi
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29969
#7

22 Jun 2016, 14:48

1. Sometimes Stata's error messages are a bit off point. There is no need for -export delimited- to "find" a file: it has to create one. Perhaps what it was really trying to tell you that it cannot open a file in the current directory. Perhaps you don't have write privileges for that one? Perhaps your working directory is not the one you are supposed to save your files in? Other than that, I can't think of any reason why that command should not work properly.

3. Ah, so your data are different from what I thought, and what you are doing does not involve any double-adjustment. I gave up using SAS almost as soon as I became acquainted with Stata in 1994, and have never looked back. It would be a vast understatement to say that I, too, prefer working in Stata.
Comment
SL Navarro

Join Date: Jun 2016

Posts: 23
#8

22 Jun 2016, 15:17

I found the file - user error, no directory issues. I have all I need in a tidy csv file. Many thanks Clyde!!
Comment
SL Navarro

Join Date: Jun 2016

Posts: 23
#9

01 Oct 2018, 16:52

I have been using variations of the code using the "postfile" command provided above for some time without issue. Now I have a need to swap the outcome and independent variables and cannot figure out how to alter the code to get the output exported (I have no issues with the model itself, just can't get it to export). Note that both the outcome and independent variables are continuous in this instance, and the independent variables (n=~120) do not have a systematic naming scheme. I am using Stata v 15.1 for windows. My old code is below, and the new model without all of the code pertaining to the postfile command is beneath. All I really need are the coefficients and p values for the association between homa and ln`var'. Any input would be much appreciated.

Code:

capture postutil clear
tempfile bcresults
postfile handle str32 metabolite float diet se_diet batch se_batch diet_seq se_diet_seq period se_period ///
female se_female age se_age cl_dxa se_cl_dxa using `bcresults'

foreach var of varlist cytidine-glucose {

gen ln`var'=ln(`var')

mixed ln`var' diet batch diet_seq period female age cl_dxa_fat || id:

local topost ("`var'")
foreach x in diet batch diet_seq period female age cl_dxa {
local topost `topost' (_b[ln`var':`x']) (_se[ln`var':`x'])
}

post handle `topost'
}

postclose handle

use `bcresults', clear

foreach v of varlist diet_0 diet_seq batch period female age cl_dxa {
gen t_`v' = `v'/se_`v'
gen p_`v' = 2*normal(-abs(t_`v'))
order t_`v' p_`v', after(se_`v')
}
export delimited using my_bcresults, replace

The NEW mixed model I need to run is:

Code:
mixed homa ln`var' diet batch diet_seq period female age cl_dxa_fat || id:
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 29969

#10

01 Oct 2018, 17:35

Try this:

Code:

capture postutil clear
tempfile results
postfile handle str32 metabolite float b_ln_metabolite se_ln_metabolite using `results'

foreach var of varlist cytidine deoxycarnitine spermidine propionate gdp n2n2dimethylguanosine ///
    reducedglutathione aminoisobutyrate melatonin nacetylneuraminate homoserineds xanthosine ///
    glycerol3p ohphenylpyruvate uracil adenylosuccinate methylhistamine adenine cadaverine ///
    oxidizedglutathione linoleicacid epinephrinenorm margaricacid lactose hode sarcosine sorbitol ///
    uridine adipicacid glyceraldehyde oxalicacid lkynurenine myristicacid glutaricacid oxypurinol ///
    niacinamide kynurenate asparticacid sucrose ascorbicacid amp erythrose fumaricacid guanidinoacetate ///
    cmp pentothenate trimethylamine hydroxybutyrate glycochenodeoxycholate glycerate cystamine ///
    anthranilate xanthurenate methyladenosine phosphoglycericacid dsuccinate glycine allantoin ///
    pyroglutamicacid biotin adp citraconicacid xanthine benzoicacid dimethylglycine acetylcholine ///
    azelaicacid dleucicacid malonicacid isovalericacid aconitate linolenicacid hydroxyproline nacetylglycine ///
    pyruvate hydroxyglutarate asparagine inositol hypoxanthine methylmalonate methionine succinate methylhistidine ///
    cystine ornithine tyrosine ketoglutaricacid hyppuricacid serine valine pyridoxicacid taurine pipecolate ///
    hydroxyisovalericacid creatine glutamicacid oxalacetate creatinine acetylcarnitine threonine carnitine ///
    arginine trimethylaminenoxide tryptophan alanine isoleucine choline aminovalericacid citrulline betaine ///
    shikimicacid lysine hba histidine leucine proline lactate phenylalanine urate glutamine glucose {

    // CREATE LOG TRANSFORMED VARIABLES
    gen ln`var'=ln(`var')
    gen lnbase`var'=ln(base`var')
    
    // RUN MIXED EFFECTS REGRESSION
    mixed homa ln`var' diet batch diet_seq period female age cl_dxa_fat || id:
    
    // POST COEFFICIENT OF ln_metabolite
    post ("`var'") (_b[homa:ln_metabolite]) (_se[homa:ln_metabolite])
}

postclose handle

use `results', clear
gen t = b_ln_metabolite/se_ln_metabolite
gen p = 2*normal(-abs(t))

Note: not tested, as no sample data was provided.

In the future, please show your code in between code delimiters, as I have done here and earlier in this thread. It makes it much easier to read. See Forum FAQ#12 for instructions on using code delimiters if you don't know how.

Comment

SL Navarro

Join Date: Jun 2016
Posts: 23

#11

02 Oct 2018, 10:02

Thank you, Clyde. Here is a sample of what my data look like (abbreviating the number of metabolites and samples):

id	batch	period	diet_seq	diet_0A_1B	female	age	cl_dxa_fat	cytidine	deoxycarnitine	glucose	homa
620001	1	0	0	0	0	25.72	0	11263	20168	26866	0.666667
620001	1	1	0	1	0	25.72	0	14233	19569	19915	0.974815
620002	2	1	1	0	1	36.54	0	43111	19375	38339	1.7
620002	2	0	1	1	1	36.54	0	29291	26338	13160	1.493333
620003	3	1	1	0	1	31.9	1	16086	12830	49021	1.167901
620003	3	0	1	1	1	31.9	1	19183	14365	13641	1.499259
620004	4	0	0	0	0	24.11	0	11345	15215	13653	3.063704
620004	4	1	0	1	0	24.11	0	11758	16005	14968	1.258272
620005	5	0	0	0	1	35.37	1	31722	13299	24488	1.415556
620005	5	1	0	1	1	35.37	1	17520	14677	11802	1.768148

I ran the code with a few alterations (i.e., no longer need to adjust for baseline, etc.)

Code:

capture postutil clear
tempfile results
gen lnhoma=ln(homa)
postfile handle str32 metabolite float b_ln_metabolite se_ln_metabolite using `results'

foreach var of varlist cytidine-glucose {

    // CREATE LOG TRANSFORMED VARIABLES
    gen ln`var'=ln(`var')
        
    // RUN MIXED EFFECTS REGRESSION
    mixed lnhoma ln`var' diet_0 i.batch female age cl_dxa_fat || id:
    
    // POST COEFFICIENT OF ln_metabolite
    post handle ("`var'") (_b[lnhoma:ln_metabolite]) (_se[lnhoma:ln_metabolite])
}

postclose handle

use `results', clear
gen t = b_ln_metabolite/se_ln_metabolite
gen p = 2*normal(-abs(t))

Stata returned the following error:

[lnhoma:ln_metabolite] not found
post: above message corresponds to expression 2, variable b_ln_metabolite

Clearly the issue is in the last line of the loop, but I am unable to correct it.

Last edited by SL Navarro; 02 Oct 2018, 10:35.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 29969
#12

02 Oct 2018, 11:46

Sorry. Where it says ln_metabolite in that -post handle- command, it should be ln`var' (for both the _b[] and _se[]).
Comment
SL Navarro

Join Date: Jun 2016

Posts: 23
#13

02 Oct 2018, 14:11

I tried changing that before responding, but Stata is still returning the same error:

[lnhoma:ln_cytidine] not found
post: above message corresponds to expression 2, variable b_ln_metabolite
r(111);

I don't think I need to make any changes in the -postfile- line as that just indicates the variables as I want them labeled on my spreadsheet (or maybe I'm wrong, but I tried that also) and if I try to run the code in chunks, the error occurs before running the command -postclose handle-

Any other thoughts on what I'm missing?
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 29969

#14

02 Oct 2018, 15:31

Well, you must have changed something because

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long id byte(batch period diet_seq diet_0a_1b female) float age byte cl_dxa_fat long cytidine int deoxycarnitine long glucose float homa
620001 1 0 0 0 0 25.72 0 11263 20168 26866  .666667
620001 1 1 0 1 0 25.72 0 14233 19569 19915  .974815
620002 2 1 1 0 1 36.54 0 43111 19375 38339      1.7
620002 2 0 1 1 1 36.54 0 29291 26338 13160 1.493333
620003 3 1 1 0 1  31.9 1 16086 12830 49021 1.167901
620003 3 0 1 1 1  31.9 1 19183 14365 13641 1.499259
620004 4 0 0 0 0 24.11 0 11345 15215 13653 3.063704
620004 4 1 0 1 0 24.11 0 11758 16005 14968 1.258272
620005 5 0 0 0 1 35.37 1 31722 13299 24488 1.415556
620005 5 1 0 1 1 35.37 1 17520 14677 11802 1.768148
end

capture postutil clear
tempfile results
gen lnhoma=ln(homa)
postfile handle str32 metabolite float b_ln_metabolite se_ln_metabolite using `results'

foreach var of varlist cytidine-glucose {

    // CREATE LOG TRANSFORMED VARIABLES
    gen ln`var'=ln(`var')
        
    // RUN MIXED EFFECTS REGRESSION
    mixed lnhoma ln`var' diet_0 i.batch female age cl_dxa_fat || id:
    
    // POST COEFFICIENT OF ln_metabolite
    post handle ("`var'") (_b[lnhoma:ln`var']) (_se[lnhoma:ln`var'])
}

postclose handle

use `results', clear
gen t = b_ln_metabolite/se_ln_metabolite
gen p = 2*normal(-abs(t))

run without errors on my setup.

I think you may have used ln_`var' where I suggested you use ln`var'. The code does not create a variable ln_cytidine, but it does create lncytidine.

Comment

SL Navarro

Join Date: Jun 2016

Posts: 23
#15

02 Oct 2018, 16:31

Yup. You were correct, I left in the underscore. If I had a dime for every minute I spent on minor syntax errors... Many thanks, Clyde.
Comment

Announcement