Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Storing and retreiving beta coefficients and p values from a loop

    I have the following loop and am trying to export the beta coefficients and p values for each variable into a csv file. I have tried searching various forums and cannot get anything that works. I realize I need to store the information in a matrix and then export and am not sure where my problem is.

    My code (simplified) is as follows:

    foreach var of varlist metabolite1 metabolite2 metabolite3... {
    mixed `var' diet sex age baseline || id:
    }

    I would like 3 columns in a csv file that include: metabolites, beta coefficients and p values

    What do I need to add to my basic code to achieve that?

    Thanks in advance.
    Sandi

  • #2
    Not tested, by try this:

    Code:
    gen long obs_no = _n
    rename metabolite* outcome*
    reshape long outcome, i(obs_no) j(metabolite)
    statsby, clear by(metabolite): mixed outcome diet sex age baseline || id:
    foreach x in diet sex age baseline {
        gen t_`x' = _b_`x'/_se_`x'
        gen p_`x' = 2*normal(-abs(t_`x'))
    }
    export delimited using my_results, replace
    I think it will give you what you want.
    Last edited by Clyde Schechter; 21 Jun 2016, 13:37.

    Comment


    • #3
      Thank you for your time and the suggestion. I've tried to adapt your code to mine but keep getting errors. Perhaps I've simplified my code too much. Here is my actual code (and I'm sure there is a better way to build this loop, I just haven't worked with loops much). One of the issues I have is that I need to adjust for baseline values of each metabolite within each separate regression and the code I have now works for that purpose (using the ln`var' and lnbase`var' grabs, for example, both cytidine and basecytidine the way I have it set up, and likewise with the rest of the metabolites). Renaming leads to too many characters for the baseline (I think) and if I try to shorten it, I get an error that there are no xij variables found.

      foreach var of varlist cytidine deoxycarnitine spermidine propionate gdp n2n2dimethylguanosine ///
      reducedglutathione aminoisobutyrate melatonin nacetylneuraminate homoserineds xanthosine ///
      glycerol3p ohphenylpyruvate uracil adenylosuccinate methylhistamine adenine cadaverine ///
      oxidizedglutathione linoleicacid epinephrinenorm margaricacid lactose hode sarcosine sorbitol ///
      uridine adipicacid glyceraldehyde oxalicacid lkynurenine myristicacid glutaricacid oxypurinol ///
      niacinamide kynurenate asparticacid sucrose ascorbicacid amp erythrose fumaricacid guanidinoacetate ///
      cmp pentothenate trimethylamine hydroxybutyrate glycochenodeoxycholate glycerate cystamine ///
      anthranilate xanthurenate methyladenosine phosphoglycericacid dsuccinate glycine allantoin ///
      pyroglutamicacid biotin adp citraconicacid xanthine benzoicacid dimethylglycine acetylcholine ///
      azelaicacid dleucicacid malonicacid isovalericacid aconitate linolenicacid hydroxyproline nacetylglycine ///
      pyruvate hydroxyglutarate asparagine inositol hypoxanthine methylmalonate methionine succinate methylhistidine ///
      cystine ornithine tyrosine ketoglutaricacid hyppuricacid serine valine pyridoxicacid taurine pipecolate ///
      hydroxyisovalericacid creatine glutamicacid oxalacetate creatinine acetylcarnitine threonine carnitine ///
      arginine trimethylaminenoxide tryptophan alanine isoleucine choline aminovalericacid citrulline betaine ///
      shikimicacid lysine hba histidine leucine proline lactate phenylalanine urate glutamine glucose {
      gen ln`var'=ln(`var')
      gen lnbase`var'=ln(base`var')
      mixed ln`var' diet batch dietseq period sex age bodyfat lnbase`var' || id:
      }

      Comment


      • #4
        I assume

        mixed lncytidine diet batch dietseq period sex age bodyfat lnbasecytidine || id:

        runs. If not, fix it first and that will tell you what is wrong in your loop.

        Your example is needlessly long. It makes no difference whether the varlist is 2 variables or 50 since you're using them one at a time.
        To debug this kind of thing, start with one or two variables in your varlist. Add a summary statement after your two generate statements to make sure you're getting non-zero non-missing values. If that doesn't show you the problem, you can add set trace on to see how Stata is interpreting your variables.

        Comment


        • #5
          So, your actual dependent variables do not have systematic names, and in attempting to make them systematic but still retain mnemonic value you are hitting problems. OK. So let's use a different approach:

          Code:
          capture postutil clear
          tempfile results
          postfile handle str32 metabolite float batch se_batch diet se_diet dietsq se_dietsq period se_period ///
              sex se_sex age se_age bodyfat se_bodyfat baseline se_baseline using `results'
          
          foreach var of varlist cytidine deoxycarnitine spermidine propionate gdp n2n2dimethylguanosine ///
              reducedglutathione aminoisobutyrate melatonin nacetylneuraminate homoserineds xanthosine ///
              glycerol3p ohphenylpyruvate uracil adenylosuccinate methylhistamine adenine cadaverine ///
              oxidizedglutathione linoleicacid epinephrinenorm margaricacid lactose hode sarcosine sorbitol ///
              uridine adipicacid glyceraldehyde oxalicacid lkynurenine myristicacid glutaricacid oxypurinol ///
              niacinamide kynurenate asparticacid sucrose ascorbicacid amp erythrose fumaricacid guanidinoacetate ///
              cmp pentothenate trimethylamine hydroxybutyrate glycochenodeoxycholate glycerate cystamine ///
              anthranilate xanthurenate methyladenosine phosphoglycericacid dsuccinate glycine allantoin ///
              pyroglutamicacid biotin adp citraconicacid xanthine benzoicacid dimethylglycine acetylcholine ///
              azelaicacid dleucicacid malonicacid isovalericacid aconitate linolenicacid hydroxyproline nacetylglycine ///
              pyruvate hydroxyglutarate asparagine inositol hypoxanthine methylmalonate methionine succinate methylhistidine ///
              cystine ornithine tyrosine ketoglutaricacid hyppuricacid serine valine pyridoxicacid taurine pipecolate ///
              hydroxyisovalericacid creatine glutamicacid oxalacetate creatinine acetylcarnitine threonine carnitine ///
              arginine trimethylaminenoxide tryptophan alanine isoleucine choline aminovalericacid citrulline betaine ///
              shikimicacid lysine hba histidine leucine proline lactate phenylalanine urate glutamine glucose {
          
              // CREATE LOG TRANSFORMED VARIABLES
              gen ln`var'=ln(`var')
              gen lnbase`var'=ln(base`var')
              
              // RUN MIXED EFFECTS REGRESSION
              mixed ln`var' c.diet##c.diet batch  period sex age bodyfat lnbase`var' || id:
              
              // LOOP OVER REGRESSORS TO CREATE THE MATERIAL TO POST
              local topost ("`var'")
              foreach x in diet c.diet#c.diet batch period sex age bodyfat lnbase`var' {
                  local topost `topost' (_b[ln`var':`x']) (_se[ln`var':`x'])
              }
              
              // POST IT
              post handle `topost'
          }
          
          postclose handle
          
          use `results', clear
          
          foreach v of varlist batch diet dietsq period sex age bodyfat baseline{
             gen t_`v' = `v'/se_`v'
             gen p_`v' = 2*normal(-abs(t_`v'))
             order t_`v' p_`v', after(se_`v')
          }
          should do the trick. Again, not tested, and there may be some typos or other errors, but this is the gist of it.

          Now, something has caught my eye here about your modeling. It appears that you have repeated observations on a series of people (or animals?), and you also have baseline data on your outcome variables. You are using a mixed-effects regression, and you are including the baseline value (log transformed) as one of the regressors. If your data set includes a separate baseline observation, then you are double-adjusting for the baseline value by both including it as a regressor and also including the baseline observation in the estimation sample. In fact, if you think about what the baseline observation looks like: the outcome variable is identical to the final regressor, you will see that this cannot end well. So, if this is your situation, you either have to drop the baseline regressor, or you have to exclude the baseline observations from the estimation sample. Either way is largely acceptable: inclusion of the baseline observation does in fact cause the analysis to be adjusted for the baseline value.

          The two approaches are algebraically equivalent, although they require different interpretations of their results. In the analysis where you omit the baseline value regressor and include all observations, you get the unvarnished effects of batch, diet, sex, etc. and you can separately calculate the intraclass correlation of your outcome variable from the variance component estimates at the end of the -mixed- output if you want it. If you omit the baseline observation and include the baseline value regressor, the coefficients of your predictor variables are the effects attenuated by the intraclass correlation coefficient. And the latter is not correctly estimated anywhere in the model, so you cannot readily disentangle the effects from the measurement reliability. In general, I prefer the approach that separates the effects from the intraclass correlation (i.e., use the baseline observation but not the baseline regressor), but that is largely a matter of preference; some people feel that the attenuated estimates are better because they account for measurement unreliability.

          If you decide to remove the regressor and include the baseline observation in the estimation sample, consider adjusting for heteroscedasticity using the -residuals()- option, as the baseline observation is sometimes results of pre-study conditions that are more variable than the post-study conditions.

          Well, I've gone on at length here about a problem you may not even have. But do check that you are not double-adjusting for baseline.

          Comment


          • #6
            Thank you, Clyde, that worked, it ran from start to finish without error. Still one small issue outstanding. Getting my results into a csv file.

            1.) This worked, but I now need to export the values. I used your code from the above post "export delimited using my_results, replace" but get an error message that the file is not found. Do I need to do something else to get this information into a csv file?

            2.) To answer your first question above, my entire code ran without error as well, I just could not figure out how to get the beta coefficients and p-values into a document so that I wouldn't have to copy/paste them (too error prone and time-consuming).

            3.) Regarding the repeated measures, yes, there are 4 measures per person. One at baseline of each diet and one at the end of each diet. I have my data set up as two lines per person such that the metabolites at day 0 and end of intervention are in the same row and contain different names, e.g., "cytidine" and "basecytidine" and different values. I'm not sure what you mean by double-adjusting. I think what you mean is that I have "cytidine" in the model and then "basecytidine" again as a regressor. The way that I have set up my model, I know it is grabbing these two separate values cytidine and basecytidine because I tested it in single models. I have worked with a statistician on this and followed his code from SAS. I just prefer to work in Stata.

            4.) You are an extremely patient and helpful person and I am grateful for your time and expertise.

            -Sandi

            Comment


            • #7
              1. Sometimes Stata's error messages are a bit off point. There is no need for -export delimited- to "find" a file: it has to create one. Perhaps what it was really trying to tell you that it cannot open a file in the current directory. Perhaps you don't have write privileges for that one? Perhaps your working directory is not the one you are supposed to save your files in? Other than that, I can't think of any reason why that command should not work properly.

              3. Ah, so your data are different from what I thought, and what you are doing does not involve any double-adjustment. I gave up using SAS almost as soon as I became acquainted with Stata in 1994, and have never looked back. It would be a vast understatement to say that I, too, prefer working in Stata.

              Comment


              • #8
                I found the file - user error, no directory issues. I have all I need in a tidy csv file. Many thanks Clyde!!

                Comment


                • #9
                  I have been using variations of the code using the "postfile" command provided above for some time without issue. Now I have a need to swap the outcome and independent variables and cannot figure out how to alter the code to get the output exported (I have no issues with the model itself, just can't get it to export). Note that both the outcome and independent variables are continuous in this instance, and the independent variables (n=~120) do not have a systematic naming scheme. I am using Stata v 15.1 for windows. My old code is below, and the new model without all of the code pertaining to the postfile command is beneath. All I really need are the coefficients and p values for the association between homa and ln`var'. Any input would be much appreciated.

                  Code:

                  capture postutil clear
                  tempfile bcresults
                  postfile handle str32 metabolite float diet se_diet batch se_batch diet_seq se_diet_seq period se_period ///
                  female se_female age se_age cl_dxa se_cl_dxa using `bcresults'

                  foreach var of varlist cytidine-glucose {

                  gen ln`var'=ln(`var')

                  mixed ln`var' diet batch diet_seq period female age cl_dxa_fat || id:

                  local topost ("`var'")
                  foreach x in diet batch diet_seq period female age cl_dxa {
                  local topost `topost' (_b[ln`var':`x']) (_se[ln`var':`x'])
                  }

                  post handle `topost'
                  }

                  postclose handle

                  use `bcresults', clear

                  foreach v of varlist diet_0 diet_seq batch period female age cl_dxa {
                  gen t_`v' = `v'/se_`v'
                  gen p_`v' = 2*normal(-abs(t_`v'))
                  order t_`v' p_`v', after(se_`v')
                  }
                  export delimited using my_bcresults, replace


                  The NEW mixed model I need to run is:

                  Code:
                  mixed homa ln`var' diet batch diet_seq period female age cl_dxa_fat || id:

                  Comment


                  • #10
                    Try this:

                    Code:
                    capture postutil clear
                    tempfile results
                    postfile handle str32 metabolite float b_ln_metabolite se_ln_metabolite using `results'
                    
                    foreach var of varlist cytidine deoxycarnitine spermidine propionate gdp n2n2dimethylguanosine ///
                        reducedglutathione aminoisobutyrate melatonin nacetylneuraminate homoserineds xanthosine ///
                        glycerol3p ohphenylpyruvate uracil adenylosuccinate methylhistamine adenine cadaverine ///
                        oxidizedglutathione linoleicacid epinephrinenorm margaricacid lactose hode sarcosine sorbitol ///
                        uridine adipicacid glyceraldehyde oxalicacid lkynurenine myristicacid glutaricacid oxypurinol ///
                        niacinamide kynurenate asparticacid sucrose ascorbicacid amp erythrose fumaricacid guanidinoacetate ///
                        cmp pentothenate trimethylamine hydroxybutyrate glycochenodeoxycholate glycerate cystamine ///
                        anthranilate xanthurenate methyladenosine phosphoglycericacid dsuccinate glycine allantoin ///
                        pyroglutamicacid biotin adp citraconicacid xanthine benzoicacid dimethylglycine acetylcholine ///
                        azelaicacid dleucicacid malonicacid isovalericacid aconitate linolenicacid hydroxyproline nacetylglycine ///
                        pyruvate hydroxyglutarate asparagine inositol hypoxanthine methylmalonate methionine succinate methylhistidine ///
                        cystine ornithine tyrosine ketoglutaricacid hyppuricacid serine valine pyridoxicacid taurine pipecolate ///
                        hydroxyisovalericacid creatine glutamicacid oxalacetate creatinine acetylcarnitine threonine carnitine ///
                        arginine trimethylaminenoxide tryptophan alanine isoleucine choline aminovalericacid citrulline betaine ///
                        shikimicacid lysine hba histidine leucine proline lactate phenylalanine urate glutamine glucose {
                    
                        // CREATE LOG TRANSFORMED VARIABLES
                        gen ln`var'=ln(`var')
                        gen lnbase`var'=ln(base`var')
                        
                        // RUN MIXED EFFECTS REGRESSION
                        mixed homa ln`var' diet batch diet_seq period female age cl_dxa_fat || id:
                        
                        // POST COEFFICIENT OF ln_metabolite
                        post ("`var'") (_b[homa:ln_metabolite]) (_se[homa:ln_metabolite])
                    }
                    
                    postclose handle
                    
                    use `results', clear
                    gen t = b_ln_metabolite/se_ln_metabolite
                    gen p = 2*normal(-abs(t))
                    Note: not tested, as no sample data was provided.

                    In the future, please show your code in between code delimiters, as I have done here and earlier in this thread. It makes it much easier to read. See Forum FAQ#12 for instructions on using code delimiters if you don't know how.

                    Comment


                    • #11
                      Thank you, Clyde. Here is a sample of what my data look like (abbreviating the number of metabolites and samples):
                      id batch period diet_seq diet_0A_1B female age cl_dxa_fat cytidine deoxycarnitine glucose homa
                      620001 1 0 0 0 0 25.72 0 11263 20168 26866 0.666667
                      620001 1 1 0 1 0 25.72 0 14233 19569 19915 0.974815
                      620002 2 1 1 0 1 36.54 0 43111 19375 38339 1.7
                      620002 2 0 1 1 1 36.54 0 29291 26338 13160 1.493333
                      620003 3 1 1 0 1 31.9 1 16086 12830 49021 1.167901
                      620003 3 0 1 1 1 31.9 1 19183 14365 13641 1.499259
                      620004 4 0 0 0 0 24.11 0 11345 15215 13653 3.063704
                      620004 4 1 0 1 0 24.11 0 11758 16005 14968 1.258272
                      620005 5 0 0 0 1 35.37 1 31722 13299 24488 1.415556
                      620005 5 1 0 1 1 35.37 1 17520 14677 11802 1.768148
                      I ran the code with a few alterations (i.e., no longer need to adjust for baseline, etc.)

                      Code:
                      capture postutil clear
                      tempfile results
                      gen lnhoma=ln(homa)
                      postfile handle str32 metabolite float b_ln_metabolite se_ln_metabolite using `results'
                      
                      foreach var of varlist cytidine-glucose {
                      
                          // CREATE LOG TRANSFORMED VARIABLES
                          gen ln`var'=ln(`var')
                              
                          // RUN MIXED EFFECTS REGRESSION
                          mixed lnhoma ln`var' diet_0 i.batch female age cl_dxa_fat || id:
                          
                          // POST COEFFICIENT OF ln_metabolite
                          post handle ("`var'") (_b[lnhoma:ln_metabolite]) (_se[lnhoma:ln_metabolite])
                      }
                      
                      postclose handle
                      
                      use `results', clear
                      gen t = b_ln_metabolite/se_ln_metabolite
                      gen p = 2*normal(-abs(t))
                      Stata returned the following error:

                      [lnhoma:ln_metabolite] not found
                      post: above message corresponds to expression 2, variable b_ln_metabolite

                      Clearly the issue is in the last line of the loop, but I am unable to correct it.
                      Last edited by SL Navarro; 02 Oct 2018, 10:35.

                      Comment


                      • #12
                        Sorry. Where it says ln_metabolite in that -post handle- command, it should be ln`var' (for both the _b[] and _se[]).

                        Comment


                        • #13
                          I tried changing that before responding, but Stata is still returning the same error:

                          [lnhoma:ln_cytidine] not found
                          post: above message corresponds to expression 2, variable b_ln_metabolite
                          r(111);

                          I don't think I need to make any changes in the -postfile- line as that just indicates the variables as I want them labeled on my spreadsheet (or maybe I'm wrong, but I tried that also) and if I try to run the code in chunks, the error occurs before running the command -postclose handle-

                          Any other thoughts on what I'm missing?

                          Comment


                          • #14
                            Well, you must have changed something because
                            Code:
                            * Example generated by -dataex-. To install: ssc install dataex
                            clear
                            input long id byte(batch period diet_seq diet_0a_1b female) float age byte cl_dxa_fat long cytidine int deoxycarnitine long glucose float homa
                            620001 1 0 0 0 0 25.72 0 11263 20168 26866  .666667
                            620001 1 1 0 1 0 25.72 0 14233 19569 19915  .974815
                            620002 2 1 1 0 1 36.54 0 43111 19375 38339      1.7
                            620002 2 0 1 1 1 36.54 0 29291 26338 13160 1.493333
                            620003 3 1 1 0 1  31.9 1 16086 12830 49021 1.167901
                            620003 3 0 1 1 1  31.9 1 19183 14365 13641 1.499259
                            620004 4 0 0 0 0 24.11 0 11345 15215 13653 3.063704
                            620004 4 1 0 1 0 24.11 0 11758 16005 14968 1.258272
                            620005 5 0 0 0 1 35.37 1 31722 13299 24488 1.415556
                            620005 5 1 0 1 1 35.37 1 17520 14677 11802 1.768148
                            end
                            
                            capture postutil clear
                            tempfile results
                            gen lnhoma=ln(homa)
                            postfile handle str32 metabolite float b_ln_metabolite se_ln_metabolite using `results'
                            
                            foreach var of varlist cytidine-glucose {
                            
                                // CREATE LOG TRANSFORMED VARIABLES
                                gen ln`var'=ln(`var')
                                    
                                // RUN MIXED EFFECTS REGRESSION
                                mixed lnhoma ln`var' diet_0 i.batch female age cl_dxa_fat || id:
                                
                                // POST COEFFICIENT OF ln_metabolite
                                post handle ("`var'") (_b[lnhoma:ln`var']) (_se[lnhoma:ln`var'])
                            }
                            
                            postclose handle
                            
                            use `results', clear
                            gen t = b_ln_metabolite/se_ln_metabolite
                            gen p = 2*normal(-abs(t))
                            run without errors on my setup.

                            I think you may have used ln_`var' where I suggested you use ln`var'. The code does not create a variable ln_cytidine, but it does create lncytidine.

                            Comment


                            • #15
                              Yup. You were correct, I left in the underscore. If I had a dime for every minute I spent on minor syntax errors... Many thanks, Clyde.

                              Comment

                              Working...
                              X