Create multiple new variables using postfile

David Wang

Join Date: Jul 2018

Posts: 30
#1

Create multiple new variables using postfile

25 Jul 2018, 15:53

Hello,

I am trying to create a new dataset from an output using postfile command. Specifically, I want to create variables cat1_ave as the mean of cat1, cat2_ave as the mean of cat2, until cat10_ave. I can successfully create the first two columns (i.e., year and cat1_ave). However, the remaining columns do not work. They are still the average of the cat1. I wondered how to solve this problem? I cannot figure it out. Below is my syntax.

capture postutil clear
postfile ms_year1 year cat1_ave cat2_ave cat3_ave ///
cat4_ave cat5_ave cat6_ave ///
cat7_ave cat8_ave cat9_ave cat10_ave ///
using ms_year1.dta, replace

forvalues yr = 2010/2017 {
sum cat1-cat10 if (year==`yr' & it=="ms")
local var "`k'"
post ms_year1 (`yr') (`r(mean)') (`r(mean)') (`r(mean)') ///
(`r(mean)') (`r(mean)') (`r(mean)') (`r(mean)') (`r(mean)') (`r(mean)') (`r(mean)')
}
postclose ms_year1

Many thanks,
David
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30101
#2

25 Jul 2018, 18:50

When you -summarize- a single variable, r(mean) contains the mean of that variable. When you -summarize- a whole list of variables, r(mean) contains the mean of only the last variable in the list. You have, in your code, set all of cat1_ave through cat10_ave to the mean value of cat1.

So what you need to do is add one mean at a time in a loop, and then post the whole thing. Something like this:

Code:

forvalues yr = 2010/2017 { local topost (`yr') forvalues i = 1/10 { sum cat1`i' if (year==`yr' & it=="ms") // local var "`k'" // YOU NEVER USE THIS AND IT DOESN"T APPEAR TO MAKE SENSE // AS LOCAL MACRO k IS NEVER DEFINED IN THIS CODE local topost `topost' (`r(mean)') } post ms_year1 `topost' }

As no sample data was provided, this code is not tested and may contain typos or other errors, but the gist of it is how you approach this kind of thing in general.

That said, this seems an unnecessarily complicated approach. Why not just do this?

Code:

preserve collapse (mean) cat1-cat10, by(yr) rename cat* cat*_ave save ms_year1, replace restore

In the future, when showing code, please wrap it inside code delimiters so that indentation and other formatting will be preserved. The resulting code will be much easier for people who want to help you to read. If you are not familiar with code delimiters, please read FAQ #12 for information about them.
1 like
Comment

David Wang

Join Date: Jul 2018
Posts: 30

26 Jul 2018, 10:50

Thanks much, Clyde!

The collapse approach is promising. I have two follow-up questions about this approach. (1) I also want to calculate the mean within each year by the variable "it", and I tried to add the second by(it), but it does not work. (2) some years have missing values. When using the collapse approach, a couple of years with missing values do not show up in the generated "ms_year1" dataset. I was wondering how to display every year even though there are missing values in that year?

Regarding the forvalues and post approach, the codes look correct. However, the output says "invalid syntax post: above message corresponds to expression 2, variable cat1_ave". and no means were generated. I am not sure why it is now working?

This is my first post at statalist.org. I will wrap it inside the code delimiters.

sample data:
code:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str3 it year cat1-cat10
ms    2010    0.278    0.236    0.750    0.072    0.385    0.704    0.932    0.742    0.884    0.591
ms    2011    0.004    0.626    0.018    0.653    0.508    0.071    0.458    0.292    0.672    0.567
ms    2012    0.311    0.675    0.271    0.239    0.238    0.339    0.261    0.575    0.820    0.046
ms    2013    0.815    0.002    0.705    0.918    0.875    0.739    0.295    0.915    0.178    0.349
ms    2014    0.817    0.825    0.450    0.963    0.027    0.968    0.274    0.245    0.843    0.007
ms    2015    0.518    0.867    0.475    0.909    0.340    0.146    0.563    0.939    0.133    0.821
ms    2016    0.554    0.910    0.451    0.754    0.090    0.110    0.598    0.015    0.720    0.914
ms    2017    0.109    0.345    0.716    0.862    0.909    0.167    0.215    0.600    0.822    0.219
mp    2010    0.737    0.397    0.038    .    .    0.461    0.111    .    .    .
mp    2011    0.664    0.669    0.172    0.546    0.095    0.931    0.893    0.619    0.252    0.262
mp    2012    0.649    0.495    0.567    0.354    0.954    0.029    0.256    0.144    0.836    0.932
mp    2013    0.787    0.947    0.232    0.392    0.536    0.767    0.218    0.306    0.068    0.578
mp    2014    0.143    0.747    0.787    0.890    0.815    0.558    0.936    0.754    0.838    0.988
mp    2015    0.868    0.910    0.743    0.925    0.375    0.649    0.321    0.533    0.971    0.522
mp    2016    0.138    0.935    0.777    0.822    0.046    0.886    0.325    0.888    0.539    0.044
mp    2017    0.189    0.285    0.480    0.209    0.750    0.697    0.120    0.969    0.236    0.162
mp    2012    0.260    0.607    0.747    0.767    0.894    0.459    0.471    0.016    0.275    0.637
mc    2010    0.704    0.570    0.581    0.063    0.802    0.468    0.900    0.632    0.065    0.580
ms    2011    0.531    0.519    0.706    .    0.833    0.869    0.963    .    0.477    .
mc    2012    0.537    0.048    0.517    0.612    0.759    0.748    0.134    0.612    0.995    0.555
mc    2013    0.490    0.742    0.354    0.529    0.587    0.693    0.141    0.247    0.634    0.627
mc    2014    0.866    0.251    0.261    0.006    0.400    0.986    0.330    0.562    0.355    0.431
mc    2015    0.669    0.898    0.898    0.366    0.924    0.368    0.001    0.905    0.879    0.632
mc    2016    0.251    0.772    0.488    0.608    0.422    0.760    0.430    0.852    0.327    0.507
mc    2017    0.543    0.716    0.129    0.968    0.553    0.083    0.197    0.277    0.550    0.099
end

Thanks,
David

Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30101

26 Jul 2018, 13:02

David, thanks for the -dataex-. But you still need to ask your question more clearly. You don't show any code, so I have no way to know what you did. You don't explain what you mean by "does not work." There are thousands of ways code can "not work." You have to show what Stata actually gave you as output (including error messages). And, if it isn't obvious, you need to then explain how it's not what you were expecting.

That said, here is some code using postfile that works with your most recent example data.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str3 it year cat1-cat10
ms    2010    0.278    0.236    0.750    0.072    0.385    0.704    0.932    0.742    0.884    0.591
ms    2011    0.004    0.626    0.018    0.653    0.508    0.071    0.458    0.292    0.672    0.567
ms    2012    0.311    0.675    0.271    0.239    0.238    0.339    0.261    0.575    0.820    0.046
ms    2013    0.815    0.002    0.705    0.918    0.875    0.739    0.295    0.915    0.178    0.349
ms    2014    0.817    0.825    0.450    0.963    0.027    0.968    0.274    0.245    0.843    0.007
ms    2015    0.518    0.867    0.475    0.909    0.340    0.146    0.563    0.939    0.133    0.821
ms    2016    0.554    0.910    0.451    0.754    0.090    0.110    0.598    0.015    0.720    0.914
ms    2017    0.109    0.345    0.716    0.862    0.909    0.167    0.215    0.600    0.822    0.219
mp    2010    0.737    0.397    0.038    .    .    0.461    0.111    .    .    .
mp    2011    0.664    0.669    0.172    0.546    0.095    0.931    0.893    0.619    0.252    0.262
mp    2012    0.649    0.495    0.567    0.354    0.954    0.029    0.256    0.144    0.836    0.932
mp    2013    0.787    0.947    0.232    0.392    0.536    0.767    0.218    0.306    0.068    0.578
mp    2014    0.143    0.747    0.787    0.890    0.815    0.558    0.936    0.754    0.838    0.988
mp    2015    0.868    0.910    0.743    0.925    0.375    0.649    0.321    0.533    0.971    0.522
mp    2016    0.138    0.935    0.777    0.822    0.046    0.886    0.325    0.888    0.539    0.044
mp    2017    0.189    0.285    0.480    0.209    0.750    0.697    0.120    0.969    0.236    0.162
mp    2012    0.260    0.607    0.747    0.767    0.894    0.459    0.471    0.016    0.275    0.637
mc    2010    0.704    0.570    0.581    0.063    0.802    0.468    0.900    0.632    0.065    0.580
ms    2011    0.531    0.519    0.706    .    0.833    0.869    0.963    .    0.477    .
mc    2012    0.537    0.048    0.517    0.612    0.759    0.748    0.134    0.612    0.995    0.555
mc    2013    0.490    0.742    0.354    0.529    0.587    0.693    0.141    0.247    0.634    0.627
mc    2014    0.866    0.251    0.261    0.006    0.400    0.986    0.330    0.562    0.355    0.431
mc    2015    0.669    0.898    0.898    0.366    0.924    0.368    0.001    0.905    0.879    0.632
mc    2016    0.251    0.772    0.488    0.608    0.422    0.760    0.430    0.852    0.327    0.507
mc    2017    0.543    0.716    0.129    0.968    0.553    0.083    0.197    0.277    0.550    0.099
end
tempfile results
capture postutil clear
postfile ms_year1 str3 it yr cat1_ave cat2_ave cat3_ave cat4_ave cat5_ave cat6_ave ///
    cat7_ave cat8_ave cat9_ave cat10_ave using `results'
levelsof it, local(its)
foreach it of local its {
    forvalues yr = 2010/2017 {
        local topost ("`it'") (`yr')
        forvalues i = 1/10 {
            sum cat`i' if (year==`yr' & it=="ms")
    //        local var "`k'" // YOU NEVER USE THIS AND IT DOESN"T APPEAR TO MAKE SENSE
                            // AS LOCAL MACRO k IS NEVER DEFINED IN THIS CODE
            local topost `topost' (`r(mean)')
        }
        post ms_year1 `topost'
    }
}
postclose ms_year1
use `results', clear

If you want me to troubleshoot the -collapse- version, post back with the actual code you tried and show the actual results you got, and I'll give it a try.

Comment

David Wang

Join Date: Jul 2018
Posts: 30

26 Jul 2018, 17:47

Thanks a lot, Clyde. I think I did not explain my question clearly. What I want to do is to list the average of cat1-cat10 by year and by it. Because my original codes writes "it=="ms", the generated outputs list only "ms". In other words, even though in the "it" and "yr" columns display "mc", "mp" and each year nicely, the observations from cat1 to cat10 are "ms"'s corresponding means. Thus, I tweaked that part a little bit. But it is not working. The error message says: "invalid syntax post: above message corresponds to expression 3, variable cat1_ave".

Below are codes that are not working.

Code:

tempfile results
capture postutil clear
postfile results str3 it yr cat1_ave cat2_ave cat3_ave cat4_ave cat5_ave cat6_ave ///
    cat7_ave cat8_ave cat9_ave cat10_ave using "C:\Users\Desktop\results.dta"

levelsof it, local(its)
foreach it of local its {
    forvalues yr = 2010/2017 {
        local topost ("`it'") (`yr')
        forvalues i = 1/10 {
            sum cat`i' if (year==`yr' & it=="`its'")
            local topost `topost' (`r(mean)')
        }
        post results `topost'
    }
}
postclose results

Here is the sample data:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str3 it year cat1-cat10
ms    2010    0.278    0.236    0.750    0.072    0.385    0.704    0.932    0.742    0.884    0.591
ms    2010    0.209    0.891    0.467    0.882    0.209    0.651    0.227    0.596    0.395    0.597
ms    2010    0.187    0.761    0.739    0.748    0.561    0.275    0.244    0.245    0.937    0.215
ms    2011    0.004    0.626    0.018    0.653    0.508    0.071    0.458    0.292    0.672    0.567
ms    2012    0.311    0.675    0.271    0.239    0.238    0.339    0.261    0.575    0.820    0.046
ms    2013    0.815    0.002    0.705    0.918    0.875    0.739    0.295    0.915    0.178    0.349
ms    2014    0.817    0.825    0.450    0.963    0.027    0.968    0.274    0.245    0.843    0.007
ms    2015    0.518    0.867    0.475    0.909    0.340    0.146    0.563    0.939    0.133    0.821
ms    2015    0.434    0.563    0.203    0.591    0.398    0.547    0.058    0.607    0.191    0.212
ms    2016    0.554    0.910    0.451    0.754    0.090    0.110    0.598    0.015    0.720    0.914
ms    2017    0.109    0.345    0.716    0.862    0.909    0.167    0.215    0.600    0.822    0.219
ms    2017    0.717    0.026    0.008    0.534    0.246    0.896    0.915    0.827    0.326    0.591
mp    2010    0.737    0.397    0.038    .    .    0.461    0.111    .    .    .
mp    2011    0.664    0.669    0.172    0.546    0.095    0.931    0.893    0.619    0.252    0.262
mp    2012    0.649    0.495    0.567    0.354    0.954    0.029    0.256    0.144    0.836    0.932
mp    2013    0.787    0.947    0.232    0.392    0.536    0.767    0.218    0.306    0.068    0.578
mp    2014    0.143    0.747    0.787    0.890    0.815    0.558    0.936    0.754    0.838    0.988
mp    2015    0.868    0.910    0.743    0.925    0.375    0.649    0.321    0.533    0.971    0.522
mp    2016    0.138    0.935    0.777    0.822    0.046    0.886    0.325    0.888    0.539    0.044
mp    2017    0.189    0.285    0.480    0.209    0.750    0.697    0.120    0.969    0.236    0.162
mp    2012    0.260    0.607    0.747    0.767    0.894    0.459    0.471    0.016    0.275    0.637
mc    2010    0.704    0.570    0.581    0.063    0.802    0.468    0.900    0.632    0.065    0.580
ms    2011    0.531    0.519    0.706    .    0.833    0.869    0.963    .    0.477    .
mc    2012    0.537    0.048    0.517    0.612    0.759    0.748    0.134    0.612    0.995    0.555
mc    2012    0.427    0.754    0.667    0.005    0.510    0.851    0.218    0.131    0.190    0.971
mc    2012    0.828    0.208    0.032    0.655    .    0.628    0.536    0.056    0.702    0.045
mc    2013    0.490    0.742    0.354    0.529    0.587    0.693    0.141    0.247    0.634    0.627
mc    2014    0.866    0.251    0.261    0.006    0.400    0.986    0.330    0.562    0.355    0.431
mc    2015    0.669    0.898    0.898    0.366    0.924    0.368    0.001    0.905    0.879    0.632
mc    2016    0.251    0.772    0.488    0.608    0.422    0.760    0.430    0.852    0.327    0.507
mc    2017    0.543    0.716    0.129    0.968    0.553    0.083    0.197    0.277    0.550    0.099
end

Regarding the -collapse- version, I really think it is a fantastic approach. Below is the code that I tried. Not sure how to break it down by "it" within each year.

Code:

preserve
collapse (mean) cat1-cat10, by(yr) by(it)
rename cat* cat*_ave
save ms_collapse, replace
restore

Many thanks,
David

Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30101

26 Jul 2018, 18:46

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str3 it year cat1-cat10
ms    2010    0.278    0.236    0.750    0.072    0.385    0.704    0.932    0.742    0.884    0.591
ms    2010    0.209    0.891    0.467    0.882    0.209    0.651    0.227    0.596    0.395    0.597
ms    2010    0.187    0.761    0.739    0.748    0.561    0.275    0.244    0.245    0.937    0.215
ms    2011    0.004    0.626    0.018    0.653    0.508    0.071    0.458    0.292    0.672    0.567
ms    2012    0.311    0.675    0.271    0.239    0.238    0.339    0.261    0.575    0.820    0.046
ms    2013    0.815    0.002    0.705    0.918    0.875    0.739    0.295    0.915    0.178    0.349
ms    2014    0.817    0.825    0.450    0.963    0.027    0.968    0.274    0.245    0.843    0.007
ms    2015    0.518    0.867    0.475    0.909    0.340    0.146    0.563    0.939    0.133    0.821
ms    2015    0.434    0.563    0.203    0.591    0.398    0.547    0.058    0.607    0.191    0.212
ms    2016    0.554    0.910    0.451    0.754    0.090    0.110    0.598    0.015    0.720    0.914
ms    2017    0.109    0.345    0.716    0.862    0.909    0.167    0.215    0.600    0.822    0.219
ms    2017    0.717    0.026    0.008    0.534    0.246    0.896    0.915    0.827    0.326    0.591
mp    2010    0.737    0.397    0.038    .    .    0.461    0.111    .    .    .
mp    2011    0.664    0.669    0.172    0.546    0.095    0.931    0.893    0.619    0.252    0.262
mp    2012    0.649    0.495    0.567    0.354    0.954    0.029    0.256    0.144    0.836    0.932
mp    2013    0.787    0.947    0.232    0.392    0.536    0.767    0.218    0.306    0.068    0.578
mp    2014    0.143    0.747    0.787    0.890    0.815    0.558    0.936    0.754    0.838    0.988
mp    2015    0.868    0.910    0.743    0.925    0.375    0.649    0.321    0.533    0.971    0.522
mp    2016    0.138    0.935    0.777    0.822    0.046    0.886    0.325    0.888    0.539    0.044
mp    2017    0.189    0.285    0.480    0.209    0.750    0.697    0.120    0.969    0.236    0.162
mp    2012    0.260    0.607    0.747    0.767    0.894    0.459    0.471    0.016    0.275    0.637
mc    2010    0.704    0.570    0.581    0.063    0.802    0.468    0.900    0.632    0.065    0.580
ms    2011    0.531    0.519    0.706    .    0.833    0.869    0.963    .    0.477    .
mc    2012    0.537    0.048    0.517    0.612    0.759    0.748    0.134    0.612    0.995    0.555
mc    2012    0.427    0.754    0.667    0.005    0.510    0.851    0.218    0.131    0.190    0.971
mc    2012    0.828    0.208    0.032    0.655    .    0.628    0.536    0.056    0.702    0.045
mc    2013    0.490    0.742    0.354    0.529    0.587    0.693    0.141    0.247    0.634    0.627
mc    2014    0.866    0.251    0.261    0.006    0.400    0.986    0.330    0.562    0.355    0.431
mc    2015    0.669    0.898    0.898    0.366    0.924    0.368    0.001    0.905    0.879    0.632
mc    2016    0.251    0.772    0.488    0.608    0.422    0.760    0.430    0.852    0.327    0.507
mc    2017    0.543    0.716    0.129    0.968    0.553    0.083    0.197    0.277    0.550    0.099
end

//    HERE'S HOW TO DO IT WITH A TEMPFILE
tempfile results
capture postutil clear
postfile handle str3 it yr cat1_ave cat2_ave cat3_ave cat4_ave cat5_ave cat6_ave ///
    cat7_ave cat8_ave cat9_ave cat10_ave using `results'

levelsof it, local(its)
foreach it of local its {
    forvalues yr = 2010/2017 {
        local topost ("`it'") (`yr')
        forvalues i = 1/10 {
            sum cat`i' if (year==`yr' & it=="`it'")
            if `r(N)' {
                local topost `topost' (`r(mean)')
            }
            else {
                local topost `topost' (.)
            }
        }
        post handle `topost'
    }
}
postclose handle

//    HERE'S HOW TO DO IT WITH COLLAPSE
preserve
tempfile results
collapse (mean) cat1-cat10, by(year it)
rename cat* cat*_ave
fillin it year
save `results', replace
restore

use `results', clear
isid it yr, sort

So the above is how to do it both ways. (Do only one or the other, of course.) Changes to your code are in bold italics. I gather that you want a row of missing cat*_ave values for combinations of it and year that do not exist in your data. That's what that -fillin- command does.

Comment

David Wang

Join Date: Jul 2018

Posts: 30
#7

27 Jul 2018, 18:45

Thanks much, Clyde! Both -foreach- and -collapse- approach really works great! You have magic! I think the loop over is too complicated for me. I will stick to the -collapse- approach.

Again, thank you, Clyde!

Best,
David
Comment

Announcement