Changing data format after statsby

Erdem Yilmaz

Join Date: Dec 2017
Posts: 62

Changing data format after statsby

22 Jun 2018, 07:32

I save my regression coefficents and standard errors by using:

Code:

xtset product_code
statsby _b _se, saving(C:\temp\temp_2.dta, replace) : xtreg lnq lnp i.product_code#c.lnp, fe vce(cluster product_code)

Then I get a standard output statsby output as:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(_stat_2 _stat_3 _stat_4 _stat_597 _stat_598 _stat_599)
0 0 -.3286103 0 0 1.3530442e-07
end

I first change variable names to their labels by:

Code:

foreach var of varlist * {
    local lab `: var label `var''
    local lab `: di subinstr("`lab'", "[", "_", .)'
    local lab `: di subinstr("`lab'", "]", "_", .)'
    local lab `: di subinstr("`lab'", "#", "_", .)'
    local lab `: di subinstr("`lab'", ".", "_", .)'
    label var `var' "`lab'"
}

foreach v of varlist _all {
   local x : variable label `v'
   rename `v' `x'
}

Then I get the following output:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(_b_904b_product_code_co_lnp_ _b_905o_product_code_co_lnp_ _b_906_product_code_c_lnp_ _se_904b_product_code_co_lnp_ _se_905o_product_code_co_lnp_ _se_906_product_code_c_lnp_)
0 0 -.3286103 0 0 1.3530442e-07
end

As a final step, I want to create categorical variable for each estimation, and want to add their coefficents and standard errors besides them. So each observation should have name of the coeffcient e.g. _b_904b_product_code_co_lnp_, the coefficent itself e.g. 0 and the standard error e.g. 0.
How could I do it ?

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30104
#2

23 Jun 2018, 12:08

Code:

gen long obs_no = _n reshape long _b_ _se_, i(obs_no) j(coefficient_name) string

Do learn the -reshape- command by reading the corresponding chapter in the PDF documentation. It takes some getting used to, but once mastered, it is among Stata's most useful commands.

I don't quite grasp what you are trying to do here. Why are you using -statsby- when you only run a single regression on the whole data set? Am I missing something?
Comment
Erdem Yilmaz

Join Date: Dec 2017

Posts: 62
#3

25 Jun 2018, 04:30

Hello Clyde
Thank you very much for your answer. Actually I am trying to draw sub-samples from a dataset with around 40 million observations to estimate elasticity and later on cross-elasticity of demand for each product. There are around 30.000 products and my main variable of interests are the coefficients on the interaction terms of product codes and logarithm of price. Since I can not create more than 11.000 variables in a single regression, I want to draw sub-samples and store coefficients for each and every product. Maybe there are more elite ways of doing that
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30104
#4

25 Jun 2018, 08:28

So if I understand your goal correctly, this can be done much faster and more simply with:

Code:

capture program drop one_product program define one_product regress lnq lnp gen b_lnp = _b[lnp] gen se_lnp = _se[lnp] exit end runby one_product, by(product_code) status

I assume that your intend is to get separate regression coefficients for lnq on lnp for each product_code. That seems consistent with what you describe in #3 and is more or less consistent with the code in #1.

To use this, you will need to install -runby-, written by Robert Picard and me, available from SSC.

Note: The above code is not tested, and may contain errors, though I believe it is correct. Note also that you may encounter product codes for which there is only one observation, or where the value of lnq or of lnp is a constant across all observations. In the first case the regression cannot be carried out; and in the second case the variable lnp will be dropped from the model due to colinearity, and _b[lnp] is not defined. In either case, the above code will generate no output for that product code, and the final report from -runby- will show these by-groups as having produced errors.
Comment
Erdem Yilmaz

Join Date: Dec 2017

Posts: 62
#5

26 Jun 2018, 06:20

Dear Clyde,
Thank you very much for you answer, actually I revised my regression to:

Code:

xtset product_code xtreg lnq i.product_code#c.lnp, fe vce(cluster product_code)

The idea is to save variance -covariance matrix for different products and see if there are correlations... But in any case, with runby it will be way easier.
Comment

Announcement

Changing data format after statsby

Comment

Comment

Comment

Comment