Multiple regressions in a loop and storing regression coefficients

Hein Willems

Join Date: Mar 2023

Posts: 27
#1

Multiple regressions in a loop and storing regression coefficients

08 Mar 2023, 07:36

Hi all,

For my master thesis I am working with dynamic panel data with small T (13) and large N(20.000). I am estimating the long-term price elasticity of energy consumption using the Arellano-Bond estimator in xtabond2.
What I want to research is whether there is heteroscedasticity with respect to these elasticities and to do this I want to randomly take a sample from my dataset, run the regression, calculate the long-term elasticitiy, store it somewhere(?) and repeat this several (100+) times.
I then want to further inspect the datasets that resulted in the maximum and minimum with respect to the elasticity. I tried the following code

Code:

forval i = 1/3 { set seed `i' tempfile holding save `holding' keep id duplicates drop sample 200, count merge 1:m id using `holding', assert (match using) keep(match) nogenerate sort id year xtabond2 consumption consumptionL1 consumptionL2 gas gasL1 gasL2 gdp gdpL1 gdpL2 heatdays, gmm(consumptionL1) iv(gas gasL1 gasL2 gdp gdpL1 gdpL2 heatdays) nolevel robust small matrix elast = (_b[gas] + _b[gasL1] + _b[gasL2])/(1-_b[consumptionL1] - _b[consumptionL2]) }

Indeed, three regressions are run, however they seem to be identical so I guess something must be wrong with how I set up the seed?
Also, I cannot seem to find how to inspect the values for the elasticities that I stored as a matrix? Should I store the results differently.

Any help is welcome!

Kind regards,

Hein Willems
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30065
#2

08 Mar 2023, 10:08

Nothing is wrong with the way you set the seed, but after you run the first regression, you don't bring the original data back in. Instead you "resample" the first sample. What you need to do is revise the management of the `holding' file. Moreover, there is no need to reset the random number seed on each iteration of the loop. You will do just as well setting it once at the top of the code. The random number generator will just keep going through its sequence on subsequent iterations: it will not reset itself and repeat what it did the first time.

Code:

tempfile holding save `holding' set seed 1234 forval i = 1/3 { use `holding', clear keep id duplicates drop sample 200, count merge 1:m id using `holding', assert (match using) keep(match) nogenerate sort id xtabond2 consumption consumptionL1 consumptionL2 gas gasL1 gasL2 gdp gdpL1 gdpL2 heatdays, gmm(consumptionL1) iv(gas gasL1 gasL2 gdp gdpL1 gdpL2 heatdays) nolevel robust small matrix elast = (_b[gas] + _b[gasL1] + _b[gasL2])/(1-_b[consumptionL1] - _b[consumptionL2]) }
Comment
Hein Willems

Join Date: Mar 2023

Posts: 27
#3

09 Mar 2023, 06:50

Clyde Schechter Thank you so much, it works perfectly.
Is there also a way to see what sample is used in a certain iteration? For example, the results show that the 5th iteration gave the largest elasticity, I would like to know which ID's belong to this sample, what would be a way to go?

Thanks in advance!

Hein Willems
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30065

09 Mar 2023, 11:11

You can build up a matrix containing the ids from each sample as you iterate through the loop. The code below ends up by listing that matrix, but you can also use the -svmat- command to turn it into a data set. See -help svmat- if you want to do that.

Code:

tempfile holding
save `holding'
set seed 1234

forval i = 1/3 {
    use `holding', clear
    keep id
    duplicates drop
    sample 200, count
    sort id
    mkmat id, matrix(current_sample)
    matname current_sample sample`i', explicit columns(1)
    matrix all_samples = nullmat(all_samples), current_sample
    merge 1:m id using `holding', assert (match using) keep(match) nogenerate
    sort id
    xtabond2 consumption consumptionL1 consumptionL2 gas gasL1 gasL2 gdp gdpL1 gdpL2 heatdays, gmm(consumptionL1) iv(gas gasL1 gasL2 gdp gdpL1 gdpL2 heatdays) nolevel robust small
    matrix elast = (_b[gas] + _b[gasL1] + _b[gasL2])/(1-_b[consumptionL1] - _b[consumptionL2])
}

matrix list all_samples

Comment

Hein Willems

Join Date: Mar 2023

Posts: 27
#5

25 Apr 2023, 06:23

Clyde Schechter Thanks for your help some time ago.

I am now using the above code again but for bootstrapping standard errors I now want to sample with replacement.
I thought this should be an easy adaption but I dont seem to be able to figure it out. Do you know how to resolve this?

Thanks in advance,

Hein Willems
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30065
#6

25 Apr 2023, 13:32

Please show the exact code you are trying and the exact Results that Stata is giving you. Also, please show example data, using the -dataex- command.
Comment

Hein Willems

Join Date: Mar 2023
Posts: 27

26 Apr 2023, 00:55

Dear Clyde Schechter ,

I was trying to adapt the code above by replacing sample with bsample (as this is said to sample with replacement). However, when I run it I get the error that the option count is not possible. Removing count than gives a new error.
This is my code:

Code:

clear all


cd "C:\Users\wille\OneDrive\Afstuderen\Stata\data_enexis_1500"
insheet using "quantile14.csv", comma clear

format year %ty
encode postcode, generate(id)
egen newid = group(id)
global id id
global year year
sort $id $year
xtset $id $year

matrix elast = (.)

tempfile holding
save `holding'
set seed 1234

forval i = 1/3 {
    use `holding', clear
    keep id
    duplicates drop
    bsample 200, count
    merge 1:m id using `holding', assert (match using) keep(match) nogenerate
    sort id
    xtdpdgmm L(0/2).consumption L(0/2).gas L(0/2).gdp heatdays, model(diff) gmm(    consumption, lag(1 .)) gmm(gas, lag(1 .)) gmm(gdp, lag(1 .)) gmm(heatdays, lag(1 .)) two vce(r) overid
    matrix elast = (elast \ (_b[gas] + _b[L1.gas] + _b[L2.gas])/(1-_b[L1.consumption] - _b[L2.consumption]))
}

And I tried to show my data using dataex, but the input statement exceeds the linesize limit..

Thanks in advance,

Hein Willems

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30065
#8

26 Apr 2023, 09:27

The problem is that you are not using -bsample- correctly. You need to read the help file and manual section on -bsample- before proceeding. It is a more complicated command than -sample-, and its syntax and semantics are different. I don't know enough about your project, and know nothing at all about -xtdpdgmm- to fully correct your code. But it is a fair bet that you are dealing here with panel data and that you will want to sample whole panels, not randomly selected observations in panels, especially since randomly selected observations in panels will result in the lagged values you mention being mostly missing!

Here is an example, using the online -grunfeld- data set of bootstrapping a fixed effects regression, using sampling with replacement of 5 panels (companies) and creating a matrix containing an expression calculated from the regression coefficients:

Code:

clear* webuse grunfeld tempfile holding save `holding' set seed 1234 forval i = 1/3 { use `holding', clear bsample 5, cluster(company) idcluster(new_id) xtset new_id year xtreg mvalue kstock L1.invest, fe matrix elast = nullmat(elast) \ (_b[kstock] * _b[L1.invest]) } matrix list elast

Let me re-emphasize: the example I have shown here may not be appropriate for what you are trying to do. It is shown just to illustrate some of the features of -bsample- that you probably need to use. You need to read the support materials on -bsample- to use it properly. And what constitutes proper use will depend on details of your problem that have not been discussed in this thread.
Comment

Announcement