How to store p-value and CI in a new variable in prais-winsten regression?

Andressa Freire

Join Date: Aug 2022

Posts: 33
#1

How to store p-value and CI in a new variable in prais-winsten regression?

24 Aug 2022, 19:20

Hi!

I'm new to Stata! I have a database model as shown below (hypothetical data). When running Prais-winsten regression, I would like to store the p-values and confidence intervals in a new variable.

The command “ statsby, by(id): prais log10_prevame year ”, stores only the beta values in a new variable. How do I store the p-value and CI as well?

I also tested the regsave command (below) but it only stores the values of the last prais-wisten regression and not the set of regressions by id (my database is approximately 2000 id)

by(id): prais log10_prevame year
regsave, tstat pval ci

I thank the help of all you.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30103
#2

24 Aug 2022, 19:41

Something like this:

Code:

levelsof id, local(ids) gen b = . gen ll = . gen ul = . gen pvalue = . foreach i of local ids { capture prais log10_prevname year if id == `i' if c(rc) == 0 { matrix M = r(table) foreach x in b ll ul pvalue { quietly replace `x' = M["`x'", "year"] if id == `i' } } else if inlist(c(rc), 2000, 2001) { continue } else { display as error "Unexpected error: id == `i'" exit(c(rc)) } }

Warning: Because it is not possible to import data from a screenshot into Stata, this code is untested and may contain typos or other errors. In the future, when showing data examples, please use the -dataex- command to do so. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

Added: The -capture- and -if ... else if ... else- structure are there in anticipation of the possibility that for some id's there will be insufficiently many observations with non-missing values of log10_prevname and year to carry out the -prais- command. Because large data sets often have gaps of this nature, such errors are anticipated and to avoid Stata's default response of terminating execution, this code allows the completion of the task for all id's having sufficient associated data. If, however, -prais- encounters other problems and cannot successfully execute, this is an unanticipated and abnormal condition, and the code will terminate execution and display a corresponding error message.

Last edited by Clyde Schechter; 24 Aug 2022, 19:46.
1 like
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#3

24 Aug 2022, 23:32

Another option to what Clyde proposes is to first retrieve the betas and standard errors with -statsby-, e.g.,

Code:

statsby _b _se, by(rep): reg price mpg headroom

and then to construct manually the confidence intervals from the information. Confidence interval for the coefficient on mpg is just [ _b[mpg] - 1.96*_se[mpg] , _b[mpg] + 1.96*_se[mpg]].

The pvalue for the coefficient on mpg is 2*normal(_b[mpg]/_se[mpg])
1 like
Comment
Andressa Freire

Join Date: Aug 2022

Posts: 33
#4

25 Aug 2022, 08:36

Hi, thank you very much Clyde e Joro!

Joro, unfortunately the values don't match the regression output.

Clyde, here's an example using the dataex command. I tried the code you provided and at the end I get the message "matrix operators that return matrices not allowed in this context" I think I'm missing some simple detail (Stata is new to me, but I'm increasingly surprised by the possibilities)

I thank the help of all you.

copy starting from the next line ------ ----------------

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input long codigo_ibge int ano float logprev_ame_ 110002 2015 1.69897 110002 2016 2 110002 2017 1.7781513 110002 2018 1.8037053 110002 2019 2 110012 2015 2 110012 2016 1.7781513 110012 2017 2 110012 2018 2 110012 2019 1.30103 110029 2015 1.5126595 110029 2016 1.6651118 110029 2017 1.5948814 110029 2018 1.686381 110029 2019 1.533344 110032 2015 1.363178 110032 2016 .7695511 110032 2017 1.4259688 110032 2018 1.498405 110032 2019 1.39794 end

copy up to and including the previous line - ----------------
Comment

Joro Kolev

Join Date: Aug 2018
Posts: 3050

25 Aug 2022, 08:58

The values do not exactly match the regression output, because whoever programmed -prais- used finite sample t-statistics, which are not justified in this context because -prais- is a two step GLS, and therefore the procedure has only asymptotic justification.

Here is how the values can be made to match the regression output:

Code:

. webuse idle

. tsset t

Time variable: t, 1 to 30
        Delta: 1 unit

. prais usr idle, nolog


Prais–Winsten AR(1) regression with iterated estimates

      Source |       SS           df       MS      Number of obs   =        30
-------------+----------------------------------   F(1, 28)        =      7.12
       Model |  43.0076941         1  43.0076941   Prob > F        =    0.0125
    Residual |  169.165739        28  6.04163354   R-squared       =    0.2027
-------------+----------------------------------   Adj R-squared   =    0.1742
       Total |  212.173433        29  7.31632528   Root MSE        =     2.458

------------------------------------------------------------------------------
         usr | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
        idle |  -.1356522   .0472195    -2.87   0.008    -.2323769   -.0389275
       _cons |   15.20415   4.160391     3.65   0.001     6.681978    23.72633
-------------+----------------------------------------------------------------
         rho |   .5535476
------------------------------------------------------------------------------
Durbin–Watson statistic (original)    = 1.295766
Durbin–Watson statistic (transformed) = 1.476004

. dis "lower bound CI = " _b[idle] - invttail(28, .025)*_se[idle]
lower bound CI = -.23237687

. dis "higher bound CI = " _b[idle] + invttail(28, .025)*_se[idle]
higher bound CI = -.03892751

. dis "p-value = " 2*ttail(28, abs(_b[idle]/_se[idle]))
p-value = .00767544

Comment

Andressa Freire

Join Date: Aug 2022

Posts: 33
#6

25 Aug 2022, 11:42

Thanks for the explanations, Joro! The values match

Now, in a large database like the one I have, how can I generate the p-value variable and CI for a large number of regressions?

With the command "statsby _b _se, by( codigo_ibge ): prais logprev_ame_ ano", I can calculate p-value and CI for all "codigo_ibge", but not correctly because I don't have "df".

Last edited by Andressa Freire; 25 Aug 2022, 11:54.
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30103

25 Aug 2022, 11:57

Re #4: I cannot reproduce your problem with your example data. The code, edited to use the actual variable names in your data set, runs with no error messages and produces plausible looking output:

Code:

. * Example generated by -dataex-. To install: ssc install    dataex
. clear

. input long codigo_ibge int ano float logprev_ame_

      codigo_ibge       ano  logprev~_
  1. 110002 2015   1.69897
  2. 110002 2016         2
  3. 110002 2017 1.7781513
  4. 110002 2018 1.8037053
  5. 110002 2019         2
  6. 110012 2015         2
  7. 110012 2016 1.7781513
  8. 110012 2017         2
  9. 110012 2018         2
 10. 110012 2019   1.30103
 11. 110029 2015 1.5126595
 12. 110029 2016 1.6651118
 13. 110029 2017 1.5948814
 14. 110029 2018  1.686381
 15. 110029 2019  1.533344
 16. 110032 2015  1.363178
 17. 110032 2016  .7695511
 18. 110032 2017 1.4259688
 19. 110032 2018  1.498405
 20. 110032 2019   1.39794
 21. end

.
. xtset codigo_ibge ano

Panel variable: codigo_ibge (strongly balanced)
 Time variable: ano, 2015 to 2019
         Delta: 1 unit

.
. levelsof codigo_ibge, local(ids)
110002 110012 110029 110032

. gen b = .
(20 missing values generated)

. gen ll = .
(20 missing values generated)

. gen ul = .
(20 missing values generated)

. gen pvalue = .
(20 missing values generated)

.
. foreach i of local ids {
  2.     capture prais logprev_ame_ ano if codigo_ibge == `i'
  3.     if c(rc) == 0 {
  4.         matrix M = r(table)
  5.         foreach x in b ll ul pvalue {
  6.             quietly replace `x' = M["`x'", "ano"] if codigo_ibge == `i'
  7.         }
  8.     }
  9.     else if inlist(c(rc), 2000, 2001) {
 10.         continue
 11.     }
 12.     else {
 13.         display as error "Unexpected error: codigo_ibge == `i'"
 14.         exit(c(rc))
 15.     }
 16. }

.
. list, noobs clean

    codigo~e    ano   logpre~_           b          ll         ul     pvalue  
      110002   2015    1.69897    .0158911   -.0685422   .1003244   .5914078  
      110002   2016          2    .0158911   -.0685422   .1003244   .5914078  
      110002   2017   1.778151    .0158911   -.0685422   .1003244   .5914078  
      110002   2018   1.803705    .0158911   -.0685422   .1003244   .5914078  
      110002   2019          2    .0158911   -.0685422   .1003244   .5914078  
      110012   2015          2   -.0595957   -.3509707   .2317792   .4716642  
      110012   2016   1.778151   -.0595957   -.3509707   .2317792   .4716642  
      110012   2017          2   -.0595957   -.3509707   .2317792   .4716642  
      110012   2018          2   -.0595957   -.3509707   .2317792   .4716642  
      110012   2019    1.30103   -.0595957   -.3509707   .2317792   .4716642  
      110029   2015    1.51266    .0072499   -.0277862    .042286   .5572233  
      110029   2016   1.665112    .0072499   -.0277862    .042286   .5572233  
      110029   2017   1.594881    .0072499   -.0277862    .042286   .5572233  
      110029   2018   1.686381    .0072499   -.0277862    .042286   .5572233  
      110029   2019   1.533344    .0072499   -.0277862    .042286   .5572233  
      110032   2015   1.363178    .1282266   -.0481095   .3045626   .1036335  
      110032   2016   .7695511    .1282266   -.0481095   .3045626   .1036335  
      110032   2017   1.425969    .1282266   -.0481095   .3045626   .1036335  
      110032   2018   1.498405    .1282266   -.0481095   .3045626   .1036335  
      110032   2019    1.39794    .1282266   -.0481095   .3045626   .1036335

It is easy for me to imagine that you neglected to change some mention of the variable names shown in the original code, but that should not produce the error message you got. I really don't know what to say. Can you post back with example data that actually produces the error you are encountering?

Comment

Andressa Freire

Join Date: Aug 2022

Posts: 33
#8

25 Aug 2022, 14:09

Thank you so much, Clyde!

I applied the code exactly and at the end this message appears in red, so that the generated variables (b, ll, ul and pvalue) are still missing. The only difference is that I'm doing it in the full database.
My version of stata is 13.1. I don't know if it could be related. (p.s: sorry, I couldn't send using dataex). I send the print below
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30103
#9

25 Aug 2022, 15:12

Version 13.1 is pretty old by now. Much has changed since then, and it is possible that some of the syntax I am using here was introduced after that. I no longer have version 13.1 on any of my computers, so, not knowing what has changed since then, I can't really fix the code to run on the older version.

Here's what I suggest you do. Immediately after the -capture prais...- command, put in two new lines of code:

Code:

set tracedepth 1 set trace on

That way, the commands in the loop will themselves print out and we can see which of them is triggering that error message. Seeing that, I'll at least know which command is the source of the problem, and I'll see if I can dredge up from memory what the syntax for that used to be.

By the way, the Forum FAQ does specifically say that if you are not using the current version of Stata when you post, you should state what version you are using. In truth, I don't think that would have made a difference here: I probably would not have realized that I was using syntax beyond what was available in version 13. But still.
1 like
Comment
Andressa Freire

Join Date: Aug 2022

Posts: 33
#10

25 Aug 2022, 16:33

Thanks for the explanations, Clyde!
I followed your lead and the error appears in the matrix. As soon as possible I will update Stata, but I appreciate the help to solve my problem at this time.
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30103

#11

25 Aug 2022, 17:48

Try this (changes from earlier code in bold face):

Code:

xtset codigo_ibge ano

levelsof codigo_ibge, local(ids)

gen b = .
gen ll = .
gen ul = .
gen pvalue = .
foreach i of local ids {
    capture prais logprev_ame_ ano if codigo_ibge == `i'
    if c(rc) == 0 {
        matrix M = r(table)
        local c: colnumb M "ano"
        foreach x in b ll ul pvalue {
            local r: rownumb M "`x'"
            quietly replace `x' = M[`r', `c'] if codigo_ibge == `i'
        }
    }
    else if inlist(c(rc), 2000, 2001) {
        continue
    }
    else {
        display as error "Unexpected error: codigo_ibge == `i'"
        exit(c(rc))
    }
}

If that doesn't eliminate that error message, I'm kind of stuck and hope somebody else who still uses 13.1 or remembers it better than I do will jump in. If it does eliminate that message but leads to a different one, do the -set tracedepth 1- -set trace on- thing again and post back with what happens.

Comment

Joro Kolev

Join Date: Aug 2018

Posts: 3050
#12

25 Aug 2022, 22:31

The first thing that OP should check is whether -r(table)- exists at all in Stata 13. The object -r(table)- was relatively recently and quietly introduced, and Stata Corp documented it in Stata 17 manual only after I and William Lisowski asked for it to be documented.
1 like
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#13

25 Aug 2022, 22:39

This is the thread where I am raging about the quiet introduction of -r(table)-. https://www.statalist.org/forums/for...-it-documented

Somebody there says that -r(table)- was introduced in Stata 12. So then this should not be the problem.
1 like
Comment
Andressa Freire

Join Date: Aug 2022

Posts: 33
#14

26 Aug 2022, 06:33

Thank you so much, Clyde and Joro!

Clyde, I was able to run my analyzes on Stata 17 successfully. The code provided last still didn't run on Stata 13, but I appreciate you helping me come up with a solution to follow up on my analysis.

Joro, I'm going to read about r(table) in Stata 13 through the post you sent. Thank you for presenting more possibilities to do the analyses.
Comment

Announcement