how to calculate pseudo-R2 using imputed data in a multinomial logistic regression

Anna Novaresio

Join Date: Jan 2021

Posts: 17
#1

how to calculate pseudo-R2 using imputed data in a multinomial logistic regression

30 Apr 2021, 03:28

Dear all,

I would like to calculate the pseudo-R2 for my multinomial logistic regression in multiple imputed data.
This is my code for imputing data and running the regression(s):
mi set mlong
mi register regular BROWNISH_MULTI GREENISH_MULTI GREEN_MULTI
mi register imputed BERD_AUTO CAR_SALES Log_PATENT_STOCK GREEN_GOV_RD REN_EN_PUB_RD FOSS_FUEL_PUB_RD BEV_SALES EPS GDP_PC
mi impute mvn BERD_AUTO CAR_SALES Log_PATENT_STOCK GREEN_GOV_RD REN_EN_PUB_RD FOSS_FUEL_PUB_RD BEV_SALES EPS GDP_PC = BROWNISH_MULTI, add(20)

mi estimate: mlogit BROWNISH_MULTI l(1).(BERD_AUTO CAR_SALES Log_PATENT_STOCK GREEN_GOV_RD REN_EN_PUB_RD FOSS_FUEL_PUB_RD BEV_SALES EPS GDP_PC), base(0)

I have read in a Statalist post that a possible solution to get the pseudo R2 after mi estimate is the following:
local rhs "armg2 armg3 tbsaburn20 tbsaburn21" noi mi estimate, or saving(miest, replace): logistic hodc `rhs', vce(cluster site) qui mi query local M=r(M) scalar r2=0 scalar cstat=0 qui mi xeq 1/`M': logistic hodc `rhs'; scalar r2=r2+e(r2_p); lroc, nog; scalar cstat=cstat+r(area) scalar r2=r2/`M' scalar cstat=cstat/`M' noi di "Pseudo R=squared over imputed data = " r2 noi di "C statistic over imputed data = " cstat I don't understand exactly how to adapt this code to my needs (e.g. where to plug my variables).
Could you provide me any help, please? Many thanks in advance
Anna
Tags: impute, mlogit, pseudoR2
Andrew Musau

Join Date: Oct 2014

Posts: 10190
#2

30 Apr 2021, 15:05

Anna Novaresio, if you can provide a reproducible example (either using your dataset or one of Stata's datasets), it should be possible to provide code.
Comment
Anna Novaresio

Join Date: Jan 2021

Posts: 17
#3

01 May 2021, 10:19

Andrew Musau, thanks for your reply.
I can provide you more info about my dataset, specifying that I am handling longitudinal data and that my variables are the following:
- DEP VARS: BROWNISH_MULTI GREENISH_MULTI GREEN_MULTI, which are categorical and without missing values
- EXPLANATORY VARS: BERD_AUTO CAR_SALES Log_PATENT_STOCK GREEN_GOV_RD REN_EN_PUB_RD FOSS_FUEL_PUB_RD BEV_SALES EPS GDP_PC, which are lagged by 1 year and contain missing values, that I try to 'fill/fix' with the mnv imputation procedure.
My regression is a multinomial logistic based on imputed data, from which I would like to derive the pseudo-r2.
Could you give me any advice on how to calculate it in a smooth way? Many thanks
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10190
#4

01 May 2021, 11:07

What I mean in #3 is that you should present a data example using dataex so that one can run your code below:

Code:

mi set mlong mi register regular BROWNISH_MULTI GREENISH_MULTI GREEN_MULTI mi register imputed BERD_AUTO CAR_SALES Log_PATENT_STOCK GREEN_GOV_RD REN_EN_PUB_RD FOSS_FUEL_PUB_RD BEV_SALES EPS GDP_PC mi impute mvn BERD_AUTO CAR_SALES Log_PATENT_STOCK GREEN_GOV_RD REN_EN_PUB_RD FOSS_FUEL_PUB_RD BEV_SALES EPS GDP_PC = BROWNISH_MULTI, add(20) mi estimate: mlogit BROWNISH_MULTI l(1).(BERD_AUTO CAR_SALES Log_PATENT_STOCK GREEN_GOV_RD REN_EN_PUB_RD FOSS_FUEL_PUB_RD BEV_SALES EPS GDP_PC), base(0)

If for some reason you cannot post your data, then recreate an example where you impute missing values and then run a multinomial logit model. You could do this, e.g., by replacing some existing values with missing in the dataset below, imputing them and then running the estimation using the datasets with imputed values.

Code:

webuse sysdsn1, clear mlogit insure age male nonwhite i.site

In this way, the focus is on answering your specific question instead of trying to replicate the imputation procedure.
Comment
Anna Novaresio

Join Date: Jan 2021

Posts: 17
#5

26 Nov 2021, 14:00

hello, I have recently worked on this piece of code again, and I have figured out how to run the proposed code to calculate the pseudo-R^2 over imputed data with a multinomial logit using my data:

Code:

mi set along mi register regular A B C mi register imputed X Y Z mi impute mvn X Y Z = A, add(20) force local mvar l(1).(X Y Z) noi mi est: mlogit A $myvar, base(1) ereturn list qui mi query local M=r(M) scalar r2=0 display "`M'" qui mi xeq 1/`M': mlogit A `myvar'; scalar r2=r2+e(r2_p) scalar r2=r2+e(r2_p) scalar r2=r2/`M' noi di "Pseudo R=squared over imputed data = " r2

However, like other STATA users, I have run into the "invalid numlist" error, which cannot be fixed following any of the instructions given by various expert users on this forum (e.g., set track on, mi set wide, etc).
I kindly ask whether any of you could provide any further advice to get the value of the pseudo-R^2 when managing a multinomial logit model over imputed data. Many thanks in advance!
Anna
Comment

Felix Bittmann

Join Date: Aug 2018
Posts: 687

26 Nov 2021, 15:36

My suggestion:

Code:

clear all
webuse sysdsn1

*Generate fake missing data*
replace insure = . in 1/20
replace male = . in 50/100
replace site = . in 100/150

*Impute dataset*
mi set flong
mi register imputed insure age male nonwhite site
mi impute chained (pmm, knn(5)) age male nonwhite site, ///
    add(5) rseed(123)
    

*Average statistic of interest*
mi describe
local mtotal = r(M)
local r2 = 0
forvalues i = 1 / `mtotal' {
    mlogit insure age male nonwhite i.site if _mi_m == `i'
    local r2 = `r2' + e(r2_p)
}
di `r2' / `mtotal'

Best wishes

Stata 18.0 MP | ORCID | Google Scholar

Comment

daniel klein

Join Date: Mar 2014

Posts: 3845
#7

26 Nov 2021, 16:00

Originally posted by Anna Novaresio View Post

[...] I am handling longitudinal data

Then this should probably be reflected in your imputation model and in the analyses model as well. It is not clear from what you have shown that this is so.

Felix suggested code crucially depends on the data being mi set flong (not mlong!). Some people suggest combining transformed individual pseudo-R-squared values. The basic approach is detailed here.
1 like
Comment

Anna Novaresio

Join Date: Jan 2021
Posts: 17

27 Nov 2021, 02:39

Felix Bittmann many thanks for sharing your very useful code, which works fine to my purpose and with my dataset, at least up to the final step:

Code:

di `r2' / `mtotal'

when the software displays an "invalid syntax" message, instead of the final value of the pseudo R^2.
I could get a value of the p-R^2 by looking at the results of the regression produced with the loop, but I am not sure that is the correct result.

This is my code:

Code:

use "autogreen_special_NOV21"

*Impute dataset*
mi set flong
mi register regular BROWNISH_MNL GREENISH_MNL GREEN_MNL
mi register imputed BERD_AUTO CAR_SALES  GREEN_GOV_RD REN_EN_PUB_RD FOSS_FUEL_PUB_RD BEV_SALES EPS Log_PATENT_STOCK GDP_PC
mi impute chained (pmm, knn(5)) BERD_AUTO CAR_SALES  GREEN_GOV_RD REN_EN_PUB_RD FOSS_FUEL_PUB_RD BEV_SALES EPS Log_PATENT_STOCK GDP_PC, add(5) rseed(123)
    

*Average statistic of interest*
mi describe
local mtotal = r(M)
local r2 = 0
forvalues i = 1 / `mtotal' {
    mlogit BROWNISH_MNL BERD_AUTO CAR_SALES  GREEN_GOV_RD REN_EN_PUB_RD FOSS_FUEL_PUB_RD BEV_SALES EPS Log_PATENT_STOCK GDP_PC if _mi_m == `i'
    local r2 = `r2' + e(r2_p)
}
di `r2' / `mtotal'

I wonder whether I could also specify the base category in the loop without discarding its functioning. Many thanks in advance!

Comment

Anna Novaresio

Join Date: Jan 2021

Posts: 17
#9

27 Nov 2021, 02:48

daniel klein many thanks for your clarifications

crucially depends on the data being mi set flong (not mlong!).

and the very useful link shared!
Comment
Anna Novaresio

Join Date: Jan 2021

Posts: 17
#10

27 Nov 2021, 03:06

Felix Bittmann I am approximating the average pseudo-R² using the values of the iterated regressions, but still interested in understanding why I do not get the proper final result displayed using di di

Code:

di `r2' / `mtotal'
Comment
daniel klein

Join Date: Mar 2014

Posts: 3845
#11

27 Nov 2021, 03:39

Originally posted by Anna Novaresio View Post

still interested in understanding why I do not get the proper final result displayed using di

Probably because you are not running the entire code at once. When you execute the one line

Code:

di `r2' / `mtotal'

in isolation, neither local macro is defined, and Stata sees

Code:

di /

which is invalid syntax.
Comment
Anna Novaresio

Join Date: Jan 2021

Posts: 17
#12

27 Nov 2021, 03:52

daniel klein got it, when I do not run that line

in isolation

I got the result. I am used to run the code line by line or piece by piece, but I will keep your advice in mind for further similar situations. thanks a lot! PS my average approximation is in line with the final proper results ^.^
Comment
daniel klein

Join Date: Mar 2014

Posts: 3845
#13

27 Nov 2021, 04:06

Originally posted by Anna Novaresio View Post

I am used to run the code line by line or piece by piece,

To elaborate a bit, to Stata, each of those pieces/runs has its own namespace, and local macros, as their name suggest, are only visible within the namespace in which they are defined.

Originally posted by Anna Novaresio View Post

PS my average approximation is in line with the final proper results ^.^

What do you mean by average approximation? The code that Felix suggested gives you the average pseudo-R-squared over all imputed datasets (i.e., it averages the pseudo-R-squares from the 5 regressions), which is the same principle in which you obtain any point estimates from multiply imputed datasets.
Comment

Announcement