Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • how to calculate pseudo-R2 using imputed data in a multinomial logistic regression

    Dear all,

    I would like to calculate the pseudo-R2 for my multinomial logistic regression in multiple imputed data.
    This is my code for imputing data and running the regression(s):
    mi set mlong
    mi register regular BROWNISH_MULTI GREENISH_MULTI GREEN_MULTI
    mi register imputed BERD_AUTO CAR_SALES Log_PATENT_STOCK GREEN_GOV_RD REN_EN_PUB_RD FOSS_FUEL_PUB_RD BEV_SALES EPS GDP_PC
    mi impute mvn BERD_AUTO CAR_SALES Log_PATENT_STOCK GREEN_GOV_RD REN_EN_PUB_RD FOSS_FUEL_PUB_RD BEV_SALES EPS GDP_PC = BROWNISH_MULTI, add(20)

    mi estimate: mlogit BROWNISH_MULTI l(1).(BERD_AUTO CAR_SALES Log_PATENT_STOCK GREEN_GOV_RD REN_EN_PUB_RD FOSS_FUEL_PUB_RD BEV_SALES EPS GDP_PC), base(0)

    I have read in a Statalist post that a possible solution to get the pseudo R2 after mi estimate is the following:
    local rhs "armg2 armg3 tbsaburn20 tbsaburn21" noi mi estimate, or saving(miest, replace): logistic hodc `rhs', vce(cluster site) qui mi query local M=r(M) scalar r2=0 scalar cstat=0 qui mi xeq 1/`M': logistic hodc `rhs'; scalar r2=r2+e(r2_p); lroc, nog; scalar cstat=cstat+r(area) scalar r2=r2/`M' scalar cstat=cstat/`M' noi di "Pseudo R=squared over imputed data = " r2 noi di "C statistic over imputed data = " cstat I don't understand exactly how to adapt this code to my needs (e.g. where to plug my variables).
    Could you provide me any help, please? Many thanks in advance
    Anna

  • #2
    Anna Novaresio, if you can provide a reproducible example (either using your dataset or one of Stata's datasets), it should be possible to provide code.

    Comment


    • #3
      Andrew Musau, thanks for your reply.
      I can provide you more info about my dataset, specifying that I am handling longitudinal data and that my variables are the following:
      - DEP VARS: BROWNISH_MULTI GREENISH_MULTI GREEN_MULTI, which are categorical and without missing values
      - EXPLANATORY VARS: BERD_AUTO CAR_SALES Log_PATENT_STOCK GREEN_GOV_RD REN_EN_PUB_RD FOSS_FUEL_PUB_RD BEV_SALES EPS GDP_PC, which are lagged by 1 year and contain missing values, that I try to 'fill/fix' with the mnv imputation procedure.
      My regression is a multinomial logistic based on imputed data, from which I would like to derive the pseudo-r2.
      Could you give me any advice on how to calculate it in a smooth way? Many thanks

      Comment


      • #4
        What I mean in #3 is that you should present a data example using dataex so that one can run your code below:

        Code:
        mi set mlong
        mi register regular BROWNISH_MULTI GREENISH_MULTI GREEN_MULTI
        mi register imputed BERD_AUTO CAR_SALES Log_PATENT_STOCK GREEN_GOV_RD REN_EN_PUB_RD FOSS_FUEL_PUB_RD BEV_SALES EPS GDP_PC
        mi impute mvn BERD_AUTO CAR_SALES Log_PATENT_STOCK GREEN_GOV_RD REN_EN_PUB_RD FOSS_FUEL_PUB_RD BEV_SALES EPS GDP_PC = BROWNISH_MULTI, add(20)
        
        mi estimate: mlogit BROWNISH_MULTI l(1).(BERD_AUTO CAR_SALES Log_PATENT_STOCK GREEN_GOV_RD REN_EN_PUB_RD FOSS_FUEL_PUB_RD BEV_SALES EPS GDP_PC), base(0)
        If for some reason you cannot post your data, then recreate an example where you impute missing values and then run a multinomial logit model. You could do this, e.g., by replacing some existing values with missing in the dataset below, imputing them and then running the estimation using the datasets with imputed values.

        Code:
        webuse sysdsn1, clear
        mlogit insure age male nonwhite i.site
        In this way, the focus is on answering your specific question instead of trying to replicate the imputation procedure.

        Comment


        • #5
          hello, I have recently worked on this piece of code again, and I have figured out how to run the proposed code to calculate the pseudo-R^2 over imputed data with a multinomial logit using my data:

          Code:
          mi set along
          mi register regular A B C
          mi register imputed X Y Z
          mi impute mvn X Y Z = A, add(20) force
          
          local mvar l(1).(X Y Z)
          noi mi est: mlogit A $myvar, base(1)
          ereturn list
          qui mi query
          local M=r(M)
          scalar r2=0
          display "`M'"
          qui mi xeq 1/`M': mlogit A `myvar'; scalar r2=r2+e(r2_p)
          scalar r2=r2+e(r2_p)
          scalar r2=r2/`M'
          noi di "Pseudo R=squared over imputed data = " r2
          However, like other STATA users, I have run into the "invalid numlist" error, which cannot be fixed following any of the instructions given by various expert users on this forum (e.g., set track on, mi set wide, etc).
          I kindly ask whether any of you could provide any further advice to get the value of the pseudo-R^2 when managing a multinomial logit model over imputed data. Many thanks in advance!
          Anna

          Comment


          • #6
            My suggestion:

            Code:
            clear all
            webuse sysdsn1
            
            *Generate fake missing data*
            replace insure = . in 1/20
            replace male = . in 50/100
            replace site = . in 100/150
            
            *Impute dataset*
            mi set flong
            mi register imputed insure age male nonwhite site
            mi impute chained (pmm, knn(5)) age male nonwhite site, ///
                add(5) rseed(123)
                
            
            *Average statistic of interest*
            mi describe
            local mtotal = r(M)
            local r2 = 0
            forvalues i = 1 / `mtotal' {
                mlogit insure age male nonwhite i.site if _mi_m == `i'
                local r2 = `r2' + e(r2_p)
            }
            di `r2' / `mtotal'
            Best wishes

            Stata 18.0 MP | ORCID | Google Scholar

            Comment


            • #7
              Originally posted by Anna Novaresio View Post
              [...] I am handling longitudinal data
              Then this should probably be reflected in your imputation model and in the analyses model as well. It is not clear from what you have shown that this is so.

              Felix suggested code crucially depends on the data being mi set flong (not mlong!). Some people suggest combining transformed individual pseudo-R-squared values. The basic approach is detailed here.

              Comment


              • #8
                Felix Bittmann many thanks for sharing your very useful code, which works fine to my purpose and with my dataset, at least up to the final step:
                Code:
                di `r2' / `mtotal'
                when the software displays an "invalid syntax" message, instead of the final value of the pseudo R^2.
                I could get a value of the p-R^2 by looking at the results of the regression produced with the loop, but I am not sure that is the correct result.

                This is my code:
                Code:
                use "autogreen_special_NOV21"
                
                *Impute dataset*
                mi set flong
                mi register regular BROWNISH_MNL GREENISH_MNL GREEN_MNL
                mi register imputed BERD_AUTO CAR_SALES  GREEN_GOV_RD REN_EN_PUB_RD FOSS_FUEL_PUB_RD BEV_SALES EPS Log_PATENT_STOCK GDP_PC
                mi impute chained (pmm, knn(5)) BERD_AUTO CAR_SALES  GREEN_GOV_RD REN_EN_PUB_RD FOSS_FUEL_PUB_RD BEV_SALES EPS Log_PATENT_STOCK GDP_PC, add(5) rseed(123)
                    
                
                *Average statistic of interest*
                mi describe
                local mtotal = r(M)
                local r2 = 0
                forvalues i = 1 / `mtotal' {
                    mlogit BROWNISH_MNL BERD_AUTO CAR_SALES  GREEN_GOV_RD REN_EN_PUB_RD FOSS_FUEL_PUB_RD BEV_SALES EPS Log_PATENT_STOCK GDP_PC if _mi_m == `i'
                    local r2 = `r2' + e(r2_p)
                }
                di `r2' / `mtotal'

                I wonder whether I could also specify the base category in the loop without discarding its functioning. Many thanks in advance!

                Comment


                • #9
                  daniel klein many thanks for your clarifications
                  crucially depends on the data being mi set flong (not mlong!).
                  and the very useful link shared!

                  Comment


                  • #10
                    Felix Bittmann I am approximating the average pseudo-R2 using the values of the iterated regressions, but still interested in understanding why I do not get the proper final result displayed using di di
                    Code:
                    di `r2' / `mtotal'

                    Comment


                    • #11
                      Originally posted by Anna Novaresio View Post
                      still interested in understanding why I do not get the proper final result displayed using di
                      Probably because you are not running the entire code at once. When you execute the one line

                      Code:
                      di `r2' / `mtotal'
                      in isolation, neither local macro is defined, and Stata sees

                      Code:
                      di /
                      which is invalid syntax.

                      Comment


                      • #12
                        daniel klein got it, when I do not run that line
                        in isolation
                        I got the result. I am used to run the code line by line or piece by piece, but I will keep your advice in mind for further similar situations. thanks a lot! PS my average approximation is in line with the final proper results ^.^

                        Comment


                        • #13
                          Originally posted by Anna Novaresio View Post
                          I am used to run the code line by line or piece by piece,
                          To elaborate a bit, to Stata, each of those pieces/runs has its own namespace, and local macros, as their name suggest, are only visible within the namespace in which they are defined.


                          Originally posted by Anna Novaresio View Post
                          PS my average approximation is in line with the final proper results ^.^
                          What do you mean by average approximation? The code that Felix suggested gives you the average pseudo-R-squared over all imputed datasets (i.e., it averages the pseudo-R-squares from the 5 regressions), which is the same principle in which you obtain any point estimates from multiply imputed datasets.

                          Comment

                          Working...
                          X