Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Applying arcsinh transform with a mlogit model


    Hello,

    I was wondering if anyone might be willing to provide some guidance on a question I have regarding the application of the arcsinh transformation.

    Specifically, I am using a multinomial logistic regression model (mlogit), as my outcome variable is categorical. One of my independent variables, total household wealth, has a very high variance and includes negative values. I understand that the arcsinh transformation is often recommended as an alternative to log-transforming such variables, particularly when zero or negative values are present. However, I am unsure whether it is appropriate to use the arcsinh transformation in the context of a nonlinear model like mlogit.

    Below, I’ve included the -dataex- output for the variables I'm using, including totalwealth_wave1 and the outcome variable _traj_Group.

    To provide some additional context, I am using data from the Health and Retirement Study (HRS). I first estimated a group-based trajectory model using six waves of data (2012–2022) on the variables food_insecurity_wave1 through food_insecurity_wave6. Participants were assigned to one of three trajectory groups. I then used a multinomial logistic regression model to examine associations between trajectory group membership and several control variables: sex, education, veteran status, mother’s education, marital status, chronic disease, mental health, household income, and household wealth.

    As noted, the issue is that the household wealth variable (totalwealth_wave1) has a very large standard deviation (shown below). I am considering applying the arcsinh transformation but wanted to confirm whether this approach is appropriate for use with the mlogit command.

    Any advice or insights you could offer would be greatly appreciated.



    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(_traj_Group inc_d race) byte(male education veteran) float(mothered marital_status_wave1 chronicdisease_wave1 MentalHealth_wave1 totalwealth_wave1) double HouseholdIncome_wave1
    1 0 1 0 5 0 1 4 2 1    350000              11808
    1 0 1 0 3 0 0 4 0 0   1646599  354830.1554667724
    1 0 1 0 4 0 1 2 0 0    850000              18724
    1 0 1 1 5 1 1 1 2 0   4440000             129660
    1 0 1 0 5 0 0 1 0 0   4440000             129660
    1 0 4 0 5 0 1 4 2 0 1029774.8             145289
    1 0 3 0 1 0 0 4 2 0         0               4164
    1 0 1 1 5 1 0 4 1 0   2460000              54000
    1 0 3 0 1 0 0 2 2 0         0              11376
    3 0 2 0 3 0 0 1 1 0         0   18580.1933652527
    1 0 1 0 4 0 1 4 1 0     15000  98462.84667842393
    1 0 1 0 4 0 1 4 0 0   3896000 46526.875164932644
    1 0 1 1 5 1 1 1 0 0   1390000              88200
    1 0 1 0 1 0 1 4 1 0    690000              21588
    1 0 4 0 5 0 0 1 2 0    194000              80400
    1 0 4 1 5 1 0 1 2 1    194000              80400
    1 0 1 0 3 0 0 1 0 0    544000              65780
    1 0 1 0 5 0 1 1 0 0    245000              69720
    1 0 1 1 5 1 1 1 1 0    686030              18190
    1 0 1 0 3 0 1 1 0 0    686030              18190
    1 0 1 1 5 0 0 1 3 0    411500              92556
    1 0 1 0 5 0 1 1 0 1    411500              92556
    1 0 2 0 1 0 0 4 2 0    190000  31443.22356979586
    2 0 2 0 5 0 1 4 2 0     -2955              25506
    1 0 2 0 3 0 0 4 0 0      4400              66770
    1 0 2 1 3 1 1 1 2 1    320000              28960
    1 0 2 0 3 0 1 1 0 0    320000              28960
    1 0 3 0 3 0 1 1 0 0   1252000              39265
    1 0 4 0 1 0 0 1 0 0    304000              62724
    1 0 4 1 5 0 0 1 0 0    304000              62724
    1 0 3 1 4 1 1 1 0 1    407000              19140
    1 0 1 0 4 0 1 1 0 0    407000              19140
    3 0 2 0 5 0 0 4 3 0     -5950              10140
    3 0 4 0 4 0 0 4 0 0    -80500              35600
    1 1 1 1 1 0 1 1 1 0    557000              28784
    1 0 1 0 5 0 1 1 1 0    557000              28784
    1 0 1 1 3 1 0 1 2 0   1052000              47926
    1 0 2 0 4 0 0 1 2 0    194000              30000
    1 0 1 1 5 1 1 1 2 0   1976291 179523.44600249556
    1 0 3 0 3 0 0 1 1 0    103800   27375.3422452243
    1 0 3 1 2 1 0 1 1 0    103800   27375.3422452243
    1 0 2 0 1 0 0 4 2 0         0              10008
    1 0 2 0 3 0 0 5 1 0      8000              11760
    1 0 1 1 4 1 0 1 3 0     22050              51824
    1 0 1 1 3 1 1 3 3 1   1080350              77840
    1 0 1 0 1 0 1 3 1 0   1080350              77840
    1 0 1 0 5 0 1 1 1 0    491000              60400
    1 0 1 0 3 0 1 4 2 0    137740              35033
    1 0 1 1 4 0 0 1 1 0    103500              27780
    1 0 1 0 3 0 1 1 1 0    103500              27780
    1 0 1 0 2 0 0 2 0 0    852000              14076
    1 1 2 0 1 0 0 5 1 0         0              22460
    1 0 2 0 4 0 1 4 2 0     50000                  0
    1 0 1 0 1 0 0 2 2 0      6000              17353
    1 0 1 0 3 0 1 4 1 0  811281.9              15240
    1 0 1 1 5 1 1 3 2 0   1748163              85054
    1 0 1 1 5 0 0 1 1 1    762500              27200
    1 0 3 0 1 0 0 1 2 0    153700              30580
    1 0 1 0 4 0 0 2 0 0    163000              18264
    1 0 2 0 3 0 0 4 3 0       800              15492
    1 1 2 1 3 0 1 2 0 0    103135              16048
    1 0 2 0 4 0 0 4 2 0      3150              17724
    1 0 2 1 4 1 0 1 2 0    106000             109240
    1 1 2 0 4 0 0 1 2 0    106000             109240
    1 0 1 1 1 1 0 1 3 0    970000              73264
    1 1 1 1 4 0 1 1 3 0    353000              76125
    1 0 1 0 3 0 1 1 0 0    353000              76125
    1 0 1 0 4 0 1 4 2 1      3493              14460
    1 0 1 0 3 0 0 4 4 0    245000              40800
    1 0 1 0 1 0 1 4 1 0         0              15912
    1 0 1 1 5 0 1 1 1 0    771500              58564
    1 0 1 0 1 0 0 1 3 0    151500              89200
    1 0 1 1 3 0 1 1 0 0    100000              44452
    1 0 1 0 3 0 0 1 2 0    100000              44452
    1 0 1 0 5 0 1 4 1 0     70000              34600
    1 0 1 0 5 0 1 1 1 0   3055575              98196
    1 0 1 0 3 0 0 1 1 0     17500              23776
    1 0 2 0 3 0 1 4 1 0     54000              42615
    1 0 2 0 1 0 0 3 3 0    -16000              21240
    1 0 1 0 5 0 0 1 1 0    812000              90148
    2 0 2 0 4 0 1 4 2 0    -16000  29436.09564515772
    1 0 2 0 4 0 0 2 1 0    113500              41258
    1 0 2 0 2 0 0 4 1 0     80000              18140
    1 0 2 0 1 0 0 5 3 0         0              10416
    1 0 2 0 1 0 0 1 1 0     69000              31172
    1 1 4 1 2 0 1 1 0 0     69000              31172
    1 0 1 1 2 1 0 1 3 0     87000              41846
    1 0 1 0 3 0 1 1 0 0     87000              41846
    1 1 1 1 1 0 0 1 3 0     47800              24168
    1 0 1 0 3 0 0 1 1 0     47800              24168
    1 0 1 1 3 0 0 1 2 0    162000              75404
    1 0 1 0 3 0 0 1 0 0    162000              75404
    1 0 1 1 4 1 1 1 2 0    324000              33660
    1 0 1 0 3 0 0 1 2 0    324000              33660
    1 0 1 1 3 1 1 1 2 0    247200              43860
    1 0 1 0 3 0 1 1 2 0    247200              43860
    1 0 1 0 3 0 1 1 2 0   1851000              55524
    1 0 1 1 4 1 1 1 3 0   1851000              55524
    1 1 1 1 5 0 1 1 3 0    381000              86800
    1 0 1 0 5 0 1 1 1 1    381000              86800
    end
    label values inc_d inc_d
    label def inc_d 0 "No", modify
    label def inc_d 1 "Yes", modify
    label values race race
    label def race 1 "White", modify
    label def race 2 "Black", modify
    label def race 3 "Hispanic", modify
    label def race 4 "Other", modify
    label values male male
    label def male 0 "Female", modify
    label def male 1 "Male", modify
    label values education EDUC
    label def EDUC 1 "1.lt high-school", modify
    label def EDUC 2 "2.ged", modify
    label def EDUC 3 "3.high-school graduate", modify
    label def EDUC 4 "4.some college", modify
    label def EDUC 5 "5.college and above", modify
    label values veteran veteran
    label def veteran 0 "No", modify
    label def veteran 1 "Yes", modify
    label values mothered mothered
    label def mothered 0 "Less than High School", modify
    label def mothered 1 "High School or Higher", modify
    label values marital_status_wave1 married
    label def married 1 "Married", modify
    label def married 2 "Absent sopuse/Seperated/Divorced", modify
    label def married 3 "Partnered", modify
    label def married 4 "Widowed", modify
    label def married 5 "Never Married", modify
    label values MentalHealth_wave1 MH
    label def MH 0 "No", modify
    label def MH 1 "Had Mental Health Problems", modify

    Code:
    
     tabstat totalwealth_wave1, statistics(count mean min max p50 sd ) long 
    
        Variable |         N      Mean       Min       Max       p50        SD
    -------------+------------------------------------------------------------
    totalwealt~1 |      8299  479734.3  -1495000  2.31e+07    197000  911117.1
    --------------------------------------------------------------------------


  • #2
    I conjecture that the issue of model specification (-mlogit- or not) is somewhat of a "red herring". There are different ways of summarising differences in wealth than using a continuous variable or some transformation of that variable (e.g., arcsinh). E.g., you might convert the wealth variable into categories and use these. Moreover, even if you used the arcsinh transformation, how would you interpret your estimation results? How would tell a reader about how differences in wealth are associated with differences in your outcome(s)? (For a salutary reminder in the arcsinh context, see e.g. https://onlinelibrary.wiley.com/doi/...111/obes.12325.) In contrast, it would relatively straightforward to do this sort of exercise with a categorical wealth variable via -margins- or related post-estimation commands.

    Comment


    • #3
      I am usually more sympathetic to transformation than some people active here.

      In this context the use of mlogit and its nonlinearity seems neither here nor there. The variable of concern is a predictor and whether to use it or a transformed version is more practicality than deep principle.

      Nor does the fact that use of asinh (which for some bizarre reason many economists prefer to call IHS despite longstanding notation) does not mesh easily with the idea of elasticity seem compelling. If you're predominantly focused on elasiticity, fine, but perhaps you should stick to positive variables.

      The large variance (SD) of any variable is a matter of units. DIvide by a thousand or a million if it is troubling!

      I think the practical question is what works best as a predictor. We've not seen evidence that the raw variable works really badly, although it is right to worry about outliers.

      There is no such thing as the asinh transformation as there is scope for asinh(k x) too.

      Detail: Very pedantically, terms like arcsinh although common are a misrepresentation of the underlying mathematics. People writing that, when they aren't just copying, are thinking of arcsine and its siblings where the inverse of a trigonometric function has the interpretation of yielding an angle or arc. Hyperbolic functions are only loosely analogous; their inverses don't yield angles, but rather areas. In a Stata context see Section 3.3 of https://journals.sagepub.com/doi/pdf...867X0800800307. Better is to go direct to Saler, B. M. 1985. Inverse hyperbolic functions as areas. College Mathematics Journal 16: 129–131. You may have direct access using https://www.jstor.org/stable/2686214

      Some famous economists don't seem to know this!

      Comment


      • #4
        Nick Cox Here is the output from a mlogit without any transformation applied to the totalwealth_wave1 variable, the SE is insane, would you recommend a cubnic transformation or multiplying the variable by a constant like 0.0001 instead of IHS transformation?



        Code:
        
        svy: mlogit _traj_Group inc_d i.race i.male i.education i.veteran i.mothered i.marital_status_wave1 c.chronicdisease_wave1 i.MentalHealth_wave1 totalwealth_wav
        > e1, rrr baseoutcome(1)
        (running mlogit on estimation sample)
        
        Survey: Multinomial logistic regression
        
        Number of strata =  56                            Number of obs   =      8,299
        Number of PSUs   = 112                            Population size = 52,052,925
                                                          Design df       =         56
                                                          F(36, 21)       =      21.24
                                                          Prob > F        =     0.0000
        
        ---------------------------------------------------------------------------------------------------
                                          |             Linearized
                              _traj_Group |        RRR   std. err.      t    P>|t|     [95% conf. interval]
        ----------------------------------+----------------------------------------------------------------
        1                                 |  (base outcome)
        ----------------------------------+----------------------------------------------------------------
        2                                 |
                                    inc_d |   1.583272   .2979819     2.44   0.018     1.085967    2.308311
                                          |
                                     race |
                                   Black  |   3.228209   .6929344     5.46   0.000      2.09999    4.962565
                                Hispanic  |   3.354377   .6586644     6.16   0.000     2.263496    4.971004
                                   Other  |   2.899665   .9845573     3.14   0.003      1.46875    5.724636
                                          |
                                     male |
                                    Male  |   .7862911   .1111773    -1.70   0.095     .5923394    1.043749
                                          |
                                education |
                                   2.ged  |   .5507071   .1608566    -2.04   0.046     .3067627    .9886413
                  3.high-school graduate  |   .5167927   .0907585    -3.76   0.000     .3635203    .7346899
                          4.some college  |   .5980478   .1007785    -3.05   0.003     .4267079    .8381874
                     5.college and above  |    .425691   .0964581    -3.77   0.000     .2703712    .6702372
                                          |
                                  veteran |
                                     Yes  |   .8883438   .1782134    -0.59   0.557     .5943583    1.327742
                                          |
                                 mothered |
                   High School or Higher  |   1.088135   .1630643     0.56   0.575     .8059497    1.469121
                                          |
                     marital_status_wave1 |
        Absent sopuse/Seperated/Divorced  |    2.31905   .4610573     4.23   0.000     1.557197    3.453638
                               Partnered  |    1.88913   .5687121     2.11   0.039     1.033604    3.452784
                                 Widowed  |   .9501908   .1945629    -0.25   0.804     .6304762    1.432033
                           Never Married  |   2.084201   .6448618     2.37   0.021     1.121395    3.873652
                                          |
                     chronicdisease_wave1 |   1.261931   .0701802     4.18   0.000     1.128892    1.410649
                                          |
                       MentalHealth_wave1 |
              Had Mental Health Problems  |   2.186456   .3389075     5.05   0.000     1.602835    2.982586
                        totalwealth_wave1 |   .9999997   7.01e-07    -0.46   0.650     .9999983    1.000001
                                    _cons |   .0474638   .0157231    -9.20   0.000     .0244435    .0921642
        ----------------------------------+----------------------------------------------------------------
        3                                 |
                                    inc_d |    1.54605   .3468298     1.94   0.057     .9864039    2.423217
                                          |
                                     race |
                                   Black  |   2.599658   .5812332     4.27   0.000     1.661124    4.068465
                                Hispanic  |   1.854912   .4805592     2.38   0.020     1.103901    3.116855
                                   Other  |   2.713307   1.192521     2.27   0.027     1.124937    6.544398
                                          |
                                     male |
                                    Male  |   .3971497   .0849576    -4.32   0.000     .2587296    .6096244
                                          |
                                education |
                                   2.ged  |   .7415699   .2438701    -0.91   0.367     .3837498    1.433032
                  3.high-school graduate  |   .7935076   .1710169    -1.07   0.288     .5152871    1.221948
                          4.some college  |   .6472407   .1407022    -2.00   0.050      .418735    1.000443
                     5.college and above  |   .3994608     .13102    -2.80   0.007     .2070724    .7705949
                                          |
                                  veteran |
                                     Yes  |   1.198809     .43805     0.50   0.622     .5765664    2.492588
                                          |
                                 mothered |
                   High School or Higher  |   1.056353   .1937699     0.30   0.766     .7315143    1.525439
                                          |
                     marital_status_wave1 |
        Absent sopuse/Seperated/Divorced  |   2.477222   .6157957     3.65   0.001     1.505561    4.075974
                               Partnered  |   1.588552    .692876     1.06   0.293     .6630353    3.805978
                                 Widowed  |   1.964464   .4913103     2.70   0.009     1.190307    3.242121
                           Never Married  |   2.570662   .6861016     3.54   0.001     1.506071    4.387774
                                          |
                     chronicdisease_wave1 |   1.440641   .1195711     4.40   0.000     1.219965    1.701235
                                          |
                       MentalHealth_wave1 |
              Had Mental Health Problems  |   1.895926   .3850766     3.15   0.003     1.262172    2.847895
                        totalwealth_wave1 |    .999994   1.14e-06    -5.28   0.000     .9999917    .9999963
                                    _cons |   .0310002    .009915   -10.86   0.000     .0163346    .0588329
        ---------------------------------------------------------------------------------------------------
        Note: _cons estimates baseline relative risk for each outcome.

        Comment


        • #5
          Thanks for showing some results, but I lack the ability to tell whether you would be better off with a transform or other. You need to try it to see.

          The SE is not insane in your results: it is tied up with the units or measurement scale you are using. Here is a silly illustration of the main point. In Stata's auto data, prices are in USD. (The prices are 1970s prices.) If it suited some purpose, we could change to thousands of dollars or to cents or anything else. The coefficients and SEs change accordingly. The t statistic is dimensionless and as such an indicator of relative importance (with all sorts of reservations about wording for separate reasons).

          In your context, and most others, coefficients and SEs also depend on what else in the model

          Code:
          . sysuse auto, clear
          (1978 automobile data)
          
          . regress mpg price
          
                Source |       SS           df       MS      Number of obs   =        74
          -------------+----------------------------------   F(1, 72)        =     20.26
                 Model |  536.541807         1  536.541807   Prob > F        =    0.0000
              Residual |  1906.91765        72  26.4849674   R-squared       =    0.2196
          -------------+----------------------------------   Adj R-squared   =    0.2087
                 Total |  2443.45946        73  33.4720474   Root MSE        =    5.1464
          
          ------------------------------------------------------------------------------
                   mpg | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
          -------------+----------------------------------------------------------------
                 price |  -.0009192   .0002042    -4.50   0.000    -.0013263   -.0005121
                 _cons |   26.96417   1.393952    19.34   0.000     24.18538    29.74297
          ------------------------------------------------------------------------------
          
          . gen priceT = price / 1000
          
          . regress mpg priceT
          
                Source |       SS           df       MS      Number of obs   =        74
          -------------+----------------------------------   F(1, 72)        =     20.26
                 Model |  536.541807         1  536.541807   Prob > F        =    0.0000
              Residual |  1906.91765        72  26.4849674   R-squared       =    0.2196
          -------------+----------------------------------   Adj R-squared   =    0.2087
                 Total |  2443.45946        73  33.4720474   Root MSE        =    5.1464
          
          ------------------------------------------------------------------------------
                   mpg | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
          -------------+----------------------------------------------------------------
                priceT |  -.9191631   .2042163    -4.50   0.000    -1.326261   -.5120652
                 _cons |   26.96417   1.393952    19.34   0.000     24.18538    29.74297
          ------------------------------------------------------------------------------
          
          . gen priceC = price * 100
          
          . regress mpg priceC
          
                Source |       SS           df       MS      Number of obs   =        74
          -------------+----------------------------------   F(1, 72)        =     20.26
                 Model |  536.541807         1  536.541807   Prob > F        =    0.0000
              Residual |  1906.91765        72  26.4849674   R-squared       =    0.2196
          -------------+----------------------------------   Adj R-squared   =    0.2087
                 Total |  2443.45946        73  33.4720474   Root MSE        =    5.1464
          
          ------------------------------------------------------------------------------
                   mpg | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
          -------------+----------------------------------------------------------------
                priceC |  -9.19e-06   2.04e-06    -4.50   0.000    -.0000133   -5.12e-06
                 _cons |   26.96417   1.393952    19.34   0.000     24.18538    29.74297
          ------------------------------------------------------------------------------
          I assume you're alluding to a cube root transform (not cubic). Cubing a variable that is variously negative, zero or positive is legal but would cause your outliers to explode outwards, relatively as well as absolutely.

          Comment


          • #6
            Nick Cox thank you for the feedback

            Comment

            Working...
            X