Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Non-linear regression in Stata. Error: Starting values invalid or some RHS variables have missing values

    Dear all,

    I am writing here today with the hope that someone might be able to help me with an obstacle that I am facing on Stata. I am trying to run the following non-linear regression:

    nl (price = log(1-exp(-1*{a1}*Tau))+{a2}) where price is the dependent variable and Tau is the regressor.

    On running this regression, I am continuously getting this error: "starting values invalid or some RHS variables have missing values".

    Now I am completely stuck at this point and don't really know what to do. My RHS variables do not have any missing values and I have tried many different initial values for Tau but always get the same error.

    Please let me know if anyone has any suggestions on what I should do.

    Thanks
    Best,
    Prateek


  • #2
    Have you tried approaching it stepwise, that is, partial linearization in order to get halfway-decent starting values to feed to the ultimate regression equation?

    .ÿ
    .ÿversionÿ15.1

    .ÿ
    .ÿclearÿ*

    .ÿ
    .ÿsetÿseedÿ`=strreverse("1305918")'

    .ÿquietlyÿsetÿobsÿ25

    .ÿ
    .ÿgenerateÿdoubleÿtauÿ=ÿruniform()

    .ÿ
    .ÿgenerateÿdoubleÿpriceÿ=ÿlog(1ÿ-ÿexp(-1ÿ*ÿ2ÿ*ÿtau))ÿ+ÿ3ÿ+ÿrnormal(0,ÿ0.25)

    .ÿ
    .ÿ*
    .ÿ*ÿBeginÿhere
    .ÿ*
    .ÿgenerateÿdoubleÿepriceÿ=ÿexp(price)

    .ÿnlÿ(epriceÿ=ÿ(1ÿ-ÿexp(-1ÿ*ÿ{a1}ÿ*ÿtau))ÿ*ÿexp({a2})),ÿnolog
    (obsÿ=ÿ25)


    ÿÿÿÿÿÿSourceÿ|ÿÿÿÿÿÿSSÿÿÿÿÿÿÿÿÿÿÿÿdfÿÿÿÿÿÿÿMS
    -------------+----------------------------------ÿÿÿÿNumberÿofÿobsÿ=ÿÿÿÿÿÿÿÿÿ25
    ÿÿÿÿÿÿÿModelÿ|ÿÿ4888.5715ÿÿÿÿÿÿÿÿÿÿ2ÿÿ2444.28574ÿÿÿÿR-squaredÿÿÿÿÿ=ÿÿÿÿÿ0.9459
    ÿÿÿÿResidualÿ|ÿÿ279.48232ÿÿÿÿÿÿÿÿÿ23ÿÿ12.1514053ÿÿÿÿAdjÿR-squaredÿ=ÿÿÿÿÿ0.9412
    -------------+----------------------------------ÿÿÿÿRootÿMSEÿÿÿÿÿÿ=ÿÿÿ3.485887
    ÿÿÿÿÿÿÿTotalÿ|ÿÿ5168.0538ÿÿÿÿÿÿÿÿÿ25ÿÿ206.722153ÿÿÿÿRes.ÿdev.ÿÿÿÿÿ=ÿÿÿ131.2985

    ------------------------------------------------------------------------------
    ÿÿÿÿÿÿepriceÿ|ÿÿÿÿÿÿCoef.ÿÿÿStd.ÿErr.ÿÿÿÿÿÿtÿÿÿÿP>|t|ÿÿÿÿÿ[95%ÿConf.ÿInterval]
    -------------+----------------------------------------------------------------
    ÿÿÿÿÿÿÿÿÿ/a1ÿ|ÿÿÿ.0926413ÿÿÿ.5279595ÿÿÿÿÿ0.18ÿÿÿ0.862ÿÿÿÿ-.9995261ÿÿÿÿ1.184809
    ÿÿÿÿÿÿÿÿÿ/a2ÿ|ÿÿÿ5.594525ÿÿÿÿ5.50369ÿÿÿÿÿ1.02ÿÿÿ0.320ÿÿÿÿ-5.790725ÿÿÿÿ16.97977
    ------------------------------------------------------------------------------

    .ÿ
    .ÿnlÿ(priceÿ=ÿlog(ÿ1ÿ-ÿexp(-1ÿ*ÿ{a1}ÿ*tau)ÿ)ÿ+ÿ{a2}),ÿ///
    >ÿÿÿÿÿÿÿÿÿinitial(a1ÿ`=_b[/a1]'ÿa2ÿ`=_b[/a2]')ÿnolog
    (obsÿ=ÿ25)


    ÿÿÿÿÿÿSourceÿ|ÿÿÿÿÿÿSSÿÿÿÿÿÿÿÿÿÿÿÿdfÿÿÿÿÿÿÿMS
    -------------+----------------------------------ÿÿÿÿNumberÿofÿobsÿ=ÿÿÿÿÿÿÿÿÿ25
    ÿÿÿÿÿÿÿModelÿ|ÿÿ44.124468ÿÿÿÿÿÿÿÿÿÿ1ÿÿÿ44.124468ÿÿÿÿR-squaredÿÿÿÿÿ=ÿÿÿÿÿ0.9676
    ÿÿÿÿResidualÿ|ÿÿ1.4769624ÿÿÿÿÿÿÿÿÿ23ÿÿ.064215758ÿÿÿÿAdjÿR-squaredÿ=ÿÿÿÿÿ0.9662
    -------------+----------------------------------ÿÿÿÿRootÿMSEÿÿÿÿÿÿ=ÿÿÿ.2534083
    ÿÿÿÿÿÿÿTotalÿ|ÿÿÿ45.60143ÿÿÿÿÿÿÿÿÿ24ÿÿÿ1.9000596ÿÿÿÿRes.ÿdev.ÿÿÿÿÿ=ÿÿÿ.2247201

    ------------------------------------------------------------------------------
    ÿÿÿÿÿÿÿpriceÿ|ÿÿÿÿÿÿCoef.ÿÿÿStd.ÿErr.ÿÿÿÿÿÿtÿÿÿÿP>|t|ÿÿÿÿÿ[95%ÿConf.ÿInterval]
    -------------+----------------------------------------------------------------
    ÿÿÿÿÿÿÿÿÿ/a1ÿ|ÿÿÿ1.096691ÿÿÿÿ.402636ÿÿÿÿÿ2.72ÿÿÿ0.012ÿÿÿÿÿÿ.263775ÿÿÿÿ1.929607
    ÿÿÿÿÿÿÿÿÿ/a2ÿ|ÿÿÿ3.419029ÿÿÿ.2841542ÿÿÿÿ12.03ÿÿÿ0.000ÿÿÿÿÿ2.831212ÿÿÿÿ4.006847
    ------------------------------------------------------------------------------
    ÿÿParameterÿa2ÿtakenÿasÿconstantÿtermÿinÿmodelÿ&ÿANOVAÿtable

    .ÿ
    .ÿexit

    endÿofÿdo-file


    .

    Comment


    • #3
      Dear Joseph,

      Thank you so much for replying to my message. I tried running the exact same code that you did and after doing the partial linearization, when I try to feed the starting values into the ultimate regression I get the following error: matrix a11.471632094572502a23.188964046151424 not found

      Full Code:

      . set seed `=strreverse("1305918")'

      . quietly set obs 25

      . generate double tau = runiform()

      . generate double price = log(1-exp(-1*2*tau)) + 3 + rnormal(0, 0.25)

      . generate double eprice = exp(price)

      . nl (eprice = (1-exp(-1*{a1}*tau))*exp({a2})), nolog
      (obs = 25)

      Source | SS df MS
      -------------+------------------------------ Number of obs = 25
      Model | 3761.78431 2 1880.89216 R-squared = 0.9694
      Residual | 118.892216 23 5.1692268 Adj R-squared = 0.9667
      -------------+------------------------------ Root MSE = 2.273593
      Total | 3880.67653 25 155.227061 Res. dev. = 109.9305

      -------------------------------------------------------------------------------
      eprice | Coef. Std. Err. t P>|t| [95% Conf. Interval]
      ---------+--------------------------------------------------------------------
      /a1 | 1.471632 .4746885 3.10 0.005 .489664 2.4536
      /a2 | 3.188964 .201933 15.79 0.000 2.771234 3.606694
      -------------------------------------------------------------------------------

      . nl (price=log(1-exp(-1*{a1}*tau))+{a2}), initial(a1`=_b[/a1]'a2`=_b[/a2]') nolog

      matrix a11.471632094572502a23.188964046151424 not found
      r(480);



      Basically, it seems that Stata is not being able to use the values of a1 and a2 that had been estimated earlier. Any idea why this is happening? Even when I try to run this code on my original data I get the exact same error and it states that the matrix values are not being found. I am using Stata 13 version of the software, could that be the reason?

      Any help would be greatly appreciated. Thanks!


      Best,
      Prateek
      Last edited by Prateek Pillai; 19 Jun 2019, 14:31.

      Comment


      • #4
        Originally posted by Prateek Pillai View Post
        Any idea why this is happening?
        Try putting spaces between the parameter names and the macro-expansions of the starting values.

        I am using Stata 13 version of the software, could that be the reason?
        I doubt it.

        Comment


        • #5
          Dear Joseph,

          I did what you suggested and the regression is running perfectly but when I try to do the same on my original data I still get the error: "starting values invalid or some RHS variables have missing values". Can you suggest any other way that I can arrive at the appropriate starting values? Alternatively, is it possible to use other optimization methods such as direct search method (eg. golden search) to estimate this model and if so what are the codes to run that in Stata?

          Thanks again!
          Best,
          Prateek

          Comment


          • #6
            Are you getting that message even for the partially linearized version, or only for the final (ultimate) model using the initial values?

            Step 1: plot the data. Read the intercept (a2) off the graph (you might have to extrapolate to it by eyeballing the curve)

            Step 2. choose a middling number of Tau along the x axis, and for that abscissa, read the ordinate off the graph, the corresponding value of price (approximately)

            Step 3: plug the values in the formula and solve for a1

            Step 4. use those values for starting values in the -nl- command

            If you want to use other minimization / maximization algorithms, you can use the maximum likelihood analogue to -nl-
            Code:
            help mlexp

            Comment


            • #7
              Dear Joseph,

              I am getting the error message only for the final (ultimate) model using the initial values. I have tried to do what you suggested and plot the data but the graph is extremely messy and not very helpful to draw any inferences (I have attached pictures of the graph so you can take a look.). I think this is because my data set is huge (over 500000 rows) and as you can see from the scatter plot, the data is distributed extremely unevenly with all the observations clustering at extreme points.

              Click image for larger version

Name:	Line.png
Views:	1
Size:	12.4 KB
ID:	1504151

              Click image for larger version

Name:	Scatter.png
Views:	1
Size:	10.5 KB
ID:	1504152

              Additionally, I tried to run this same regression using the gmm command and its working. However, when I add additional regressors to my equation (there are more than 100 regressors overall including more than 90 dummy variables), Stata runs for an extremely long time without completing the operation. During my last attempt, Stata ran for more than 24 hours until finally I terminated the operation!

              Following is the code I ran:

              gmm (price - ln(1-exp(-1*{a1=0.02}*Tau-1*{a2}*(Tau*HighRise)))-NewSale*{b1=0.1858334}-ln( Areasqm)*{b2=-0.0006558}- FLR4_n*{b3=0.169311}- FLR10_n*{b4=0.563479}-m1*{b5}-m2*{b6}-m3*{b7}-m4*{b8}-m5*{b9}-m6*{b10}-m7*{b11}-m8*{b12}-m9*{b13}-m10*{b14}-m11*{b15}-m12*{b16}-m13*{b17}-m14*{b18}-m15*{b19}-m16*{b20}-m17*{b21}-m18*{b22}-m19*{b23}-m20*{b24}-m21*{b25}-m22*{b26}-m23*{b27}-m24*{b28}-m25*{b29}-m26*{b30}-m27*{b31}-m28*{b32}-m29*{b33}-m30*{b34}-m31*{b35}-m32*{b36}-m33*{b37}-m34*{b38}-m35*{b39}-m36*{b40}- m37*{b41}-m38*{b42}-m39*{b43}-m40*{b44}-m41*{b45}-m42*{b46}-m43*{b47}-m44*{b48}-m45*{b49}-m46*{b50}-m47*{b51}-m48*{b52}-m49*{b53}-m50*{b54}-m51*{b55}-m52*{b56}-m53*{b57}-m54*{b58}-m55*{b59}-m56*{b60}-m57*{b61}-m58*{b62}-m59*{b63}-m60*{b64}-m61*{b65}-m62*{b66}-m63*{b67}-m64*{b68}-m65*{b69}-m66*{b70}-m67*{b71}-m68*{b72}-m69*{b73}-m70*{b74}-m71*{b75}-m72*{b76}-m73*{b77}-m74*{b78}-m75*{b79}-m76*{b80}-m77*{b81}-m78*{b82}-m79*{b83}-m80*{b84}-m81*{b85}-m82*{b86}-m83*{b87}-m84*{b88}-m85*{b89}-m86*{b90}-m87*{b91}-m88*{b92}-m89*{b93}-m90*{b94}-m91*{b95}-m92*{b96}-m93*{b97}-m94*{b98}-m95*{b99}-m96*{b100}-m97*{b101}-{b102}), instruments(Tau HighRise NewSale Areasqm FLR4_n FLR10_n m1 m2 m3 m4 m5 m6 m7 m8 m9 m10 m11 m12 m13 m14 m15 m16 m17 m18 m19 m20 m21 m22 m23 m24 m25 m26 m27 m28 m29 m30 m31 m32 m33 m34 m35 m36 m37 m38 m39 m40 m41 m42 m43 m44 m45 m46 m47 m48 m49 m50 m51 m52 m53 m54 m55 m56 m57 m58 m59 m60 m61 m62 m63 m64 m65 m66 m67 m68 m69 m70 m71 m72 m73 m74 m75 m76 m77 m78 m79 m80 m81 m82 m83 m84 m85 m86 m87 m88 m89 m90 m91 m92 m93 m94 m95 m96 m97)

              Where the regressors m1-m97 are dummy variables. Interestingly the initial value that I assigned to Tau in this equation i.e. 0.02 works but doesn't work when I try to use in with nl.

              Any idea why the gmm estimation is taking such a long time? Alternatively could you could suggest some other way to find the initial values for nl?

              Once again, many thanks! I really appreciate you helping me out.
              Best,
              Prateek

              Comment


              • #8
                Originally posted by Prateek Pillai View Post
                could you could suggest some other way to find the initial values for nl?
                I don't think that there is one. You have essentially four values of Tau, one of which about -1000, a second about zero and the fourth about +100 000. Forget its magnitude—what sign of the parameter a1 do you think would work to cover this domain?

                . foreach Tau in -1000 0 100000 {
                2. foreach a1 in -1 0 1 {
                3. display in smcl as text "log(1-exp(-1*{a1}*Tau)); Tau = `Tau'; a1 = `a1' = " log(1-exp(-1 * `a1' * `Tau'))
                4. }
                5. }
                log(1-exp(-1*{a1}*Tau)); Tau = -1000; a1 = -1 = 0
                log(1-exp(-1*{a1}*Tau)); Tau = -1000; a1 = 0 = .
                log(1-exp(-1*{a1}*Tau)); Tau = -1000; a1 = 1 = .
                log(1-exp(-1*{a1}*Tau)); Tau = 0; a1 = -1 = .
                log(1-exp(-1*{a1}*Tau)); Tau = 0; a1 = 0 = .
                log(1-exp(-1*{a1}*Tau)); Tau = 0; a1 = 1 = .
                log(1-exp(-1*{a1}*Tau)); Tau = 100000; a1 = -1 = .
                log(1-exp(-1*{a1}*Tau)); Tau = 100000; a1 = 0 = .
                log(1-exp(-1*{a1}*Tau)); Tau = 100000; a1 = 1 = 0


                Why are the dots connected in the first graph? I don't know anything about -gmm-, but it doesn't look any more appropriate than -nl- to model repeated measurements of price on an entity.

                Comment


                • #9
                  Dear Joseph,

                  I dropped the observations for which Tau is negative and the -nl- equation is now working perfectly.

                  Thank you so much!

                  Comment

                  Working...
                  X