Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Estimating a non-linear system of equations

    I want to estimate the following system of equations where the dependent variable is a function of (logged) experience.

    Y(it) = a*X(it) + b*log(experience(it)) + error(i, t)
    experience (it) = c* experience (i,t-1) + dummy(i, t-1)

    The way I think of experience is that it is accumulated over time but only a fraction c < 1 is retained over time periods. (1 – c) depreciates away. Experience is augmented if the dummy variable takes the value 1 in the previous period (e.g., i indexes a firm and if they exported in the previous period (export dummy =1) then this augments their experience today). Note there is no error term in the second equation. I want to estimate both b (the importance of experience) and c (the retention parameter.) Y is a count variable, so I want to use Poisson for the first equation.

    I can estimate this by non-linear least squares (NLS) but the dataset has 1 million observations and 20,000 dummies in X. So NLS simply cannot do this – my machine just hangs. Can this be done in a more computationally efficient way? Example use maximum likelihood. Any suggestions are welcome.

    Thanks in advance, Pushan

  • #2
    We normally use our real names on this list serve. Please read the FAQ on asking questions - provide Stata code in code delimiters, readable Stata output, and sample data using dataex. For this kind of problem, your version of Stata might also be important.

    I don't see that ML would be much faster than NLS unless you can give it analytical derivatives - they're both iterative approaches to minimizing something like the sum of squared errors by changing your parameters.

    When you say your computer hangs, it is not clear what this means. Have you tried letting it run over a weekend or some such?

    If you set c, then you can calculate the log() variable and run the Y equation by ols. So, you might be able to get good starting values or even estimates but not with the statistical properties, by looping over values of c and estimating the ols repeatedly and picking the one with the highest R-square.

    Comment


    • #3
      Dear Phil,

      Thanks for the response. I had to write to STATA to display my real name which was weird. There is no option under profile or user settings. They just changed my displayed name manually yesterday. I am running STATA MP 14.2 with 4 cores and have allocated 20 GB of memory to STATA.

      By hanging I mean that the successive iterations in NLS take about 2 days or so. Sometimes the machine just crashes. I have let it run for a week or so but to no avail. I did try the procedure of iterating over c and picking the highest R-square but referees were unhappy about it.

      I am not sure what is the binding constraint - the power of my machine (24 GB RAM, 4 core 2.66GHZ) , the number of variables (20,000), the number of observations (1 million), or some mix of the two.

      Nick Cox in an earlier thread spoke about scenarios where convergence is usually not an issue
      >1. the model is actually right for the data in a qualitative sense > (easy to say, hard to define, obvious when it fits well) > > 2. you supply good initial guesses for the parameters (this is perhaps the easiest one to tweak) > > 3. you are estimating a small number of parameters > > 4. you have a good ratio of data points to parameters > > 5. the data are not grotesquely behaved (e.g. outliers and high > skewness can be just as problematic as with linear models) > > 6. the model is not highly nonlinear (the textbooks are full of this)

      For my data I have both a large number of parameters and the data is very skewed (many zeros).

      Best, Pushan

      Comment

      Working...
      X