Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • maximum likelihood OLS regression in Statsby

    Hi Statalisters,

    I am trying to estimate intercept and coefficient of individual level observations using -statsby- command and maximum likelihood OLS regression but hitting a rock. The estimations runs but the estimates are not posted in the new dataset created by statsby. Can someone help with ideas on how to navigate through this problem? Examples are giving below:

    Code:
    clear
    input float(id year weight treat)
     1 0    89.72 1
     1 1   90.177 1
     1 2 93.59666 1
     2 0   84.507 1
     2 1   88.757 1
     2 2   50.763 1
     3 0   73.043 1
     3 1   74.603 1
     3 2   62.313 1
     4 0   82.553 1
     4 1   90.303 1
     4 2   82.823 1
     5 0    96.54 1
     5 1   89.697 1
     5 2   87.447 1
     6 0   29.727 0
     6 1    31.58 0
     6 2   24.757 0
    10 0   75.783 1
    10 1    78.18 1
    10 2   76.493 1
    end

    Code:
    capture program drop lfols
    program lfols
      version 14.1
      args lnf xb lnsigma
      local y "$ML_y1"
      quietly replace `lnf' = ln(normalden(`y', `xb',exp(`lnsigma')))
    end
    
    ml model lf lfols (xb: weight = year) (lnsigma:)
    statsby cons=_b[xb:_cons] slope= _b[xb:year], by(treat id) clear:  ml maximize
    
    list

    HTML Code:
    treat    id    cons    slope    
                        
    1.    0    6    .    .    
    2.    1    1    .    .    
    3.    1    2    .    .    
    4.    1    3    .    .    
    5.    1    4    .    .    
                        
    6.    1    5    .    .    
    7.    1    10    .    .    
                        

    Any help will be appreciated.

    Regards,

    Madu
    Last edited by Madu Abuchi; 19 Apr 2019, 20:03. Reason: Reduced length of example data

  • #2
    You can discover the problem for yourself by adding a -noisily- option to your -statsby- command. Then Stata will tell you that you have to run an -ml model- command each time, you can't just run -ml maximize- by itself. But, of course, -statsby- will only take one command. You can get around that by wrapping -ml model- and -ml maximize- into a program and than having -statsby- iterate that. But if you're going to that much trouble, you may as well do it the easy and quick way with -runby- instead.

    Code:
    capture program drop lfols
    program lfols
      version 14.1
      args lnf xb lnsigma
      local y "$ML_y1"
      quietly replace `lnf' = ln(normalden(`y', `xb',exp(`lnsigma')))
    end
    
    capture program drop one_group
    program define one_group
        ml model lf lfols (xb: weight = year) (lnsigma:)
        ml maximize
        gen cons = _b[xb:_cons]
        gen slope = _b[xb:year]
        exit
    end
    
    runby one_group, by(treat id)
    Notes:

    1. Instead of creating a separate data file, the above puts the values of cons and slope into the original data set in variables with those same names, in the observations with the corresponding values of treat and id. If you prefer to have a separate data set such as the one -statsby- would have given you, you can just -keep treat id cons slope- and then -duplicates drop-.

    2. If your data set is large, add the -status- option to the -runby- command so you will get a progress report as the calculations proceed.

    3. -runby- is written by Robert Picard and me, and is available from SSC. If your data set is at all sizeable, this approach will be much faster than -statsby-.

    Added: I assume you are doing this as a learning experience, or that this isn't your real problem, just a simplified version of it. There is no reason to use maximum likelihood estimation to do linear regression: the OLS estimator produces the same results as ML and is much quicker. In fact, if this is your real problem and you just need group-specific regression coefficients the whole thing can be reduced to a single line of code:

    Code:
    rangestat (reg) weight year, by(id treat) interval(year . .)
    Note; -rangestat- is by Robert Picard, Nick Cox, and Roberto Ferrer, and is also available from SSC.
    Last edited by Clyde Schechter; 19 Apr 2019, 20:23.

    Comment


    • #3
      Thank you Clyde. Much appreciate your help and the -runby- program....so amazing!

      Added: You are right. I have done this using -regress- and -statsby- before and the point estimates looks same. But my confusion arose when a question was asked to use maximum likelihood to estimate the slopes, which warranted my attempting to write ML estimation approach.

      Another thing I noticed was that regress uses |t| while the ML uses |z| with slight differences in their p-values. But not sure if these will make much difference in some real-world sense.


      Last edited by Madu Abuchi; 19 Apr 2019, 20:40.

      Comment


      • #4
        Maximum likelihood estimation is asymptotically correct: it is a large sample procedure. OLS can be used appropriately with small or large samples. As the sample size goes to infinity, the t-statistic approaches the z-statistic and, in fact, the two are, for practical purposes, equal already at a sample size of about 60.

        Comment

        Working...
        X