maximum likelihood OLS regression in Statsby

Madu Abuchi

Join Date: Sep 2017
Posts: 143

maximum likelihood OLS regression in Statsby

19 Apr 2019, 19:58

Hi Statalisters,

I am trying to estimate intercept and coefficient of individual level observations using -statsby- command and maximum likelihood OLS regression but hitting a rock. The estimations runs but the estimates are not posted in the new dataset created by statsby. Can someone help with ideas on how to navigate through this problem? Examples are giving below:

Code:

clear
input float(id year weight treat)
 1 0    89.72 1
 1 1   90.177 1
 1 2 93.59666 1
 2 0   84.507 1
 2 1   88.757 1
 2 2   50.763 1
 3 0   73.043 1
 3 1   74.603 1
 3 2   62.313 1
 4 0   82.553 1
 4 1   90.303 1
 4 2   82.823 1
 5 0    96.54 1
 5 1   89.697 1
 5 2   87.447 1
 6 0   29.727 0
 6 1    31.58 0
 6 2   24.757 0
10 0   75.783 1
10 1    78.18 1
10 2   76.493 1
end

Code:

capture program drop lfols
program lfols
  version 14.1
  args lnf xb lnsigma
  local y "$ML_y1"
  quietly replace `lnf' = ln(normalden(`y', `xb',exp(`lnsigma')))
end

ml model lf lfols (xb: weight = year) (lnsigma:)
statsby cons=_b[xb:_cons] slope= _b[xb:year], by(treat id) clear:  ml maximize

list

HTML Code:

treat    id    cons    slope    
                    
1.    0    6    .    .    
2.    1    1    .    .    
3.    1    2    .    .    
4.    1    3    .    .    
5.    1    4    .    .    
                    
6.    1    5    .    .    
7.    1    10    .    .

Any help will be appreciated.

Regards,

Madu

Last edited by Madu Abuchi; 19 Apr 2019, 20:03. Reason: Reduced length of example data

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#2

19 Apr 2019, 20:11

You can discover the problem for yourself by adding a -noisily- option to your -statsby- command. Then Stata will tell you that you have to run an -ml model- command each time, you can't just run -ml maximize- by itself. But, of course, -statsby- will only take one command. You can get around that by wrapping -ml model- and -ml maximize- into a program and than having -statsby- iterate that. But if you're going to that much trouble, you may as well do it the easy and quick way with -runby- instead.

Code:

capture program drop lfols program lfols version 14.1 args lnf xb lnsigma local y "$ML_y1" quietly replace `lnf' = ln(normalden(`y', `xb',exp(`lnsigma'))) end capture program drop one_group program define one_group ml model lf lfols (xb: weight = year) (lnsigma:) ml maximize gen cons = _b[xb:_cons] gen slope = _b[xb:year] exit end runby one_group, by(treat id)

Notes:

1. Instead of creating a separate data file, the above puts the values of cons and slope into the original data set in variables with those same names, in the observations with the corresponding values of treat and id. If you prefer to have a separate data set such as the one -statsby- would have given you, you can just -keep treat id cons slope- and then -duplicates drop-.

2. If your data set is large, add the -status- option to the -runby- command so you will get a progress report as the calculations proceed.

3. -runby- is written by Robert Picard and me, and is available from SSC. If your data set is at all sizeable, this approach will be much faster than -statsby-.

Added: I assume you are doing this as a learning experience, or that this isn't your real problem, just a simplified version of it. There is no reason to use maximum likelihood estimation to do linear regression: the OLS estimator produces the same results as ML and is much quicker. In fact, if this is your real problem and you just need group-specific regression coefficients the whole thing can be reduced to a single line of code:

Code:

rangestat (reg) weight year, by(id treat) interval(year . .)

Note; -rangestat- is by Robert Picard, Nick Cox, and Roberto Ferrer, and is also available from SSC.

Last edited by Clyde Schechter; 19 Apr 2019, 20:23.
2 likes
Comment
Madu Abuchi

Join Date: Sep 2017

Posts: 143
#3

19 Apr 2019, 20:25

Thank you Clyde. Much appreciate your help and the -runby- program....so amazing!

Added: You are right. I have done this using -regress- and -statsby- before and the point estimates looks same. But my confusion arose when a question was asked to use maximum likelihood to estimate the slopes, which warranted my attempting to write ML estimation approach.

Another thing I noticed was that regress uses |t| while the ML uses |z| with slight differences in their p-values. But not sure if these will make much difference in some real-world sense.

Last edited by Madu Abuchi; 19 Apr 2019, 20:40.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#4

19 Apr 2019, 21:08

Maximum likelihood estimation is asymptotically correct: it is a large sample procedure. OLS can be used appropriately with small or large samples. As the sample size goes to infinity, the t-statistic approaches the z-statistic and, in fact, the two are, for practical purposes, equal already at a sample size of about 60.
Comment

Announcement

maximum likelihood OLS regression in Statsby

Comment

Comment

Comment