Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Appropriate regression model

    My dataset consists of an amount of lubricant added to a process at various times. The lubricant is in predetermined volumes:
    Code:
    . tab lub
    
            lub |      Freq.     Percent        Cum.
    ------------+-----------------------------------
         100.00 |          1        0.65        0.65
         200.00 |          3        1.95        2.60
         300.00 |          5        3.25        5.84
         400.00 |         26       16.88       22.73
         450.00 |          1        0.65       23.38
         500.00 |          1        0.65       24.03
         600.00 |         57       37.01       61.04
         800.00 |         16       10.39       71.43
         900.00 |         11        7.14       78.57
        1000.00 |          7        4.55       83.12
        1050.00 |          1        0.65       83.77
        1200.00 |         21       13.64       97.40
        1400.00 |          1        0.65       98.05
        1500.00 |          2        1.30       99.35
        2000.00 |          1        0.65      100.00
    ------------+-----------------------------------
          Total |        154      100.00
    
    . reg lub t
    
          Source |       SS           df       MS      Number of obs   =       151
    -------------+----------------------------------   F(1, 149)       =     10.84
           Model |  996788.379         1  996788.379   Prob > F        =    0.0012
        Residual |  13696754.7       149   91924.528   R-squared       =    0.0678
    -------------+----------------------------------   Adj R-squared   =    0.0616
           Total |    14693543       150  97956.9536   Root MSE        =    303.19
    
    ------------------------------------------------------------------------------
             lub |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
               t |   4.627758   1.405351     3.29   0.001     1.850765     7.40475
           _cons |   513.9383   64.98879     7.91   0.000     385.5196     642.357
    ------------------------------------------------------------------------------
    When I investigate the residuals using dpplot and qenvnormal from SSC, I have no worries about normality.

    My concern is the fact that lub, although a continuous variable, can only take specific values. Is it legitimate to use OLS in this situation or should I consider other techniques such as npregress.

    I would be grateful for any advice.

    This is a data subset:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int(lub t)
     400  54
     600  27
    1000  42
     800  29
     200  30
     200  29
     400  34
     400  48
     400  39
     400  36
     300  43
     800  12
     100  48
    2000  51
    1000  46
     400  29
     600  30
     400  15
     400  37
     400  31
     600  19
     800  66
     800  21
     400  29
     800  29
     400  38
     400  27
     400  61
     400  59
     400  46
    1400  54
     400  29
     800  43
     800  22
     600  29
     800  67
     600 100
     400  66
    1200  67
     800   .
     400  31
     800  29
     800  32
     600  67
     800  55
     600  59
     400  61
    1200  60
    1000  51
     400  18
    end
    Thank you,

    Janet

    Stata IC 16.0

  • #2
    In principle, I see no great problem in that, although I would be more tempted given a non-negative response and the discussion at

    https://blog.stata.com/2011/08/22/us...tell-a-friend/

    to use


    Code:
    poisson lub t , vce(robust)
    even though for your example data the answer is very similar.

    Comment


    • #3
      Thank you for both the advice and reference.
      Janet

      Comment

      Working...
      X