Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • offset variable in panel data GEE

    Hi, stata users

    I want some explanation about "offset variable"
    My dataset below is a unbalanced panel data,
    What I want to look for is level of noise exposure has impact on occurence of disease.

    I heard that offset variable can be used to take account the exposure time in the analysis...so,
    when I run the xtgee with offset variable "year" (which is the year that subject has
    participated the survey), the result come out significant(the exposure makes possibility of outcome high).
    but when I don't consider the year as offset variable, it is insignificant..

    Am I wrong to enter the "year" as offset variable??

    thank you in advance..

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double(P_no R_time atopyDx1y year age sex) byte noise_q double Noisemodel
    1 1 1 2013  9 2 1  42.81393064795522
    1 2 1 2015 11 2 1  42.20467563905939
    2 1 1 2013  9 2 1 43.211582234526695
    2 2 1 2015 11 2 1  42.78365005924527
    3 1 0 2013  7 2 3 50.711780353822846
    end
    -----------------


    xtset P_no year
    panel variable: P_no (unbalanced)
    time variable: year, 2009 to 2018, but with gaps
    delta: 1 unit

    . xtgee atopyDx1y Noisemodel, family(binomial 1) link(logit) offset(year) corr(ex
    > changeable) vce(robust)

    Iteration 1: tolerance = .00693255
    Iteration 2: tolerance = .00191436
    Iteration 3: tolerance = .00073082
    Iteration 4: tolerance = .00029676
    Iteration 5: tolerance = .00012216
    Iteration 6: tolerance = .00005058
    Iteration 7: tolerance = .000021
    Iteration 8: tolerance = 8.726e-06
    Iteration 9: tolerance = 3.629e-06
    Iteration 10: tolerance = 1.509e-06
    Iteration 11: tolerance = 6.278e-07

    GEE population-averaged model Number of obs = 10976
    Group variable: P_no Number of groups = 4731
    Link: logit Obs per group: min = 1
    Family: binomial avg = 2.3
    Correlation: exchangeable max = 4
    Wald chi2(1) = 631.30
    Scale parameter: 1 Prob > chi2 = 0.0000

    (Std. Err. adjusted for clustering on P_no)
    ------------------------------------------------------------------------------
    | Semirobust
    atopyDx1y | Coef. Std. Err. z P>|z| [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    Noisemodel | .189584 .0075454 25.13 0.000 .1747953 .2043728
    _cons | -2024.045 .3792067 -5337.58 0.000 -2024.788 -2023.302
    year | 1 (offset)
    ------------------------------------------------------------------------------

    .

  • #2
    The year variable, as you describe it, is probably not appropriate as an -offset()-. What would be appropriate is if you have a time variable that represented how long the person had been exposed to noise. And even then, it would be better used in the -exposure()- option rather than the -offset()- option.

    When you use year in the -offset()- option you are fitting a model in which it is constrained that the log odds of disease increases by exactly 1 every calendar year. That is a very odd model and I doubt that any natural process works like that.

    By the way, if you want to do honest science, you must never decide what model to use based on whether it produces "significant" results or not. You must first pick an appropriate model that is a suitable match to the design of your study and the real-world data generating process, and then go with the results it provides. If you want to do additional exploratory analyses after that, that's fine, but their conclusions should always be regarded as hypothesis generating, not as resolving any research question.

    Comment


    • #3
      Thank you for your answer!! Also, I will keep in mind about the honest science.!!
      Actually this time, I came up with the idea that I have to consider the amount of time the subjects are exposed,
      but I wasn't "sure" that I am doing the right thing,,,,,,,
      I'll try the variable that participant has been exposed as you recommended!!

      Thank you again!!

      Comment


      • #4
        When logit is used to produce an odds ratio of the disease among observations, exposed to risk factor during different period of time, the offset option instead of exposure (like that in poisson) is only available. Can offset (ln_time) be used in this case to assess the odds ratio and what is the dimension of that result (I'm not sure, if the odds ratio per 1 year is correct)?

        Comment

        Working...
        X