Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Logistic regression with probabilities instead of binary values

    Hi everyone,

    I would like to run a logistic regression of a certain variable distance to get the probability of a certain outcome valid, but instead of rows in the format
    distance valid
    500 1
    500 1
    500 0
    1000 1
    1000 0
    I have counts, for ex
    distance valid_count total_count
    500 2 3
    1000 1 2
    I reasoned it should be possible to run the logistic regression with the data as it is, using frequency weights. I generated a new variable valid_ratio with
    Code:
    . gen valid_ratio = valid_count/total_count
    but when running the regression I get the following error:

    Code:
    . logistic valid_ratio distance [fweight = total_count]
    outcome does not vary; remember:
                                      0 = negative outcome,
            all other nonmissing values = positive outcome
    Which I assume means Stata is interpreting all my ratios as positive outcomes.

    My first question is: is it possible to run this logistic regression with the data as it is? If so, how?

    If not, what is the best way to process the data to make this regression possible? The two options that come to my mind are:
    1. generate, for every currently existing observation, two rows: one with valid=1 and count=valid_count, and the other with valid=0 and count=total_count-valid_count, and then run the regression with
      Code:
      . logistic valid distance [fweight = count]
      . In this case, the variables would look like
      distance valid count
      500 1 2
      500 0 1
      1000 1 1
      1000 0 1
    2. generate, for each currently existing observation, total_count rows, being valid_count of them with valid=1 and total_count-valid_count of the with valid=0. In this case, the variables would be exactly in the format of the first table​​​​​ and I would run the regression without any weights:
      Code:
      . logistic valid distance
      .
    I prefer option 1 because it looks cleaner, but maybe that won't work with the frequency weights?

    Thanks in advance.
    Last edited by Clara Daru; 05 Nov 2020, 12:05.

  • #2
    Clara: You want to use fracreg logit. It is intended precisely for fractional responses. Be sure to use the margins command.

    Alternatively, you could use binomial regression and implement it using the glm command.

    JW

    Comment


    • #3
      Hello Clara Daru. I wonder if -fracreg logit- is what you are looking for. E.g.,

      Code:
      fracreg logit valid_ratio distance
      HTH
      Last edited by Bruce Weaver; 05 Nov 2020, 12:10. Reason: Crossed with Jeff's post (#2).
      --
      Bruce Weaver
      Email: [email protected]
      Version: Stata/MP 18.5 (Windows)

      Comment


      • #4
        Thanks Jeff Wooldridge and Bruce Weaver , fracreg was exactly what I needed!

        Comment


        • #5
          No worries, Clara, and welcome to Statalist. (I failed to notice last time that it was your first post.)
          --
          Bruce Weaver
          Email: [email protected]
          Version: Stata/MP 18.5 (Windows)

          Comment

          Working...
          X