Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generalized beta distribution in probabilistic question

    Hi everyone,

    I'm currently trying to fit a generalized beta distribution for every individuals that I have in a survey in order to take interquartile range of the responses as proxy for uncertainty levels. The formulation derives from a famous paper of Engelberg and Manski (2009) that is called "Comparing the Point Predictions and Subjective Probability Distributions of Professional Forecasters", widely used in the surveys of the fed and the ecb.
    I was wondering for a way to derive the distribution's percentiles directly from STATA, that is not something that I found up to now (maybe I'm wrong).

    In particular, the distribution is fitted on the outcomes that individuals give to a probabilistic question where they can attach different probability (summed up to 100) from 0 to 100 to different bins of inflation expectations, in my case I have:
    • Increase by 8% or more
    • Increase by 4% to less than 8%
    • Increase by 2% to less than 4%
    • Increase by 0% to less than 2%
    • Decrease by more than 0% to less than 2%
    • Decrease by 2% to less than 4%
    • Decrease by 4% to less than 8%
    • Decrease by 8% or more
    the estimate of the distribution is the following

    \[
    \min_{\substack{a > 1,\, b > 1,\\ l > L,\, r > r}} \sum_{i=1}^{8} \left[ B(x_i; a, b, l, r) - F(x_i) \right]^2
    \]


    The distribution parameters (a,b,l,r)(a, b, l, r)(a,b,l,r) are estimated by minimizing the squared distance between the cumulative beta distribution and the respondent’s empirical distribution over the 8 bins.
    • a,b>1 are shape parameters.
    • l,r are location parameters, constrained by lower (L) and upper (R) bounds if the outer bins are selected (i.e., bins 1 or 8).
    • F(x_i) represents the cumulative empirical distribution based on the respondent’s probability allocations.
    Depending on the bins selected:
    • If neither bin 1 nor bin 8 is used → only a and b are estimated; l and r are fixed.
    • If one of bin 1 or 8 is used → estimate a,b and the corresponding location parameter.
    • If both outer bins are used → all four parameters (a,b,l,r) are estimated.
    Initialization and bounds:
    • Initial parameter guesses: a=b=2, l=−12, r=12
    • Constraints: L=−38, R=+38
    Sorry for the length of the post,
    if you have any idea on how to proceed, or if you know whether someone already implemented this methodology on STATA, I would be very glad.



    Thanks,
    best regards,

    Riccardo
    Last edited by Riccardo Rasoni; 13 May 2025, 10:07. Reason: beta distribution

  • #2
    Dear Riccardo,

    Stata does not natively support estimation of the full four-parameter generalized beta distribution, but it is possible to implement the procedure manually. The key steps involve (1) constructing the empirical cumulative distribution for each respondent based on their assigned probabilities over the eight bins; (2) defining the generalized beta cumulative distribution function using Stata’s ibeta() function, which computes the regularized incomplete beta function; and (3) estimating the parameters (a, b, l, r) by minimizing the squared error between the empirical CDF and the generalized beta CDF. This can be done using Stata’s ml command or Mata's optimization routines. You can fix or estimate l and r depending on whether the outer bins are used, as described in your message.

    Here is a basic example of how you can define the generalized beta CDF as a user-defined program in Stata:

    Code:
    program define gbeta_cdf, rclass
        // Arguments: x = evaluation point, a = shape1, b = shape2, l = lower bound, r = upper bound
        args x a b l r
        tempvar z cdf
        gen double `z' = (`x' - `l') / (`r' - `l')
        gen double `cdf' = ibeta(`a', `b', `z')
        return scalar cdf = `cdf'
    end

    Once parameters are estimated, extracting percentiles such as the interquartile range is straightforward using the inverse CDF of the generalized beta. In Stata, this is available through the invibeta() function. For example, the 25th percentile can be computed as l + (r − l) * invibeta(a, b, 0.25), and similarly for the 75th percentile. The difference between these gives you the interquartile range, which serves as your uncertainty proxy.

    This approach will require looping over individuals or using statsby to apply the estimation per respondent. If you are interested, I can provide a template .do or .ml file that sets up the estimation for one respondent, which you could then scale to your full sample.

    Best regards,
    Josh from Estima

    Comment


    • #3
      Originally posted by Josh Zweig View Post
      Dear Riccardo,

      Stata does not natively support estimation of the full four-parameter generalized beta distribution, but it is possible to implement the procedure manually. The key steps involve (1) constructing the empirical cumulative distribution for each respondent based on their assigned probabilities over the eight bins; (2) defining the generalized beta cumulative distribution function using Stata’s ibeta() function, which computes the regularized incomplete beta function; and (3) estimating the parameters (a, b, l, r) by minimizing the squared error between the empirical CDF and the generalized beta CDF. This can be done using Stata’s ml command or Mata's optimization routines. You can fix or estimate l and r depending on whether the outer bins are used, as described in your message.

      Here is a basic example of how you can define the generalized beta CDF as a user-defined program in Stata:

      Code:
      program define gbeta_cdf, rclass
      // Arguments: x = evaluation point, a = shape1, b = shape2, l = lower bound, r = upper bound
      args x a b l r
      tempvar z cdf
      gen double `z' = (`x' - `l') / (`r' - `l')
      gen double `cdf' = ibeta(`a', `b', `z')
      return scalar cdf = `cdf'
      end

      Once parameters are estimated, extracting percentiles such as the interquartile range is straightforward using the inverse CDF of the generalized beta. In Stata, this is available through the invibeta() function. For example, the 25th percentile can be computed as l + (r − l) * invibeta(a, b, 0.25), and similarly for the 75th percentile. The difference between these gives you the interquartile range, which serves as your uncertainty proxy.

      This approach will require looping over individuals or using statsby to apply the estimation per respondent. If you are interested, I can provide a template .do or .ml file that sets up the estimation for one respondent, which you could then scale to your full sample.

      Best regards,
      Josh from Estima
      Dear Josh,

      thank you very much for your answer, it was very helpful. Of course if you can provide the template that you mentioned I would be very glad
      thanks

      Riccardo

      Comment


      • #4
        You're very welcome, Riccardo. I'm glad it was helpful. Below is a template using Stata's ml command to estimate the generalized beta distribution for a single individual’s forecast. This assumes you've already processed the data into cumulative probabilities F_i and bin midpoints x_i.

        Step 1: Input Data for One Respondent

        Assume you have 8 observations for one individual, with variables:
        • x = bin midpoints
        • F = cumulative empirical distribution (based on assigned probabilities)
        Code:
        * Example data structure (1 respondent)
        clear
        input byte bin float(x F)
        1 -10 0.10
        2  -6 0.20
        3  -2 0.35
        4   1 0.55
        5   3 0.75
        6   5 0.90
        7   7 0.95
        8  10 1.00
        end
        
        *Step 2: Define the Generalized Beta CDF Evaluation
        program define beta_obj
            version 19.0
            args lnf a b l r
        
            tempvar z model_F
            gen double `z' = (x - `l') / (`r' - `l')
            gen double `model_F' = ibeta(`a', `b', `z')
        
            quietly replace `lnf' = -((F - `model_F')^2)
        end
        
        *Step 3: Estimate Parameters Using ml
        
        ml model lf beta_obj (a b l r)
        ml init a 2 b 2 l -12 r 12
        ml constrain 1 a > 1
        ml constrain 2 b > 1
        ml maximize
        
        *Step 4: Calculate Percentiles After Estimation
        ml model lf beta_obj (a b l r)
        ml init a 2 b 2 l -12 r 12
        ml constrain 1 a > 1
        ml constrain 2 b > 1
        ml maximize
        Best regards,
        Josh Zweig - Estima

        Comment


        • #5
          Originally posted by Josh Zweig View Post
          You're very welcome, Riccardo. I'm glad it was helpful. Below is a template using Stata's ml command to estimate the generalized beta distribution for a single individual’s forecast. This assumes you've already processed the data into cumulative probabilities F_i and bin midpoints x_i.

          Step 1: Input Data for One Respondent

          Assume you have 8 observations for one individual, with variables:
          • x = bin midpoints
          • F = cumulative empirical distribution (based on assigned probabilities)
          Code:
          * Example data structure (1 respondent)
          clear
          input byte bin float(x F)
          1 -10 0.10
          2 -6 0.20
          3 -2 0.35
          4 1 0.55
          5 3 0.75
          6 5 0.90
          7 7 0.95
          8 10 1.00
          end
          
          *Step 2: Define the Generalized Beta CDF Evaluation
          program define beta_obj
          version 19.0
          args lnf a b l r
          
          tempvar z model_F
          gen double `z' = (x - `l') / (`r' - `l')
          gen double `model_F' = ibeta(`a', `b', `z')
          
          quietly replace `lnf' = -((F - `model_F')^2)
          end
          
          *Step 3: Estimate Parameters Using ml
          
          ml model lf beta_obj (a b l r)
          ml init a 2 b 2 l -12 r 12
          ml constrain 1 a > 1
          ml constrain 2 b > 1
          ml maximize
          
          *Step 4: Calculate Percentiles After Estimation
          ml model lf beta_obj (a b l r)
          ml init a 2 b 2 l -12 r 12
          ml constrain 1 a > 1
          ml constrain 2 b > 1
          ml maximize
          Best regards,
          Josh Zweig - Estima
          Wow, impressive work!
          Thank you very much Josh, I really appreciate what you did.
          I immediately go to test it, thanks!
          Best regards,

          Riccardo

          Comment

          Working...
          X