Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Predict values after "skewnreg" command

    Dear Statalisters,

    I am struggling to solve an arguably simple problem:
    • I have a variable (min:0, max: 100) that is drawn from a skew-normal distribution.
    • I want to fit a skew-normal distribution model and compute the fitted values for the integers from 0 to 100.
    There is an excellent user-written set of commands for skew-normal distributions (ssc install st0207), in particular, the skewnreg and skewrplot command.

    However, I can't figure out how to get this list of hypothetical values between 0 and 100.

    I might be falsely using the predict command?

    Here is a simple example of my attempts with the auto dataset:

    Code:
    sysuse auto, replace
    
    // Run skew-normal-regression
    ** ssc install st0207
    skewnreg mpg
    
    // Show fitted values vs. histogram
    skewrplot, fitted
    
    // Replace variable values with hypothetical range
    set obs 101
    egen range = seq(), f(0) t(100)
    replace mpg = range
    
    // Predict values from fitted model?
    predict mpg_fitted
    Any hint is highly appreciated!

    Best regards,
    John
    Last edited by John Hanser; 01 Dec 2022, 05:56.

  • #2
    Some confusion here. skewnreg is from the Stata Journal and can't be found on SSC at all, so the commented out ssc command won't work.

    From what else I understand I get the impression that you want to stretch predicted values to cover the range from 0 to 100 and what's more for those to be integers too.

    An immediate difficulty is that your example just puts a constant into the predicted variable, as there are no predictors.

    Perhaps your real problem has predictors, and you want to scale to [0, 100] which could be


    Code:
    predict predicted
    
    su predicted
    
    gen wanted = 100 * (predicted - r(max)) / (r(max) - r(min))
    and then an application of round() produces integers.

    Or what you want are percentiles of the predicted response.

    The more I think about it, the less I understand what you are doing and trying to do. Sorry this won't help much, and someone else may be able to help more.
    Last edited by Nick Cox; 01 Dec 2022, 07:07.

    Comment


    • #3
      Thank you for the quick reply, Nick.

      Sorry that I didn't make myself clear. Please ignore the integer part of the question.

      Let me write a few lines about my motivation; maybe that clarifies things:

      I want to be able to answer the question: "How likely is it to draw a number between X and X+1 from the process that generated my observations of variable mpg?".
      Since I don't know the exact number-generating process, I looked at the histogram, and a skew-normal distribution looked suitable.
      Then, I try to fit a skew-normal model with "skewnreg". The resulting "skewrplot" suggests a good fit.


      Click image for larger version

Name:	Graph.jpg
Views:	1
Size:	85.1 KB
ID:	1691628



      I basically want to access the values of the blue line. Additionally, I want to obtain the estimated probability"values for non-plotted values up to 100.

      Using "predict" might be a totally wrong approach.

      Hope that clarifies my question.

      Best,
      John
      Last edited by John Hanser; 01 Dec 2022, 07:30.

      Comment


      • #4
        That makes your problem clearer. Although it's presumably not your real problem, the small print from skewnreg mpg indicates that the fit is suspect.

        What you're trying to do strikes me as very tricky statistically. Although trying to read off the cumulative distribution function from the data is tricky -- real data comes with lumps and gaps that don't mean much usually -- you could fit any number of loosely plausible skewed distributions to data like mpg and get different answers.

        But I've never wanted any version of your problem statistically, so I won't try to lay down precepts on what will work best.

        Comment


        • #5
          Thanks, Nick! Note that the auto data was just an example. My actual data fits substantially better.

          Hope that someone might have a suggestion...

          Comment

          Working...
          X