Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Nick is right in his point about multivariate outliers. As a matter of fact, I have seen many papers in Finance that winsorize or drop values that are 3 SD away from mean values. In that case, we can adopt the following code
    sysuse auto
    foreach x of varlist price mpg{
    sum `x'
    drop if (`x' -(r(mean))>(3*r(sd)))
    }
    Regards
    Attaullah Shah
    Last edited by Attaullah Shah; 24 Sep 2014, 04:10.
    Regards
    --------------------------------------------------
    Attaullah Shah, PhD.
    Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
    FinTechProfessor.com
    https://asdocx.com
    Check out my asdoc program, which sends outputs to MS Word.
    For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.

    Comment


    • #17
      Nick Cox Dear Nick,

      I installed the "extremes" code written by you. I would like to use it to remove extreme values in my sample. However, I do not know how to actually remove those extreme values instead of just listing them. Is there any way to do this?

      ​Thanks in advance!

      Kind regards,

      Wesley

      Comment


      • #18
        You have now tacked a question on to a thread that was closed over a year ago. Start a new thread.
        Steve Samuels
        Statistical Consulting
        [email protected]

        Stata 14.2

        Comment


        • #19
          Steve Samuels I started a new thread.
          HTML Code:
          http://www.statalist.org/forums/forum/general-stata-discussion/general/1336660-remove-outliers-on-stata

          Comment


          • #20
            Originally posted by Nick Cox View Post
            "Think on a logarithmic scale" solves many more problems than eliminating outliers.
            Sir,

            What should I do, if the log value is also not normal?

            I have a dataset with 9000 observations. Can I assume normality just because the sample is large?

            Denila.

            Comment


            • #21
              Denila Jinny Not at all. A large sample can be highly non-normal too. To give a better answer, we need to know more about your data and your goals, especially on whether or why you think your data "should be" normal.

              Comment


              • #22
                Originally posted by Nick Cox View Post
                Denila Jinny Not at all. A large sample can be highly non-normal too. To give a better answer, we need to know more about your data and your goals, especially on whether or why you think your data "should be" normal.
                Thank you very much sir for your immediate reply.
                I am working on a cross sectional data. My objective is to study the causal relationships between funding, profitability and productivity. Literature suggests bi-directional relationships among these variables. Therefore I intend to do non-recursive SEM, one of the assumptions of which is normality. I have 4 continuous variables, 1 interaction variable that interacts 2 continuous variables, 2 interaction variables that interact one continuous variable with 1 dichotomous variable, and few other categorical variables. Will this be enough for you to help me with this issue?

                Comment


                • #23
                  I don't know anything you don't about SEM. My advice is to start a new thread with a title like "Non-normality and structural equation models" so that people who know about SEM can see that. Also, I would show some graphs of the distributions of your continuous variables to give us some flavour.

                  Comment


                  • #24
                    Originally posted by Nick Cox View Post
                    I don't know anything you don't about SEM. My advice is to start a new thread with a title like "Non-normality and structural equation models" so that people who know about SEM can see that. Also, I would show some graphs of the distributions of your continuous variables to give us some flavour.
                    OK. Thank you very much.

                    Comment


                    • #25
                      *This example shows how to highlight outliers using percentiles
                      input x
                      1
                      2
                      12
                      14
                      15
                      14
                      16
                      15
                      14
                      98
                      76
                      end
                      * let show outliers using boxplot
                      graph box x
                      *we can then summarize with details
                      sum x,detail
                      return list
                      gen x_outlier=1 if x<=r(p25)-(1.5*(r(p75)-r(p25)))|x>=r(p75)+(1.5*(r(p75)-r(p25)))
                      keep if x_outlier==1

                      Comment


                      • #26
                        #25 John W. Tukey proposed a rule of thumb to plot points separately on a box plot if greater than p75 + 1.5 IQR or less than p25 - 1.5 IQR.

                        So far, so good. This wasn't a recipe for identifying points to drop. In most cases the occurrence of outliers was, at least for Tukey, a signal to think about a transformation.

                        Comment

                        Working...
                        X