Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating dummy variable equal to one if current mean is one standard deviation above/below the historical mean

    Hello everybody,

    I am trying to create a dummy variable that is equal to one of current rainfall in a given district (avgrf) is one standard deviation below or above the historical mean of rainfall in that district (lravgrf). I also want to create a dummy equal to one if the current rainfall falls within the range of one standard deviation.

    Does anyone have any suggestions on how to do that?

    Best, Jérôme

  • #2
    Please follow the FAQ (http://www.statalist.org/forums/help) and use -dataex- to post some example data. Given there is no data example, I'd probably start with -egen std()-, with a -by- option for each district.

    Also, it's not clear what the historical rainfall is. Is it the mean of all the data, or the mean of all the data prior to time, t, and will get updated whenever it move to the next time point, t+1?

    Comment


    • #3
      Jerome:
      I do share Ken's concern (and related recommendations) about your research goal.
      That said, I've tried to sketch a possible toy-example as far as your first question is concerned:
      Code:
      . use "https://www.stata-press.com/data/r16/nlswork.dta"
      (National Longitudinal Survey.  Young Women 14-26 years of age in 1968)
      
      . quietly sum ln_wage
      
      . scalar hist_mean=r(mean)
      
      . scalar hist_sd=r(sd)
      
      . bysort idcode: gen wanted_1=1 if ln_wage<(hist_mean-hist_sd) | ln_wage>(hist_mean+hist_sd)
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment


      • #4
        The definitions here are less than clear. My guess is that you would be better off with a variable that is

        1 if rainfall > mean + SD

        0 if mean - SD < rainfall < mean + SD

        -1 if rainfall < mean - SD

        missing if rainfall missing.

        The wish to degrade to indicator variables (you say "dummy") is not obviously an improvement over using (rainfall - mean) / SD as a variable.

        Comment


        • #5
          Hey Ken, Thank you. I included the example data in here. The historical mean contains all data prior to t and is updated when moving to t+1.

          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input float(avgrf lravgrf)
          234.41667 226.42545
                164  141.4649
                  .         .
          204.41667 203.14473
            146.625  123.5965
           365.8333 256.55264
                164  141.4649
           365.8333 256.55264
          276.16666 237.66338
          224.16667 221.04825
              227.5    207.25
                  .         .
                  .         .
          227.58333 174.60965
                  .         .
          227.58333 174.60965
          227.58333 174.60965
           362.9167  334.6009
                  .         .
          227.58333 174.60965
           394.8333 280.26315
          227.58333 174.60965
          162.83333 131.41228
          227.58333 174.60965
          280.04166 246.01315
                  .         .
                  .         .
                  .         .
                  .         .
                187 170.17105
             267.25 213.02193
             214.75 184.71053
          169.97917 143.19408
          214.16667  188.4342
                  .         .
                  .         .
                  .         .
                  .         .
             214.75 184.71053
                  .         .
          276.27777 260.14328
             259.75 225.93333
           281.8611  231.7646
                  .         .
             150.75 121.98026
          305.08334 277.07455
            243.125  213.8304
          174.91667 157.10088
                  .         .
                  .         .
          152.08333 149.89035
          200.41667 166.20615
                  .         .
          226.20833 186.86403
                  .         .
            212.125  181.0022
                  .         .
                  .         .
              217.5 200.97588
                  .         .
                  .         .
          276.16666 237.66338
                  .         .
                  .         .
             148.25 146.76097
          160.41667 153.47807
          224.16667 221.04825
          281.66666 239.36403
          279.91666 247.24342
          172.66667 133.78947
                  .         .
            179.875  165.4342
          201.66667 177.73245
           215.6111 173.46783
             221.75 195.85307
            146.625  123.5965
                  .         .
          290.34314  247.5467
          281.66666 239.36403
                  .         .
                  .         .
          120.77778  97.53655
                296  267.8991
          151.08333  149.4035
          175.27777 161.20615
          227.58333 174.60965
          275.16666 243.11075
                  .         .
                  .         .
          227.58333 174.60965
                  .         .
          201.66667 177.73245
                  .         .
                  .         .
                  .         .
                  .         .
          189.16667 170.82455
             150.75 121.98026
          169.97917 143.19408
          227.58333 174.60965
          end

          Comment


          • #6
            Code:
            egen sd_lravgrf=sd(lravgrf)
            generate indicator=avgrf>lravgrf+sd_lravgrf
            replace indicator=1 if avgrf<lravgrf-sd_lravgrf
            label define indicator 1 "One SD Above/Below" 0 "Within SD range"
            label values indicator indicator
            Last edited by Chen Samulsion; 15 Nov 2021, 08:43.

            Comment


            • #7
              Thank you! This is really helpful!

              Comment

              Working...
              X