Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Group big data list

    Hey there,

    I have a big dataset of flow measurement of water througth a pipe. Now at several point someone change the feed manually at the valve. So at those point there is jump in the feed inside the data. Now to my question:

    How to manually group the parts of data where there is no manual change on the feed?

    Is there a way to say that stata automatically creates a new variable und increase it with a big jump in data values or deviation or mean values?

    I hope it was well explained, otherwise let me know and I wil give it another try.

    Best regards
    Cere BrosuS

  • #2
    If you post some example data I might be able to help. First install -dataex- by
    Code:
    ssc install dataex
    then for an example of the data
    Code:
    dataex
    .

    Comment


    • #3
      Hm,

      that is not possible because I am retricted to publish the data. But I will try to create an example, which fits to what I mean.
      Nr Flow of water [kg/h]
      1 1001
      2 1020
      3 1012
      4 997
      5 1001
      6 304
      7 298
      8 305
      9 310
      Now there should be an additional column with just a simple group number. Here this can be done by hand but in my local dataset I have ~10000 data rows and maybe up to 50 changes in the value, so doing it by hand will be to much time consuming. Now the dataset should look like this as a result:
      Nr Category Flow of water [kg/h]
      1 1 1001
      2 1 1020
      3 1 1012
      4 1 997
      5 1 1001
      6 2 304
      7 2 298
      8 2 305
      9 2 310
      I hope this makes it more clearly.

      Best regards

      Comment


      • #4
        I am not sure I completely follow you, but if you wanted to automatically generate what you have in your example:
        Code:
        gen newvar = 1 if flowofwater >599 & flowofwater<.
        replace newvar = 2 if flowofwater <600
        So then feed on is where newvar==1 and feed off is where newvar==2?

        You would need to know what the cut off value is for the feed on/off is though. Maybe you could determine that if you create a histogram of your data:
        Code:
        hist flowofwater

        Comment


        • #5
          You mainly got what I want, except of I don't want this separation by seeing and doing it manually. I want an automatic way of detecting, that there is a cutoff and where it is and then automatically group the different section. The datalist is too long to have a look at a histogram and searching for the points manually.

          So a function (maybe already predefined in stata) should go througth the datalike and recognize like: oh now there is a big change in the data, lets begin a new group here.

          By the way there is no cut off but bigger decrease or increase in the flow of water.

          Best regards

          Comment


          • #6
            I wonder whether dealing with time-series operators (such as lags or differences) wouldn't be helpful to you

            Best,

            Marcos
            Best regards,

            Marcos

            Comment


            • #7
              I think you would still have to specify what that "big change" is though. Whether it be a percentage of the previous value, or a number (say over 200?), maybe something like:
              Code:
              gen newvar = 2 if flowofwater<=flowofwater[_n-1] - 200
              Though that would just detect the change?

              I suggested the histogram if it was quite clear when the feed was off and on, you would only need to look at that histogram once to determine where that value might be, though I don't think that is what you want now anyhow.

              Comment


              • #8
                Using a difference value to define a 'big change' is ok for me. It is possible for me to just estimate that value. Then further I still need a way to group the whole datalist with many many values automatically into many groups with increasing a category variable, based on my estimated 'big change'-value.

                Comment

                Working...
                X