Group big data list

Cere BrosuS

Join Date: Sep 2016

Posts: 7
#1

Group big data list

08 Sep 2016, 04:35

Hey there,

I have a big dataset of flow measurement of water througth a pipe. Now at several point someone change the feed manually at the valve. So at those point there is jump in the feed inside the data. Now to my question:

How to manually group the parts of data where there is no manual change on the feed?

Is there a way to say that stata automatically creates a new variable und increase it with a big jump in data values or deviation or mean values?

I hope it was well explained, otherwise let me know and I wil give it another try.

Best regards
Cere BrosuS
Tags: None
Peta Hitchens

Join Date: Sep 2016

Posts: 13
#2

08 Sep 2016, 05:54

If you post some example data I might be able to help. First install -dataex- by

Code:

ssc install dataex

then for an example of the data

Code:

dataex

.
Comment
Cere BrosuS

Join Date: Sep 2016

Posts: 7
#3

08 Sep 2016, 06:21

Hm,

that is not possible because I am retricted to publish the data. But I will try to create an example, which fits to what I mean.

Nr Flow of water [kg/h]

1 1001

2 1020

3 1012

4 997

5 1001

6 304

7 298

8 305

9 310

Now there should be an additional column with just a simple group number. Here this can be done by hand but in my local dataset I have ~10000 data rows and maybe up to 50 changes in the value, so doing it by hand will be to much time consuming. Now the dataset should look like this as a result:

Nr Category Flow of water [kg/h]

1 1 1001

2 1 1020

3 1 1012

4 1 997

5 1 1001

6 2 304

7 2 298

8 2 305

9 2 310

I hope this makes it more clearly.

Best regards
Comment
Peta Hitchens

Join Date: Sep 2016

Posts: 13
#4

08 Sep 2016, 06:45

I am not sure I completely follow you, but if you wanted to automatically generate what you have in your example:

Code:

gen newvar = 1 if flowofwater >599 & flowofwater<. replace newvar = 2 if flowofwater <600

So then feed on is where newvar==1 and feed off is where newvar==2?

You would need to know what the cut off value is for the feed on/off is though. Maybe you could determine that if you create a histogram of your data:

Code:

hist flowofwater
Comment
Cere BrosuS

Join Date: Sep 2016

Posts: 7
#5

08 Sep 2016, 06:53

You mainly got what I want, except of I don't want this separation by seeing and doing it manually. I want an automatic way of detecting, that there is a cutoff and where it is and then automatically group the different section. The datalist is too long to have a look at a histogram and searching for the points manually.

So a function (maybe already predefined in stata) should go througth the datalike and recognize like: oh now there is a big change in the data, lets begin a new group here.

By the way there is no cut off but bigger decrease or increase in the flow of water.

Best regards
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#6

08 Sep 2016, 07:01

I wonder whether dealing with time-series operators (such as lags or differences) wouldn't be helpful to you

Best,

Marcos

Best regards,

Marcos
1 like
Comment
Peta Hitchens

Join Date: Sep 2016

Posts: 13
#7

08 Sep 2016, 07:05

I think you would still have to specify what that "big change" is though. Whether it be a percentage of the previous value, or a number (say over 200?), maybe something like:

Code:

gen newvar = 2 if flowofwater<=flowofwater[_n-1] - 200

Though that would just detect the change?

I suggested the histogram if it was quite clear when the feed was off and on, you would only need to look at that histogram once to determine where that value might be, though I don't think that is what you want now anyhow.
Comment
Cere BrosuS

Join Date: Sep 2016

Posts: 7
#8

08 Sep 2016, 07:12

Using a difference value to define a 'big change' is ok for me. It is possible for me to just estimate that value. Then further I still need a way to group the whole datalist with many many values automatically into many groups with increasing a category variable, based on my estimated 'big change'-value.
Comment

Nr	Flow of water [kg/h]
1	1001
2	1020
3	1012
4	997
5	1001
6	304
7	298
8	305
9	310

Nr	Category	Flow of water [kg/h]
1	1	1001
2	1	1020
3	1	1012
4	1	997
5	1	1001
6	2	304
7	2	298
8	2	305
9	2	310

Announcement

Group big data list

Comment

Comment

Comment

Comment

Comment

Comment

Comment