Creation of a new variable with multiple conditions

Michael Duarte Goncalves

Join Date: Oct 2022
Posts: 500

Creation of a new variable with multiple conditions

20 Oct 2023, 10:25

Hi everyone,

I would like to compute a variable based on several conditions, and I don't know how to proceed please.

Basically, I have to compute a new variable that depends on if a zone is a "s.e.r zone" or not. It depends also on time (-interlude- variable).

I want to compute a new variable, if interlude is not missing and if the zone is s.e.r.
Here is a dataex example:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input double(parking_slots blue_slots) byte dest_zona_ser float interlude
   .    . 0   .
   .    . 0 315
 357    . 1   .
   .    . 0 210
 357    . 1   .
   .    . 0  90
   .    . 0   .
   .    . 0 270
   .    . 0 120
   .    . 0 220
   .    . 0   .
   .    . 0 120
   .    . 0 120
   .    . 0 150
   .    . 0   .
   .    . 0 571
4553  806 1   .
   .    . 0  60
   .    . 0   .
   .    . 0 180
   .    . 0   .
   .    . 0 360
   .    . 0   .
   .    . 0 510
 357    . 1   .
   .    . 0 220
   .    . 0   .
   .    . 0 105
   .    . 0 517
   .    . 0  47
2530  353 1   .
2530  353 1  40
2530  353 1   .
2530  353 1  40
3111  441 1   .
2348  204 1 118
2348  204 1   .
2348  204 1  30
2934  414 1   .
2348  204 1 600
   .    . 0   .
2348  204 1 480
5887 1452 1   .
2438  568 1  90
2035  370 1   .
1925  416 1  78
1925  416 1 325
1925  416 1  85
1925  416 1   .
1925  416 1   0
1925  416 1   .
1925  416 1 110
1925  416 1   .
1925  416 1   0
1925  416 1 320
1925  416 1   0
1925  416 1 440
1925  416 1   0
2438  568 1   .
1925  416 1 419
2438  568 1  89
1925  416 1 329
2982  676 1   .
1925  416 1 310
3238  531 1   .
2934  414 1   .
2934  414 1   .
2934  414 1 510
4361  871 1   .
2934  414 1 330
5001  872 1   .
2934  414 1 360
   .    . 0   .
   .    . 0   .
   .    . 0   0
2934  414 1 658
   .    . 0   .
   .    . 0 480
2934  414 1   0
5887 1452 1   .
2934  414 1 400
 646    . 1   .
   .    . 0 120
   .    . 0   .
   .    . 0  90
3310  817 1   .
   .    . 0 555
3310  817 1   .
   .    . 0   0
   .    . 0   0
   .    . 0 280
3310  817 1 200
   .    . 0   0
   .    . 0   0
   .    . 0 270
2476  372 1   .
   .    . 0 600
   .    . 0   .
   .    . 0 324
   .    . 0 105
end
label values dest_zona_ser dest_zonaser
label def dest_zonaser 0 "outside s.e.r zone (destin.)", modify
label def dest_zonaser 1 "s.e.r zone (destin.)", modify

The code computed to have -interlude- was:

Code:

by individ_ID (start_time), sort: gen interlude = ///
    clockdiff(end_time[_n-1], start_time, "m") if _n > 1

The multiple conditions are a price table as follows for blue_slots:

Minutes	€
5	0.05
9 < x < 13	0.10
13 < x < 16, and so on...	0.15
16	0.20
20	0.25
23	0.30
27	0.35
30	0.40
32	0.45
34	0.50
36	0.55
39	0.60
41	0.65
43	0.70
45	0.75
47	0.80
49	0.85
51	0.90
54	0.95
56	1.00
58	1.05
60	1.10
63	1.15
65	1.20
68	1.25
70	1.30
73	1.35
75	1.40
...	...

The new variable name should be something like "blue_parking_prices".

Could anyone give me a solution please? I have been stuck for a while now.
Thank you very much for your help.

Best,

Michael

Last edited by Michael Duarte Goncalves; 20 Oct 2023, 10:29.

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30161
#2

20 Oct 2023, 11:02

So, the first thing you need to do is create a Stata data set from the tableau of parking price data you posted. You need to fix it up a bit first. Entries like 9 < x < 13 will not be useful. I think the data set you need will look like this:

Code:

* Example generated by -dataex-. For more info, type help dataex clear input byte minutes double blue_parking_prices 5 .05 9 .1 13 .15 16 .2 20 .25 23 .3 27 .35 30 .4 32 .45 34 .5 36 .55 39 .6 41 .65 43 .7 45 .75 47 .8 49 .85 51 .9 54 .95 56 1 58 1.05 60 1.1 63 1.15 65 1.2 68 1.25 70 1.3 73 1.35 75 1.4 end

Moreover, since interlude has values considerably larger than 75, you will have include more observations in this data set to cover those longer time intervals as well. Let's call this Stata data set parking_prices.dta. And let's call the first data set you showed original_data.dta.

Then you can get the kind of joining you want as follows:

Code:

use original_data, clear gen `c(obs_t)' obs_no = _n tempfile holding save `holding' use parking_prices, clear rename minutes minutes2 gen minutes1 = 1 in 1, before(minutes2) replace minutes1 = minutes2[_n-1]+1 in 2/L rangejoin interlude minutes1 minutes2 using `holding' keep if !missing(obs_no) merge 1:1 obs_no using `holding', assert(match using) nogenerate replace blue_parking_prices = . if !dest_zona_ser | missing(interlude) sort obs_no drop obs_no minutes* order blue_parking_prices, after(interlude)

-rangejoin- is written by Robert Picard and is available from SSC. To use it, you must also install -rangestat-, by Robert Picard, Nick Cox, and Roberto Ferrer, also available from SSC.
Comment
George Ford

Join Date: Aug 2014

Posts: 3182
#3

20 Oct 2023, 11:42

Clyde,

What does
gen `c(obs_t)' obs_no = _n with the `c(obs_t)' part? Does it force the same format?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30161
#4

20 Oct 2023, 12:46

It sets the storage type to the smallest one that is large enough to represent the numbers from 1 through _N without loss of precision. If you have a large data set, you might need a double, with a small enough one int, or even byte might suffice. The use of `c(obs_t)' assures that you will get something large enough without wasting memory. And the nice thing about it is that you don't have to know at run time what the size of the data set will be, because `c(obs_t)' is evaluated at run-time.
1 like
Comment
Michael Duarte Goncalves

Join Date: Oct 2022

Posts: 500
#5

23 Oct 2023, 00:27

Hi Clyde Schechter,

Thank you so much for your help and your wonderful explanations.
I will try what you said me in #2 and see what happens.

But I have another question please:

The table presented above comes from the web. Is it possible to do some kind of webscrapping with stata?

Again, thank you so much.
Lovely day.

Michael
Comment
Michael Duarte Goncalves

Join Date: Oct 2022

Posts: 500
#6

23 Oct 2023, 01:37

Hi Clyde Schechter:

I tried what you suggested in #2. It works perfectly well. It is what I exactly wanted.
Thank you!

Michael
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30161
#7

23 Oct 2023, 10:45

Is it possible to do some kind of webscrapping with stata?

I don't know. That's something I have never tried and know nothing about. Just not part of my interests and activities.

Because this question is both off the topic promised by the thread title, and because it has been addressed to me, you are unlikely to get an answer to it here. There are people on the Forum who are knowledgeable in this area--probably they will not see it here. I suggest you repost this as a new thread, and don't address it to anybody in particular. You have a better chance of getting a timely and helpful response if you do that.
Comment
Michael Duarte Goncalves

Join Date: Oct 2022

Posts: 500
#8

24 Oct 2023, 00:22

Hi Clyde Schechter:

I apologize for #5. Thank you for your suggestions.
Again, sorry.

Have a lovely day/evening/night.

Best wishes,

Michael
Comment

Announcement

Creation of a new variable with multiple conditions

Comment

Comment

Comment

Comment

Comment

Comment

Comment