Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creation of a new variable with multiple conditions

    Hi everyone,

    I would like to compute a variable based on several conditions, and I don't know how to proceed please.

    Basically, I have to compute a new variable that depends on if a zone is a "s.e.r zone" or not. It depends also on time (-interlude- variable).

    I want to compute a new variable, if interlude is not missing and if the zone is s.e.r.
    Here is a dataex example:


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input double(parking_slots blue_slots) byte dest_zona_ser float interlude
       .    . 0   .
       .    . 0 315
     357    . 1   .
       .    . 0 210
     357    . 1   .
       .    . 0  90
       .    . 0   .
       .    . 0 270
       .    . 0 120
       .    . 0 220
       .    . 0   .
       .    . 0 120
       .    . 0 120
       .    . 0 150
       .    . 0   .
       .    . 0 571
    4553  806 1   .
       .    . 0  60
       .    . 0   .
       .    . 0 180
       .    . 0   .
       .    . 0 360
       .    . 0   .
       .    . 0 510
     357    . 1   .
       .    . 0 220
       .    . 0   .
       .    . 0 105
       .    . 0 517
       .    . 0  47
    2530  353 1   .
    2530  353 1  40
    2530  353 1   .
    2530  353 1  40
    3111  441 1   .
    2348  204 1 118
    2348  204 1   .
    2348  204 1  30
    2934  414 1   .
    2348  204 1 600
       .    . 0   .
    2348  204 1 480
    5887 1452 1   .
    2438  568 1  90
    2035  370 1   .
    1925  416 1  78
    1925  416 1 325
    1925  416 1  85
    1925  416 1   .
    1925  416 1   0
    1925  416 1   .
    1925  416 1 110
    1925  416 1   .
    1925  416 1   0
    1925  416 1 320
    1925  416 1   0
    1925  416 1 440
    1925  416 1   0
    2438  568 1   .
    1925  416 1 419
    2438  568 1  89
    1925  416 1 329
    2982  676 1   .
    1925  416 1 310
    3238  531 1   .
    2934  414 1   .
    2934  414 1   .
    2934  414 1 510
    4361  871 1   .
    2934  414 1 330
    5001  872 1   .
    2934  414 1 360
       .    . 0   .
       .    . 0   .
       .    . 0   0
    2934  414 1 658
       .    . 0   .
       .    . 0 480
    2934  414 1   0
    5887 1452 1   .
    2934  414 1 400
     646    . 1   .
       .    . 0 120
       .    . 0   .
       .    . 0  90
    3310  817 1   .
       .    . 0 555
    3310  817 1   .
       .    . 0   0
       .    . 0   0
       .    . 0 280
    3310  817 1 200
       .    . 0   0
       .    . 0   0
       .    . 0 270
    2476  372 1   .
       .    . 0 600
       .    . 0   .
       .    . 0 324
       .    . 0 105
    end
    label values dest_zona_ser dest_zonaser
    label def dest_zonaser 0 "outside s.e.r zone (destin.)", modify
    label def dest_zonaser 1 "s.e.r zone (destin.)", modify
    The code computed to have -interlude- was:

    Code:
    by individ_ID (start_time), sort: gen interlude = ///
        clockdiff(end_time[_n-1], start_time, "m") if _n > 1
    The multiple conditions are a price table as follows for blue_slots:



    Minutes
    5 0.05
    9 < x < 13 0.10
    13 < x < 16, and so on... 0.15
    16 0.20
    20 0.25
    23 0.30
    27 0.35
    30 0.40
    32 0.45
    34 0.50
    36 0.55
    39 0.60
    41 0.65
    43 0.70
    45 0.75
    47 0.80
    49 0.85
    51 0.90
    54 0.95
    56 1.00
    58 1.05
    60 1.10
    63 1.15
    65 1.20
    68 1.25
    70 1.30
    73 1.35
    75 1.40
    ... ...
    The new variable name should be something like "blue_parking_prices".


    Could anyone give me a solution please? I have been stuck for a while now.
    Thank you very much for your help.



    Best,

    Michael
    Last edited by Michael Duarte Goncalves; 20 Oct 2023, 10:29.

  • #2
    So, the first thing you need to do is create a Stata data set from the tableau of parking price data you posted. You need to fix it up a bit first. Entries like 9 < x < 13 will not be useful. I think the data set you need will look like this:
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte minutes double blue_parking_prices
     5  .05
     9   .1
    13  .15
    16   .2
    20  .25
    23   .3
    27  .35
    30   .4
    32  .45
    34   .5
    36  .55
    39   .6
    41  .65
    43   .7
    45  .75
    47   .8
    49  .85
    51   .9
    54  .95
    56    1
    58 1.05
    60  1.1
    63 1.15
    65  1.2
    68 1.25
    70  1.3
    73 1.35
    75  1.4
    end
    Moreover, since interlude has values considerably larger than 75, you will have include more observations in this data set to cover those longer time intervals as well. Let's call this Stata data set parking_prices.dta. And let's call the first data set you showed original_data.dta.

    Then you can get the kind of joining you want as follows:
    Code:
    use original_data, clear
    gen `c(obs_t)' obs_no = _n
    tempfile holding
    save `holding'
    
    use parking_prices, clear
    rename minutes minutes2
    gen minutes1 = 1 in 1, before(minutes2)
    replace minutes1 = minutes2[_n-1]+1 in 2/L
    rangejoin interlude minutes1 minutes2 using `holding'
    keep if !missing(obs_no)
    merge 1:1 obs_no using `holding', assert(match using) nogenerate
    replace blue_parking_prices = . if !dest_zona_ser | missing(interlude)
    sort obs_no
    drop obs_no minutes*
    order blue_parking_prices, after(interlude)
    -rangejoin- is written by Robert Picard and is available from SSC. To use it, you must also install -rangestat-, by Robert Picard, Nick Cox, and Roberto Ferrer, also available from SSC.

    Comment


    • #3
      Clyde,

      What does
      gen `c(obs_t)' obs_no = _n with the `c(obs_t)' part? Does it force the same format?

      Comment


      • #4
        It sets the storage type to the smallest one that is large enough to represent the numbers from 1 through _N without loss of precision. If you have a large data set, you might need a double, with a small enough one int, or even byte might suffice. The use of `c(obs_t)' assures that you will get something large enough without wasting memory. And the nice thing about it is that you don't have to know at run time what the size of the data set will be, because `c(obs_t)' is evaluated at run-time.

        Comment


        • #5
          Hi Clyde Schechter,

          Thank you so much for your help and your wonderful explanations.
          I will try what you said me in #2 and see what happens.

          But I have another question please:

          The table presented above comes from the web. Is it possible to do some kind of webscrapping with stata?



          Again, thank you so much.
          Lovely day.

          Michael

          Comment


          • #6
            Hi Clyde Schechter:

            I tried what you suggested in #2. It works perfectly well. It is what I exactly wanted.
            Thank you!

            Michael

            Comment


            • #7
              Is it possible to do some kind of webscrapping with stata?
              I don't know. That's something I have never tried and know nothing about. Just not part of my interests and activities.

              Because this question is both off the topic promised by the thread title, and because it has been addressed to me, you are unlikely to get an answer to it here. There are people on the Forum who are knowledgeable in this area--probably they will not see it here. I suggest you repost this as a new thread, and don't address it to anybody in particular. You have a better chance of getting a timely and helpful response if you do that.

              Comment


              • #8
                Hi Clyde Schechter:

                I apologize for #5. Thank you for your suggestions.
                Again, sorry.

                Have a lovely day/evening/night.

                Best wishes,

                Michael

                Comment

                Working...
                X