Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to find the value of a variable using condition on another variable?

    Hi Statalist users,

    I have a dataset of which I am posting a snippet with relevant variables (age, quarterly date, trend, unemployment) below:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long age float(dt_survey_q trend unrate)
    18 72 125 6.333333
    19 72 125 6.333333
    20 72 125 6.333333
    21 72 125 6.333333
    22 72 125 6.333333
    23 72 125 6.333333
    24 72 125 6.333333
    25 72 125 6.333333
    26 72 125 6.333333
    27 72 125 6.333333
    28 72 125 6.333333
    29 72 125 6.333333
    30 72 125 6.333333
    31 72 125 6.333333
    32 72 125 6.333333
    33 72 125 6.333333
    34 72 125 6.333333
    35 72 125 6.333333
    36 72 125 6.333333
    37 72 125 6.333333
    38 72 125 6.333333
    39 72 125 6.333333
    40 72 125 6.333333
    41 72 125 6.333333
    42 72 125 6.333333
    43 72 125 6.333333
    44 72 125 6.333333
    45 72 125 6.333333
    46 72 125 6.333333
    47 72 125 6.333333
    48 72 125 6.333333
    49 72 125 6.333333
    50 72 125 6.333333
    51 72 125 6.333333
    52 72 125 6.333333
    53 72 125 6.333333
    54 72 125 6.333333
    55 72 125 6.333333
    56 72 125 6.333333
    57 72 125 6.333333
    58 72 125 6.333333
    59 72 125 6.333333
    60 72 125 6.333333
    61 72 125 6.333333
    62 72 125 6.333333
    63 72 125 6.333333
    64 72 125 6.333333
    65 72 125 6.333333
    66 72 125 6.333333
    67 72 125 6.333333
    68 72 125 6.333333
    69 72 125 6.333333
    70 72 125 6.333333
    71 72 125 6.333333
    72 72 125 6.333333
    73 72 125 6.333333
    74 72 125 6.333333
    75 72 125 6.333333
    76 72 125 6.333333
    77 72 125 6.333333
    78 72 125 6.333333
    79 72 125 6.333333
    80 72 125 6.333333
    81 72 125 6.333333
    82 72 125 6.333333
    83 72 125 6.333333
    84 72 125 6.333333
    85 72 125 6.333333
    86 72 125 6.333333
    87 72 125 6.333333
    88 72 125 6.333333
    90 72 125 6.333333
    91 72 125 6.333333
    95 72 125 6.333333
     . 72 125 6.333333
    18 73 126        6
    19 73 126        6
    20 73 126        6
    21 73 126        6
    22 73 126        6
    23 73 126        6
    24 73 126        6
    25 73 126        6
    26 73 126        6
    27 73 126        6
    28 73 126        6
    29 73 126        6
    30 73 126        6
    31 73 126        6
    32 73 126        6
    33 73 126        6
    34 73 126        6
    35 73 126        6
    36 73 126        6
    37 73 126        6
    38 73 126        6
    39 73 126        6
    40 73 126        6
    41 73 126        6
    42 73 126        6
    end
    format %tq dt_survey_q
    I am generating a new variable according to a complicated formula part of which needs me to to be able to do the below:
    i. Find the value of unemployment (unrate) corresponding to a particular value of trend
    ii. Find the value of dt_survey_q according to a particular value of trend
    Essentially, I want to be able to find the index corresponding to a particular value of trend and then find the value of unemployment, dt_sruvey_q according to that index. On python, I would do something as below. I don't know how to do this on Stata. If not Stata, can I use Mata? If so, how would I do that?

    Python code:

    Code:
    index = data[data['trend']== x].index.tolist()[0]
                y = data['unrate'][index]

  • #2
    Checkout the user written -vlookup-, from within Stata type
    Code:
    findit vlookup
    and then follow the instructions.

    If this does not do what you want, be more particular about what you want to achieve.

    Comment


    • #3
      Shreya:
      do you mean something along the following lines?
      Code:
      . bysort trend: list unrate dt_survey_q trend if _n==1
      
      ---------------------------------------------------------------------------------------------------------------------------------------
      -> trend = 125
      
           +-----------------------------+
           |   unrate   dt_sur~q   trend |
           |-----------------------------|
        1. | 6.333333     1978q1     125 |
           +-----------------------------+
      
      ---------------------------------------------------------------------------------------------------------------------------------------
      -> trend = 126
      
           +---------------------------+
           | unrate   dt_sur~q   trend |
           |---------------------------|
        1. |      6     1978q2     126 |
           +---------------------------+
      
      . bysort trend: gen counter=1 if _n==1
      
      
      . list if counter==1
      
           +---------------------------------------------+
           | age   dt_sur~q   trend     unrate   counter |
           |---------------------------------------------|
        1. |  18     1978q1     125   6.333333         1 |
       76. |  18     1978q2     126          6         1 |
           +---------------------------------------------+
      
      .
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment


      • #4
        Dear Joro and Carlo,

        I tried to use vlookup as follows:

        Code:
        local x unrate
        gen `x'key = 125
        vlookup `x'key, gen(l1x)  key(trend) value(`x')
        I was expecting to have l1x = 6.3333 populated but instead I got back the following error:
        unrate is unique within trend;
        there are multiple observations with different unrate within trend.

        I am not sure I understand the error: is there an error because there are duplicate entries for unrate? How do I solve this issue?
        Last edited by Shreya Dutt; 28 Jul 2020, 14:22.

        Comment


        • #5
          William Lisowski provided a solution to this issue in: https://www.statalist.org/forums/for...ror-in-vlookup
          I used that to get around my problem.
          Thank you all!

          Comment


          • #6
            I have no idea what you re trying to do, so I am just firing shots in the dark here, but the following
            Code:
            . gen unratekey = 125
            
            . vlookup unratekey, gen(l1x)  key(trend) value( unrate )
            goes through the values of variable unratekey, which in this case are constant at 125, then, for this values of 125 creates a new variable having the unrate corresponding to the value of trend equal to unratekey, which in this case is 125.

            Comment


            • #7
              I'm also uncertain what you are trying to do, but maybe the following does it:

              Code:
              rangestat (mean) unrate dt_survey_q, interval(trend `x'key `x'key)
              Note: -rangestat- is written by Robert Picard, Nick Cox, and Roberto Ferrer, and is available from SSC.

              That said, the problem as stated is ill-posed. You refer to "the" value of unrate or dt_survey_q for a given value of trend. But there are many observations for every value of trend in your data. And while it is true that in the example data you show, for any value of trend, all the corresponding observations of unrate or dt_survey_q are the same. But if that doesn't hold in all of your data, then there is no solution to your problem as any of the various values of unrate or dt_survey_q that co-appear with the keyed value of trend would be a candidate and you have given no rule for selecting among them. (The code I've shown here will give you the mean of all such values.) If you are confident that unrate and dt_survey_q are always the same for all observations with a given value of trend, then I suggest you have the code verify that beforehand:

              Code:
              by trend (unrate), sort: assert unrate[1] == unrate[_N]
              by trend (dt_survey_q), sort: assert dt_survey_q[1] = dt_survey_q[_N]
              There is an additional problem you may stumble over. Your variable trend is a float. Now, in the example data, the values of trend are all integers. But if that is not true throughout the data, some of your lookups may fail because exact matching of floating-point numbers cannot be relied on.
              Last edited by Clyde Schechter; 28 Jul 2020, 20:05.

              Comment

              Working...
              X