Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generating Probabilities from Range

    Hi everyone,
    I have question regarding how to calculate probabilities from data.

    My idea is to create a function that transforms range of data into essentially a CDF and then assign probabilities to values in this range based on this.
    For example, let's say that someone is given a target of attaining $350 on a specific metric. We know historical values of this metric and we fit a specified confidence interval around historical perforance, which could bee $200-$500. I am interested in where $350 would fall on the CDF given this range of values. As a comparison, as the values of the target approach $500, I assignn a value of 1. If they approach $200, I assign a value of 0. The issue is that I have many range and many targets (the same could be $100-$1000 with a target of $300 or $800).

    For example, a built dataset to emulate this is S&P 500 (sysuse sp500.dta).

    This dataset has value on high, low and closing price. I would use high and low for the range and I'm interested in the probability of closing given the CDF defined around the high and low price.
    Thanks for any assistance!

    Code:
    sysuse sp500.dta
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int date float(open high low close) double volume float change
    14977 1320.28 1320.28 1276.05 1283.27   11294          .
    14978 1283.27 1347.76 1274.62 1347.56   18807   64.29004
    14979 1347.56 1350.24 1329.14 1333.34   21310 -14.220093
    14980 1333.34 1334.77 1294.95 1298.35   14308  -34.98999
    14983 1298.35 1298.35 1276.29 1295.86   11155   -2.48999
    14984 1295.86 1311.72 1295.14  1300.8   11913   4.940063
    14985  1300.8 1313.76 1287.28 1313.27   12965   12.46997
    14986 1313.27 1332.19 1309.72 1326.82   14112  13.549927
    14987 1326.82 1333.21 1311.59 1318.55   12760  -8.269897
    14991 1318.32 1327.81 1313.33 1326.65   12057   8.099976
    14992 1326.65 1346.92 1325.41 1329.47   13491   2.819946
    14993 1329.89 1352.71 1327.41 1347.97   14450       18.5
    14994 1347.97 1354.55 1336.74 1342.54   14078  -5.429932
    14997 1342.54 1353.62 1333.84  1342.9   11640  .35998535
    14998  1342.9  1362.9 1339.63  1360.4   12326       17.5
    14999  1360.4 1369.75 1357.28  1364.3   13090  3.9000244
    15000  1364.3 1367.35 1354.63 1357.51   12580  -6.790039
    15001 1357.51 1357.51 1342.75 1354.95   10980 -2.5600586
    15004 1354.92 1365.54 1350.36 1364.17   10531   9.220093
    15005 1364.17 1375.68  1356.2 1373.73   11498   9.559937
    15006 1373.73 1383.37 1364.66 1366.01   12953  -7.719971
    15007 1366.01  1373.5 1359.34 1373.47   11188   7.459961
    15008 1373.47 1376.38 1348.72 1349.47   10484        -24
    15011 1349.47 1354.56 1344.48 1354.31   10130   4.840088
    15012 1354.31 1363.55 1350.04 1352.26   10596 -2.0500488
    15013 1352.26 1352.26 1334.26 1340.89   11583 -11.369995
    15014  1341.1 1350.32 1332.42 1332.53   11072  -8.359985
    15015 1332.53 1332.53 1309.98 1314.76   10755  -17.77002
    15018 1314.76 1330.96 1313.64 1330.31   10391   15.55005
    15019 1330.31 1336.62 1317.51  1318.8   10752  -11.51001
    15020  1318.8 1320.73 1304.72 1315.92   11503  -2.880005
    15021 1315.92 1331.29 1315.92 1326.61   11537   10.68994
    15022 1326.61 1326.61 1293.18 1301.53   12572 -25.079956
    15026 1301.53 1307.16 1278.44 1278.94   11122  -22.59009
    15027 1278.94 1282.97 1253.16 1255.27   12085  -23.66992
    15028 1255.27 1259.94 1228.33 1252.82   13659  -2.450073
    15029 1252.82 1252.82 1215.44 1245.86   12313  -6.959961
    15032 1245.86 1267.69 1241.71 1267.65   11308   21.79004
    15033 1267.65 1272.76 1252.26 1257.94   11141  -9.710083
    15034 1257.94 1263.47 1229.65 1239.94   12253        -18
    15035 1239.94 1241.36  1214.5 1241.23   12949   1.290039
    15036 1241.23 1251.01 1219.74 1234.18   12940  -7.049927
    15039 1234.18 1242.55 1234.04 1241.41    9292    7.22998
    15040 1241.41 1267.42 1241.41  1253.8   10918  12.390015
    15041  1253.8 1263.86  1253.8 1261.89   11322   8.089966
    15042 1261.89  1266.5  1257.6 1264.74   11141  2.8499756
    15043 1264.74 1264.74 1228.42 1233.42   10859 -31.319946
    15046 1233.42 1233.42 1176.78 1180.16   12290  -53.26001
    15047 1180.16 1197.83  1171.5 1197.66   13609       17.5
    15048 1197.66 1197.66 1155.35 1166.71   13974 -30.950073
    15049 1166.71 1182.04 1166.71 1173.56   12595   6.850098
    15050 1173.56 1173.56 1148.64 1150.53 15435.6  -23.03003
    15053 1150.53  1173.5 1147.18 1170.81   11262   20.28003
    15054 1170.81 1180.56 1142.19 1142.62   12359 -28.190063
    15055 1142.62 1149.39 1118.74 1122.14   13463  -20.47998
    15056 1122.14 1124.27 1081.19 1117.58 17239.5 -4.5600586
    15057 1117.58 1141.83 1117.58 1139.83   13649      22.25
    15060 1139.83 1160.02 1139.83 1152.69   11140  12.859985
    15061 1152.69 1183.35 1150.96 1182.17   13142    29.4801
    15062 1182.17 1182.17 1147.83 1153.29   13334 -28.880005
    15063 1153.29 1161.69 1136.26 1147.95   12345  -5.340088
    15064 1147.95  1162.8 1143.83 1160.33   12808  12.380005
    15067 1160.33 1169.51 1137.51 1145.87   12549  -14.45996
    15068 1145.87 1145.87 1100.19 1106.46   13861  -39.41003
    15069 1106.46  1117.5 1091.99 1103.25 14255.9  -3.209961
    15070 1103.25 1151.47 1103.25 1151.44   13680   48.18994
    15071 1151.44 1151.44 1119.29 1128.43   12668  -23.00989
    15074 1128.43 1146.13 1126.38 1137.59   10628   9.159912
    15075 1137.59 1173.92 1137.59 1168.38   13496   30.79004
    15076 1168.38 1182.24 1160.26 1165.89   12903   -2.48999
    15077 1165.89 1183.51 1157.73  1183.5   11020  17.609985
    15081  1183.5 1184.64 1167.38 1179.68    9139  -3.819946
    15082 1179.68 1192.25  1168.9 1191.81   11096  12.130005
    15083 1191.81 1248.42 1191.81 1238.16   19189   46.34998
    15084 1238.16 1253.71 1233.39 1253.69   14868  15.529907
    15085  1253.7  1253.7 1234.41 1242.98   13387  -10.70996
    15088 1242.98 1242.98 1217.47 1224.36   10126 -18.619995
    15089 1224.36 1233.54 1208.89 1209.47   12165 -14.890015
    15090 1209.47 1232.36 1207.38 1228.75   12036   19.28003
    15091 1228.75  1248.3 1228.75 1234.52   13452    5.77002
    15092 1234.52 1253.07 1234.52 1253.05   10913   18.53003
    15095 1253.05  1269.3 1243.99 1249.46   12668  -3.590088
    15096 1249.46 1266.47 1243.55 1266.44   11813   16.97998
    15097 1266.44 1272.93  1257.7 1267.43   13422   .9901123
    15098 1267.43 1267.43 1239.88 1248.58   11379 -18.850098
    15099 1248.58 1267.51    1232 1266.61   10821   18.03003
    15102 1266.61    1270 1259.19 1263.51    9490 -3.0999756
    15103 1266.71 1267.01    1253  1261.2   10063 -2.3100586
    15104  1261.2 1261.65 1247.83 1255.54   11324  -5.659912
    15105 1255.54 1268.14 1254.56 1255.18   10567 -.35998535
    15106 1255.18 1259.84 1240.79 1245.67    9062   -9.51001
    15109 1245.67 1249.68 1241.02 1248.92    8582       3.25
    15110 1248.92 1257.45 1245.36 1249.44   10718  .51989746
    15111 1249.44 1286.39 1243.02 1284.99   14053   35.55005
    15112 1284.99 1296.48 1282.65 1288.49   13556        3.5
    15113 1288.49 1292.06 1281.15 1291.96   11308   3.469971
    15116 1291.96 1312.95 1287.87 1312.83   11749  20.869995
    15117 1312.83 1315.93 1306.89 1309.38   12604  -3.449951
    15118 1309.38 1309.38  1288.7 1289.05   11348 -20.329956
    15119 1289.05 1295.04 1281.22 1293.17   11007   4.119995
    end
    format %td date

  • #2
    I think the "CDF" is not well defined. For example, in the first entry, the range is [1276, 1320], but what's the distribution on that range? A uniform distribution or a specific non-uniform distribution?

    Comment


    • #3
      Let's assume I would do a uniform distribution across all ranges.

      Comment


      • #4
        Then the code would be as simple as below.

        Code:
        gen wanted = (close-low)/(high-low)

        Comment


        • #5
          In the perhaps unlikely case that high and low are identical, you might want to use a convention such as returning 0.5.

          Comment


          • #6
            Thank you so much both, this is exactly what I need!

            Do you know how I could approach this if I assumed that the distribution was denser around the mean and dropped off near the boundaries of the range? Usiing any distribution that would resemble this is ok (Normal or otherwise).

            Comment


            • #7
              On the contrary, only distributions with finite support are consistent with your goal. Various beta distributions are among the candidates.

              Comment


              • #8
                Hi Nick,
                I see your point. Do you have an idea how to implement this logic in this case?

                Comment


                • #9
                  See the help on statistical functions.

                  Comment


                  • #10
                    Thank you Nick, again!

                    Comment


                    • #11
                      Hi again,
                      I took a look at the functions and this is what I came up with.
                      1) Rescale the range so it fits 0-1 range using Fei Wang's code.
                      2) Use ibeta function to generate the beta distribution.

                      Code:
                      sysuse sp500.dta
                      uniform = (close-low)/(high-low)
                       gen beta=ibeta(2,2,uniform)
                      su beta, d
                      I've used 2 and 2 for alpha and beta parameters since that seems to roughly correspond with the logic that the probability of the target exponentially converging with probabilities 0 or 1 depending on whether the target value approaches the lower or upper part of the range.

                      Would this code work for my problem?
                      Thanks!

                      Comment


                      • #12
                        It seems reasonable as long as the assumption about beta distribution is acceptable -- The validity of distribution assumptions may go beyond the scope of Stata and should be justified by theories of your field.

                        Comment


                        • #13
                          Of course, agreed. Thank you for the assistance!

                          Comment

                          Working...
                          X