Hi Statalist, I'm new to using ARIMA modeling and although I've read the ARIMA postestimation documentation, I'm a bit stumped on this one. Hoping folks can help me out--prior threads on this topic didn't have replies from OPs explaining if they had figured it out.
I'm modeling a wastewater signal of SARS-CoV-2 against daily new cases of COVID identified in the community. I have variables describing natural-log-transformed new cases, a wastewater classification signal (increasing, decreasing, or plateau), wastewater flow rate, and PMMOV (a proxy measure for amount of feces in wastewater). Please forgive the large number of values in the -dataex- step, it's meant to assist with running the models and generating predictions.
I then define the data as time-series on my sample collection date variable:
And run an ARIMA model with these values. The AR and MA values were determined by me using the full dataset which I don't present here.
However, when I graph the differenced, and non-differenced, predicted values, I see a difference in how far out the model predicts. The non-differenced values are only provided to the end of the actual data I have for new cases, whereas I wanted to have them moving forwards for the entire length of time that I have wastewater data for.
Here's the -tsline- graph:

I'm having a really hard time understanding why the non-differenced model predicted values (produced by predict newcase_est_2, y and shown in goldenrod) are not going past approximately mid-April 2023, but the differenced model predicted values (produced by predict d_newcase_est and shown in seafoam green) are. This is likely some embarrassingly simple mathematical thing, but I would be so grateful if someone could explain. I've also tried using the dynamic() option of predict, to no avail.
I'm modeling a wastewater signal of SARS-CoV-2 against daily new cases of COVID identified in the community. I have variables describing natural-log-transformed new cases, a wastewater classification signal (increasing, decreasing, or plateau), wastewater flow rate, and PMMOV (a proxy measure for amount of feces in wastewater). Please forgive the large number of values in the -dataex- step, it's meant to assist with running the models and generating predictions.
Code:
clear input double newcases_mm float ln_newcases_mm double(sample_collect_date flow_rate_mm) byte classification_mm double ln_pmmov_conc_mm float avg_50day_log_mm 53 3.970292 23012 36.82 0 19.729013442993164 5.589388 74 4.304065 23013 40.77 0 19.377422332763672 5.586774 98 4.5849676 23014 39.41 0 19.590559005737305 5.575511 86 4.454347 23015 38.11 0 19.57074546813965 5.575034 70 4.248495 23016 . . . 5.570022 48 3.871201 23017 36.38 0 19.829322814941406 5.572013 42 3.73767 23018 36.56 0 19.92564582824707 5.567854 78 4.356709 23019 37.2 0 20.021602630615234 5.567013 86 4.454347 23020 36.36 0 19.782886505126953 5.562326 53 3.970292 23021 36.37 1 19.91169548034668 5.554551 48 3.871201 23022 35.59 1 19.538572311401367 5.550797 43 3.7612 23023 . . . 5.551994 33 3.496508 23024 35.29 1 19.350637435913086 5.564191 33 3.496508 23025 34.73 2 19.556203842163086 5.565587 46 3.8286414 23026 37.28 2 19.82225799560547 5.567031 65 4.1743875 23027 36.7 1 19.574539184570313 5.56625 49 3.89182 23028 36.53 1 19.75336265563965 5.561026 46 3.8286414 23029 36.9 1 19.682092666625977 5.55269 47 3.8501475 23030 . . . 5.538929 52 3.9512436 23031 35.94 0 19.754417419433594 5.544563 28 3.3322046 23032 36.6 0 19.604124069213867 5.532524 55 4.0073333 23033 38.14 0 19.855194091796875 5.523293 77 4.3438053 23034 37.44 0 19.717487335205078 5.526967 56 4.0253515 23035 36.96 0 19.794902801513672 5.530503 40 3.6888795 23036 36.58 0 19.616056442260742 5.515887 47 3.8501475 23037 . . . 5.513857 34 3.5263605 23038 35.39 0 19.668861389160156 5.515269 33 3.496508 23039 36.55 0 20.01917839050293 5.517579 45 3.8066626 23040 36.8 1 20.099346160888672 5.508532 52 3.9512436 23041 36.37 1 19.729877471923828 5.500669 57 4.0430512 23042 37.3 2 19.718799591064453 5.499311 50 3.912023 23043 36.87 2 20.093955993652344 5.491871 60 4.0943446 23044 . . . 5.489391 30 3.4011974 23045 35.87 2 19.843114852905273 5.495866 26 3.2580965 23046 36.92 2 19.161474227905273 5.4988 65 4.1743875 23047 36.56 2 19.96534538269043 5.492465 62 4.1271343 23048 37.61 1 19.562353134155273 5.497091 71 4.26268 23049 35.15 1 19.84638786315918 5.499902 47 3.8501475 23050 36.96 1 19.601425170898438 5.500391 50 3.912023 23051 . . . 5.4966 40 3.6888795 23052 36.93 1 19.761768341064453 5.48804 21 3.0445225 23053 37.63 1 19.732898712158203 5.488622 51 3.9318256 23054 38.72 1 19.75694465637207 5.494551 81 4.394449 23055 39.03 1 19.529640197753906 5.493357 66 4.189655 23056 40.93 1 19.523550033569336 5.494569 40 3.6888795 23057 39.3 1 19.716611862182617 5.486994 37 3.610918 23058 . . . 5.495099 37 3.610918 23059 37.98 1 19.674585342407227 5.488668 36 3.583519 23060 39.13 1 19.573528289794922 5.490763 64 4.158883 23061 39.06 1 19.695148468017578 5.494078 58 4.060443 23062 38.61 0 19.73009490966797 5.48185 47 3.8501475 23063 37.15 0 19.712005615234375 5.475732 47 3.8501475 23064 37.56 0 19.589067459106445 5.472705 61 4.1108737 23065 . . . 5.455986 42 3.73767 23066 37.53 0 19.763023376464844 5.458945 28 3.3322046 23067 37.87 0 19.266376495361328 5.453701 60 4.0943446 23068 55.33 0 18.668790817260742 5.45036 63 4.1431346 23069 48.31 0 19.124570846557617 5.448144 66 4.189655 23070 45.74 0 19.29648208618164 5.448114 56 4.0253515 23071 43.77 0 19.759674072265625 5.459624 51 3.9318256 23072 . . . 5.459912 38 3.637586 23073 40.81 0 18.988718032836914 5.460083 29 3.367296 23074 40.71 0 19.842342376708984 5.447379 57 4.0430512 23075 . . . 5.447135 59 4.0775375 23076 41.88 0 18.965791702270508 5.441201 52 3.9512436 23077 41.48 0 19.20622444152832 5.435276 50 3.912023 23078 40.85 0 19.745311737060547 5.434058 41 3.713572 23079 . . . 5.434228 30 3.4011974 23080 39.33 0 19.38507652282715 5.4336 39 3.6635616 23081 40.51 0 19.143775939941406 5.423797 49 3.89182 23082 40.07 0 19.348424911499023 5.423503 60 4.0943446 23083 39.89 0 19.626890182495117 5.424713 54 3.988984 23084 40.02 1 19.753995895385742 5.414766 58 4.060443 23085 40.99 1 19.58233070373535 5.412777 56 4.0253515 23086 . . . 5.411372 40 3.6888795 23087 40.15 1 19.679595947265625 5.412155 38 3.637586 23088 40.57 1 19.852903366088867 5.407754 47 3.8501475 23089 40.58 2 19.70294761657715 5.404565 70 4.248495 23090 40.13 2 19.12930679321289 5.405726 56 4.0253515 23091 40.13 2 19.576051712036133 5.409409 47 3.8501475 23092 40.21 2 19.753995895385742 5.407861 44 3.78419 23093 . . . 5.412015 37 3.610918 23094 39.45 2 19.705385208129883 5.408628 28 3.3322046 23095 39.56 2 19.036083221435547 5.401965 47 3.8501475 23096 40.62 2 19.68729019165039 5.396789 67 4.204693 23097 41.32 2 20.298250198364258 5.399424 65 4.1743875 23098 40.81 1 19.508838653564453 5.384326 48 3.871201 23099 40.62 1 19.558513641357422 5.380361 54 3.988984 23100 . . . 5.379776 33 3.496508 23101 47.54 0 19.53042984008789 5.370909 31 3.433987 23102 44.31 0 19.54975700378418 5.365918 53 3.970292 23103 44.02 0 19.54014015197754 5.366637 60 4.0943446 23104 43.37 0 19.22393798828125 5.354074 55 4.0073333 23105 45.17 0 19.46335792541504 5.350321 35 3.555348 23106 43.82 0 19.712663650512695 5.345755 28 3.3322046 23107 . . . 5.344503 33 3.496508 23108 40.52 0 19.436201095581055 5.331028 25 3.218876 23109 39.95 0 19.593534469604492 5.329842 45 3.8066626 23110 42.01 0 19.673900604248047 5.331709 51 3.9318256 23111 41.36 0 19.984619140625 5.326061 45 3.8066626 23112 41.74 0 19.6107177734375 5.327641 23 3.135494 23113 41.3 0 19.706270217895508 5.321802 29 3.367296 23114 . . . 5.319568 31 3.433987 23115 39.1 0 19.781042098999023 5.320871 26 3.2580965 23116 43.06 0 19.55876922607422 5.313355 30 3.4011974 23117 43.02 0 19.697607040405273 5.306843 44 3.78419 23118 41.75 0 19.534378051757813 5.298716 19 2.944439 23119 41.34 0 19.718143463134766 5.297304 . . 23120 40.61 0 19.673900604248047 5.294681 . . 23121 . . . 5.290625 . . 23122 39.07 0 19.623292922973633 5.275259 . . 23123 39.42 0 20.168975830078125 5.27065 . . 23124 39.89 0 19.83283805847168 5.260292 . . 23125 39.7 0 19.724897384643555 5.24592 . . 23126 39.21 0 19.77507209777832 5.242687 . . 23127 39.67 . 19.695819854736328 5.239677 . . 23129 38.32 0 19.797531127929688 5.233493 . . 23130 39.91 0 19.800756454467773 5.232412 . . 23131 39.5 0 20.20104217529297 5.228647 . . 23132 39.67 0 19.692909240722656 5.224076 . . 23133 38.88 1 19.416004180908203 5.216482 . . 23134 38.84 1 19.470102310180664 5.214991 . . 23136 36.46 1 19.818706512451172 5.196811 . . 23137 38.33 1 19.70737648010254 5.196811 . . 23138 40.12 1 19.633573532104492 5.196811 . . 23139 39.49 1 19.726415634155273 5.194838 . . 23140 39.57 1 19.899555206298828 5.185842 . . 23141 39.27 1 19.757858276367188 5.1806 . . 23142 . . . 5.172689 . . 23143 36.91 1 19.942378997802734 5.160373 . . 23144 36.6 1 19.805936813354492 5.155474 . . 23145 38.43 1 19.832406997680664 5.144033 . . 23146 38.24 0 20.069339752197266 5.142966 . . 23147 37.15 0 19.92653465270996 5.130842 . . 23148 37.66 0 19.80196189880371 5.131242 . . 23150 35.47 0 19.778575897216797 5.098516 . . 23151 36.09 0 19.77507209777832 5.095707 . . 23152 37.39 0 20.032516479492188 5.096595 . . 23153 36.43 0 19.95470428466797 5.081789 . . 23154 36.91 0 19.929723739624023 5.068685 . . 23155 36.58 0 19.649620056152344 5.05501 . . 23157 34.13 0 19.59551239013672 5.032358 . . 23158 33.23 0 19.86675262451172 5.03197 . . 23159 35.03 0 19.80817413330078 5.031382 . . 23160 37.37 0 19.947607040405273 5.019902 . . 23161 38.32 0 19.996273040771484 5.007681 end format %td sample_collect_date label values classification_mm class label def class 0 "Decreasing", modify label def class 1 "Plateau", modify label def class 2 "Increasing", modify
Code:
tsset(sample_collect_date)
Code:
arima D.ln_newcases_mm classification_mm flow_rate_mm ln_pmmov_conc_mm, ar(3) ma(1 7) //now predict the remaining cases predict d_newcase_est arima D.ln_newcases_mm classification_mm flow_rate_mm ln_pmmov_conc_mm, ar(3) ma(1 7) predict newcase_est_2, y
Here's the -tsline- graph:
I'm having a really hard time understanding why the non-differenced model predicted values (produced by predict newcase_est_2, y and shown in goldenrod) are not going past approximately mid-April 2023, but the differenced model predicted values (produced by predict d_newcase_est and shown in seafoam green) are. This is likely some embarrassingly simple mathematical thing, but I would be so grateful if someone could explain. I've also tried using the dynamic() option of predict, to no avail.

Comment