Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Regression results far lower than expected - right approach?

    Dear Stata list community,

    following situation - I am writing my Master Thesis, which is valuation of the Bumble IPO (Bumble is a online dating website)

    As a smaller part, Thought it would make in grading the difference if I add an empirical research about the impact of covid measures on the online dating market in the US
    (I was deciding on either just looking at Bumble downloads / revenues or US downloads / revenues) both retrieved from a data analytics tool
    the Covid information I retrieved from WHO

    I deleted the data prior to 31/01/2020 because this is where the first cases started, the first vaccinations started in 2021, so its not shown in the example below.

    I created the ln of downloads to avoid skewness - first I tried the regression with xtreg but only got omitted values, which is why I used reg.
    However, the results are FAR lower than expected (I am saying this because broker always mentioned in broker reports, that an increase vaccinations / increase in mobility will increase downloads of online dating apps)

    Of course, it could just be the case that there is only a very very small impact, but I am questioning rather the approach I used.
    Do you have tipps or suggestions what to do differently?

    E.g., should I use LOG also for the independent variables, as they are also quite big and fluctuating?
    should I use another approach than classic reg?
    Unfortunately there is not much research about looking at the impact of mobility data / vaccination data on online dating and my Prof is a corporate finance prof, so not used at all to STATA.

    Thank you very much for your help - much appreciated!

    Best,
    Pauline

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long downloads_market float lndownloads_market long(total_cases total_vaccinations)
    349322  12.76375       8 .
    419778  12.94748       8 .
    410355 12.924778       9 .
    323442 12.686775      10 .
    304240 12.625572      13 .
    312402 12.652046      17 .
    332996 12.715886      19 .
    344259  12.74915      19 .
    412194  12.92925      19 .
    420423 12.949017      20 .
    342605 12.744333      20 .
    315655 12.662405      20 .
    302375 12.619423      20 .
    297856 12.604365      20 .
    309883  12.64395      22 .
    387060 12.866335      23 .
    409318 12.922248      24 .
    335484  12.72333      24 .
    314410 12.658453      24 .
    317562 12.668428      26 .
    300150 12.612038      31 .
    311941  12.65057      34 .
    398717 12.896008      35 .
    417589 12.942253      40 .
    327214  12.69837      48 .
    306061  12.63154      48 .
    293169 12.588505      52 .
    284897 12.559883      56 .
    303564 12.623347      64 .
    369591 12.820152      69 .
    380383 12.848934      73 .
    309360  12.64226      82 .
    305225 12.628804     100 .
    304429 12.626193     135 .
    292680 12.586835     186 .
    300948 12.614693     256 .
    373907 12.831762     334 .
    384436 12.859532     464 .
    308904 12.640786     610 .
    291723  12.58356     822 .
    283307 12.554286    1212 .
    268841 12.501876    1709 .
    283359  12.55447    2234 .
    360911 12.796387    2961 .
    351178  12.76905    3929 .
    267055  12.49521    5148 .
    251286 12.434347    7283 .
    259075 12.464873    9652 .
    242240 12.397684   12881 .
    266414 12.492806   17743 .
    331464 12.711274   23856 .
    336786 12.727203   31415 .
    266837 12.494393   40525 .
    250916 12.432874   51290 .
    248241 12.422155   62044 .
    249382 12.426742   73376 .
    274532 12.522823   87491 .
    347143 12.757492  105965 .
    364568 12.806468  126309 .
    267250  12.49594  146982 .
    250079 12.429532  173143 .
    243658  12.40352  188679 .
    252228 12.438088  211939 .
    277761 12.534516  240613 .
    364321  12.80579  270403 .
    361539 12.798125  302460 .
    289566 12.576138  334594 .
    271952  12.51338  361578 .
    272584 12.515702  390207 .
    274151 12.521435  419562 .
    294180 12.591948  453710 .
    376700 12.839205  488453 .
    375375  12.83568  521632 .
    278342 12.536606  553167 .
    278400 12.536814  580261 .
    250954 12.433025  605505 .
    259665 12.467148  630764 .
    281926   12.5494  655972 .
    371005  12.82397  687466 .
    367160 12.813553  717782 .
    288604  12.57281  746063 .
    259612 12.466944  772133 .
    263134  12.48042  798561 .
    264060 12.483932  824488 .
    285927 12.563492  854958 .
    376521  12.83873  888169 .
    368201 12.816384  922947 .
    280147  12.54307  956550 .
    288791  12.57346  982613 .
    277703 12.534307 1006371 .
    281986 12.549613 1030306 .
    283461  12.55483 1057080 .
    363533 12.803625 1088301 .
    365892 12.810094 1121383 .
    287396 12.568616 1149675 .
    264130 12.484197 1175602 .
    269427 12.504053 1197926 .
    258022   12.4608 1220804 .
    283223  12.55399 1246244 .
    349013 12.762864 1274796 .
    end



  • #2
    Pauline:
    as per FAQ, please report what you typed and what Stata gave you back. Thanks.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Click image for larger version

Name:	Screenshot 2023-11-05 at 15.23.04.png
Views:	1
Size:	94.7 KB
ID:	1732730
      Code:
      //Data cleaning
      sort date
      
      by date: ereplace retail_and_recreation = max(retail_and_recreation)
      by date: ereplace grocery_and_pharmacy = max(grocery_and_pharmacy)
      by date: ereplace parks = max(parks)
      by date: ereplace transit_stations = max(transit_stations)
      by date: ereplace workplaces = max(workplaces)
      by date: ereplace residential = max(residential)
      by date: ereplace daily_vaccinations = max(daily_vaccinations)
      by date: ereplace people_fully_vaccinated = max(people_fully_vaccinated)
      by date: ereplace total_vaccinations = max(total_vaccinations)
      by date: ereplace total_cases = max(total_cases)
      by date: ereplace new_cases = max(new_cases)
      by date: ereplace total_tests = max(total_tests)
      by date: ereplace new_tests = max(new_tests)
      by date: ereplace facial_coverings = max(facial_coverings)
      by date: ereplace stay_at_home_policy = max(stay_at_home_policy)
      by date: ereplace revenue_bumble = max(revenue_bumble)
      by date: ereplace downloads_bumble = max(downloads_bumble)
      by date: ereplace cumulative_rpd_bumble = max(cumulative_rpd_bumble)
      by date: ereplace revenue_tinder = max(revenue_tinder)
      by date: ereplace downloads_tinder = max(downloads_tinder)
      by date: ereplace cumulative_rpd_tinder = max(cumulative_rpd_tinder)
      by date: ereplace revenue_hinge = max(revenue_hinge)
      by date: ereplace downloads_hinge = max(downloads_hinge)
      by date: ereplace cumulative_rpd_hinge = max(cumulative_rpd_hinge)
      by date: ereplace revenue_match = max(revenue_match)
      by date: ereplace downloads_match = max(downloads_match)
      by date: ereplace cumulative_rpd_match = max(cumulative_rpd_match)
      by date: ereplace revenue_market = max(revenue_market)
      by date: ereplace downloads_market = max(downloads_market)
      by date: ereplace cumulative_rpd_market = max(cumulative_rpd_market)
      
      replace retail_and_recreation = 0 if retail_and_recreation ==.
      replace grocery_and_pharmacy = 0 if grocery_and_pharmacy ==.
      replace parks = 0 if parks ==.
      replace transit_stations = 0 if transit_stations ==.
      replace workplaces = 0 if workplaces ==.
      replace residential = 0 if residential ==.
      
      //generate non-essential places during covid   (transit_stations, workplaces, retal and recreation)
      egen non_essential = rowtotal(transit_stations workplaces retail_and_recreation)
      
      //Keep only one line per date
      sort date, stable
      by date: keep if _n == 1
      
      
      //Determination of the investigation period (Trump declared the U.S. outbreak a public health emergency on January 31 2020)
      
      gen cutoff_date = date("31/01/2020", "DMY")
      drop if date < cutoff_date
      drop cutoff_date
      
      
      
      
      //gen ln of revenue because of skewedness
      
      gen lnrevenue_bumble = ln(revenue_bumble)
      gen lnrevenue_market = ln(revenue_market)
      gen lndownloads_bumble = ln(downloads_bumble)
      gen lndownloads_market = ln(downloads_market)
      
      
      **REGRESSION
      //Regression
      //H1
      reg lndownloads_market total_vaccinations, vce(r)
      reg lndownloads_bumble total_vaccinations, vce(r)

      STATA REG:
      Last edited by Pauline Mueller; 05 Nov 2023, 07:23.

      Comment


      • #4
        Looking graphically at your data, it seems that there is only a weak relationship between downloads and total cases, regardless of whether you log transform the downloads variable. The relationship looks stronger, and closer to linear if you log-transform total_cases, so that might be a better bet. Of course, you cannot take the log of zero, so this excludes the data from before the pandemic erupted. Actually, it probably makes sense to consider the pre-pandemic and intra-pandemic processes as being different and modeling them separately. I think you will have difficulty finding a simple parametric model that fits both of these eras well.

        As for vaccinations, you are making a substantial error by coding this variable as missing value in the pre-vaccination era. The number of vaccinations in that area is not unknown. It is zero, and you should code it as 0 in the observations for that era.

        All of that said, I know nothing about evaluating IPOs, so I can't say how this all fits into your overall goal or how you would make use of the proposed analyses.

        Comment


        • #5
          Pauline:
          as an aside to Clyde's excellent advice, I'm afraid your regression model is too poor to be informative (as you can see, despite the remarkable sample size, the R_Sq is really low).
          You show us a simple linear regression (that is, an OLS with one predictor only).
          I would recommend you to discuss the model specification with your supervisor, just to avoid nuisances when the runway is in sight !
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment


          • #6
            Dear Clyde, thank you so much for your advice, I really appreciate your time!

            I was also thinking about separating pre-pandemic, intra-pandemic, and post pandemic

            But since there is no data pre / post on vaccination it might be difficult to state that vaccination had an influencing factor - it would be rather the statement "Covid in general had impact on online dating" and for this I would just compare the different periods, but I would not need a regression. What do you think?

            Code:
            corr downloads_market facial_coverings stay_at_home_policy non_essential lntotal_vaccinations
            (obs=671)
            
                         | downlo~t facial~s stay_a~y non_es~l lntot~ns
            -------------+---------------------------------------------
            downloads_~t |   1.0000
            facial_cov~s |  -0.1336   1.0000
            stay_at_ho~y |  -0.0781   0.6006   1.0000
            non_essent~l |   0.4210  -0.4115  -0.4553   1.0000
            lntotal_va~s |   0.0369  -0.7329  -0.6875   0.5124   1.0000
            
            . reg downloads_market lntotal_vaccinations non_essential stay_at_home_policy facial_coverings, v
            > ce(r)
            
            Linear regression                               Number of obs     =        671
                                                            F(4, 666)         =      23.67
                                                            Prob > F          =     0.0000
                                                            R-squared         =     0.2379
                                                            Root MSE          =      33098
            
            --------------------------------------------------------------------------------------
                                 |               Robust
                downloads_market | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
            ---------------------+----------------------------------------------------------------
            lntotal_vaccinations |  -10667.25    1928.15    -5.53   0.000    -14453.24   -6881.269
                   non_essential |   861.1042   110.3132     7.81   0.000     644.5007    1077.708
             stay_at_home_policy |   2888.208   3400.859     0.85   0.396    -3789.488    9565.904
                facial_coverings |  -8879.383   2148.556    -4.13   0.000    -13098.14   -4660.623
                           _cons |   560542.3   43807.13    12.80   0.000     474525.6    646559.1
            --------------------------------------------------------------------------------------
            Code:
            * Example generated by -dataex-. For more info, type help dataex
            clear
            input long downloads_market float(lntotal_vaccinations non_essential) byte(stay_at_home_policy facial_coverings)
            341189  17.45484  -84 2 4
            276702 17.491432 -115 2 4
            273405 17.533895 -107 2 4
            261750 17.584856  -91 2 4
            258194 17.637798  -95 2 4
            271123 17.688305  -93 2 4
            335024 17.715134  -66 2 4
            331554 17.724073  -90 2 4
            268429 17.755428 -100 2 4
            258350 17.793253  -99 2 4
            258774 17.835752  -94 2 4
            258242 17.878296 -101 2 4
            272266 17.922132  -96 2 4
            337155 17.946182  -74 2 4
            343975 17.955643  -83 2 4
            278952  17.97455 -135 2 4
            261930 17.999641 -123 2 4
            253641  18.02928 -108 2 4
            252157 18.056467 -121 2 4
            265697  18.08445 -103 2 4
            334151 18.102638  -60 2 4
            339249 18.110874  -64 2 4
            264264 18.130716  -92 2 4
            249671 18.156395  -85 2 4
            243892 18.188892  -80 2 4
            242057 18.225368  -82 2 4
            251278  18.26262  -81 2 4
            311802 18.285732  -53 2 4
            308777 18.295488  -58 2 4
            239650 18.318987  -83 2 4
            224399 18.346012  -82 2 4
            220762 18.376955  -76 2 4
            223473  18.40994  -79 1 4
            239508 18.442186  -78 1 4
            293563  18.46223  -50 1 4
            293945 18.471146  -51 1 4
            231257 18.491608  -82 1 4
            219047 18.516666  -79 1 4
            216059 18.544844  -77 1 4
            223494 18.575022  -76 1 4
            240834 18.604183  -74 1 4
            300223 18.622347  -42 1 4
            306224 18.630373  -50 1 4
            247223  18.64871  -81 1 4
            229951 18.671486  -75 1 4
            219173 18.695906  -65 1 4
            219060 18.721415  -73 1 4
            237601 18.745697  -69 1 4
            297082 18.761223  -32 1 4
            303143 18.768507  -32 1 4
            248635 18.785015  -69 1 4
            258886 18.806047  -70 1 4
            251465 18.829933  -70 1 4
            244261 18.855017  -70 1 4
            252701 18.879414  -67 1 4
            315511 18.894943  -34 1 4
            318479 18.902458  -43 1 4
            255835 18.919811  -69 1 4
            241676  18.94139  -67 1 4
            235200 18.965878  -66 1 4
            234764 18.992092  -62 1 4
            279313 19.012476  -73 1 4
            341929 19.025166  -38 1 4
            347939 19.029337  -72 1 4
            263302 19.046082  -67 1 4
            247202 19.067127  -64 1 4
            239936 19.089714  -62 1 4
            240578  19.11252  -62 1 4
            261429   19.1339  -66 1 4
            326151  19.14799  -36 1 4
            320046 19.154745  -39 1 4
            258481 19.169167  -67 1 4
            244370  19.18552  -66 1 4
            243909 19.202637  -66 1 4
            244612 19.219696  -68 1 4
            259203 19.235826  -67 1 4
            326913 19.246264  -37 1 4
            331781  19.25123  -34 1 4
            254606  19.26246  -67 1 4
            239776 19.276327  -65 1 4
            236824  19.29106  -67 1 4
            237560 19.305655  -65 1 4
            256040 19.319233  -64 1 4
            323971 19.328215  -39 1 4
            322041  19.33223  -37 1 4
            265313  19.34183  -66 1 4
            253175 19.353327  -64 1 4
            249020 19.365446  -63 1 4
            246365 19.377283  -64 1 4
            260354  19.38846  -59 1 4
            330265  19.39555  -31 1 4
            332922 19.398876  -30 1 4
            251693  19.40627  -64 1 4
            239938 19.415596  -63 1 4
            234723 19.425034  -60 1 4
            231805  19.43428  -58 1 4
            250938  19.44339  -57 1 4
            304681 19.448927  -29 1 4
            309428 19.451134  -31 1 4
            258221  19.45687  -65 1 4
            end

            Comment


            • #7
              Dear Carlo,

              Thank you very much for your comment. You are absolutely right that the model with only one predictor is very poor in information. But I had to get the data myself from publicly available sources. Although I would have liked to have more information on daily marriages/divorces or daily broadband data, smartphone usage etc as controls. I was mostly able to find data for the covid period only.

              After including other influencing factors, the number of observations is still small, but I was able to increase the Rsquared. However, this is not the main part of the master thesis, this empirical research should only indirectly influence the share price of Bumble and serves as a minor part of the overall analysis.

              Still I love to analyse empirical and love to widen my knowledge in STATA.

              Thank you for your advice, much appreciated!

              Best, Pauline

              Comment


              • #8
                But since there is no data pre / post on vaccination it might be difficult to state that vaccination had an influencing factor - it would be rather the statement "Covid in general had impact on online dating" and for this I would just compare the different periods, but I would not need a regression.
                I'm not sure what you mean when you say there is no data pre/post on vaccination. There certainly is data available about the uptake of vaccination over time, and it is fairly fine-grained both geographically and demographically. So you should be able to find information about the rates of vaccination in the target demographic for Bumble and how they vary over time, and see the extent to which that covaries with Bumble downloads.

                It is true that a simple pre-post comparison could support a conclusion that "Covid in general had impact on online dating." But it would be very weak support. While intuition is that during that time period Covid was the most important determinant of all manner of human behavior during that era, from a logical point of view this argument is subject to the criticism that there were, in fact, other things happening in the world at the same time and how do we know that they were not the driving influences. (Agreed that intuitively this seems far-fetched, but we are not talking about intuitions here, we are talking about trying to evaluate something scientifically.) In fact, it is almost never possible to completely exclude competing explanatory events. But, for example, showing that during the epidemic there was a relationship between, say incidence rates, mortality rates, or hospitalization rates from Ccovid over time and Bumble downloads would be more compelling--and you aleady have some evidence of that with regard to incidence rates in what you've shared here. So I think there are other things that you can pursue that would strengthen the argument.

                Comment


                • #9
                  I value your opinion a lot and it helps me to iterate and research more and more! Thank you!

                  I am investigating the US market only, as this is the most developed country for the online dating market. here I can only find vaccination data for US in general - even though it would be very helpful to have it more granular across different age clusters.

                  In fact, it is almost never possible to completely exclude competing explanatory events. But, for example, showing that during the epidemic there was a relationship between, say incidence rates, mortality rates, or hospitalization rates from Ccovid over time and Bumble downloads would be more compelling--and you aleady have some evidence of that with regard to incidence rates in what you've shared here.
                  I don't know if showing a correlation between these variables are useful for investigating the impact on the online dating market. It would be indeed helpful to have marriage / divorce / broadband usage during this time, but I don't have these information.

                  However, I included the mobility data during that time from Google and Covid regulatory policies as control variables.

                  My Hypothesis:

                  H1: All else equal, the average US dating apps downloads will decrease with rising vaccination rates
                  - rationale: e.g., people seek to meet people in real time again, no need to swipe online

                  H2: The negative effect of the growing number of vaccinations on the average US downloads of dating apps weakens with increasing covid cases

                  Code:
                  . reg downloads_market lndaily_vaccinations lnnew_cases stay_at_home_policy facial_coverings residential wor
                  > kplaces grocery_and_pharmacy retail_and_recreation, vce(robust)
                  
                  Linear regression                               Number of obs     =        671
                                                                  F(8, 662)         =      80.11
                                                                  Prob > F          =     0.0000
                                                                  R-squared         =     0.6325
                                                                  Root MSE          =      23054
                  
                  ---------------------------------------------------------------------------------------
                                        |               Robust
                       downloads_market | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
                  ----------------------+----------------------------------------------------------------
                   lndaily_vaccinations |  -9635.087   1313.795    -7.33   0.000     -12214.8   -7055.379
                            lnnew_cases |   10766.09   1037.934    10.37   0.000     8728.054    12804.13
                    stay_at_home_policy |   5753.986   2645.987     2.17   0.030     558.4476    10949.52
                       facial_coverings |   7502.247   1446.861     5.19   0.000     4661.257    10343.24
                            residential |  -7694.958   873.4544    -8.81   0.000    -9410.033   -5979.883
                             workplaces |   446.7504   338.1463     1.32   0.187     -217.218    1110.719
                   grocery_and_pharmacy |    334.679   358.5211     0.93   0.351    -369.2965    1038.654
                  retail_and_recreation |  -2114.613   360.6698    -5.86   0.000    -2822.807   -1406.418
                                  _cons |   300760.2   20966.71    14.34   0.000     259590.9    341929.5
                  I think I can't use xtreg because of the small number of observations... Do you think reg & vce(robust) is fine ? I also applied the logarithm for new cases, as it seemed to have a more linear relationship to downloads..
                  Fixed effects are also not possible to used due to the number of observations right? This is why I applied ice(robust) - would you do anything different here?

                  I will focus on the on the period 31.01.2020 - 21.10.2022 as this is the point in time where officials stopped to publish mobility data, vaccination data on a daily basis.

                  Code:
                  * Example generated by -dataex-. For more info, type help dataex
                  clear
                  input long downloads_market float(lndaily_vaccinations lnnew_cases) byte(retail_and_recreation grocery_and_pharmacy workplaces residential stay_at_home_policy facial_coverings)
                  341189 14.209035 12.009072 -27 -16 -18  7 2 4
                  276702 14.212294 11.840955 -29 -19 -37 14 2 4
                  273405 14.216315   11.6257 -26 -15 -36 13 2 4
                  261750  14.23675   11.7679 -20  -9 -31 11 2 4
                  258194  14.27234  11.70627 -23 -11 -32 11 2 4
                  271123 14.311616 11.596062 -24 -13 -30 11 2 4
                  335024 14.336114 11.716846 -21  -6 -11  6 2 4
                  331554  14.33439 11.734684 -31 -14 -17  6 2 4
                  268429 14.348606  11.60987 -26 -15 -33 11 2 4
                  258350 14.370917 11.439612 -25 -12 -33 12 2 4
                  258774 14.384303 11.298816 -22 -10 -32 11 2 4
                  258242 14.389502 11.423186 -26 -12 -34 13 2 4
                  272266 14.408416 11.496176 -24 -10 -33 13 2 4
                  337155 14.421847 11.508395 -23  -4 -15  8 2 4
                  343975 14.433304   11.4869 -24 -10 -20  7 2 4
                  278952  14.40295 11.353004 -34 -21 -52 18 2 4
                  261930 14.374315 11.118593 -33 -17 -45 16 2 4
                  253641  14.34583 10.969336 -27 -11 -39 13 2 4
                  252157 14.297775  10.94854 -33 -19 -42 17 2 4
                  265697  14.24027 11.063038 -26 -15 -35 14 2 4
                  334151  14.22454  11.13558 -19  -8 -10  6 2 4
                  339249  14.22552 11.206074 -19 -11 -13  5 2 4
                  264264 14.250896 11.089958 -22 -13 -31 11 2 4
                  249671 14.280054  10.94426 -18  -8 -31 10 2 4
                  243892 14.329224 10.929906 -15  -8 -30 10 2 4
                  242057  14.41776 11.158562 -17  -9 -30 10 2 4
                  251278 14.503937 11.189382 -18 -11 -29 10 2 4
                  311802  14.55191 11.184394 -15  -6 -10  5 2 4
                  308777   14.5692 11.201962 -16 -10 -13  4 2 4
                  239650 14.610542 11.131636 -17  -9 -30  9 2 4
                  224399  14.64404   10.8699 -16  -7 -30 10 2 4
                  220762  14.66751 10.841814 -12  -4 -30  9 2 4
                  223473  14.68345 10.892303 -15  -6 -30  9 1 4
                  239508 14.690618 11.098062 -16  -8 -29  9 1 4
                  293563 14.694923  11.07997 -13  -4 -11  4 1 4
                  293945 14.699476 11.068106 -12  -6 -13  3 1 4
                  231257  14.70397 10.963185 -16  -7 -31  9 1 4
                  219047 14.718522  10.68643 -14  -5 -31  9 1 4
                  216059 14.731704 10.662024 -13  -6 -30  9 1 4
                  223494 14.746383 10.898368 -14  -5 -30  9 1 4
                  240834   14.7582  11.03093 -14  -7 -30  9 1 4
                  300223 14.765597 10.998158 -10  -2 -11  3 1 4
                  306224  14.76848   11.0701 -11  -7 -15  4 1 4
                  247223 14.774426 10.864254 -14  -7 -33 10 1 4
                  229951 14.783678 10.762785 -11  -3 -32  9 1 4
                  219173 14.785355 10.681252  -4  -2 -31  8 1 4
                  219060  14.78174  10.85526 -10  -5 -32 10 1 4
                  237601   14.7745  10.99521 -10  -6 -31  8 1 4
                  297082 14.772497  11.05246  -6  -2 -10  2 1 4
                  303143   14.7748 11.035244  -4  -3 -12  2 1 4
                  248635 14.778866  10.97603  -9  -5 -30  8 1 4
                  258886 14.787873 10.719472  -9  -4 -30  8 1 4
                  251465 14.808052 10.783695  -9  -5 -30  9 1 4
                  244261  14.83017 10.962735 -10  -4 -30  9 1 4
                  252701 14.855364 11.127572 -10  -6 -30  8 1 4
                  315511 14.870922 11.087772  -8  -2 -10  2 1 4
                  318479 14.880028 11.223027  -8  -5 -14  3 1 4
                  255835  14.90327 11.054107  -8  -3 -31  8 1 4
                  241676 14.928617 10.882866  -7  -1 -31  8 1 4
                  235200  14.95725  10.96844  -6  -2 -31  8 1 4
                  234764   14.9912  11.03148  -4   3 -31  8 1 4
                  279313 14.983832 11.150045  -6   3 -40 10 1 4
                  341929 14.976334 11.207555  -6  11 -14  2 1 4
                  347939 14.956147 11.162601 -28 -11 -23  1 1 4
                  263302 14.968364  11.06186  -6  -1 -34  8 1 4
                  247202 14.985447 10.730335  -6   0 -31  7 1 4
                  239936  14.99374 10.940225  -4   0 -31  7 1 4
                  240578 14.990297 11.061265  -6  -1 -30  8 1 4
                  261429  15.01943 11.220258 -10  -5 -30  7 1 4
                  326151 15.044286 11.231013  -7  -2 -12  2 1 4
                  320046 15.070593 11.243935  -6  -4 -14  2 1 4
                  258481  15.06747 11.128894  -9  -3 -29  7 1 4
                  244370 15.047266 10.882546  -8  -1 -29  7 1 4
                  243909 15.019733 11.009324  -7  -3 -30  8 1 4
                  244612 14.987386  11.18728 -10  -3 -29  8 1 4
                  259203  14.95588  11.16555 -11  -6 -29  8 1 4
                  326913 14.931624 11.177424  -8  -2 -12  2 1 4
                  331781  14.91909 11.171477  -5  -2 -13  2 1 4
                  254606  14.89829  11.25431 -10  -3 -29  8 1 4
                  239776  14.88633  10.80643  -8  -1 -29  8 1 4
                  236824  14.87565   10.4711  -9  -3 -29  8 1 4
                  237560  14.86318  10.95637  -9  -2 -28  8 1 4
                  256040 14.847864 11.051667 -11  -5 -28  7 1 4
                  323971 14.839994  11.02645  -9  -2 -12  2 1 4
                  322041 14.832767 11.041625  -6  -3 -13  2 1 4
                  265313 14.822848 10.853174  -9  -2 -29  7 1 4
                  253175 14.805206  10.61759  -8   0 -29  7 1 4
                  249020  14.78408 10.315564  -7  -1 -29  7 1 4
                  246365 14.759535 10.826018  -8  -1 -29  8 1 4
                  260354 14.737804 10.904046  -8  -1 -28  6 1 4
                  330265  14.71812 10.926712  -6   2 -12  1 1 4
                  332922 14.711512 10.920727  -4   1 -13  1 1 4
                  251693  14.68623 10.786263  -8   1 -29  7 1 4
                  239938 14.662437 10.507612  -7   3 -29  7 1 4
                  234723 14.629167 10.402838  -4   3 -29  7 1 4
                  231805  14.59519 10.658483  -6   3 -28  6 1 4
                  250938 14.568467 10.704165  -7   1 -27  6 1 4
                  304681  14.54604 10.661767  -3  11 -12  0 1 4
                  309428 14.527622 10.706408  -2   7 -14  0 1 4
                  258221  14.50202  10.51924  -9   0 -29  7 1 4
                  end

                  Comment


                  • #10
                    Pauline:
                    1) the only issue with the -fe- estimator is that it does not return the coefficiienys of time-invariant predictors;
                    2) the way you coded -regress- means that yu're investigating a cross-sectional study (that is, one wave of data for each id) correcting the standard errors for heteroskedasticity. All in all, this has neither anything to do with a -fe- estimator, not it works arount -fe- estimator nuisances;
                    3) you can (inefficiently) apply the -fe- estimator to -regress- if you plug in, among the predictors, a categorical variable for the panel -i.id- (possibly, -i.market- in your case);
                    4) to go -regress- taking serial correlation of the espilon into account, you shuould go (without other details, I clustered the standard errors on -lndaily_vaccinations-):
                    Code:
                    . reg downloads_market lndaily_vaccinations lnnew_cases stay_at_home_policy facial_coverings residential workplaces grocery_and_pharmacy retail_and_recreation, vce(cluster lndaily_vaccinations )
                    note: facial_coverings omitted because of collinearity.
                    
                    Linear regression                               Number of obs     =        100
                                                                    F(7, 99)          =      56.07
                                                                    Prob > F          =     0.0000
                                                                    R-squared         =     0.8045
                                                                    Root MSE          =      16922
                    
                                              (Std. err. adjusted for 100 clusters in lndaily_vaccinations)
                    ---------------------------------------------------------------------------------------
                                          |               Robust
                         downloads_market | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
                    ----------------------+----------------------------------------------------------------
                     lndaily_vaccinations |   4268.889   12952.85     0.33   0.742    -21432.38    29970.16
                              lnnew_cases |    32054.1   9656.118     3.32   0.001     12894.27    51213.93
                      stay_at_home_policy |   3526.275   10576.96     0.33   0.740    -17460.71    24513.26
                         facial_coverings |          0  (omitted)
                              residential |  -7102.283   1637.083    -4.34   0.000    -10350.61   -3853.955
                               workplaces |   1056.072   632.1225     1.67   0.098    -198.1965     2310.34
                     grocery_and_pharmacy |   1109.773   760.2271     1.46   0.148    -398.6822    2618.229
                    retail_and_recreation |  -2927.162   628.2612    -4.66   0.000    -4173.768   -1680.555
                                    _cons |  -104280.8   199443.8    -0.52   0.602    -500020.4    291458.9
                    ---------------------------------------------------------------------------------------
                    5) however, your model seems misspecified as per the results of the -linktest- (more detais in the related entry in Stata -pdf manual):
                    Code:
                    . linktest
                    
                          Source |       SS           df       MS      Number of obs   =       100
                    -------------+----------------------------------   F(2, 97)        =    211.20
                           Model |  1.0956e+11         2  5.4782e+10   Prob > F        =    0.0000
                        Residual |  2.5161e+10        97   259387410   R-squared       =    0.8132
                    -------------+----------------------------------   Adj R-squared   =    0.8094
                           Total |  1.3472e+11        99  1.3609e+09   Root MSE        =     16106
                    
                    ------------------------------------------------------------------------------
                    downloads_~t | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
                    -------------+----------------------------------------------------------------
                            _hat |  -.8765903   .8800735    -1.00   0.322    -2.623293     .870112
                          _hatsq |   3.32e-06   1.56e-06     2.14   0.035     2.35e-07    6.41e-06
                           _cons |     260764   122823.1     2.12   0.036     16994.08    504533.8
                    ------------------------------------------------------------------------------
                    
                    .
                    Kind regards,
                    Carlo
                    (Stata 19.0)

                    Comment


                    • #11
                      So I can not use my regression, as I used it above? Is it necessary for this time period to employ fixed effects?

                      So basically you're saying my data is not "sophisticated" enough for this regression? Should I stop here, as I will not find more public available data, or is there a way to still do the regression and investigate the two hypothesis?
                      The number of observation shrunk to only 100 in the regression you showed above..

                      Comment


                      • #12
                        Pauline:
                        1) if you have one wave of data only for each -id-, you can only use -regress- (assuming your regressand is continuous), as panel data regression requires at least two observation per -id-;
                        2) you invoked -robust- standard errors, that, in -regress-, takes heteroskedasticity only into account. To take serial correlation into account, you should go -vce(cluster clusterid)-. Obviously, you should have detected heteroskedasticity and/or serial correlation in your data;
                        3) the -linktest- result is telling you that your regression may need more predictors and/or interactions among its predictors and/or a different functional form of the regressand (say, natural log).

                        What above, as you wisely pointed out, holds for the excerpt you shared. It may well be (and I do hope so) that some issues will disappear when your regression is run on the whole sample.
                        Kind regards,
                        Carlo
                        (Stata 19.0)

                        Comment


                        • #13
                          Dear Carlo

                          1) okay I have only one observation per -id - thank you

                          2)
                          Code:
                          reg downloads_market lndaily_vaccinations lnnew_cases stay_at_home_policy facial_coverings residential wor
                          > kplaces grocery_and_pharmacy retail_and_recreation, vce(cluster lndaily_vaccinations)
                          
                          Linear regression                               Number of obs     =        623
                                                                          F(8, 622)         =      77.65
                                                                          Prob > F          =     0.0000
                                                                          R-squared         =     0.6249
                                                                          Root MSE          =      23315
                          
                                                    (Std. err. adjusted for 623 clusters in lndaily_vaccinations)
                          ---------------------------------------------------------------------------------------
                                                |               Robust
                               downloads_market | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
                          ----------------------+----------------------------------------------------------------
                           lndaily_vaccinations |  -10448.79   1894.866    -5.51   0.000     -14169.9   -6727.684
                                    lnnew_cases |   10812.03   1098.477     9.84   0.000     8654.854     12969.2
                            stay_at_home_policy |   5828.911   2764.407     2.11   0.035     400.2091    11257.61
                               facial_coverings |   8076.785   1820.666     4.44   0.000     4501.389    11652.18
                                    residential |  -7650.552   880.6573    -8.69   0.000    -9379.974   -5921.131
                                     workplaces |   473.3282   359.8937     1.32   0.189    -233.4258    1180.082
                           grocery_and_pharmacy |   503.1539   432.3106     1.16   0.245    -345.8112    1352.119
                          retail_and_recreation |  -2211.527   394.3143    -5.61   0.000    -2985.876   -1437.178
                                          _cons |   309402.6   23762.65    13.02   0.000     262737.8    356067.3
                          perfect this worked. - I am actually not sure if I have time invariant predictors in my dataset, so I might leave fe simply out...

                          you can (inefficiently) apply the -fe- estimator to -regress- if you plug in, among the predictors, a categorical variable for the panel -i.id- (possibly, -i.market- in your case)
                          I did not understand this part for market I could categorise it further into the main leader of the dating market - as I have the daily information of downloads of these as well - is this what you were thinking of?

                          3) How do I best test if I should use a different functional form? Unfortunately I do not have access to additional data / predictors

                          Many thanks for your patience. Means a lot. (I am really trying to get wiser)



                          Comment


                          • #14
                            Pauline:
                            1) one observation per -id - means that a panel data regression is out of debate in your case;
                            2) with inefficiently, I meant that, while you can apply the -fe- estimator via -regress-, you get less information vs .xtreg,fe-, as you can see from the following toy-example:
                            Code:
                            . use "https://www.stata-press.com/data/r17/nlswork.dta"
                            (National Longitudinal Survey of Young Women, 14-24 years old in 1968)
                            
                            . xtreg ln_wage c.age##c.age if idcode<=3, fe
                            
                            Fixed-effects (within) regression               Number of obs     =         39
                            Group variable: idcode                          Number of groups  =          3
                            
                            R-squared:                                      Obs per group:
                                 Within  = 0.6382                                         min =         12
                                 Between = 0.8744                                         avg =       13.0
                                 Overall = 0.2765                                         max =         15
                            
                                                                            F(2,34)           =      29.99
                            corr(u_i, Xb) = -0.2473                         Prob > F          =     0.0000
                            
                            ------------------------------------------------------------------------------
                                 ln_wage | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
                            -------------+----------------------------------------------------------------
                                     age |   .2512762   .0450106     5.58   0.000     .1598037    .3427487
                                         |
                             c.age#c.age |  -.0037603   .0007625    -4.93   0.000    -.0053098   -.0022107
                                         |
                                   _cons |  -2.189815   .6402959    -3.42   0.002    -3.491053   -.8885773
                            -------------+----------------------------------------------------------------
                                 sigma_u |  .31366066
                                 sigma_e |  .19867104
                                     rho |  .71367959   (fraction of variance due to u_i)
                            ------------------------------------------------------------------------------
                            F test that all u_i=0: F(2, 34) = 29.72                      Prob > F = 0.0000
                            
                            . reg ln_wage c.age##c.age i.idcode if idcode<=3
                            
                                  Source |       SS           df       MS      Number of obs   =        39
                            -------------+----------------------------------   F(4, 34)        =     24.28
                                   Model |  3.83375281         4  .958438203   Prob > F        =    0.0000
                                Residual |  1.34198615        34  .039470181   R-squared       =    0.7407
                            -------------+----------------------------------   Adj R-squared   =    0.7102
                                   Total |  5.17573896        38  .136203657   Root MSE        =    .19867
                            
                            ------------------------------------------------------------------------------
                                 ln_wage | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
                            -------------+----------------------------------------------------------------
                                     age |   .2512762   .0450106     5.58   0.000     .1598037    .3427487
                                         |
                             c.age#c.age |  -.0037603   .0007625    -4.93   0.000    -.0053098   -.0022107
                                         |
                                  idcode |
                                      2  |  -.4231615   .0816747    -5.18   0.000    -.5891444   -.2571786
                                      3  |  -.6126416   .0809386    -7.57   0.000    -.7771285   -.4481546
                                         |
                                   _cons |   -1.82398   .6366167    -2.87   0.007    -3.117741   -.5302195
                            ------------------------------------------------------------------------------
                            
                            .
                            3) you may want to try to log your dependent variable and see what happens and/or you may want to consider interacting some of your predictors.
                            Kind regards,
                            Carlo
                            (Stata 19.0)

                            Comment


                            • #15
                              To whom it may still read this,

                              I talked with my supervisor and she just said I should group the data to perform the standard errors test by week but she didn#t want to look at any code as she's not a STATA specialist... - so my current regression looks like this:
                              Code:
                              reg downloads_market lndaily_vaccinations lnnew_cases int_mobility_measures residential workplaces grocery_and_pharmacy leisure, vce(cluster week_cluster)
                              with following outcome:
                              Code:
                              Linear regression                               Number of obs     =        671
                                                                              F(8, 96)          =      64.66
                                                                              Prob > F          =     0.0000
                                                                              R-squared         =     0.6325
                                                                              Root MSE          =      23054
                              
                                                                (Std. err. adjusted for 97 clusters in week_cluster)
                              --------------------------------------------------------------------------------------
                                                   |               Robust
                                  downloads_market | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
                              ---------------------+----------------------------------------------------------------
                              lndaily_vaccinations |  -9635.087   1617.302    -5.96   0.000    -12845.41   -6424.768
                                       lnnew_cases |   10766.09   1292.844     8.33   0.000      8199.82    13332.37
                               stay_at_home_policy |   5753.986   3507.658     1.64   0.104     -1208.66    12716.63
                                  facial_coverings |   7502.247    1967.25     3.81   0.000     3597.287    11407.21
                                       residential |  -7694.958   1042.531    -7.38   0.000    -9764.366   -5625.551
                                        workplaces |   446.7504   355.9763     1.26   0.213     -259.857    1153.358
                              grocery_and_pharmacy |    334.679   368.6927     0.91   0.366    -397.1702    1066.528
                                           leisure |  -2114.613   346.5121    -6.10   0.000    -2802.434   -1426.792
                                             _cons |   300760.2   24848.71    12.10   0.000     251435.9    350084.5
                              --------------------------------------------------------------------------------------
                              
                              .
                              I've done the link test, I've plotted the variables against the residuals, but the model still has misspecification. I've tried to see if I notice any effect when I apply a log to the dependent variable, I've tried interacting different variables, I've tried logging other independent variables, but to be honest, I don't see any improvement.... Maybe I just don't know what to look for, but I've never done this before... how can I address this misspecification? Do you have any other advice to work on this problem in a more structured way?

                              Thank you, as always very much appreciated.

                              Best, Pauline

                              Comment

                              Working...
                              X