Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Failing to understand how to pool and perform logit on differnt dataset( sadly this argument is not covered on guide, as long as i can see)

    Dear statalists,
    I'm trying to replicate the ou and Penman study on a bank index on different years. Summarized: take a bunch of indicators from the previous fiscal year and perform a logistic regression on binary variable( Abnormal return 1/ not abnormal return 0) ; once you estimate the model, predict the probabilty of 1 and 0. I performed this model in different years but there's a problem: very few banks for many indicators each year , so i was wondering i there was a way to pool all the variables and observation ( which are all the same for the years i have taken in account) and perform a model with this new " pooled sample" . Thank you in advance; I'll be gladly cite in my thesis whoever is able to help me.
    Best regards

  • #2
    Filippo:
    the issue your complaining about seems to rest on missing values.
    The model you use (pooled -logit- or, if your have a panel dataset, -xtlogit- sounds better, provide that your dataset shows evidence of a group-wise effect) has no bearing on missing values, that will be ruled out from the -e(sample)- in any case.
    That said, you may want to consider a missing values imputation.
    Otherwise, you can stick with your dataset and explain that the final sample is reduced due to the presence of missing values.
    I'd not sponsor a complete case analysis (that is, considering oly those observations with no missing values) as you may end up with a resulting sample that has nothing to do with the original one.
    Last but not least, see what others did in your research field in the past to deal with the very same issue.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Dear Carlo , Maybe I wasn't clear enough. I have no missing values.
      let me try to explain what my work is about: it's january the 31th 2021 and you decide to collect some balance sheet indicators across a discrete number of banks( nasdak banks, in this particular case) from the year 2021. Now that you have all your indicators( no missing values) I perform a logistic regression on a binary value that I created( now it's not important to specify how) and predict the probabilties. Now that you have this, repeat this steps for the previous fifteen years: gather data, regress predict.

      Now my problem is: Since every year I have about 300 firms and like 40 indicators , i was wondering: is there a way to pool my datasets or my models ? in order to increase n and have more significant results.
      Thank you Professor; btw tennis watcher like you!

      Comment


      • #4
        Filippo:
        thanks for clarifying.
        It seems to me that you may have:
        1) a retrospective repeated cross-sectional data (the same indicators each year, but not the same sample of banks): in this case, you can go pooled logit;
        2) a retrospective panel data analysis (the same indicators measured on the same sample of banks each year): in this case you can go -xtlogit- and check whether your dataset shows any evidence of group-wise effect;
        3) in both cases, the wave of data is annual;
        4) I'm under the impression that you asking for something like -collapse-. However, I cannot see how ths could help you;
        5) I would be more interested in applying the right tool to your analysis than having statistically significant results;
        6) if the points above do not help, please share an excerpt/example of your dataset via -dataex- (you can change the name of your variables, if confidential).

        Happy with reading that you're a tennis watcher too, but please stick with Carlo only (I'm simply one of this clan, like you). Thanks.

        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Bank name Binary Revenue Dividend per share Debt Price/Book
          bank 1 1 656 676776 767 7676677
          bank 2 0 776 7676766 776 5565656
          bank 3 0 468 34678 877 8788
          The sample is a bit tricky, a mix of 1 and 2: a retrospective repeated cross-sectional data but i had to change my observation every five years; the big problem here is there is no guide whatsoever on how to perform a pooled logit or how to construct the database in order to be read by the software. I'd love to perform a a panel data every five years or a pooled logit, but I lack the skill and the material to learn


          Here an example of one of the database ( this refer to one particular year and obviously has many more observations and variables)

          You are being very kind to me. If you don't have the time to help me, that's ok; You have already done so much for me!

          Filippo

          Comment


          • #6
            Filippo:
            first, I would -append- the different files to create an unique database.
            If you want to create a panel dataset:
            1) the banks should be the same in each wave (attrition can happen, though);
            2) assuming that it is not already present in your dataset, you should create a -timevar- (say; 2005; 2010; 2015....);
            3) -xtset- your dataset with -panelid- (mandatory) and -timevar- (optional but recommended);
            4) go -xtlogit- (conditional fe or re);
            5) if you do not have evidence of a panel-wise effect, go pooled logit.
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment


            • #7
              Dear Carlo,
              I sent you a private message in order not to create spam .

              Comment

              Working...
              X