Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • repeated time values within panel: combining rows with the same date

    Hi all,

    My question is about a panel data set for which in would like to regress a certain monthly score on holding period returns of equities. I'm getting to know Stata step by step and have done some reshaping and cleaning already but I cant seem to find the right way to handle this last part. Im dealing with 4 different monthly scores (0-100) and monthly holding period returns for 1000 equities in a time frame of 10 years.
    The problem is the structure of the current dataset has a separate time variable for each combination of date - firm - 1/4 scores, which causes 4 identical date observations per company & return combination. I would like to ask for a way to delete the 'missing values' and combine the 4 score variables with 1 date variable row per firm.
    The data currently looks like this:
    Click image for larger version

Name:	72C30136-65E5-42C3-A04A-013DEC2ABE73.jpeg
Views:	1
Size:	461.7 KB
ID:	1683217

    I think it might be important to mention that the data is not complete for all dates or scores for some firms, also the order of the (non-missing) scores is not consistent.

    Appreciate any help or tips, thank you very much in advance.
    Last edited by Sarah Sarah; 24 Sep 2022, 16:34.

  • #2
    Your data are not suitable for what you are asking. There are two problems, one of which is, I suspect, a typo, but the other is structural.

    The probable typo is the fact that the value of RET is not always the same for the same cusip on the same date. Look at cusip 00846U10 for date 2019m10. While the value appears to be -.011483718, for the observation where governance is not missing, there is an extra . at the end.

    The structural problem arises from the variable combined, which is pretty much always different among the observations for any given cusip on the same date. The only exception is cusip 04135620 in 2017m4. So it is unclear which of those values of combined you want to retain when you reduce to one observation per cusip-date combination, or how you want to reconcile the differences. Or maybe you don't want to retain the variable combined?

    Anyway, something went wrong in creating this data set and you cannot proceed with it. You need to fix these inconsistency problems first. Go back over the data management that created this data to see how this situation arose and fix that, along with any other errors you may find in the process.

    Since you are going to need to review the data management that created this data, let me also point out that the kind of data you are showing with different variables having their only non-missing values in different observations typically results from having combined different data sets using -append- when -merge- would have been appropriate. So if you find that aspect of the creation of this data and change it, you won't even need to do what you asked for in #1. (But that still won't exempt you from fixing the problem of the inconsistency of the variable combined, and the probable typo in ret.)

    In the future, when showing data examples, please use the -dataex- command to do so. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Comment

    Working...
    X