Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Unbalanced Panel Data - Regional Code for only the last year

    Dear professors, fellow PhD candidates, and members of this forum, I just subscribed to the forum however, your questions and answers helped me out through my academic years.

    Currently I am doing research for my PhD thesis. I am using an unbalanced data which includes firm level dataset, their industries their years, etc.

    I am using two different surveys from 2009 to 2019 (11 years in total). My question is, recently the statistical agency added the regional codes in their 2019 dataset which did not exist in the previous 10 years datasets. We have firm IDs but I do not know how to change the empty cells from earlier years' data. How can I match previous years firm data to their regions accordingly?

    I have been using many dummies so far and I am very used to create dummies however for this one, I am struggling.

    Thank you for your understanding, I hope I was able to explain the situation and it was clear.

    Sincerely from Istanbul, Turkey.

  • #2
    I don't see any role for dummy variables in this. Assuming you have a variable firm_id, and a numeric variable called region, and the data from all of the survey years are in a single data set, then it's just:
    Code:
    by firm_id (region), sort: replace region = region[1]
    That said, you did not show example data so this code is written for the data set I imagine you have. As such, it is untested, and might not work in your real data. In the future, when asking for help with code, always show example data. And use the -dataex- command to do that. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    As an aside, it is unfortunate that you are comfortable working with dummy variables, because, unless you are using an ancient version of Stata, there is almost no need for them. Nearly all of what you do with dummy variables can be done more easily, more quickly, more transparently, and with less risk of making mistakes, by using factor-variable notation instead. There are other advantages to factor-variable notation, too, but I won't go into them here. I refer you to -help fvvarlist-.

    Comment

    Working...
    X