Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Duplicates: 1 copy of 90814 observations and 0 surplus

    Hello, I need help turning a dataset into panel data. I am using a dataset from OECD https://stats.oecd.org/Index.aspx?Da...=AIR_EMISSIONS and look only at the observations for Sulphur. Using the comman "duplicates report" I get 1 copy of 90814 observations and 0 surplus. So now I can't turn them into panel data because of duplicates. But if I drop the duplicates only observations from one country remains. What can I do to fix this problem?

  • #2
    That's good news -- so far as it goes.

    If there is one copy of each observation, it is ipso facto unique,

    Consider this silly example. An identifier created as such evokes the summary report.

    Code:
    . clear
    
    . set obs 10
    number of observations (_N) was 0, now 10
    
    . gen id = _n
    
    . duplicates report id
    
    Duplicates in terms of id
    
    --------------------------------------
       copies | observations       surplus
    ----------+---------------------------
            1 |           10             0
    --------------------------------------
    
    . list id
    
         +----+
         | id |
         |----|
      1. |  1 |
      2. |  2 |
      3. |  3 |
      4. |  4 |
      5. |  5 |
         |----|
      6. |  6 |
      7. |  7 |
      8. |  8 |
      9. |  9 |
     10. | 10 |
         +----+
    
    .
    The issue with that dataset is just that you need a reshape to get where you need to be. I downloaded the csv after which this worked for me -- although look at the file to see the metadata you might need.

    Code:
    replace pol = subinstr(pol, "PM2-5", "PM25", . )
    
    egen which = concat(pol var unitcode), p(_)
    
    drop pol* var* unit* flag* power* ref*
    
    reshape wide value, i(cou year) j(which) string
    
    rename value* * 
    
    egen id = group(cou), label 
    
    xtset id year

    Comment

    Working...
    X