Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dropping very specific cells over multiple columns

    Dear All,

    I would like some help editing a dataset I have, which I have included an example of below:

    v103 v106 v109 v112 v115 v118
    14 41 22 20 23 17
    14 41 22 20 23 17
    14 41 22 20 23 17
    14 41 22 20 23 17
    14 41 22 20 23 17
    14 41 22 20 23 17
    14 41 22 20 23 17
    14 41 22 20 23 17
    14 41 22 20 23 17
    14 41 22 20 23 17
    14 41 22 20 23 17
    14 41 22 20 23 17
    14 41 22 20 23 17
    14 41 22 20 23 17
    14 41 22 20 23 17
    14 41 22 20 23 17
    14 41 22 20 23 17
    14 41 22 20 23 17

    This is considered one group of data points.


    What I would like the table to look like is this:

    v103 v106 v109 v112 v115 v118
    14 0 0 0 0 0
    14 0 0 0 0 0
    14 0 0 0 0 0
    0 41 0 0 0 0
    0 41 0 0 0 0
    0 41 0 0 0 0
    0 0 22 0 0 0
    0 0 22 0 0 0
    0 0 22 0 0 0
    0 0 0 20 0 0
    0 0 0 20 0 0
    0 0 0 20 0 0
    0 0 0 0 23 0
    0 0 0 0 23 0
    0 0 0 0 23 0
    0 0 0 0 0 17
    0 0 0 0 0 17
    0 0 0 0 0 17

    The issue is this is a very big dataset with 13,176 rows and 36 columns in total, and one column could have several groups of the first table shown above. I have looked at resources online, which only explain how to delete all the variables using drop. I did find an old Statalist post with a similar question, but the post asked about dropping missing values while I'm trying to drop a certain number of values.

    Does anyone have any suggestions or advice on how to approach this problem? Thank you very much in advance!

    Lastly, I'm using Stata 15.1.



    Best,
    Helen

  • #2
    Well, you are not looking to -drop- anything here. You are looking to replace the existing values by zeroes. And that's a good thing, because in Stata you can drop whole observations or whole variables, but it is not possible to drop individual cells (or groups of cells other than observations or variables.)

    There is an apparent pattern to what you want to do. Your variable names increment by 3, and you also want the non-zeroes to be retained in groups of three within a variable. It's also true that the variables you are showing are all actually constants. I don't know if this is coincidental to the example you show, or if it is, in fact, the general pattern. I will assume that at least the incrementing by threes is a general description of the situation. If it is not, please post back with a fuller explanation of what you are looking for.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte(v103 v106 v109 v112 v115 v118)
    14 41 22 20 23 17
    14 41 22 20 23 17
    14 41 22 20 23 17
    14 41 22 20 23 17
    14 41 22 20 23 17
    14 41 22 20 23 17
    14 41 22 20 23 17
    14 41 22 20 23 17
    14 41 22 20 23 17
    14 41 22 20 23 17
    14 41 22 20 23 17
    14 41 22 20 23 17
    14 41 22 20 23 17
    14 41 22 20 23 17
    14 41 22 20 23 17
    14 41 22 20 23 17
    14 41 22 20 23 17
    14 41 22 20 23 17
    end
    
    forvalues i = 3(3)18 {
        local varnum = `i' + 100
        replace v`varnum' = 0 if !inrange(_n, `=`i'-2', `i')
    }
    In the future, when showing data examples, please use the -dataex- command to do so, as I have here. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Finally, I am very curious why you want to do this. It is one of the oddest data management operations I have encountered and I am unable to picture what purpose it serves.





    Comment


    • #3
      Hi Clyde,

      Thank you very much for your response and my apologies for the wrong format of the example. It was my first time posting and I will keep that in mind for future posts!

      While the code was able to run, it wasn't completely what I was looking for. I ended up having to do everything manually because there is a time crunch on this project. I also agree the data management could have been better. Once again, I really appreciate your help!

      Best,
      Helen

      Comment

      Working...
      X