Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Appending variables within the same dataset

    Hi,

    I have a dataset that looks something like this:
    ID v1a v1b v1c
    1 1 1 2
    2 1 3
    3 2
    4 5
    5 3 3 1

    Now, I want to append v1b and v1c on top of v1a in order to combine it into one variable of v1. The problem is this is within the same dataset, so I can't simply append using another DTA.

    Is there an easier way to do this other than to create new DTA files and then append?

    Thanks.

  • #2
    yes, see help reshape

    Code:
    // create the example data
    clear
    input ID     v1a     v1b     v1c
    1     1     1     2
    2     1     3   .    
    3     2     .   .    
    4     5     .   .    
    5     3     3     1
    end
    
    // do the reshape
    reshape long v1, i(ID) j(version) string
    
    // admire the result
    list, sepby(ID)
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      Thanks so much!

      Comment


      • #4
        I have a similar dataset and want to use this command. However, when I enter
        Code:
        insert id_pat_no v1 v2 v3
        , I get the error message "variable id_pat_no already defined". The ID is not a single number as was the case for David, but rather a series of 6 numbers. I would love some advice on how to proceed.

        Thanks!
        Yi

        Comment


        • #5
          what is the -insert- command and where does it come from?

          Comment


          • #6
            Originally posted by Rich Goldstein View Post
            what is the -insert- command and where does it come from?
            Sorry, I mean -input-

            Comment


            • #7
              Please provide an example of your data by using dataex, so that people will be able to answer this much easier. See more on how and why to use daatex in the FAQ: https://www.statalist.org/forums/help#stata

              Comment


              • #8
                Originally posted by Jorrit Gosens View Post
                Please provide an example of your data by using dataex, so that people will be able to answer this much easier. See more on how and why to use daatex in the FAQ: https://www.statalist.org/forums/help#stata
                I have tried the dataex command but my data set is too large. I know snapshots of data sets are generally discouraged but it is the only way I can think of. My data looks like this
                Click image for larger version

Name:	Capture.PNG
Views:	1
Size:	19.9 KB
ID:	1451482



                I'd like to calculate a rate (case/risk) for each disease for each country automatically using a loop command that applies the
                Code:
                gen rate_disease = *case/*risk
                where the wildcard (*) is the various diseases included in the data.

                I really appreciate your help!

                Comment


                • #9
                  You can select numbers of variables and observations to create a data example with dataex by doing e.g.,:
                  Code:
                  dataex Country Disease1_case Disease1_risk Disease2_case Disease2_risk in 1/20

                  Staring from your example, you coudl do:
                  Code:
                  foreach n of numlist 1/4 {
                  gen rate_disease_`n' = Disease`n'_case / Disease`n'_risk
                  }
                  edit: in the above code, replace the 4 with the number of diseases in your dataset

                  However, you could also consider whether your data is in the correct format for your analyses. Most Stata analyses would require long formats. You could get there by doing:
                  Code:
                  reshape long Disease, i(Country) j(disease) string
                  ren Disease value
                  gen var = substr(disease,3,.)
                  replace disease = substr(disease,1,strpos(disease, "_")-1)
                  destring disease, replace
                  reshape wide value, i(Country disease) j(var) string
                  ren valuecase case
                  ren valuerisk risk
                  gen rate_disease = case/risk
                  Edit no 2: You will also need to think what values need to be used when dividing by zero, which happens a lot with your data example
                  Last edited by Jorrit Gosens; 02 Jul 2018, 03:31.

                  Comment

                  Working...
                  X