Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generate a new ROW (NOT COLUMN) with consecutive numbers or add a consecutive number to each element in an existing row

    Hello,
    I want to create a new ROW (not column) in my dataset which contains numbers from 1,2,....to my last variable or add a consecutive number to each Observation in one row (e.g. row 1 for first column: add 1, row 1 for second column: add 2.....) . However, I can't figure out how to do it. I experimented a lot with "forvalues" but can't figure out how to do it.
    I would be very thankful for help.

  • #2
    If somebody else understands what you want, I hope he or she will respond to you directly. But I can't visualize at all what you want. I suggest you post a short example of the data that you have, work out by hand how the result you want would look for that data, and then post the way the data looks after your addition to it. It is often easier to illustrate what is wanted than to explain it in words.

    When you do post example data, please be sure to use the -dataex- command to do so. You can install -dataex- in your Stata by running -ssc install datatex-. Read the help file (-help dataex-) for the simple instructions on how to use it. That way whatever you post can be quickly and easily imported to Stata with a simple copy/paste operation, enabling whoever helps you to test out their code before responding. And the Stata data set that they use will be a completely faithful replica of the Stata data you showed.

    Comment


    • #3
      Sorry for the confusion. As I don't have much time at the moment, I just quickly try to make clear what I am looking for without using datatex:
      If the first row in my dataset is e.g. 14 18 133
      then after the Transition I want it to be 141 182 1333

      Comment


      • #4
        So for data like your example, this works:

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input int(vble_a vble_b vble_c vble_d vble_e vble_f vble_g vble_h vble_i vble_j vble_k vble_l vble_m vble_n vble_o vble_p vble_q vble_r vble_s vble_t)
         92  92  92  97 -86 101  96  79 -85  81  93 -84 105 -89  79  82 109 102  76 -86
        -89  93 -88 -90  91 102 -90  81  97  82  99  83 105  91  84 -90  74  94  79 -87
         91  92  74 -89  83 -90 106  98  91 -89 -90 -88 101  75 -86 102  98  79 -88  93
         92 -88  91 -85  93 -85 101 106  93  99  98 -85  78  92  83 -90 100 -88  97 100
        -86  97 -88 101 -89  95 114  95  82  93 -87  94  99 -89  92  79  82  77  92  82
        end
        
        //    CREATE A LIST OF ALL THE VARIABLES IN THE DATA SET
        list, noobs clean
        ds
        local all_vars `r(varlist)'
        
        local i = 1
        foreach v of varlist `all_vars' {
            local shift = floor(log10(`i')) + 1
            replace `v' = sign(`v')*((10^`shift')*abs(`v') + `i') in 1
            local ++i
        }
        
        list, noobs clean
        Notes:

        1. The data set I created here deliberately contains more than 9 variables, so you can see that it appropriately appends two digit numbers after variable 9. In fact, this will work even if you have thousands of variables; the correct number of digits will be appended to the end of the variable.

        2. This only works, and only makes sense to me, on the assumption that , as in this data set, all of the variables are numeric, and that the values in the first observation ("row") are integers.

        3. The code handles negative numbers correctly.

        That done, I have to say that I have been unable to imagine any circumstance in which this would be a useful thing to do. I'm really curious what this is all about, why you want to do this.

        Comment


        • #5
          Thanks a lot for your effort.
          Unfortunately, my variables are not numeric. My variable of interest is the ISIN from Datastream for German equities, so it begins with DE followed by 10 numbers, e.g. "DE0001262186".
          At the Moment I want to reshape the Datastream data (which is in wide Format, so that each ISIN occupies a new column) to fit better to the rest of my data which is in Long Format (date as the first column and then ISIN as the second column).
          In the datastream dataset my headers at the Moment for the columns are var1,var2.... and the first row of my dataset contains the ISIN numbers, whereas the following rows contain the stock data for each date. So my idea was to rename the variables from var1 to the respective ISIN. However, some ISINs occur 2 or more times in the data, so that the variables would not be unique anymore and Stata cannot rename them. Hence, my idea was to just add a consecutive number to the end of each ISIN, so that I can rename my variables to a unique ISIN in order to be able to reshape...
          Does that make sense?

          Comment


          • #6
            I want to reshape the Datastream data (which is in wide Format[...])
            It seems - reshape long - could fulfill your needs.
            Best regards,

            Marcos

            Comment


            • #7
              Providing an example (via -dataex- or just a copy / paste) would save more time (for you and those trying to help) than writing out your description of your data.

              Supposing you have data that look like:

              Code:
              clear
              set obs 3
              forval n = 1/7 {
                  tempvar a
                  g double `a' = int(1000 + runiform()*10000000)
                  format `a' %09.0f
                  tostring `a', replace u
                  **DE in row 1
                  g   var`n' = `"DE"'+ `a' in 1
                  **Date in row 2
                  replace var`n' = string(`=int(20900+runiform()*10)') in 2
                  **STock info in row 3
                  replace var`n' = string(`=runiform()*100') in 3
                  }
               
               **create issue where some isin's repeat:
                replace var1 = var2 in 1
              var1 var2 var3 var4 var5 var6 var7
              DE004088538 DE004088538 DE009789969 DE007795552 DE009664954 DE001120130 DE003404782
              20901 20904 20905 20905 20908 20901 20900
              93.6816 41.83935 91.27204 16.21693 91.33746 1.328232 74.69058
              with the DE var in the first row, date var in the second row and stock info in the third row, we can add the extra digit you are interested in (I prefixed it with an underscore for easy examination). This assumes they are sequential as you describe, if not your'll either have to sort the columns or modify the -forvalues- loop to seek beyond just `j'-1 (could iterate it to `j'-2...n):

              Code:
                **add extra digit
                  cap drop _*  
                  desc, sh  
                  di `"`r(k)' cols"'  
                 **you could also just specify the vars to loop over in order , or use a varlist defined as  var* ::
                forval j = 1/`r(k)' {
                  cap replace var`j' = var`j'+`"_`j'"' in 1 if var`j'[1]==var`=`j'-1'[1]
                  }
                  l var*, noobs clean
              var1 var2 var3 var4 var5 var6 var7
              DE004088538 DE004088538_2 DE009789969 DE007795552 DE009664954 DE001120130 DE003404782
              20901 20904 20905 20905 20908 20901 20900
              93.6816 41.83935 91.27204 16.21693 91.33746 1.328232 74.69058
              **now for your reshape to long format I'm using -sxpose- from SSC to transpose the string data:

              Code:
                  
              *xpose to reshape the data
              **could also use reshape long.
              ssc install sxpose, replace
              sxpose, clear destring
              format _var2 %td  
              l * , noobs clean
              _var1 _var2 _var3
              DE004088538 23mar2017 93.6816
              DE004088538_2 26mar2017 41.83935
              DE009789969 27mar2017 91.27204
              DE007795552 27mar2017 16.21693
              DE009664954 30mar2017 91.33746
              DE001120130 23mar2017 1.328232
              DE003404782 22mar2017 74.69058





              Last edited by eric_a_booth; 17 Jan 2017, 04:51.
              Eric A. Booth | Senior Director of Research | Far Harbor | Austin TX

              Comment


              • #8
                Thank you so much, you helped me a lot!

                Comment


                • #9
                  Eric's post did what you wanted, and even used some ancient code of this, so good; but I'll add that holding metadata within the dataset is profoundly nonStataish.

                  Comment

                  Working...
                  X