Generate a new ROW (NOT COLUMN) with consecutive numbers or add a consecutive number to each element in an existing row

Lara Schmidt

Join Date: Jan 2017

Posts: 7
#1

Generate a new ROW (NOT COLUMN) with consecutive numbers or add a consecutive number to each element in an existing row

16 Jan 2017, 14:50

Hello,
I want to create a new ROW (not column) in my dataset which contains numbers from 1,2,....to my last variable or add a consecutive number to each Observation in one row (e.g. row 1 for first column: add 1, row 1 for second column: add 2.....) . However, I can't figure out how to do it. I experimented a lot with "forvalues" but can't figure out how to do it.
I would be very thankful for help.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#2

16 Jan 2017, 15:18

If somebody else understands what you want, I hope he or she will respond to you directly. But I can't visualize at all what you want. I suggest you post a short example of the data that you have, work out by hand how the result you want would look for that data, and then post the way the data looks after your addition to it. It is often easier to illustrate what is wanted than to explain it in words.

When you do post example data, please be sure to use the -dataex- command to do so. You can install -dataex- in your Stata by running -ssc install datatex-. Read the help file (-help dataex-) for the simple instructions on how to use it. That way whatever you post can be quickly and easily imported to Stata with a simple copy/paste operation, enabling whoever helps you to test out their code before responding. And the Stata data set that they use will be a completely faithful replica of the Stata data you showed.
Comment
Lara Schmidt

Join Date: Jan 2017

Posts: 7
#3

16 Jan 2017, 15:37

Sorry for the confusion. As I don't have much time at the moment, I just quickly try to make clear what I am looking for without using datatex:
If the first row in my dataset is e.g. 14 18 133
then after the Transition I want it to be 141 182 1333
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30117

16 Jan 2017, 17:29

So for data like your example, this works:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int(vble_a vble_b vble_c vble_d vble_e vble_f vble_g vble_h vble_i vble_j vble_k vble_l vble_m vble_n vble_o vble_p vble_q vble_r vble_s vble_t)
 92  92  92  97 -86 101  96  79 -85  81  93 -84 105 -89  79  82 109 102  76 -86
-89  93 -88 -90  91 102 -90  81  97  82  99  83 105  91  84 -90  74  94  79 -87
 91  92  74 -89  83 -90 106  98  91 -89 -90 -88 101  75 -86 102  98  79 -88  93
 92 -88  91 -85  93 -85 101 106  93  99  98 -85  78  92  83 -90 100 -88  97 100
-86  97 -88 101 -89  95 114  95  82  93 -87  94  99 -89  92  79  82  77  92  82
end

//    CREATE A LIST OF ALL THE VARIABLES IN THE DATA SET
list, noobs clean
ds
local all_vars `r(varlist)'

local i = 1
foreach v of varlist `all_vars' {
    local shift = floor(log10(`i')) + 1
    replace `v' = sign(`v')*((10^`shift')*abs(`v') + `i') in 1
    local ++i
}

list, noobs clean

Notes:

1. The data set I created here deliberately contains more than 9 variables, so you can see that it appropriately appends two digit numbers after variable 9. In fact, this will work even if you have thousands of variables; the correct number of digits will be appended to the end of the variable.

2. This only works, and only makes sense to me, on the assumption that , as in this data set, all of the variables are numeric, and that the values in the first observation ("row") are integers.

3. The code handles negative numbers correctly.

That done, I have to say that I have been unable to imagine any circumstance in which this would be a useful thing to do. I'm really curious what this is all about, why you want to do this.

Comment

Lara Schmidt

Join Date: Jan 2017

Posts: 7
#5

17 Jan 2017, 03:30

Thanks a lot for your effort.
Unfortunately, my variables are not numeric. My variable of interest is the ISIN from Datastream for German equities, so it begins with DE followed by 10 numbers, e.g. "DE0001262186".
At the Moment I want to reshape the Datastream data (which is in wide Format, so that each ISIN occupies a new column) to fit better to the rest of my data which is in Long Format (date as the first column and then ISIN as the second column).
In the datastream dataset my headers at the Moment for the columns are var1,var2.... and the first row of my dataset contains the ISIN numbers, whereas the following rows contain the stock data for each date. So my idea was to rename the variables from var1 to the respective ISIN. However, some ISINs occur 2 or more times in the data, so that the variables would not be unique anymore and Stata cannot rename them. Hence, my idea was to just add a consecutive number to the end of each ISIN, so that I can rename my variables to a unique ISIN in order to be able to reshape...
Does that make sense?
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#6

17 Jan 2017, 04:47

I want to reshape the Datastream data (which is in wide Format[...])

It seems - reshape long - could fulfill your needs.

Best regards,

Marcos
Comment

eric_a_booth

Join Date: Apr 2014
Posts: 292

17 Jan 2017, 04:48

Providing an example (via -dataex- or just a copy / paste) would save more time (for you and those trying to help) than writing out your description of your data.

Supposing you have data that look like:

Code:

clear
set obs 3
forval n = 1/7 {
    tempvar a
    g double `a' = int(1000 + runiform()*10000000)
    format `a' %09.0f
    tostring `a', replace u
    **DE in row 1
    g   var`n' = `"DE"'+ `a' in 1
    **Date in row 2
    replace var`n' = string(`=int(20900+runiform()*10)') in 2
    **STock info in row 3
    replace var`n' = string(`=runiform()*100') in 3
    }
 
 **create issue where some isin's repeat:
  replace var1 = var2 in 1

var1	var2	var3	var4	var5	var6	var7
DE004088538	DE004088538	DE009789969	DE007795552	DE009664954	DE001120130	DE003404782
20901	20904	20905	20905	20908	20901	20900
93.6816	41.83935	91.27204	16.21693	91.33746	1.328232	74.69058

with the DE var in the first row, date var in the second row and stock info in the third row, we can add the extra digit you are interested in (I prefixed it with an underscore for easy examination). This assumes they are sequential as you describe, if not your'll either have to sort the columns or modify the -forvalues- loop to seek beyond just `j'-1 (could iterate it to `j'-2...n):

Code:

  **add extra digit
    cap drop _*  
    desc, sh  
    di `"`r(k)' cols"'  
   **you could also just specify the vars to loop over in order , or use a varlist defined as  var* ::
  forval j = 1/`r(k)' {
    cap replace var`j' = var`j'+`"_`j'"' in 1 if var`j'[1]==var`=`j'-1'[1]
    }
    l var*, noobs clean

var1	var2	var3	var4	var5	var6	var7
DE004088538	DE004088538_2	DE009789969	DE007795552	DE009664954	DE001120130	DE003404782
20901	20904	20905	20905	20908	20901	20900
93.6816	41.83935	91.27204	16.21693	91.33746	1.328232	74.69058

**now for your reshape to long format I'm using -sxpose- from SSC to transpose the string data:

Code:

    
*xpose to reshape the data
**could also use reshape long.
ssc install sxpose, replace
sxpose, clear destring
format _var2 %td  
l * , noobs clean

_var1	_var2	_var3
DE004088538	23mar2017	93.6816
DE004088538_2	26mar2017	41.83935
DE009789969	27mar2017	91.27204
DE007795552	27mar2017	16.21693
DE009664954	30mar2017	91.33746
DE001120130	23mar2017	1.328232
DE003404782	22mar2017	74.69058

Last edited by eric_a_booth; 17 Jan 2017, 04:51.

Eric A. Booth | Senior Director of Research | Far Harbor | Austin TX

Comment

Lara Schmidt

Join Date: Jan 2017

Posts: 7
#8

17 Jan 2017, 10:19

Thank you so much, you helped me a lot!
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35724
#9

17 Jan 2017, 10:52

Eric's post did what you wanted, and even used some ancient code of this, so good; but I'll add that holding metadata within the dataset is profoundly nonStataish.
Comment

Announcement