Hi,
I am new to Stata, and would appreciate some guidance on the following task:
I have a dataset of 5,000 series with 2 variables (var1 and var2). There are 4,000 observations in both var1 and var2. Each row for var1 and var2 are unique pairs.
I need to look for the string position of both var1 and var2 against each series. Then, I need to calculate the difference in the character position between var1 and var2 in each series.
Below is the code I am using to get to the sample data output as shown in the below table:
Code:
As you can see, I am using the foreach function to look for var1 and var2 against each of the series, but the problem with the foreach function is that it is only looping over unique values for var1 and var2 (e.g. var1 "211000" appears twice, however only one column "var1211000" is generated).
This prevents me from running the rest of the commands to calculate the difference in the character position between var1 and var2 in each of the series.
Question:
How do I update the foreach function to loop over all var1 and var2 observations (including those unique and repeated observations)?
If foreach is not the correct command to be used here, what is the correct command to use?
I was also thinking to concatenate var1 and var2, so that each observation is unique. Then, use the strpos function to look for the character position but I can only figure out how to look for the position for the whole string in each series (e.g. the 1st observation in var1_var2 "325620326190"). I do not know how to look for the position of the 1st 6 characters of var1_var2 (e.g. "325620326190" ), and then the position of the last 6 characters (e.g. "325620326190" ). Please advise.
These questions are really challenging for me, given that I am quite new to Stata, so it would be much appreciated if someone can correct my code.
Thanks a lot.
I am new to Stata, and would appreciate some guidance on the following task:
I have a dataset of 5,000 series with 2 variables (var1 and var2). There are 4,000 observations in both var1 and var2. Each row for var1 and var2 are unique pairs.
I need to look for the string position of both var1 and var2 against each series. Then, I need to calculate the difference in the character position between var1 and var2 in each series.
Below is the code I am using to get to the sample data output as shown in the below table:
Code:
levelsof var1, local(var1list) foreach i of local var1list{ generate var1`i'=0 replace var1`i'= strpos(series,"`i'") } levelsof var2, local(var2list) foreach i of local var2list{ generate var2`i'=0 replace var2`i'= strpos(series, "`i'") } order series var1 var2 var1325620 var1221100 var14A0000 var1327992 var1518200 var1211000 var1211000 var1327992 var2326190 var252A000 var2519130 var2326210 var2541512 var2333130 var2F01000 var2314900 local list1 "a1 a2 a3 a4 a5 a6 a7 a8" local list2 "b1 b2 b3 b4 b5 b6 b7 b8" local n : word count `list1' forvalues i = 1/`n' { local var1list : word `i' of `list1' local var2list : word `i' of `list2' generate diff`i' = `var1list' - `var2list' if `var1list' >0 & `var2list' >0 } |
As you can see, I am using the foreach function to look for var1 and var2 against each of the series, but the problem with the foreach function is that it is only looping over unique values for var1 and var2 (e.g. var1 "211000" appears twice, however only one column "var1211000" is generated).
This prevents me from running the rest of the commands to calculate the difference in the character position between var1 and var2 in each of the series.
Question:
How do I update the foreach function to loop over all var1 and var2 observations (including those unique and repeated observations)?
If foreach is not the correct command to be used here, what is the correct command to use?
I was also thinking to concatenate var1 and var2, so that each observation is unique. Then, use the strpos function to look for the character position but I can only figure out how to look for the position for the whole string in each series (e.g. the 1st observation in var1_var2 "325620326190"). I do not know how to look for the position of the 1st 6 characters of var1_var2 (e.g. "325620326190" ), and then the position of the last 6 characters (e.g. "325620326190" ). Please advise.
These questions are really challenging for me, given that I am quite new to Stata, so it would be much appreciated if someone can correct my code.
Thanks a lot.
series | var1 | var2 | var1_var2 | var1211000 | var1221100 | var1325620 | var1327992 | var14A0000 | var1518200 | var2314900 | var2326190 | var2326210 | var2333130 | var2519130 | var252A000 | var2541512 | var2F01000 |
322210322120322299322110 | 325620 | 326190 | 325620326190 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
322210322120113000115000325310327992 | 221100 | 52A000 | 22110052A000 | 0 | 0 | 0 | 31 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
322210322130113000115000325310327992 | 4A0000 | 519130 | 4A0000519130 | 0 | 0 | 0 | 31 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
322210322120325180334513 | 327992 | 326210 | 327992326210 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
3222103221201130001150003253102123A0 | 518200 | 541512 | 518200541512 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
3222103221301130001150003253102123A0 | 211000 | 333130 | 211000333130 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
F010004A0000531ORE22110021100033120033111033141921 2230 | 211000 | F01000 | 211000F01000 | 25 | 19 | 0 | 0 | 7 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
F010004A0000531ORE23030133341533142033141133141921 2230 | 327992 | 314900 | 327992314900 | 0 | 0 | 0 | 0 | 7 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
F010004A0000531ORE54130033592033142033141133141921 2230 | 0 | 0 | 0 | 0 | 7 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | |||
F010004A0000531ORE22110033361133211A33149033141921 2230 | 0 | 19 | 0 | 0 | 7 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | |||
F010004A0000531ORE22110021100033120033111048200033 1510331419212230 | 25 | 19 | 0 | 0 | 7 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | |||
F0100022110021100033120033111048200033151033141921 2230 | 13 | 7 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | |||
F0100022120021100033120033111048200033151033141921 2230 | 13 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | |||
F0100048100032411021100033120033111048200033151033 1419212230 | 19 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | |||
F0100048400032411021100033120033111048200033151033 1419212230 | 19 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | |||
F0100032411021100033120033111048200033151033141921 2230 | 13 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | |||
F0100033611132621033120033111048200033151033141921 2230 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 13 | 0 | 0 | 0 | 0 | 1 | |||
F0100033611232621033120033111048200033151033141921 2230 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 13 | 0 | 0 | 0 | 0 | 1 | |||
F0100048400032621033120033111048200033151033141921 2230 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 13 | 0 | 0 | 0 | 0 | 1 | |||
F0100033611233260033120033111048200033151033141921 2230 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | |||
F0100033790033260033120033111048200033151033141921 2230 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | |||
F0100031161A1121A033260033120033111048200033151033 1419212230 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | |||
F010004A0000531ORE23030133232033111048200033151033 1419212230 | 0 | 0 | 0 | 0 | 7 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | |||
F01000S0020323030133232033111048200033151033141921 2230 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | |||
F0100033611133639033637033111048200033151033141921 2230 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | |||
F0100033611233639033637033111048200033151033141921 2230 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | |||
F0100033611133621133612033361833632033441A33141921 2230 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | |||
F0100033611233621133612033361833632033441A33141921 2230 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | |||
F0100033611132621031490031330031320031310011190011 5000325310327992 | 0 | 0 | 0 | 61 | 0 | 0 | 19 | 0 | 13 | 0 | 0 | 0 | 0 | 1 | |||
F0100033611232621031490031330031320031310011190011 5000325310327992 | 0 | 0 | 0 | 61 | 0 | 0 | 19 | 0 | 13 | 0 | 0 | 0 | 0 | 1 | |||
F0100048400032621031490031330031320031310011190011 5000325310327992 | 0 | 0 | 0 | 61 | 0 | 0 | 19 | 0 | 13 | 0 | 0 | 0 | 0 | 1 | |||
F0100032621031490031330031320031310011190011500032 5310327992 | 0 | 0 | 0 | 55 | 0 | 0 | 13 | 0 | 7 | 0 | 0 | 0 | 0 | 1 | |||
F0100042000031490031330031320031310011190011500032 5310327992 | 0 | 0 | 0 | 55 | 0 | 0 | 13 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | |||
F0100045200031490031330031320031310011190011500032 5310327992 | 0 | 0 | 0 | 55 | 0 | 0 | 13 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | |||
F0100031490031330031320031310011190011500032531032 7992 | 0 | 0 | 0 | 49 | 0 | 0 | 7 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | |||
F0100045200031500031330031320031310011190011500032 5310327992 | 0 | 0 | 0 | 55 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
Comment