Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Split string variables into different parts

    Hi,

    I am using Stata 16 and would need some help in splitting a string variable and destring the variables. Here is an example of my dataset:


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double layer_experiment_id str10 winning_direction float significance str48 confidence_interval double visitors_remaining
    7515963047 "increasing" .8525631 "[-0.011315283434926104,0.15331184328920291]"  110824
    7515963047 "increasing" .8347926 "[-0.013121378545019852,0.15630020535121444]"  121286
    7515963047 "increasing"  .817702 "[-0.014682448394555736,0.1571440523116106]"   129244
    7515963047 "increasing" .8186167 "[-0.014511827458561333,0.15741806315691703]"  128500
    7515963047 "increasing" .8239062 "[-0.014312635075250463,0.15741806315691703]"  125149
    7515963047 "increasing" .8313227 "[-0.013107209369136216,0.15742087721795878]"  121576
    7515963047 "increasing" .8364435 "[-0.012939726897998272,0.15742576069322942]"  120746
    7515963047 "increasing" .8494313 "[-0.011593382156786014,0.15742576069322942]"  113275
    7515963047 "increasing" .8379146 "[-0.012737186976698942,0.15742868899953863]"  119992
    7515963047 "increasing" .8482403 "[-0.008822315493478769,0.15742610435278426]"   95023
    7515963047 "increasing" .8453595 "[-0.009292098384872965,0.1574264480527468]"    98514
    7515963047 "increasing" .8363242 "[-0.009840139736476308,0.15742041939264778]"  102542
    7515963047 "increasing"    .8434 "[-0.009688547066134742,0.15411267947149532]"  101903
    7515963047 "increasing" .8398001 "[-0.012270255292740909,0.15323747342918248]"  117306
    7515963047 "increasing" .8422456 "[-0.012662633649594038,0.15678915440594493]"  117690
    7515963047 "increasing" .8371795 "[-0.013143825028654896,0.15741806315691703]"  120370
    7515963047 "increasing" .8347926 "[-0.012059379251766839,0.15741806315691703]"  114374
    7515963047 "increasing" .8219455 "[-0.010461243576041696,0.15742610435278426]"  104807
    7515963047 "increasing"   .99996 "[-0.13061092988680228,-0.053278674126343614]"      0
    7515963047 "increasing" .9999338 "[-0.13014782388304397,-0.05140273482650343]"       0
    7515963047 "increasing"  .999939 "[-0.13060880082251963,-0.051601518820200615]"      0
    7515963047 "increasing" .9999416 "[-0.13060880082251963,-0.05313239901277002]"       0
    7515963047 "increasing"   .99994 "[-0.13060880082251963,-0.053294837520368885]"      0
    7515963047 "increasing" .9999448 "[-0.13060880082251963,-0.05372885792210556]"       0
    7515963047 "increasing"  .999963 "[-0.13060880082251963,-0.05580802648197085]"       0
    7515963047 "increasing" .9999672 "[-0.13059864291782214,-0.05639872813792632]"       0
    7515963047 "increasing" .9999714 "[-0.13006975557508363,-0.057099399907118625]"      0
    7515963047 "increasing" .9999738 "[-0.13060880082251963,-0.057457865419031626]"      0
    7515963047 "increasing" .9999857 "[-0.13060880082251963,-0.060040134081082275]"      0
    7515963047 "increasing" .9999882 "[-0.13060880082251963,-0.06085642161526732]"       0
    7515963047 "increasing" .9999897 "[-0.13060880082251963,-0.06114470958389204]"       0
    7515963047 "increasing" .9999931 "[-0.13060880082251963,-0.0629306048966536]"        0
    7515963047 "increasing" .9999917 "[-0.13060880082251963,-0.06195356796856659]"       0
    7515963047 "increasing" .9999908 "[-0.13060880082251963,-0.061459980840425574]"      0
    7515963047 "increasing" .9999902 "[-0.13060880082251963,-0.06107432056494101]"       0
    7515963047 "increasing" .9999885 "[-0.13060880082251963,-0.06018022961705374]"       0
    7515963047 "increasing" .8525631 "[-0.02409417811530537,0.22052092285700242]"   152859
    7515963047 "increasing" .8347926 "[-0.027676702592605462,0.22052092285700242]"  166420
    7515963047 "increasing"  .817702 "[-0.025528966825073995,0.22052092285700242]"  172584
    7515963047 "increasing" .8186167 "[-0.02642408129880222,0.22052092285700242]"   176460
    7515963047 "increasing" .8239062 "[-0.02959104250314787,0.22052092285700242]"   173380
    7515963047 "increasing" .8313227 "[-0.029409129357849423,0.2205148766528585]"   168780
    7515963047 "increasing" .8364435 "[-0.025213250062753073,0.2205148766528585]"   154057
    7515963047 "increasing" .8494313 "[-0.02543933298287332,0.2205148766528585]"    155514
    7515963047 "increasing" .8379146 "[-0.026737729971561958,0.2205148766528585]"   160583
    7515963047 "increasing" .8482403 "[-0.02559119588624667,0.2205148766528585]"    156484
    7515963047 "increasing" .8453595 "[-0.026195430249354473,0.2205148766528585]"   158768
    7515963047 "increasing" .8363242 "[-0.025276336838065572,0.2205148766528585]"   165409
    7515963047 "increasing"    .8434 "[-0.02618777206100939,0.2205148766528585]"    160273
    7515963047 "increasing" .8398001 "[-0.02685770392630768,0.2205148766528585]"    162943
    7515963047 "increasing" .8422456 "[-0.02409417811530537,0.2205148766528585]"    152889
    7515963047 "increasing" .8371795 "[-0.026479178279915583,0.2205148766528585]"   162113
    7515963047 "increasing" .8347926 "[-0.027676702592605462,0.2205148766528585]"   166463
    7515963047 "increasing" .8219455 "[-0.030056285677219463,0.2205148766528585]"   174589
    end
    The variable "confidence_interval" is the string variable I want to split. I intend to generate two new variables: "upper_bound" and lower_bound" from the split. I am not sure what code should I use to split the string variables.

    Any help in this area would be appreciated. Thanks!


  • #2
    Check out -[D] split -- Split string variables into parts-.

    Comment


    • #3
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input str48 confidence_interval
      "[-0.011315283434926104,0.15331184328920291]"
      "[-0.013121378545019852,0.15630020535121444]"
      "[-0.014682448394555736,0.1571440523116106]" 
      "[-0.014511827458561333,0.15741806315691703]"
      "[-0.014312635075250463,0.15741806315691703]"
      end
      
      . split confidence_interval, parse(,) gen(b)
      variables created as string: 
      b1  b2
      
      . destring b?, replace ignore("[]")
      b1: character [ removed; replaced as double
      b2: character ] removed; replaced as double
      
      . rename (b?) (lower_bound upper_bound)
      
      . format *bound %20.18f
      
      . l *bound
      
           +----------------------------------------------+
           |           lower_bound            upper_bound |
           |----------------------------------------------|
        1. | -0.011315283434926104   0.153311843289202915 |
        2. | -0.013121378545019852   0.156300205351214444 |
        3. | -0.014682448394555736   0.157144052311610605 |
        4. | -0.014511827458561333   0.157418063156917032 |
        5. | -0.014312635075250463   0.157418063156917032 |
           |----------------------------------------------|
      Last edited by Nick Cox; 23 Mar 2021, 04:16.

      Comment


      • #4
        I posted this twice. Sorry about that.

        Comment

        Working...
        X