Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • removing text from string variable with date format

    Hello, I have a string variable that looks like this: 13-03-2014(Late visited). How can I remove some kind of those texts from this variable and then convert this variable to date type. Thanks in advance.

  • #2
    I don't think anybody can help you with this; too much is left to the imagination. What "kind of those texts" do you want to remove from the variable? And how is the variable actually represented in Stata? Never mind what it looks like to your eyes. That's irrelevant--the actual Stata internal data is what matters.

    To get useful advice, please post back with example data using the -dataex- command. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Also explain exactly what you want to do with the data--showing what you want the results to look like may be the most effective way.

    Comment


    • #3
      Assuming the date part of the string is in a consistent format across observations (dd-mm-yyyy), this should do the trick:

      Code:
      gen date = date(ustrregexs(0),"DMY") if ustrregexm(var1,"\d{2}-\d{2}-\d{4}")
      Otherwise, please post back following the advice in #2.

      Comment


      • #4
        Let me make a simplifying assumption here. If the date always appears at the beginning of the text, and always has the day followed by the month followed by the year, with optional text afterwards, then the following example demonstrates useful technique.
        Code:
        . generate date = daily(datestr,"DMY#")
        
        . format %td date
        
        . list, noobs
        
          +--------------------------------------+
          |                  datestr        date |
          |--------------------------------------|
          | 13-03-2014(Late visited)   13mar2014 |
          +--------------------------------------+
        But to be honest, this string looks like it was taken from a free-form data entry field, where the date and comments were entered in whatever way the person collecting the data chose to enter the information. That is why it is so important that you show real data examples that illustrate the variability in your data.

        Comment


        • #5
          Perhaps this example will start you in a useful direction.
          Code:
          capture program drop myreg
          program define myreg
              local name `1'
              macro shift
              regress `*'
              estimates store `name'
          end
          
          sysuse auto, clear
          myreg m1 price weight
          myreg m2 price weight length
          estimates dir
          Code:
          . myreg m1 price weight
          
                Source |       SS           df       MS      Number of obs   =        74
          -------------+----------------------------------   F(1, 72)        =     29.42
                 Model |   184233937         1   184233937   Prob > F        =    0.0000
              Residual |   450831459        72  6261548.04   R-squared       =    0.2901
          -------------+----------------------------------   Adj R-squared   =    0.2802
                 Total |   635065396        73  8699525.97   Root MSE        =    2502.3
          
          ------------------------------------------------------------------------------
                 price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
          -------------+----------------------------------------------------------------
                weight |   2.044063   .3768341     5.42   0.000     1.292857    2.795268
                 _cons |  -6.707353    1174.43    -0.01   0.995     -2347.89    2334.475
          ------------------------------------------------------------------------------
          
          . myreg m2 price weight length
          
                Source |       SS           df       MS      Number of obs   =        74
          -------------+----------------------------------   F(2, 71)        =     18.91
                 Model |   220725280         2   110362640   Prob > F        =    0.0000
              Residual |   414340116        71  5835776.28   R-squared       =    0.3476
          -------------+----------------------------------   Adj R-squared   =    0.3292
                 Total |   635065396        73  8699525.97   Root MSE        =    2415.7
          
          ------------------------------------------------------------------------------
                 price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
          -------------+----------------------------------------------------------------
                weight |   4.699065   1.122339     4.19   0.000     2.461184    6.936946
                length |  -97.96031    39.1746    -2.50   0.015    -176.0722   -19.84838
                 _cons |   10386.54   4308.159     2.41   0.019     1796.316    18976.76
          ------------------------------------------------------------------------------
          
          . estimates dir
          
          ----------------------------------------------------------------
                       |           Dependent  Number of        
                  Name | Command    variable     param.  Title 
          -------------+--------------------------------------------------
                    m1 | regress       price          2  Linear regression
                    m2 | regress       price          3  Linear regression
          ----------------------------------------------------------------
          The Stata Programming Reference Manual PDF included in your Stata installation and accessible through Stata's Help menu is your guide to learning about writing programs in Stata.

          Comment


          • #6
            @Clyde Schechter Thank you so much for your advice for how to post problem in the forum to get what I want from the data that I want to do with, Thanks Ali Atia and William Lisowski for your helps and It's nicely work. Thanks all you guys!

            Best regards,
            Sophon


            Comment


            • #7
              Post #5 above belonged in a different topic, I do not know how it came to be posted here. My apologies.

              Comment


              • #8
                Dear Statalisters,
                My apologies. I just have the same query as post#1 on removing the date from the string variables. Below is the example. I want to remove "15 january 2016" and " 2014-07-03" from these string variables. Any help is highly appreciated.
                Thanks

                input str109 doser
                "One tablet 3 times daily. Next dose 15 januari 2016."
                "0.5 -1 tablet when needed for pain. Max 2 tab per day, till 2014-07-03"

                Comment

                Working...
                X