Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • regexm not recognizing characters copy-based from browse window

    Hi Statalist,

    I have had an admittedly minor but frustrating issue happen in several different contexts, and I beginning to think I am missing something obvious.

    First, I import the data:

    Code:
    import delimited using "$root/EDULIT_DS_18092019122536422.csv", clear encoding("UTF-8")
    The dataset that looks like this:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str109 indicator str52 country int year float value
    "Government expenditure on pre-primary education, US$ (millions)"                                               "Afghanistan" 2013         .
    "Government expenditure on education, US$ (millions)"                                                           "Afghanistan" 2013  710.2747
    "Government expenditure on primary education, US$ (millions)"                                                   "Afghanistan" 2013  426.2217
    "Government expenditure on lower secondary education, US$ (millions)"                                           "Afghanistan" 2013   116.178
    "Government expenditure on upper secondary education, US$ (millions)"                                           "Afghanistan" 2013  66.92618
    "Government expenditure on secondary education, US$ (millions)"                                                 "Afghanistan" 2013 183.10417
    "Government expenditure on education not specified by level, US$ (millions)"                                    "Afghanistan" 2013         .
    "Government expenditure on secondary and post-secondary non-tertiary vocational education only, US$ (millions)" "Afghanistan" 2013  21.40801
    "Government expenditure on post-secondary non-tertiary education, US$ (millions)"                               "Afghanistan" 2013  17.44303
    "Government expenditure on tertiary education, US$ (millions)"                                                  "Afghanistan" 2013  83.50579
    "Government expenditure on education, US$ (millions)"                                                           "Afghanistan" 2014   756.962
    "Government expenditure on secondary education, US$ (millions)"                                                 "Afghanistan" 2014 192.20827
    "Government expenditure on education not specified by level, US$ (millions)"                                    "Afghanistan" 2014         .
    "Government expenditure on tertiary education, US$ (millions)"                                                  "Afghanistan" 2014  93.94871
    "Government expenditure on upper secondary education, US$ (millions)"                                           "Afghanistan" 2014   70.2538
    "Government expenditure on secondary and post-secondary non-tertiary vocational education only, US$ (millions)" "Afghanistan" 2014  27.67649
    "Government expenditure on primary education, US$ (millions)"                                                   "Afghanistan" 2014  447.4138
    "Government expenditure on post-secondary non-tertiary education, US$ (millions)"                               "Afghanistan" 2014  23.39132
    "Government expenditure on lower secondary education, US$ (millions)"                                           "Afghanistan" 2014 121.95446
    "Government expenditure on post-secondary non-tertiary education, US$ (millions)"                               "Afghanistan" 2015  19.40793
    "Government expenditure on primary education, US$ (millions)"                                                   "Afghanistan" 2015  367.4638
    "Government expenditure on upper secondary education, US$ (millions)"                                           "Afghanistan" 2015  57.69989
    "Government expenditure on education, US$ (millions)"                                                           "Afghanistan" 2015  648.1357
    "Government expenditure on secondary and post-secondary non-tertiary vocational education only, US$ (millions)" "Afghanistan" 2015  22.88935
    "Government expenditure on lower secondary education, US$ (millions)"                                           "Afghanistan" 2015 100.16199
    "Government expenditure on tertiary education, US$ (millions)"                                                  "Afghanistan" 2015  103.4022
    "Government expenditure on pre-primary education, US$ (millions)"                                               "Afghanistan" 2015         .
    "Government expenditure on education not specified by level, US$ (millions)"                                    "Afghanistan" 2015         .
    "Government expenditure on secondary education, US$ (millions)"                                                 "Afghanistan" 2015  157.8619
    "Government expenditure on secondary and post-secondary non-tertiary vocational education only, US$ (millions)" "Afghanistan" 2016  22.57008
    "Government expenditure on lower secondary education, US$ (millions)"                                           "Afghanistan" 2016 117.33885
    "Government expenditure on education not specified by level, US$ (millions)"                                    "Afghanistan" 2016         .
    "Government expenditure on secondary education, US$ (millions)"                                                 "Afghanistan" 2016  175.8388
    "Government expenditure on upper secondary education, US$ (millions)"                                           "Afghanistan" 2016  58.49995
    "Government expenditure on pre-primary education, US$ (millions)"                                               "Afghanistan" 2016         .
    "Government expenditure on primary education, US$ (millions)"                                                   "Afghanistan" 2016  348.7321
    "Government expenditure on education, US$ (millions)"                                                           "Afghanistan" 2016  805.3478
    "Government expenditure on post-secondary non-tertiary education, US$ (millions)"                               "Afghanistan" 2016  23.93759
    "Government expenditure on upper secondary education, US$ (millions)"                                           "Afghanistan" 2017  58.70909
    "Government expenditure on secondary and post-secondary non-tertiary vocational education only, US$ (millions)" "Afghanistan" 2017  13.97529
    "Government expenditure on secondary education, US$ (millions)"                                                 "Afghanistan" 2017 176.53076
    "Government expenditure on pre-primary education, US$ (millions)"                                               "Afghanistan" 2017         .
    "Government expenditure on post-secondary non-tertiary education, US$ (millions)"                               "Afghanistan" 2017  15.08166
    "Government expenditure on lower secondary education, US$ (millions)"                                           "Afghanistan" 2017 117.82167
    "Government expenditure on primary education, US$ (millions)"                                                   "Afghanistan" 2017  350.5486
    "Government expenditure on education, US$ (millions)"                                                           "Afghanistan" 2017  793.2651
    "Government expenditure on education not specified by level, US$ (millions)"                                    "Afghanistan" 2017         .
    "Government expenditure on secondary education, US$ (millions)"                                                 "Albania"     2013  89.67648
    "Government expenditure on education not specified by level, US$ (millions)"                                    "Albania"     2013   5.89411
    "Government expenditure on upper secondary education, US$ (millions)"                                           "Albania"     2013  27.32272
    "Government expenditure on tertiary education, US$ (millions)"                                                  "Albania"     2013  99.16112
    "Government expenditure on lower secondary education, US$ (millions)"                                           "Albania"     2013  62.35376
    "Government expenditure on education, US$ (millions)"                                                           "Albania"     2013  452.1908
    "Government expenditure on post-secondary non-tertiary education, US$ (millions)"                               "Albania"     2013         .
    "Government expenditure on education not specified by level, US$ (millions)"                                    "Albania"     2015   1.94695
    "Government expenditure on tertiary education, US$ (millions)"                                                  "Albania"     2015  82.82124
    "Government expenditure on upper secondary education, US$ (millions)"                                           "Albania"     2015  17.33905
    "Government expenditure on education, US$ (millions)"                                                           "Albania"     2015  391.4789
    "Government expenditure on lower secondary education, US$ (millions)"                                           "Albania"     2015  67.14867
    "Government expenditure on post-secondary non-tertiary education, US$ (millions)"                               "Albania"     2015         .
    "Government expenditure on primary education, US$ (millions)"                                                   "Albania"     2015 222.23058
    "Government expenditure on secondary education, US$ (millions)"                                                 "Albania"     2015  84.48763
    "Government expenditure on tertiary education, US$ (millions)"                                                  "Albania"     2016  89.75494
    "Government expenditure on education, US$ (millions)"                                                           "Albania"     2016   469.957
    "Government expenditure on secondary education, US$ (millions)"                                                 "Albania"     2016  119.2252
    "Government expenditure on education not specified by level, US$ (millions)"                                    "Albania"     2016   7.65934
    "Government expenditure on upper secondary education, US$ (millions)"                                           "Albania"     2016  32.99036
    "Government expenditure on post-secondary non-tertiary education, US$ (millions)"                               "Albania"     2016         .
    "Government expenditure on lower secondary education, US$ (millions)"                                           "Albania"     2016  86.23484
    "Government expenditure on post-secondary non-tertiary education, US$ (millions)"                               "Algeria"     2014         .
    "Government expenditure on education not specified by level, US$ (millions)"                                    "Andorra"     2013   7.85449
    "Government expenditure on lower secondary education, US$ (millions)"                                           "Andorra"     2013  23.04739
    "Government expenditure on primary education, US$ (millions)"                                                   "Andorra"     2013  25.71746
    "Government expenditure on secondary education, US$ (millions)"                                                 "Andorra"     2013   28.2005
    "Government expenditure on pre-primary education, US$ (millions)"                                               "Andorra"     2013   14.5636
    "Government expenditure on tertiary education, US$ (millions)"                                                  "Andorra"     2013   3.67144
    "Government expenditure on upper secondary education, US$ (millions)"                                           "Andorra"     2013    5.1531
    "Government expenditure on education, US$ (millions)"                                                           "Andorra"     2013  80.00748
    "Government expenditure on lower secondary education, US$ (millions)"                                           "Andorra"     2014  15.36607
    "Government expenditure on education, US$ (millions)"                                                           "Andorra"     2014  100.4487
    "Government expenditure on pre-primary education, US$ (millions)"                                               "Andorra"     2014  13.87391
    "Government expenditure on tertiary education, US$ (millions)"                                                  "Andorra"     2014    6.1028
    "Government expenditure on upper secondary education, US$ (millions)"                                           "Andorra"     2014    5.2323
    "Government expenditure on education not specified by level, US$ (millions)"                                    "Andorra"     2014  37.48375
    "Government expenditure on post-secondary non-tertiary education, US$ (millions)"                               "Andorra"     2014         .
    "Government expenditure on secondary education, US$ (millions)"                                                 "Andorra"     2014  20.59837
    "Government expenditure on primary education, US$ (millions)"                                                   "Andorra"     2014  22.38988
    "Government expenditure on pre-primary education, US$ (millions)"                                               "Andorra"     2015  12.05604
    "Government expenditure on upper secondary education, US$ (millions)"                                           "Andorra"     2015   7.05772
    "Government expenditure on tertiary education, US$ (millions)"                                                  "Andorra"     2015   5.00818
    "Government expenditure on post-secondary non-tertiary education, US$ (millions)"                               "Andorra"     2015    .09704
    "Government expenditure on primary education, US$ (millions)"                                                   "Andorra"     2015  20.19416
    "Government expenditure on education, US$ (millions)"                                                           "Andorra"     2015  91.47684
    "Government expenditure on education not specified by level, US$ (millions)"                                    "Andorra"     2015  31.33137
    "Government expenditure on lower secondary education, US$ (millions)"                                           "Andorra"     2015  15.73232
    "Government expenditure on secondary education, US$ (millions)"                                                 "Andorra"     2015  22.79003
    "Government expenditure on upper secondary education, US$ (millions)"                                           "Andorra"     2016   7.22343
    "Government expenditure on post-secondary non-tertiary education, US$ (millions)"                               "Andorra"     2016    .20803
    "Government expenditure on education, US$ (millions)"                                                           "Andorra"     2016  93.70532
    "Government expenditure on secondary education, US$ (millions)"                                                 "Andorra"     2016   22.4614
    end
    I copy-paste an observation from the indicator variable in the browse window, isolate the "US$ (millions)", attempt to assert, and receive the following error:

    Code:
    . assert regexm(indicator, "US$ (millions)")
    4,294 contradictions in 4,294 observations
    assertion is false
    r(9);
    Notice that I have contradictions for every single observation, meaning that the copy-pasted string is not recognized even once.

    I thought maybe this was a unicode issue, since I am importing the csv at the top using UTF-8 encoding, but I tried ustrregexm to no avail:

    Code:
    . assert ustrregexm(indicator, "US$ (millions)")
    4,294 contradictions in 4,294 observations
    assertion is false
    r(9);
    I am a little perplexed. How is this possible / what am I missing here? Is there a problem with the odd characters "$" or "(" in my string?

    Thanks,

    Julian

  • #2
    In this kind of issue stating your operating system (not Windows here, perhaps) is often crucial. It may be best to go straight to StataCorp technical services, who in turn may have several specific questions to ask.

    Comment


    • #3
      () has special meaning (used for grouping) in regular expressions.
      Code:
      assert regexm("US$ (millions)", "\(millions\)")

      Comment


      • #4
        Originally posted by Bjarte Aagnes View Post
        () has special meaning (used for grouping) in regular expressions.
        So has $; it is used to indicate the end of the expression. Unfortunately, $ is additionally used to reference global macros in Stata. We need

        Code:
        assert regexm(indicator, "US\\$ \(millions\)")
        Best
        Daniel

        Comment


        • #5
          Excellent points in #3 and #4 so why not turn to strpos()?

          Comment


          • #6
            Brilliant, thank you all. strpos() it is!
            Last edited by Julian Duggan; 20 Sep 2019, 15:36. Reason: all replaced both -- three people contributed :)

            Comment

            Working...
            X