Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • rangejoin: was expecting a numeric variable

    Hi all,

    I'm using rangejoin for the first time and can't figure out what I'm doing wrong.

    The first dataset contains the key-variable age (and a number of other variables), like this:
    Casenr age
    1. 19
    2. 21
    3. 22
    4. 24
    5. 26
    6. 26
    7. 34
    8. 35
    9. 37

    The second dataset contains an age-range (and some other variables), like this:

    casenr age2min age2plus

    1. 17 21
    2. 18 22
    3. 21 25
    4. 22 26
    5. 24 28

    6. 25 29
    7. 30 34
    8. 33 37
    9. 34 38
    10. 42 46


    I use the command:
    rangejoin age age2min age2plus using "dataset2"

    Which gives me the following error:
    was expecting a numeric variable, a number, or a system missing value for the interval low: age2min
    r(198);


    However, all three variables age, age2min and age2plus have format %9.0g. They do have some missings, which have system missing value (.)
    If I drop all cases with missings, I get the same error. What could be wrong?
    Last edited by Annette Scherpenzeel; 19 Oct 2023, 16:09.

  • #2
    The message is, unfortunately, misleading. The problem is that you are using the data sets in the wrong order. Whenever you use -rangejoin-, the first variable in the range must be a variable in the using data set, and the second and third variables must be in the master data set. -rangjoin- will then pair all observations from the two data sets where the value of the first variable in the using data set falls in the range given by the second and third variables in the using data set.

    So what you need to do is:
    Code:
    use second_dataset, clear
    rangejoin age age2min age2plus using first_dataset
    Unlike -merge- and -joinby-, -rangejoin- does not treat the two data sets interchangeably. It matters very much which is master and which is using.

    As an aside, in the future, when showing example data, please use the -dataex- command to do so. If you are running version 18, 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Comment


    • #3
      Thanks Clyde. I just tried your suggestion, having the second dataset in memory and the first dataset as using. But then it gives the message invalid syntax. (for exactly the same command line as you gave).
      Last edited by Annette Scherpenzeel; 19 Oct 2023, 16:55.

      Comment


      • #4
        I cannot replicate your problem. Here is how the code runs successfully on my setup. I cannot help but think that you are doing something different from the commands I posted:
        Code:
        . clear
        
        . input Casenr age
        
                Casenr        age
          1. 1. 19
          2. 2. 21
          3. 3. 22
          4. 4. 24
          5. 5. 26
          6. 6. 26
          7. 7. 34
          8. 8. 35
          9. 9. 37
         10. end
        
        . save first_dataset, replace
        file first_dataset.dta saved
        
        .
        . clear
        
        . input casenr age2min age2plus
        
                casenr    age2min   age2plus
          1. 1. 17 21
          2. 2. 18 22
          3. 3. 21 25
          4. 4. 22 26
          5. 5. 24 28
          6. 6. 25 29
          7. 7. 30 34
          8. 8. 33 37
          9. 9. 34 38
         10. 10. 42 46
         11. end
        
        . save second_dataset, replace
        file second_dataset.dta saved
        
        .
        . use second_dataset, clear
        
        . rangejoin age age2min age2plus using first_dataset
          (using rangestat version 1.1.1)
        
        .
        . list, noobs clean
        
            casenr   age2min   age2plus   Casenr   age  
                 1        17         21        1    19  
                 1        17         21        2    21  
                 2        18         22        1    19  
                 2        18         22        2    21  
                 2        18         22        3    22  
                 3        21         25        2    21  
                 3        21         25        3    22  
                 3        21         25        4    24  
                 4        22         26        3    22  
                 4        22         26        4    24  
                 4        22         26        5    26  
                 4        22         26        6    26  
                 5        24         28        4    24  
                 5        24         28        5    26  
                 5        24         28        6    26  
                 6        25         29        5    26  
                 6        25         29        6    26  
                 7        30         34        7    34  
                 8        33         37        7    34  
                 8        33         37        8    35  
                 8        33         37        9    37  
                 9        34         38        7    34  
                 9        34         38        8    35  
                 9        34         38        9    37  
                10        42         46        .     .

        Comment


        • #5
          This is my exact code:

          clear
          use "${tussen_data}\werkbestand4.dta"
          *Define age range
          gen age2min=age-2
          gen age2plus=age+2
          rangejoin age age2min age2plus using "${tussen_data}\werkbestand6.dta"

          (the variable age is in both datasets. Purpose was to match it less strictly, as there might be imprecision in the age given)
          This is the dataex for the dataset "${tussen_data}\werkbestand4.dta":

          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input byte age
          19
          20
          23
          24
          26
          27
          32
          35
          36
          44
          45
          49
          57
          20
          15
          37
          26
          32
          33
          30
          33
          31
          30
          26
          41
          46
          24
          46
          37
          19
          43
          29
          29
          36
          49
          32
          39
          38
          17
          21
          42
          33
          30
          36
          26
          23
          34
          28
          32
          27
          40
          28
          59
          21
          17
          40
          35
          23
          31
          33
          40
          29
          54
          44
          34
          38
          37
          42
          43
          0
          43
          30
          30
          22
          17
          45
          34
          21
          19
          20
          17
          27
          31
          28
          16
          21
          21
          32
          23
          49
          45
          30
          15
          32
          24
          39
          45
          18
          24
          51
          end
          [/CODE]
          ------------------ copy up to and including the previous line ------------------

          Listed 100 out of 615 observations

          And for the dataset "${tussen_data}\werkbestand6.dta"
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input float age
          19
          20
          23
          24
          26
          27
          32
          35
          36
          44
          45
          49
          57
          17
          20
          27
          25
          23
          56
          54
          28
          41
          35
          18
          15
          37
          54
          27
          58
          37
          59
          59
          24
          36
          46
          37
          27
          28
          45
          20
          0
          46
          18
          29
          26
          41
          16
          22
          26
          30
          36
          25
          52
          26
          37
          38
          52
          32
          33
          27
          22
          39
          20
          35
          30
          25
          38
          29
          35
          33
          33
          41
          22
          34
          17
          28
          45
          31
          19
          30
          26
          37
          18
          16
          33
          26
          32
          36
          61
          32
          21
          41
          47
          41
          46
          24
          46
          38
          39
          41
          end
          [/CODE]
          ------------------ copy up to and including the previous line ------------------

          Listed 100 out of 615 observations

          Comment


          • #6
            I don't know what to tell you. In the code and output below the only modification I have made to your code is that I have simplified the path to the directories, as I have no information about what $tussen_data is and would not have such a directory in my file system anyway. It runs with no error messages.
            Code:
            . * Example generated by -dataex-. To install: ssc install dataex
            . clear
            
            . input byte age
            
                      age
              1. 19
              2. 20
              3. 23
              4. 24
              5. 26
              6. 27
              7. 32
              8. 35
              9. 36
             10. 44
             11. 45
             12. 49
             13. 57
             14. 20
             15. 15
             16. 37
             17. 26
             18. 32
             19. 33
             20. 30
             21. 33
             22. 31
             23. 30
             24. 26
             25. 41
             26. 46
             27. 24
             28. 46
             29. 37
             30. 19
             31. 43
             32. 29
             33. 29
             34. 36
             35. 49
             36. 32
             37. 39
             38. 38
             39. 17
             40. 21
             41. 42
             42. 33
             43. 30
             44. 36
             45. 26
             46. 23
             47. 34
             48. 28
             49. 32
             50. 27
             51. 40
             52. 28
             53. 59
             54. 21
             55. 17
             56. 40
             57. 35
             58. 23
             59. 31
             60. 33
             61. 40
             62. 29
             63. 54
             64. 44
             65. 34
             66. 38
             67. 37
             68. 42
             69. 43
             70. 0
             71. 43
             72. 30
             73. 30
             74. 22
             75. 17
             76. 45
             77. 34
             78. 21
             79. 19
             80. 20
             81. 17
             82. 27
             83. 31
             84. 28
             85. 16
             86. 21
             87. 21
             88. 32
             89. 23
             90. 49
             91. 45
             92. 30
             93. 15
             94. 32
             95. 24
             96. 39
             97. 45
             98. 18
             99. 24
            100. 51
            101. end
            
            . save werkbestand4, replace
            file werkbestand4.dta saved
            
            .
            . * Example generated by -dataex-. To install: ssc install dataex
            . clear
            
            . input float age
            
                       age
              1. 19
              2. 20
              3. 23
              4. 24
              5. 26
              6. 27
              7. 32
              8. 35
              9. 36
             10. 44
             11. 45
             12. 49
             13. 57
             14. 17
             15. 20
             16. 27
             17. 25
             18. 23
             19. 56
             20. 54
             21. 28
             22. 41
             23. 35
             24. 18
             25. 15
             26. 37
             27. 54
             28. 27
             29. 58
             30. 37
             31. 59
             32. 59
             33. 24
             34. 36
             35. 46
             36. 37
             37. 27
             38. 28
             39. 45
             40. 20
             41. 0
             42. 46
             43. 18
             44. 29
             45. 26
             46. 41
             47. 16
             48. 22
             49. 26
             50. 30
             51. 36
             52. 25
             53. 52
             54. 26
             55. 37
             56. 38
             57. 52
             58. 32
             59. 33
             60. 27
             61. 22
             62. 39
             63. 20
             64. 35
             65. 30
             66. 25
             67. 38
             68. 29
             69. 35
             70. 33
             71. 33
             72. 41
             73. 22
             74. 34
             75. 17
             76. 28
             77. 45
             78. 31
             79. 19
             80. 30
             81. 26
             82. 37
             83. 18
             84. 16
             85. 33
             86. 26
             87. 32
             88. 36
             89. 61
             90. 32
             91. 21
             92. 41
             93. 47
             94. 41
             95. 46
             96. 24
             97. 46
             98. 38
             99. 39
            100. 41
            101. end
            
            . save werkbestand6, replace
            file werkbestand6.dta saved
            
            .
            . use "werkbestand4.dta"
            
            . *Define age range
            . gen age2min=age-2
            
            . gen age2plus=age+2
            
            . rangejoin age age2min age2plus using "werkbestand6.dta"
              (using rangestat version 1.1.1)
            Finally, since you have shown that age2min and age2plus are just the value of age with 2 subtracted or added, there is a slightly simpler syntax you can use:
            Code:
            . use werkbestand4.dta
            
            . rangejoin age -2 2 using "werkbestand6.dta"
              (using rangestat version 1.1.1)
            I don't expect this will change your error message problem, as I can see no reason for the error message you are getting in the first place. But, who knows? It might work where the other didn't.

            Sorry I can't be more helpful.

            Added: given that there is nothing apparently wrong with the data or the code, there are some standard things you can try. Do them in this order, and try the code again after each one until it either runs properly or you have exhausted them all. (They are listed in order of increasing burdensomeness.)
            1. Exit Stata, then relaunch Stata and try again.
            2. Update Stata. (If it says you are already updated, do a forced update.)
            3. If you originally acquired the code by copy/pasting from the Forum, delete that code and replace it by hand-typing the code in, being careful to get every character correct. (Code in the Forum editor is sometimes contaminated with non-printing characters that humans cannot see but Stata can, and they can wreak havoc when you try to run the code.)
            4. Uninstall -rangestat- and -rangejoin- and then reinstall them.
            5. Uninstall Stata, re-install it and immediately do an update.
            Last edited by Clyde Schechter; 19 Oct 2023, 19:18.

            Comment


            • #7
              Hi Clyde, thanks for your testing and suggestions. The ${tussen_data} just refers to a global defined as the (long) path where the datafiles are stored. It works fine in the rest of my syntax.

              I tried the suggestions 1,3 and 4. I cannot update or uninstall Stata, since it is an institute license and I don't have administrator rights. We have Stata 16.1 by the way.
              Nothing worked. This is what I got by using your suggestion to use -2 2 instead of the age2min and age2plus (typing it, not copying it):

              . clear

              . use "${tussen_data}\werkbestand4.dta"

              . rangejoin age -2 2 using "${tussen_data}\werkbestand6.dta"
              (using rangestat version 1.1.1)
              invalid syntax
              r(198);

              end of do-file

              r(198);

              I ran my original syntax again also, in which the range was in the wrong dataset (the using file), just as a test. Because Stata did not see that syntax as invalid but complained about the expected numeric values. I got this again:

              . clear

              . use "${tussen_data}\werkbestand6.dta"

              . *define age range
              . gen age2min=age-2
              (1 missing value generated)

              . gen age2plus=age+2
              (1 missing value generated)

              . save "${tussen_data}\werkbestand6.dta", replace
              file P:\RAM\RAM-059 Monitor gezondheidseffecten\03. Dataverwerking\Gepseudonimiseerd_vanafapril2023\we rkbestand6.dta saved

              . clear

              . use "${tussen_data}\werkbestand4.dta"

              . rangejoin age age2min age2plus using "${tussen_data}\werkbestand6.dta"
              (using rangestat version 1.1.1)
              was expecting a numeric variable, a number, or a system missing value for the interval low: age2min
              r(198);

              end of do-file

              r(198);


              Now I give up. I will program the range merge in an old-fashioned way, I saw some examples of that too in the forum. Thanks for your help and thinking!

              Best,
              Annette
              Last edited by Annette Scherpenzeel; 20 Oct 2023, 12:58.

              Comment


              • #8
                Thanks for the follow-up. I'm sorry I couldn't resolve your problem. If you ever do somehow find out what is causing this, it would be nice if you re-open this thread and post it so we can all learn.

                Comment


                • #9
                  Here's something weird from the output,
                  Code:
                  . save "${tussen_data}\werkbestand6.dta", replace
                  file P:\RAM\RAM-059 Monitor gezondheidseffecten\03. Dataverwerking\Gepseudonimiseerd_vanafapril2023\we rkbestand6.dta saved
                  Notice the whitespace in we rkbestand6.dta saved. Maybe it's just a mistake druing copy/paste, but makes me wonder what happens if you run the command in the folder, like:

                  Code:
                  clear
                  cd  "${tussen_data}\"
                  use werkbestand4.dta
                  rangejoin age -2 2 using werkbestand6.dta

                  Or save werkbestand4.dta and werkbestand6.dta to you local C:/drive, first create a folder called test on your c:/ drive then:

                  Code:
                  clear
                  use "${tussen_data}\werkbestand4.dta"
                  save c:/test/werkbestand4.dta
                  clear
                  use "${tussen_data}\werkbestand6.dta"
                  save c:/test/werkbestand6.dta
                  
                  clear
                  cd c:/test/
                  use werkbestand4.dta
                  rangejoin age -2 2 using werkbestand6.dta
                  Last edited by Hua Peng (StataCorp); 20 Oct 2023, 13:23.

                  Comment


                  • #10
                    Thanks Hua Peng. It was indeed something that must have happened during copy-paste, in the actual output the space is not present. To be sure, I tried your first suggestion (changing to the data folder), and it did not help. I cannot try the second suggestion as we are not supposed to and cannot transfer data from our secure data server to the local C:/drive

                    Comment


                    • #11
                      Meanwhile, I programmed it without using rangejoin, using a bit of a detour but it works.

                      Comment

                      Working...
                      X