Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help with Stata Package

    Hi all,

    Currently working on a project where I am tasked to do up a Stata package. The package aims to standardise the length of a string variable by allowing the user to input the desired final length and where they want to add the dummy character to (either front or back of the observation).

    I would like to get some help on my current code (generated by ChatGPT - an instruction provided by my professor)

    * standardise.ado

    program define standardise, rclass
    version 17.0

    syntax varname [if] [in], Length(int) AddLocation(string)

    * Check if the specified variable exists
    confirm variable `varname'

    * Check if the specified variable is a string variable
    local vartype : type variable `varname'
    if "`vartype'" != "str" {
    di "Error: The specified variable is not a string variable."
    exit
    }

    * Check if AddLocation is either "front" or "back"
    if "`AddLocation'" != "front" & "`AddLocation'" != "back" {
    di "Error: AddLocation must be either 'front' or 'back'."
    exit
    }

    * Check if Length is specified
    if "`Length'" == "" {
    di "Error: Option Length(int) is required."
    exit
    }

    * Generate a new variable with the desired length
    local orig_len = length(`varname')
    local num_to_add = `Length' - orig_len
    local char_to_add = "0"
    local new_var = "`varname'_standardised"

    gen `new_var' = `varname'

    if "`AddLocation'" == "front" {
    replace `new_var' = "`char_to_add'" + substr(`new_var', 1, orig_len)
    }
    else {
    replace `new_var' = substr(`new_var', 1, orig_len) + "`char_to_add'"
    }

    * Add characters to reach the desired length
    forval i = 1/`num_to_add' {
    if "`AddLocation'" == "front" {
    replace `new_var' = "`char_to_add'" + `new_var'
    }
    else {
    replace `new_var' = `new_var' + "`char_to_add'"
    }
    }

    * Return the modified dataset
    return scalar newvar "`new_var'"
    end

    Thanks all!

  • #2
    I'm not going to clean up your chatbox-generated code for you, but to help get you started:
    Code:
    help syntax
    help string functions
    help macro

    Comment


    • #3
      As Joseph Coveney I think is hinting there are various errors and dubious choices here, as any programmer new to the language might make. What follows is opinionated. My prejudice runs that AI agents can waste your time, but there you go. I haven't tested everything and some later comments are guesses based on experience rather than direct test.

      Here is a command that works, in the sense that the following test script does what I want. It can naturally be improved.

      Code:
      program define standardise 
      version 17.0
      
      syntax varname(string) [if] [in], Length(int) ADDlocation(string) Generate(string)
      
      confirm new variable `generate'
      
      * Check if AddLocation is either "front" or "back"
      if "`addlocation'" == "front" {
          gen `generate' = (`length' - strlen(`varlist')) * "0" + `varlist'
      }  
      else if "`addlocation'" == "back" {
         gen `generate' = `varlist' + (`length' - strlen(`varlist')) * "0"  
      }
      else {
          di `"error: addlocation must be either "front" or "back""' 
          exit 198
      }
      
      end
      Code:
      clear
      input str5 test 
      "a"
      "bc"
      "def"
      "ghij"
      "klmno"
      end 
      
      standardise test, add(front) gen(testf) length(6)
      
      standardise test, add(back) gen(testb) length(7)
      
      standardise test, add(back) gen(testB) length(4)
      
      list 
      
      * should fail 
      standardise test, add(garbage) gen(testg) length(7)
      Output:

      Code:
           +----------------------------------+
           |  test    testf     testb   testB |
           |----------------------------------|
        1. |     a   00000a   a000000    a000 |
        2. |    bc   0000bc   bc00000    bc00 |
        3. |   def   000def   def0000    def0 |
        4. |  ghij   00ghij   ghij000    ghij |
        5. | klmno   0klmno   klmno00   klmno |
           +----------------------------------+
      Comments on your code:

      Code:
      program define standardise, rclass
      version 17.0
      
      syntax varname [if] [in], Length(int) AddLocation(string)
      0. You allow if and in here but do not implement them. That is a source of bugs.

      1. Stata's rules about capitalisation in syntax for options I think rule out camelcase option names like AddLocation()

      Code:
      * Check if the specified variable exists
      confirm variable `varname'
      2. This is unnecessary as syntax has already been given the job of checking that you specified a variable name.

      3. It would not work any way. Perhaps perversely, although syntax allows an argument varname, the result is returned in local macro varlist. So varname would be empty at this point and the check would always fail. The error is repeated later in the code, but Stata would never get there until this error is fixed.

      Code:
      * Check if the specified variable is a string variable
      local vartype : type variable `varname'
      if "`vartype'" != "str" {
      di "Error: The specified variable is not a string variable."
      exit
      }
      4. This isn't going to work for another reason too: str is never returned by that macro function. The result is going to be something like str6, str18 or strL. That can be fixed but it is easier to give syntax the job of insisting on a string variable.

      Code:
      * Check if AddLocation is either "front" or "back"
      if "`AddLocation'" != "front" & "`AddLocation'" != "back" {
      di "Error: AddLocation must be either 'front' or 'back'."
      exit
      }
      5. The intent makes sense, but it would be better to exit with a return code. As before only addlocation() will work here.

      Code:
      * Check if Length is specified
      if "`Length'" == "" {
      di "Error: Option Length(int) is required."
      exit
      }
      6. This is unnecessary as your syntax statement made length() a required option. See also #5.

      7. But it would not work as intended. The local macro Length is not defined. That is a misunderstanding of what is specified by capitalisation of option names in syntax.

      Code:
      * Generate a new variable with the desired length
      local orig_len = length(`varname')
      local num_to_add = `Length' - orig_len
      local char_to_add = "0"
      local new_var = "`varname'_standardised"
      8. The calculation of the length of the string variable will be based on its value in observation 1. This likely isn't obvious, but other way round, the local macro can't hold an entire variable, the lengths of the variable in each observation.

      9. Moreover, no allowance is made for the possibility that the length is itself not constant across observations. That may not arise in the intended application, but it is easy to accommodate.

      10. The character to add is wired into the program as "0". Given that, we don't need to put it in a local macro. Other way round, if it is the intention to allow other characters, and that "0" just be the default, excellent, but that needs to be an option.

      11. There is no check that the new variable name really is new and that it is within the limit on variable name length and otherwise legal. It's best to move such a check to early in the program and let the user specify a new name. Not everyone will want a lengthy suffix _standardised (which will trip up many users who prefer a different spelling, or otherwise).

      Code:
      gen `new_var' = `varname'
      
      if "`AddLocation'" == "front" {
      replace `new_var' = "`char_to_add'" + substr(`new_var', 1, orig_len)
      }
      else {
      replace `new_var' = substr(`new_var', 1, orig_len) + "`char_to_add'"
      }
      12. The substr() stuff is unnecessary as that's just the original variable, whose name is held in varlist.

      13. So here we added one "0" either at the front or at the back.

      Code:
      * Add characters to reach the desired length
      forval i = 1/`num_to_add' {
      if "`AddLocation'" == "front" {
      replace `new_var' = "`char_to_add'" + `new_var'
      }
      else {
      replace `new_var' = `new_var' + "`char_to_add'"
      }
      }
      14. But now we are adding several characters as needed. Won't that end up with 1 one more character than needed?

      15. We don't need a loop here anyway. See my alternative.

      Code:
      * Return the modified dataset
      16. The comment is needlessly unspecific. We are returning the name of the new variable.

      Code:
      return scalar newvar "`new_var'"
      17. OK. I wouldn't use a scalar here and if you specify the new name as an input, you don't need to return it, but those are choices.

      Code:
      end

      Comment


      • #4
        I am guessing too, but I would be surprised if Econ was your family name. We ask people to use real names here.

        Comment

        Working...
        X