Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to create a string variable whose unique value is the name of a file in the directory and add new observations interactively

    Dear all,

    I am working with string data and I would like to import multiple Excel files and append them, creating in each of them a string variable whose unique value is the name of the corresponding file (so that I can identify from which file each set of observations come).

    In particular, in the working directory I have 56 files named as follows: 'NAME1.xlsx', 'NAME2.xlsx', ..., 'NAME56.xlsx' where 'NAME?' represents any word with length <= 32.

    I would like the variable for file 'NAME1.xlsx' to have only the value 'NAME1' for all observations of the 'NAME1.dta' file (which I will save interactively in a loop).

    wanted_var var2 var3
    NAME1 etc ...
    NAME1 etc ...
    NAME1 etc ...

    The same would be done for the file 'NAME2.dta' and so forth for all the 56 files

    wanted_var var2 var3
    NAME2 etc ...
    NAME2 etc ...
    NAME2 etc ...

    In the final version of the appended dataset, I would also like to add an empty row (i.e., a new observation) after each set of observations that come from each file. [I tried to use something like 'insobs 1, after(`wanted_var[_n] != wanted_var[_n-1]')]

    The dataset ideally would be like
    wanted_var var2
    NAME1 etc
    NAME1 etc
    NAME1 etc
    (empty row) etc
    NAME2 etc
    NAME2 etc
    (empty row) etc
    NAME3 etc
    NAME3 etc
    NAME3 etc
    NAME3 etc
    NAME3 etc
    (empty row) etc
    NAME4 etc
    NAME4 etc
    (and so on) ....

    Can you help to find a solution for that?

    Thank you very much!


  • #2
    Dear all,

    Searching for a solution these days I came across the user-written package 'multimport' from Alvaro Carril (available at SSC).

    The command 'multimport' does exactly what I was looking for: it imports and appends all the datasets and creates a numeric variable with a unique value for each set of observations whose label is the name of the Excel file from which data was imported.

    Obs: the values of 'wanted_var' are like 1, 2, 3,..., 56 (and each value has a label). In this case, the label for value 1 is 'NAME1.xlsx', for value 2 is 'NAME2.xlsx' and so on.

    In particular, this is how I ran the command

    multimport excel , dir("C:/Users/Otavio/Downloads") force clear

    So the final result is like

    wanted_var var2
    NAME1.xlsx etc
    NAME1.xlsx etc
    NAME1.xlsx etc
    NAME2.xlsx etc
    NAME2.xlsx etc
    NAME3.xlsx etc
    NAME3.xlsx etc
    NAME3.xlsx etc
    NAME3.xlsx etc
    NAME3.xlsx etc
    NAME4.xlsx etc
    NAME4.xlsx etc
    (and so on) ....

    The only task that I couldn't manage to solve yet is to include a new observation after the variable 'wanted_var' changes it values.

    Any advice is greatly appreciated.

    Thanks!

    Comment


    • #3
      I found out recently the command 'ingap' (from SSC) that adds blank rows interactively: it is great!

      Comment

      Working...
      X