Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • My appended dataset displays labels from the first dataset

    Dear Statalisters,

    All my .dta files (one per country) are stored in the same folder. I am using this command to append them all:

    Code:
    cd "$input"
    local append: dir "." files "*.dta"
    
    *log using session1
    *precombine `append' // precombine is from SSC
    *log close
    
    append using `append', force
    But once I have my appended dataset, the variable that is supposed to denote the region of the surveyed individual gets label from the first dataset, i.e. the first country. Please have a look at my data:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int a1 float a2
    12 2
    12 2
    12 1
    12 1
    12 1
    12 1
    12 2
    12 1
    12 1
    12 1
    12 2
    12 1
    12 2
    12 1
    12 1
    12 1
    12 2
    12 2
    12 1
    12 1
    end
    label values a1 A1
    label values a2 a2
    label def a2 1 "Tirana", modify
    label def a2 2 "Durres and Shkoder", modify
    As you can see, a2 have labels from regions that are located in Albania, however 12 is the country code for a country that isn't Albania. I suspect this is due to the option "force", but if I do not write it, I get an error message saying that my variables do not have the same format across datasets. Do I have to manually format every single variable? What should I do to have an appended dataset that displays the appropriate label?


  • #2
    Julia, it would be helpful to display two data examples (from two countries, respectively) before appending. So that we would be able to understand the sources of the problems.

    Comment


    • #3
      Fei: You are right. Here are two data examples from Albania and Azerbaijan:

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input byte(a1 a2)
      44 2
      44 2
      44 2
      44 2
      44 2
      44 2
      44 2
      44 2
      44 2
      44 2
      44 2
      44 2
      44 2
      44 2
      44 2
      44 2
      44 2
      44 2
      44 2
      44 2
      44 2
      44 2
      44 2
      44 2
      44 2
      44 2
      44 2
      44 2
      44 2
      44 2
      44 2
      44 2
      44 2
      44 2
      44 2
      44 2
      44 2
      44 2
      44 2
      44 2
      44 4
      44 4
      44 4
      44 4
      44 4
      44 4
      44 4
      44 4
      44 4
      44 4
      end
      label values a1 A1
      label def A1 44 "Albania", modify
      label values a2 a2
      label def a2 2 "Durres and Shkoder", modify
      label def a2 4 "Elbasan and Korce", modify
      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input byte(a1 a2)
      65 1
      65 1
      65 1
      65 1
      65 1
      65 1
      65 1
      65 1
      65 1
      65 1
      65 1
      65 1
      65 1
      65 1
      65 1
      65 1
      65 1
      65 1
      65 1
      65 1
      65 1
      65 1
      65 1
      65 1
      65 1
      65 1
      65 1
      65 1
      65 1
      65 1
      65 1
      65 1
      65 1
      65 1
      65 1
      65 1
      65 1
      65 1
      65 1
      65 1
      65 1
      65 1
      65 1
      65 1
      65 1
      65 1
      65 1
      65 1
      65 1
      65 1
      end
      label values a1 A1
      label def A1 65 "Azerbaijan", modify
      label values a2 a2
      label def a2 1 "Baku & Apsheronski", modify
      In the appended dataset, here are the labels found for Azerbaijan:

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input int a1 float a2
      65 2
      65 1
      65 3
      65 2
      65 1
      65 3
      65 1
      65 4
      65 3
      65 4
      65 1
      65 2
      65 3
      65 2
      65 3
      65 2
      65 3
      65 1
      65 3
      65 3
      end
      label values a1 A1
      label def A1 65 "Azerbaijan", modify
      label values a2 a2
      label def a2 1 "Tirana", modify
      label def a2 2 "Durres and Shkoder", modify
      label def a2 3 "Fier and Vlore", modify
      label def a2 4 "Elbasan and Korce", modify
      As you can see these are the labels found in the Albanian dataset (which is the first one in the list of files).

      Comment


      • #4
        Julia, to my understanding, a value of "a2" does not uniquely identify a district. For example, there may be 1's in every country's data, but they refer to different districts. In that sense, you cannot append the datasets with "a2" being numeric. My solution would be transforming "a2" of every country into a string variable before appending, like

        Code:
        decode a2, generate(a2_str)
        drop a2

        Comment


        • #5
          Fei Wang gives good advice. Chances are, the numeric values in each dataset were created using encode. Because encode always assigns values 1, 2, ... to the sorted list of strings (areas in this case), the numeric values depend on which areas were observed in each dataset. The best way to combine the datasets is to decode the variables in all datasets, combine the datasets, and then let encode create a value label for the combined variable in the final dataset.

          Comment

          Working...
          X