Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Issue with merge, variable already defined, variable exists three times

    Dear Statalist,

    I am trying to merge two datasets. More precisely I try to merge data from the European Social Survey (ESS) with some other data. However, I always get the error "variable *** already defined", where *** is one of the variables in the ESS dataset. This does not just affect one variable but multiple. First I got this error for the variable "dcsfwrk", so I just dropped that variable. Then I got the error for the variable "wrywprb", so I dropped that variable. Then I got the error for the variable "trdawrk". You get the idea.

    The issue is, none of these variables is in the data I try to merge the ESS data with. In the other data are only 12 variables, so it is easy to keep an overview and I am very sure that none of the problematic variables is in there.

    So I further investigated this issue and discovered that all of the problematic variables are three times in the ESS data. In the ESS data seem to be multiple variables with the exact same name and exactly the same content. However, these extra variables only show up if I search for the variable name using the search bar on the right (see the attached screenshot). The variables for example do not show up when I use Data Editor (Browse) to look at the data. Then the variable only shows up once. Other commands like for example tab also work normally with these variables. So
    Code:
     tab dcsfwrk
    shows me the expected result.

    This is the first time I have seen something like that and I am not sure what to do. As I said, this same error affects multiple variables. I also downloaded a fresh dataset from the ESS website and I have exactly the same issue again without modifying the data in any way. The ESS data I am working with can be downloaded here: https://www.europeansocialsurvey.org/downloadwizard/

    Another strange thing is that I have to drop these variables three times in order to get rid of them. So I have to execute the code
    Code:
    drop dcsfwrk
    three times to drop this variable completely. Each drop gets rid of one of the instances of dcsfwrk.

    Here is also my merge code, although I do not believe that this is the issue.
    Code:
    use "other_data", clear
    merge 1:m year cntry using "ess_data_positions.dta"
    This is my first post here so please let me know if I should format my post in a different way or if you need more information. Thank you so much for your help.

    Best wishes,
    Jasper

    Click image for larger version

Name:	dcsfwrk_times_three.png
Views:	1
Size:	11.2 KB
ID:	1635874

    Last edited by Jasper Jansen; 10 Nov 2021, 09:06.

  • #2
    The dataset you downloaded seems to be misconstructed - as if someone created a Stata dataset without actually using Stata. Because Stata does not allow the user to create multiple variables with the same name. I am surprised that when you use the dataset you don't get an error at that time.

    Have you tried
    Code:
    use dataset
    save newdataset, replace
    clear all
    use newdataset
    and then seen if the problem persists in the new copy of the data? I'm thinking perhaps that whatever has Stata confused may be resolved having Stata create a copy, especially since it didn't detect the problem when you read it in with the use command.

    Comment


    • #3
      Thank you for the reply and please excuse my late reply.
      I was also thinking that the data must be misconstructed. I have never seen something like this before.

      Unfortunately the problem is still there after trying your solution. So far the only solution that works is to drop each of the variables that is "tripled" twice.
      Maybe I will just send an Email to the creators of the dataset.

      Comment


      • #4
        Two thoughts come to mind.

        First, regardless of what is wrong with the dataset, it appears that Stata (or more precisely, the version of Stata you have installed) is not robust against the problems in the data. We note for example that if you try to import delimited a file with the same variable name used twice, Stata detects this and changes the name of the second such variable. Perhaps Stata should be taking similar actions when a Stata dataset is used - be a little less trusting, as it were. I think the folk at Stata Technical Services would find this an interesting problem - or else one they have encountered before! - and would like to suggest you follow the guidance at https://www.stata.com/support/tech-support/ to bring it to their attention.

        Second, perhaps you could gain some insight by requesting another copy of your dataset, this time as a csv, and see if it exhibits the same, or similar, issues. This has the advantage that you can open the file in a text editor, or for that matter, in Excel, and see what is going on.

        Comment

        Working...
        X