Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generate a new binary variable from different variables in a dataset

    Hello everyone,
    I have a dataset with many “employment status” variables, such as full-time job is called variable A, part-time job is called variable B, and unemployment is called variable C. Now I need to combine them into a dichotomous variable (full-time job and part-time job equals 1; unemployment equals 0), but I don't know what command to use to combine them, so I hope someone can give me some guidance, thank you very much.

    By the way, the data type is string, should I re-encode it with an "encode" command first?

    Thanks very much and have a nice day!

  • #2
    Assuming your variables A, B, and C are already dichotomous, you should first destring each one and then generate the new dichotomous variable using:

    Code:
    gen wanted = (A | B) & !C
    Last edited by Ali Atia; 13 Dec 2020, 14:47.

    Comment


    • #3
      You are not going to get useful advice without showing your data. Without knowing what values your original variables take, it is not possible to tell you how to generate the dummy.

      Check -dataex- and provide a sample of your data.

      Comment


      • #4
        @Joro Kolev. Thank you very much for your reply.

        The variable in the database is acquired by asking respondents about their employment status (among other variable) for every single month of the last calendar year using an "activity calendar". Thus, each occupational status (full-time employed, part-time employed, unemployed...) corresponds to 12 values (1 for being in this employment status and 0 for not being in this employment status). They are already dichotomous, but I don't know how to combine five variables with so many values into one dichotomous variable.

        A description of the variables: "pab0001_v3" means full-time employment Jan-Dec Prev. Year; “pab0002” means part-time employment Jan-Dec Prev. Year; "pab0003" means short-time work Jan-Dec Prev. Year; "pab0011" means Mini-Job Jan-Dec Prev. Yr; "pab0004" means Registered Unemployed Jan-Dec Prev. Yr

        Using full-time employment (pab0001_v3) as an example, the results for each variable are as follows: (string Data; 0-1assignments)
        Click image for larger version

Name:	2020-12-14 10.22.57.png
Views:	1
Size:	140.2 KB
ID:	1585868
        Click image for larger version

Name:	2020-12-14 10.16.27.png
Views:	1
Size:	226.9 KB
ID:	1585869

        I hope I have described my problem clearly, and thank you very much for your help and have a nice day.

        Comment


        • #5
          @ Ali Atia Thanks a lot for your reply. I have used the "encode" command to destring them. And then I have tried the "generate" command, but it did not work. I don't know what's wrong, maybe if I upload a screenshot it will show the problem better. (I wrote an explanation of the variables in my reply above)

          Click image for larger version

Name:	2020-12-14 10.57.07.png
Views:	1
Size:	163.4 KB
ID:	1585873
          Click image for larger version

Name:	2020-12-14 10.57.16.png
Views:	1
Size:	99.3 KB
ID:	1585874
          There are two types of errors shown above, but I don't know how to fix them. I look forward to your reply and have a nice day.

          Comment


          • #6
            With your data, I think a better approach would be to separate the variables into single year values (i.e., pab0001_1, pab0001_2, etc.), de-string them using the destring command, and then reshape them into long format, at which point it would be a lot easier to generate the variable you're looking for. I'm not sure encode is very helpful in this case. I'm a little confused by your data example (which is easier understood if you paste the output from dataex into the forum between [code] delimiters, rather than as a screenshot). The pab0001 variable has 24 digits rather than 12 - are 1s coded as 01s and 0s as 00s? Assuming that's the case, here's something to try:

            Code:
            foreach v of varlist pab0001_v3 pab0002 pab0003 pab0004 pab0011{
                forvalues x = 1(2)24{
                    gen `v'`x' = substr(`v',`x',2)
                    replace `v'`x' = substr(`v'`x',2,1)
                    destring `v'`x',replace
                }
            }
            drop pab0001_v3 pab0002 pab0003 pab0004 pab0011
            gen id = _n
            reshape long  pab0001_v3 pab0002 pab0003 pab0004 pab0011,i(id) j(month)
            replace month = (month + 1)/2
            gen emp = (pab0001_v3 | pab0002 | pab0003 | pab0011) & !pab0004
            You'll need to clean values shown in #4 like -5 and 000000000000000000000808 before running this code.
            Last edited by Ali Atia; 14 Dec 2020, 03:56.

            Comment

            Working...
            X