Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creation of an indicative variable

    Hello everyone.

    I hope you can help me with a problem I am currently working on. I have a database that contains employment and university information in Ecuador. The database is in a wide format and has a unique identifier per person. The main variable is the salary in each month worked. So I have a period from 2012 to 2022, I have at least 132 salary variables (salary1, salary2, salary3, etc). Suffix 1 refers to January 2012, while suffix 132 refers to December 2022. Now I have another variable that tells me the graduation date of each person. I need to create a variable that tells me the starting salary (the first salary received after graduation). I tried this syntax, but I have cases where the person did not receive a salary on the graduation date (I have missing values). So I need it to look for the first salary after their graduation date. My graduation date variable also uses the same logic in the period, it goes from 1 to 132. Here is an example of the format of my database. I welcome any comments or help.

    gen primer_sueldo = .
    forval i = 1/132 {
    replace primer_sueldo = sueldo`i' if tiempo >= `i' & primer_sueldo == .
    }


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(id sueldo1 sueldo2 sueldo3 sueldo4 sueldo5 fecha_grad)
     1 1181    1209    1295   918     588    2
     2    .            .            .     804         .     1
     3    .          948         .         .      910    3
     4  630.    1099     934  1229   1181  10
     5  735       887         .   1079     929   7
     6    .        1204         .     682         .   10
     7  924     1186     978    616         .     3
     8  536          .     1028        .    1366    6
     9  586     1316     568  1140   1309    3
    10    .       1302     574  1150     694    1
    end

  • #2
    I don't know what tiempo is doing in the code, but consider this

    Code:
    clear
    input float(id sueldo1 sueldo2 sueldo3 sueldo4 sueldo5 fecha_grad)
     1 1181    1209    1295   918     588    2
     2    .            .            .     804         .     1
     3    .          948         .         .      910    3
     4  630.    1099     934  1229   1181  10
     5  735       887         .   1079     929   7
     6    .        1204         .     682         .   10
     7  924     1186     978    616         .     3
     8  536          .     1028        .    1366    6
     9  586     1316     568  1140   1309    3
    10    .       1302     574  1150     694    1
    end
    
    gen primer_sueldo = .
    forval i = 1/5 {
    replace primer_sueldo = sueldo`i' if  `i' >= fecha_grad & primer_sueldo == .
    }
    
    list
    
         +----------------------------------------------------------------------------+
         | id   sueldo1   sueldo2   sueldo3   sueldo4   sueldo5   fecha_~d   primer~o |
         |----------------------------------------------------------------------------|
      1. |  1      1181      1209      1295       918       588          2       1209 |
      2. |  2         .         .         .       804         .          1        804 |
      3. |  3         .       948         .         .       910          3        910 |
      4. |  4       630      1099       934      1229      1181         10          . |
      5. |  5       735       887         .      1079       929          7          . |
         |----------------------------------------------------------------------------|
      6. |  6         .      1204         .       682         .         10          . |
      7. |  7       924      1186       978       616         .          3        978 |
      8. |  8       536         .      1028         .      1366          6          . |
      9. |  9       586      1316       568      1140      1309          3        568 |
     10. | 10         .      1302       574      1150       694          1       1302 |
         +----------------------------------------------------------------------------+
    .
    For your full dataset, use 132 not 5.

    Comment


    • #3
      Thank you, apparently it worked perfectly. However, I have a query in the event that I want to create a variable called t1 that is the average of the 12 salaries after the hitch-up salary. Since I need to see the behavior of wages in the future after graduating. I need to do the same with at least 10 future periods. That is, create 10 variables called t1, t2, t3,..., t10 and that each one has the average of the next 12 salaries successively. Is this possible in any way?

      Specifically, I need my variable:

      T1: average of the 12 salaries following the down payment salary

      T2: average of the 12 salaries following the salary in T1

      T3: average of the 12 salaries following the salary in T2

      Until you reach at least T10.

      Any help would be perfect for me. Thank you

      Comment


      • #4
        I recommend strongly that you reshape long.

        Then you won't need loops ad hoc for each little problem.

        Sample calculations here use rangestat from SSC. The main point is that you can look at the next 12 observations. In the data example in #1 we don't have that many, but the principle is good.

        Code:
        clear
        input float(id sueldo1 sueldo2 sueldo3 sueldo4 sueldo5 fecha_grad)
         1 1181    1209    1295   918     588    2
         2    .            .            .     804         .     1
         3    .          948         .         .      910    3
         4  630.    1099     934  1229   1181  10
         5  735       887         .   1079     929   7
         6    .        1204         .     682         .   10
         7  924     1186     978    616         .     3
         8  536          .     1028        .    1366    6
         9  586     1316     568  1140   1309    3
        10    .       1302     574  1150     694    1
        end
        
        reshape long sueldo, i(id) j(time)
        
        list, sepby(id)
        
        rangestat (count) count=sueldo (mean) mean=sueldo, int(time 1 12) by(id)
        
        gen after = time > fecha_grad 
        
        list, sepby(id)
        
        
             +-----------------------------------------------------------+
             | id   time   sueldo   fecha_~d   count        mean   after |
             |-----------------------------------------------------------|
          1. |  1      1     1181          2       4      1002.5       0 |
          2. |  1      2     1209          2       3   933.66667       0 |
          3. |  1      3     1295          2       2         753       1 |
          4. |  1      4      918          2       1         588       1 |
          5. |  1      5      588          2       .           .       1 |
             |-----------------------------------------------------------|
          6. |  2      1        .          1       1         804       0 |
          7. |  2      2        .          1       1         804       1 |
          8. |  2      3        .          1       1         804       1 |
          9. |  2      4      804          1       0           .       1 |
         10. |  2      5        .          1       .           .       1 |
             |-----------------------------------------------------------|
         11. |  3      1        .          3       2         929       0 |
         12. |  3      2      948          3       1         910       0 |
         13. |  3      3        .          3       1         910       0 |
         14. |  3      4        .          3       1         910       1 |
         15. |  3      5      910          3       .           .       1 |
             |-----------------------------------------------------------|
         16. |  4      1      630         10       4     1110.75       0 |
         17. |  4      2     1099         10       3   1114.6667       0 |
         18. |  4      3      934         10       2        1205       0 |
         19. |  4      4     1229         10       1        1181       0 |
         20. |  4      5     1181         10       .           .       0 |
             |-----------------------------------------------------------|
         21. |  5      1      735          7       3         965       0 |
         22. |  5      2      887          7       2        1004       0 |
         23. |  5      3        .          7       2        1004       0 |
         24. |  5      4     1079          7       1         929       0 |
         25. |  5      5      929          7       .           .       0 |
             |-----------------------------------------------------------|
         26. |  6      1        .         10       2         943       0 |
         27. |  6      2     1204         10       1         682       0 |
         28. |  6      3        .         10       1         682       0 |
         29. |  6      4      682         10       0           .       0 |
         30. |  6      5        .         10       .           .       0 |
             |-----------------------------------------------------------|
         31. |  7      1      924          3       3   926.66667       0 |
         32. |  7      2     1186          3       2         797       0 |
         33. |  7      3      978          3       1         616       0 |
         34. |  7      4      616          3       0           .       1 |
         35. |  7      5        .          3       .           .       1 |
             |-----------------------------------------------------------|
         36. |  8      1      536          6       2        1197       0 |
         37. |  8      2        .          6       2        1197       0 |
         38. |  8      3     1028          6       1        1366       0 |
         39. |  8      4        .          6       1        1366       0 |
         40. |  8      5     1366          6       .           .       0 |
             |-----------------------------------------------------------|
         41. |  9      1      586          3       4     1083.25       0 |
         42. |  9      2     1316          3       3   1005.6667       0 |
         43. |  9      3      568          3       2      1224.5       0 |
         44. |  9      4     1140          3       1        1309       1 |
         45. |  9      5     1309          3       .           .       1 |
             |-----------------------------------------------------------|
         46. | 10      1        .          1       4         930       0 |
         47. | 10      2     1302          1       3         806       1 |
         48. | 10      3      574          1       2         922       1 |
         49. | 10      4     1150          1       1         694       1 |
         50. | 10      5      694          1       .           .       1 |
             +-----------------------------------------------------------+

        What "hitch-up" and "down payment" mean for you I don't know. I see no such mentions in #1.


        Comment


        • #5
          Thank you very much for the reply. However, due to data processing and computer capacity issues, they are asked to work with data in wide format. With hitch-up" and "down payment I mean the starting salary. My question is if keeping the database format in wide doesn't this case have a solution?

          Comment


          • #6
            Naturally there is a way to solve the problem with loops, but having shown you a relatively simple solution, I am not inclined to think up a more complicated one. Sorry if that's disappointing.

            Computer capacity is most unlikely to be an issue here as it's the same data in different layout.

            Comment

            Working...
            X