Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Nested to Long format

    Hi Gurus,

    my current dataset is like this:

    Code:
    clear
    input float n str307 basket
    1 `"{"sku":29,"quantity":1},{"sku":34,"quantity":5}"'                                                                                                                        
    2 `"{"sku":1,"quantity":1},{"sku":9,"quantity":1},{"sku":29,"quantity":1}"'                                                                                                  
    3 `"{"sku":4,"quantity":1}"'                                                                                                                                                 
    4 `"{"sku":28,"quantity":1}"'                                                                                                                                                
    5 `"{"sku":16,"quantity":1},{"sku":22,"quantity":1},{"sku":23,"quantity":1}"'                                                                                                
    6 `"{"sku":22,"quantity":1},{"sku":23,"quantity":1},{"sku":24,"quantity":1},{"sku":25,"quantity":1},{"sku":28,"quantity":1},{"sku":29,"quantity":1},{"sku":33,"quantity":2}"'
    end
    
    
    . list, noobs sep(0)
    
      +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
      | n                                                                                                                                                                    basket |
      |-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
      | 1                                                                                                                           {"sku":29,"quantity":1},{"sku":34,"quantity":5} |
      | 2                                                                                                     {"sku":1,"quantity":1},{"sku":9,"quantity":1},{"sku":29,"quantity":1} |
      | 3                                                                                                                                                    {"sku":4,"quantity":1} |
      | 4                                                                                                                                                   {"sku":28,"quantity":1} |
      | 5                                                                                                   {"sku":16,"quantity":1},{"sku":22,"quantity":1},{"sku":23,"quantity":1} |
      | 6   {"sku":22,"quantity":1},{"sku":23,"quantity":1},{"sku":24,"quantity":1},{"sku":25,"quantity":1},{"sku":28,"quantity":1},{"sku":29,"quantity":1},{"sku":33,"quantity":2} |
      +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    and need to convert to Long format:

    Code:
    input float n str3 sku float quantity
    1 "29" 1
    1 "34" 5
    2 "1"  1
    2 "9"  1
    2 "29" 1
    3 "4"  1
    4 "28" 1
    5 "16" 1
    5 "22" 1
    5 "23" 1
    6 "22" 1
    6 "23" 1
    6 "24" 1
    6 "25" 1
    6 "28" 1
    6 "29" 1
    6 "33" 1
    end
    
    . list, noobs sep(0)
    
      +--------------------+
      | n   sku   quantity |
      |--------------------|
      | 1    29          1 |
      | 1    34          5 |
      | 2     1          1 |
      | 2     9          1 |
      | 2    29          1 |
      | 3     4          1 |
      | 4    28          1 |
      | 5    16          1 |
      | 5    22          1 |
      | 5    23          1 |
      | 6    22          1 |
      | 6    23          1 |
      | 6    24          1 |
      | 6    25          1 |
      | 6    28          1 |
      | 6    29          1 |
      | 6    33          1 |
      +--------------------+
    Does anyone have any tips for this? ps.Mata code is welcome !

  • #2
    This seems to accomplish what you want.
    Code:
    split basket, parse("},{") generate(item)
    drop basket
    reshape long item, i(n) j(j)
    drop if item==""
    drop j
    generate sku = ustrregexs(1)  if ustrregexm(item,"sku.:(\d+)")
    generate quantity = real(ustrregexs(1)) if ustrregexm(item,"quantity.:(\d+)")
    drop item
    list, sepby(n)
    describe
    Code:
    . list, sepby(n)
    
         +--------------------+
         | n   sku   quantity |
         |--------------------|
      1. | 1    29          1 |
      2. | 1    34          5 |
         |--------------------|
      3. | 2     1          1 |
      4. | 2     9          1 |
      5. | 2    29          1 |
         |--------------------|
      6. | 3     4          1 |
         |--------------------|
      7. | 4    28          1 |
         |--------------------|
      8. | 5    16          1 |
      9. | 5    22          1 |
     10. | 5    23          1 |
         |--------------------|
     11. | 6    22          1 |
     12. | 6    23          1 |
     13. | 6    24          1 |
     14. | 6    25          1 |
     15. | 6    28          1 |
     16. | 6    29          1 |
     17. | 6    33          2 |
         +--------------------+
    
    . describe
    
    Contains data
     Observations:            17                  
        Variables:             3                  
    ------------------------------------------------------------------------------------------------
    Variable      Storage   Display    Value
        name         type    format    label      Variable label
    ------------------------------------------------------------------------------------------------
    n               float   %9.0g                
    sku             str2    %9s                  
    quantity        float   %9.0g                
    ------------------------------------------------------------------------------------------------
    Sorted by: n
         Note: Dataset has changed since last saved.
    
    .
    Added in edit: n==6 sku=="33" shows 2 in my results, 1 in the desired output in post #1, but 2 in the JSON input data in post #1.
    Last edited by William Lisowski; 01 Feb 2023, 12:43.

    Comment


    • #3
      yes, a Typo . n==6 sku=="33" shows 2.
      .
      Thanks, Willian it worked and this time you took 18 minutes to solve it ! You are getting even better.

      Regex often is the way to go.

      Comment


      • #4
        I'm known here for arguing in favor of regex, but perhaps if I had had some experience with the community-contributed tools for processing JSON input I would have pointed to one of them. I'm hoping Stata will provide built-in solutions that don't depend on me integrating python into my installation before I need to deal directly with JSON.

        But this would be a good time for anyone reading this and familiar with JSON to demonstrate how it could be processed for this seemingly simple dataset.

        Comment

        Working...
        X