Skip to content
Advertisement

Unusual data grouping / transformation

I’m struggling with an algorithmic problem how go transform or group data to get specified output.

My input is a bunch of messages in given order (from newest to oldest):

[
   {
      "id":5,
      "created_at":"2021-01-01 00:00:00",
      "message":"Lorem ipsum dolor sit amet...",
      "is_author":true,
      "meta_data":{
         
      }
   },
   {
      "id":4,
      "created_at":"2021-01-01 01:00:00",
      "message":"Lorem ipsum dolor sit amet...",
      "is_author":true,
      "meta_data":{
         
      }
   },
   {
      "id":3,
      "created_at":"2021-01-01 03:00:00",
      "message":"Lorem ipsum dolor sit amet...",
      "is_author":false,
      "meta_data":{
         
      }
   },
   {
      "id":2,
      "created_at":"2021-01-01 04:00:00",
      "message":"Lorem ipsum dolor sit amet...",
      "is_author":false,
      "meta_data":{
         
      }
   },
   {
      "id":1,
      "created_at":"2021-01-01 05:00:00",
      "message":"Lorem ipsum dolor sit amet...",
      "is_author":true,
      "meta_data":{
         
      }
   },
   {
      "id":0,
      "created_at":"2021-01-01 06:00:00",
      "message":"Lorem ipsum dolor sit amet...",
      "is_author":false,
      "meta_data":{
         
      }
   }
]

Desired output after transformation:

[
   {
      "is_author":true,
      "messages":[
         {
            "id":5,
            "created_at":"2021-01-01 00:00:00",
            "message":"Lorem ipsum dolor sit amet...",
            "meta_data":{
               
            }
         },
         {
            "id":4,
            "created_at":"2021-01-01 01:00:00",
            "message":"Lorem ipsum dolor sit amet...",
            "meta_data":{
               
            }
         }
      ]
   },
   {
      "is_author":false,
      "messages":[
         {
            "id":3,
            "created_at":"2021-01-01 03:00:00",
            "message":"Lorem ipsum dolor sit amet...",
            "meta_data":{
               
            }
         },
         {
            "id":2,
            "created_at":"2021-01-01 04:00:00",
            "message":"Lorem ipsum dolor sit amet...",
            "meta_data":{
               
            }
         }
      ]
   },
   {
      "is_author":true,
      "messages":[
         {
            "id":1,
            "created_at":"2021-01-01 05:00:00",
            "message":"Lorem ipsum dolor sit amet...",
            "meta_data":{
               
            }
         }
      ]
   },
   {
      "is_author":false,
      "messages":[
         {
            "id":0,
            "created_at":"2021-01-01 06:00:00",
            "message":"Lorem ipsum dolor sit amet...",
            "meta_data":{
               
            }
         }
      ]
   }
]

So as you can see, each occurrence of the is_author parameter creates a new group that collects messages from this author ?

Is there any efficient solution in JS or PHP to transform such data?

Advertisement

Answer

There’s much to be said for abstracting a bit. Here, we want to split our array up whenever something changes between two consecutive elements. That is the fundamental grouping, and if we think about it that way, we can layer the gathering of them into a single {is_author, messages} object on top of that grouping.

So we might write a generic function that splits our data into subarrays whenever some function of the previous and current items returns true. Then our main function would call that, passing a function that tests whether the is_author property is different, then reformat the generated groups after it returns. It could look like this:

const splitWhenever = (pred) => (xs) =>
  xs .length == 0 ? [] : xs .slice (1) .reduce (
    ((xss, x, i) => pred (xs [i], x) 
       ? [...xss, [x]] 
       : [...xss .slice (0, -1), [... xss [xss .length - 1], x]]
    ), [[xs [0]]]
  )

const transform = (input) => splitWhenever ((x, y) => x.is_author != y.is_author) (input)
  .map ((xs => ({
    is_author: xs [0] .is_author, 
    messages: xs .map (({is_author, ...rest}) => rest)
  })))

const input = [{id: 5, created_at: "2021-01-01 00:00:00", message: "Lorem ipsum dolor sit amet...", is_author: true, meta_data: {}}, {id: 4, created_at: "2021-01-01 01:00:00", message: "Lorem ipsum dolor sit amet...", is_author: true, meta_data: {}}, {id: 3, created_at: "2021-01-01 03:00:00", message: "Lorem ipsum dolor sit amet...", is_author: false, meta_data: {}}, {id: 2, created_at: "2021-01-01 04:00:00", message: "Lorem ipsum dolor sit amet...", is_author: false, meta_data: {}}, {id: 1, created_at: "2021-01-01 05:00:00", message: "Lorem ipsum dolor sit amet...", is_author: true, meta_data: {}}, {id: 0, created_at: "2021-01-01 06:00:00", message: "Lorem ipsum dolor sit amet...", is_author: false, meta_data: {}}]

console .log (transform (input))
.as-console-wrapper {max-height: 100% !important; top: 0}

This is more complex than the answer from navnath. But it builds on splitWherever, which is now reusable across this and other programs.

In a followup to navnath’s answer, the OP asks, “[W]hat if I have processed a set of data and the application loads new data […] will it be possible to join the messages from the object id 6 to the existing group?” I commented there with a suggested change to do that with only minor tweaks to navanth’s code. But after rereading, I think that this new data was supposed to come before the existing data.(Because the existing ids are sorted descending and the new ones are larger than those.) That means that my suggestion there would probably not do.

Here it takes a bit more work, as we have already split apart the grouping from the reformatting. This version, which still uses the same generic splitWhenever, will first flatten the existing structure back into the original format, prepend the new data, then run all over again. This may sound wasteful. Perhaps it is. Better might be to simply keep the original list, prepend to that and then rerun the approach above. But since our transformation is reversible, this will work if it’s desired:

const splitWhenever = (fn) => (xs) =>
  xs .length == 0 ? [] : xs .slice (1) .reduce (
    ((xss, x, i) => fn (xs [i], x) 
       ? [...xss, [x]] 
       : [...xss .slice (0, -1), [... xss [xss .length - 1], x]]
    ), [[xs [0]]]
  )

const transform = (input, old = []) => splitWhenever ((x, y) => x.is_author != y.is_author) ([
  ...input, 
  ...old.flatMap (({is_author, messages}) => messages .map (msg => ({...msg, is_author})))
]).map ((xs => ({
  is_author: xs [0] .is_author, 
  messages: xs .map (({is_author, ...rest}) => rest)
})))

const input = [{id: 5, created_at: "2021-01-01 00:00:00", message: "Lorem ipsum dolor sit amet...", is_author: true, meta_data: {}}, {id: 4, created_at: "2021-01-01 01:00:00", message: "Lorem ipsum dolor sit amet...", is_author: true, meta_data: {}}, {id: 3, created_at: "2021-01-01 03:00:00", message: "Lorem ipsum dolor sit amet...", is_author: false, meta_data: {}}, {id: 2, created_at: "2021-01-01 04:00:00", message: "Lorem ipsum dolor sit amet...", is_author: false, meta_data: {}}, {id: 1, created_at: "2021-01-01 05:00:00", message: "Lorem ipsum dolor sit amet...", is_author: true, meta_data: {}}, {id: 0, created_at: "2021-01-01 06:00:00", message: "Lorem ipsum dolor sit amet...", is_author: false, meta_data: {}}]
const result = transform (input)
console .log ('Original data')
console .log (result)

const additional = [{id: 7, created_at: "2020-12-31 22:00:00", message: "Lorem ipsum dolor sit amet...", is_author: false, meta_data: {}}, {id: 6, created_at: "2020-12-31 22:00:00", message: "Lorem ipsum dolor sit amet...", is_author: true, meta_data: {}}]
const result2 = transform (additional, result)
console .log ('With additional results')
console .log (result2)
.as-console-wrapper {max-height: 100% !important; top: 0}
User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement