What is the correct way to read a large JSON API r…

I currently have the following Guzzle 6 implementation returning a stream of JSON data containing user data:

$client = new GuzzleHttpClient([
        'base_uri' => 'https://www.apiexample.com',
        'handler' => $oauthHandler,
        'auth'    => 'oauth',
        'headers' => [
            'Authorization' => 'Bearer xxxxxxxxxxxxxx',
            'Content-Type' => 'application/json',
            'Accept' => 'application/json',
        ],
]);

$res = $client->post('example');
$stream = GuzzleHttpPsr7stream_for($res->getBody());

The JSON responses look like:

{
    "name": "Users",
    "record": [
        {
            "title": "Consulting",
            "_name": "Users",
            "end date": "07/03/2020",
            "session number": "1",
            "start date": "09/02/2019",
            "course id": "2900",
            "first name": "John",
            "unique user number": "123456",
            "time": "08 AM",
            "last name": "Doe",
            "year": "19-20",
            "location name": "SD"
        },
        .........
     ],
     "@extensions": "activities,corefields,u_extension,u_stu_x,s_ncea_x,s_stu_crdc_x,c_locator,u_userfields,s_edfi_x"
 }

This is being run for a number of clients using different API endpoints. Many of them return too many users for the entire JSON response to be loaded into RAM at once, which is why I am using a stream.

There may be a way to get the API to return chunks incrementally, through multiple calls. But from everything I have gotten from the developers of the API it appears that this is intended to be consumed as one streamed response.

I am new to having to stream an API response like this and am wondering what the correct approach would be to iterate through the records? Looking at the Guzzle 6 docs it appears that the iteration happens by choosing the number x character in the string and grabbing that subsection:

http://docs.guzzlephp.org/en/stable/psr7.html#streams

use GuzzleHttpPsr7;

$stream = Psr7stream_for('string data');
echo $stream;
// string data
echo $stream->read(3);
// str
echo $stream->getContents();
// ing data
var_export($stream->eof());
// true
var_export($stream->tell());
// 11

I could potentially write something that parses the strings in subsections through pattern matching and incrementally writes the data to disk as I move through the response. But it seems like that would be error prone and something that would be part of Guzzle 6.

Can you provide an example of how something like this should work or point out where I might be missing something?

I appreciate it, thanks!

Answer

But it seems like that would be error prone and something that would be part of Guzzle 6.

Nope, Guzzle is a HTTP client, it has nothing to do with parsing different response formats.

What you need is a JSON streaming parser. Please take a looks at this SO question, and also at the libraries: https://github.com/salsify/jsonstreamingparser, https://github.com/clue/php-json-stream, https://github.com/halaxa/json-machine.

In Guzzle you will have 2 possibilities:

to read the response stream manually (as you do currently), but this probably requires manual integration with a JSON streaming parser
to stream the whole response to a temporary file (see “sink” request option) and read this file later with a JSON streaming parser, this should be supported by all of the libraries.

What is the correct way to read a large JSON API response in Guzzle 6?

Advertisement

Answer