I have strings with the following pattern:
adfadfadfadfadfadfafdadfa”externalId”:”UCEjBDKfrqQI4TgzT9YLNT8g”afadfadfafadfdaffzfzfzxf
Basically, I need to find “externalId” and extract it’s value in between the quotes that follow. The length of the value can change so it needs to be everything inside the two quotes. In this case the desired outcome is to return:
UCEjBDKfrqQI4TgzT9YLNT8g
Here’s what I have so far:
$test = file_get_contents('https://www.youtube.com/c/GhostTownLiving'); $test = htmlentities($test); if (strpos($test, 'externalId') !== false) { echo 'true'; }
I tried Advanced HTML Dom but since these externalId property inside these YouTube channel pages are loaded via javascript I couldn’t target it successfully.
Basically, i’m using htmlentities to return the code and then I’d like to extract the externalId value.
How can I write a regex pattern to match that? Thank you!
Advertisement
Answer
Parse out the whole JSON, then decode it and traverse though to what value you’re after.
<?php $test = file_get_contents('https://www.youtube.com/c/GhostTownLiving'); // match the ytInitialData JSON preg_match('#var ytInitialData = {(.*?)};</script>#', $test, $matches); // add back the surounding {}'s, and parse $ytInitialData = json_decode('{'.$matches[1].'}'); // then you have that massive object easily accessible echo $ytInitialData->metadata->channelMetadataRenderer->externalId; // UCEjBDKfrqQI4TgzT9YLNT8g
Though, if you can obtain that from the API its friendlier then scraping