Skip to content
Advertisement

PHP Regex Match Specific Pattern between Quotes

I have strings with the following pattern:

adfadfadfadfadfadfafdadfa”externalId”:”UCEjBDKfrqQI4TgzT9YLNT8g”afadfadfafadfdaffzfzfzxf

Basically, I need to find “externalId” and extract it’s value in between the quotes that follow. The length of the value can change so it needs to be everything inside the two quotes. In this case the desired outcome is to return:

 UCEjBDKfrqQI4TgzT9YLNT8g

Here’s what I have so far:

$test = file_get_contents('https://www.youtube.com/c/GhostTownLiving');
$test = htmlentities($test);

if (strpos($test, 'externalId') !== false) {
    echo 'true';
}

I tried Advanced HTML Dom but since these externalId property inside these YouTube channel pages are loaded via javascript I couldn’t target it successfully.

Basically, i’m using htmlentities to return the code and then I’d like to extract the externalId value.

How can I write a regex pattern to match that? Thank you!

Advertisement

Answer

Parse out the whole JSON, then decode it and traverse though to what value you’re after.

<?php
$test = file_get_contents('https://www.youtube.com/c/GhostTownLiving');

// match the ytInitialData JSON
preg_match('#var ytInitialData = {(.*?)};</script>#', $test, $matches);

// add back the surounding {}'s, and parse
$ytInitialData = json_decode('{'.$matches[1].'}');

// then you have that massive object easily accessible
echo $ytInitialData->metadata->channelMetadataRenderer->externalId; // UCEjBDKfrqQI4TgzT9YLNT8g

Though, if you can obtain that from the API its friendlier then scraping

User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement