I want to extract the value of a specific cell from a table in a web page. First I search a string (here a player’s name) and after I wan’t to get the value of the <td>
cell associated (here 94).
I can connect to the web page, find the table with is id and get all values. I also can search a specific string with preg_match
but I can’t extract the value of the <td>
cell.
What the best way to extract the value of a table with a match expression ?
Here is my script :
<?php // Connect to the web page $doc = new DOMDocument; $doc->preserveWhiteSpace = false; $doc->strictErrorChecking = false; $doc->recover = true; @$doc->loadHTMLFile('https://www.basketball-reference.com/leaders/trp_dbl_career.html'); $xpath = new DOMXPath($doc); // Extract the table from is id $table = $xpath->query("//*[@id='nba']")->item(0); // See result in HTML //$tableResult = $doc->saveHTML($table); //print $tableResult; // Get elements by tags and build a string $str = ""; $rows = $table->getElementsByTagName("tr"); foreach ($rows as $row) { $cells = $row -> getElementsByTagName('td'); foreach ($cells as $cell) { $str .= $cell->nodeValue; } } // Search a specific string (here a player's name) $player = preg_match('/LeBron James(.*)/', $str, $matches); // Get the value $playerValue = intval(array_pop($matches)); print $playerValue; ?>
Here is the HTML
structure of the table :
<table id="nba"> <thead><tr><th>Rank</th><th>Player</th><th>Trp Dbl</th></tr></thead> ... <tr> <td>5.</td> <td><strong><a href="/players/j/jamesle01.html">LeBron James</a></strong></td> <td>94</td> </tr> ... </table>
Advertisement
Answer
DOM manipulation solution.
Search over all cells and break if cell consists LeBron James
value.
$doc = new DOMDocument; $doc->preserveWhiteSpace = false; $doc->strictErrorChecking = false; $doc->recover = true; @$doc->loadHTMLFile('https://www.basketball-reference.com/leaders/trp_dbl_career.html'); $xpath = new DOMXPath($doc); $table = $xpath->query("//*[@id='nba']")->item(0); $str = ""; $rows = $table->getElementsByTagName("tr"); $trpDbl = null; foreach ($rows as $row) { $cells = $row->getElementsByTagName('td'); foreach ($cells as $cell) { if (preg_match('/LeBron James/', $cell->nodeValue, $matches)) { $trpDbl = $cell->nextSibling->nodeValue; break; } } } print($trpDbl);
Regex expression for whole cell value with name LeBron James
.
$player = preg_match('/<td>(.*LeBron James.*)</td>/', $str, $matches);
If you want to capture also ID 94
from next cell you can use this expression.
$player = preg_match('/<td>(.*LeBron James.*)</td>s*<td>(.*)</td>/', $str, $matches);
It returns two groups, first cell with player’s name and second with ID.