Skip to content
Advertisement

Get word count for first line in doc/docx/pdf file

For my task, I have to get the total word count of an uploaded .doc, .docx or .pdf file. Then, I have to find the word count in the first line of the document and remove it from the total (since it is probably going to be the title).

I am using doccounter to find the total word count of a document as such:

include "class.doccounter.php";

$doc = new DocCounter();
$doc->setFile("file.ext");

print_r($doc->getInfo());
echo ($doc->getInfo()->wordCount);

All that is left is to find the word count of the first line of the uploaded file. Any solutions including additional libraries or native implementations are welcome! Thank you!

Edit – Solution (Credit to Rustyjim):

$doc = new DocCounter();
$doc->setFile("file.pdf");
$text = $doc->getInfo()->toText; // Edited doccounter to return text as string
$array = explode("n", $text); // every cell contains a new line of the text
echo $array[0]; // First line

Advertisement

Answer

Maybe you can use explode on newlines like:

$array = explode("n", $doc);

Then use the first element of the array to count the characters:

echo strlen($array[0]);

Hope that helps

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement