Is there any way to find a specific pixel area that is surrounded by a black border with PHP and Imagick?

Question

I&#8217;ve been trying to use Imagick to turn PDF files in my PHP application into PNGs so that I can get Tesseract OCR&#8217;s PHP library to scan only handwritten text in the documents. The handwritten text areas are surrounded by a black border in the documents, and there&#8217;s a chance that they could b…

Accepted Answer

You can do that in ImageMagick command line using connected components. I do not know if PHP Imagick can do that. But you can check. Otherwise, just use PHP exec() to run my command.The following is Unix syntax.Input:bbox=`convert Ef82Whf_d.webp -threshold 75% -type bilevel -define connected-components:exclude-header=true -define connected-components:area-threshold=500 -define connected-components:keep-top=1 -define connected-components:verbose=true -define connected-components:mean-color=true -connected-components 8 null: | grep "gray(255)" | tail -n +2 | awk '{print $2}'`convert Ef82Whf_d.webp -crop $bbox +repage -shave 5x5 textbox_crop.png

Advertisement

Answer