I have a requirement for building a web scraping tool. The scraping part will be coded in python and the results will be shown in PHP.
The results should be shown Asynchronously in PHP while the python is scraping the pages.
The client thinks the python is fast and better choice for web scraping. Do you think mixing the python and php still gives fast result or is it faster/better to stick to php for web scraping?
And What are preferred methods for exchanging data Asynchronously between these 2 languages?
Thank you!
Advertisement
Answer
You can use Scrapy which supports custom item exporters.
By inheriting the BaseItemExporter class, you can create an exporter that perhaps opens a websocket and sends each scraped item over it to your PHP application. You could even send them using just HTTP requests or you can send them to a persistent message queue (like RabbitMQ or Apache Kafka) and have the PHP application consume the items from the queue.