Skip to content
Advertisement

Encoding issue with PHP while writing in a .csv file

I’m working with a php array which contains some values parsed from a previous scraping process (using Simple HTML DOM Parser). I can normally print / echo the values of this array, which contains special chars é,à,è, etc. BUT, the problem is the following :

When I’m using fwrite to save values in a .csv file, some characters are not successfully saved. For example, Székesfehérvár is well displayed on my php view in HTML, but saved as Székesfehérvár in the .csv file which I generate with the php script above.

I’ve already set-up several things in the php script :

  • The page I’m scraping seems to be utf-8 encoded
  • My PHP script is also declared as utf-8 in the header
  • I’ve tried a lot of iconv and mb_encode methods in different places in the code
  • NOTE that when I’m make a JS console.log of my php array, using json_encode, the characters are also broken, maybe linked to the original encoding of the page I’m scraping?

Here’s a part of the script, it is the part who is writing values in a .csv file

JavaScript

I am currently stuck because I can’t save values with accentuated characters correctly.

Advertisement

Answer

The solution (provided by @misorude) :

When scraping HTML contents from webpages, there is a difference between what’s displayed in your debug and what’s really scraped in the script. I had to use html_entity_decode to let PHP interpret the true value of the HTML code I’ve scraped, and not the browser’s interpretation.

To validate a good retriving of values before store them somewhere, you could try a console.log in JS to see if values are correctly drived :

PHP

JavaScript

Javascript (to test):

JavaScript
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement