I’m trying to retrieve articles through wikipedia API using this code
$url = 'http://en.wikipedia.org/w/api.php?action=parse&page=example&format=json&prop=text'; $ch = curl_init($url); curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1); $c = curl_exec($ch); $json = json_decode($c); $content = $json->{'parse'}->{'text'}->{'*'};
I can view the content in my website and everything is fine but I have a problem with the links inside the article that I have retrieved. If you open the url you can see that all the links start with href=”/ meaning that if someone clicks on any related link in the article it redirects him to www.mysite.com/wiki/.. (Error 404) instead of en.wikipedia.com/wiki/.. Is there any piece of code that I can add to the existing one to fix this issue?
Advertisement
Answer
This seems to be a shortcoming in the MediaWiki action=parse
API. In fact, someone already filed a feature request asking for an option to make action=parse
return full URLs.
As a workaround, you could either try to mangle the links yourself (like adil suggests), or use index.php?action=render
like this:
This will only give you the page HTML with no API wrapper, but if that’s all you want anyway then it should be fine. (For example, this is the method used internally by InstantCommons to show remote file description pages.)