Skip to content
Advertisement

Get coordinates from script tag, image tag, link

I’m building a search engine for deals and I put all deals on a map, so I need coordinates from websites with scraping.

So, coordinates can be on scrit, tag, image, link, etc.

Is there any tool or any script, framework, that help me to quick get coordinates from some web sites ? How to do that? With PHP, XPath, regex, node, css selectors??? Some quick “getting coordinates” script.

Is there any solution for this, doing it manually is so hard because I need to do scraping on more that 10,000 web sites and on all sites coordinates is in script tag, or image or link …

Any automated solution?

here is some cases:

 <a href="http://maps.google.com/maps?q=44.796637,20.480168" target="_blank">prikaži na karti</a>

<iframe frameborder="0" border="0" scrolling="no"  marginwidth="0" marginheight="0" title=""  src="http://www.kolektiva.rs/beograd/dailydeal/vendor/map/center/44.815123,20.469887/"></iframe>

<iframe frameborder="0" height="230" marginheight="0" marginwidth="0" scrolling="no" src="http://maps.google.com/maps/ms?ie=UTF8&amp;hl=el&amp;msa=0&amp;msid=207271638222613154872.00049df7bb569d7af0057&amp;ll=38.775499,23.483276&amp;spn=0.984971,1.257935&amp;z=8&amp;output=embed" width="230"></iframe>

you can add your cases, becouse on every site is different type of map…

so is there universal code for extracting coordinates not only from this examples – for every text???

Advertisement

Answer

Hmmm I think you got a minus cause you seem to look for some kind of magics … Or you are not clear and accurate enough maybe …

Perhaps you should try to separate your problem into several isolated definite problems, because a sort of universal geolocation coordinates web harvester seems to be quite a very specific application … Maybe it exists, I don’t know, but … sounds kind of magics for now 🙂

So maybe try to make a clear, definite inventory of each case you may face to, and possibly begin to try to think about a possible extraction solution for each case …

That has been said, first of all, are you sure an IP to GPS-coordinates won’t be enough? If you have an URL for each of your deal, then it may be enough. In this case, you may want to have a look at here, the free databases are updated each month but it should be accurate enough. They provide API for a lot of dev environments and you can try their service freely at this address (25 requests a day max)

Here is a quick tutorial for working with the geolitecity database and quova on PHP

I think PHP have a GeoIP module as well but I don’t know if it’s using the maxmind service, or another one, and I can’t access the PHP website, it seems down for now. Try this later maybe,

edit: you need to say what kind of source you will have for your deals. Is it on big corporate website like ebay or amazon, or similar ? If yes, you may first want to check if they have proper API from which you may retrieve GPS coordinates for each deal easily …

edit#2: Ok, so from your samples, it seems that all your geolocation cases are with the form 23.987463,12.098374, say two between one and three digits preceded with a ‘minus’ or not, followed with a dot, then 6 digits, then a comma, then two between one and three digits preceded with a ‘minus’ or not, followed with a dot and six digits again … So, a regular expression for matching this format would be:

'-?[0-9]{1,3}.[0-9]{6},-?[0-9]{1,3}.[0-9]{6}'

Now, in PHP, you should do something like:

preg_match_all('-?[0-9]{1,3}.[0-9]{6},-?[0-9]{1,3}.[0-9]{6}',$s,$out);

where your input string is in $s, and where you get an array with your 3 GPS-coordinates in $out.

The array in $out will look something like:

Array
(
[0] => Array
(
[0] => 44.796637,20.480168
[1] => 44.815123,20.469887
[2] => 38.775499,23.483276
)

Now, I’m not a PHP guy, and I cannot try any code on my machine so I suggest you make some attempts, and if needed, ask new questions, more clear and definite, regarding the new problems you may meet …

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement