How do I download txt web content using perl

Question

I am trying to download data from this data page. I have tried a number of scripts I googled. On the data page I have to select the countries I want, one at a time. The one script which gets close to ...

Accepted Answer

The link returns text wrapped in HTML. Simplest approach would be to use HTML::FormatText and HTML::Parse to get the text only version. #!/usr/bin/perluse strict;use warnings;use HTML::TreeBuilder;use HTML::FormatText;my $url = 'https://www.ogimet.com/ultimos_synops2.php?lang=en&estado=Zamb&fmt=txt&Send=Send';my $text = HTML::FormatText->new(leftmargin=>0, rightmargin=>100000000000)->format(HTML::TreeBuilder->new_from_url($url));my $file = 'Zamb.txt';open (my $fh, '>', $file);print $fh $text;close ($fh);HTML::TreeBuilder->new_from_url($url) &#8211; download and parse the htmlHTML::FormatText->new(leftmargin=>0, rightmargin=>100000000000) &#8211; intialize the html format &#8211; set the right margin to a big value to prevent wrappingThis is the content of Zamb.txt afterwards.  $ cat Zamb.txt########################################################### Query made at 02/29/2020 18:15:54 UTC##################################################################################################################### latest SYNOP reports from Zambia before 02/29/2020 18:15:54 UTC##########################################################202002291200 AAXX 29124 67855 42775 51401 10310 20168 3//// 48/// 85201                   333 5//// 85850 83080=My php fu isn&#8217;t up to date, but for PHP, I think you can use the following:<?php$url = 'https://www.ogimet.com/ultimos_synops2.php?lang=en&estado=Zamb&fmt=txt&Send=Send';$content = strip_tags(file_get_contents($url));echo substr($content, strpos($content, '###############'));Note: I seem to recall that there are some configuration options that might disable fetching URL via file_get_contents so YMMV.However, the same page there is a note:  NOTE: If you want to get simply files with synop reports in CSV format without HTML tags consider to use the binary getsynopThis would get you the same data in a easy to use format:$ wget "https://www.ogimet.com/cgi-bin/getsynop?begin=$(date +%Y%m%d0000)&state=Zambia" -o /dev/null -O - | tail -167855,2020,02,29,12,00,AAXX 29124 67855 42775 51401 10310 20168 3//// 48/// 85201 333 5//// 85850 83080=

Advertisement

Answer