How to Summarise Auntie's Front Page
My automated news summariser has been enhanced and made faster. I am certain this isn't the most efficient way of solving it, but it does work and does so reasonably fast. Also, I've standardised on using par as the distribution format. If you still prefer the old method, the script's source is pasted below. To run the par, you'll be typing perl -MPAR BBC.par and just let it do its thing.
As for the "auntie" moniker, it is a nickname for the BBC, which is the script's news source.
use strict;
use XML::RSS::Parser::Lite;
use LWP::Simple;
use HTML::TreeBuilder;
my $url = "http://newsrss.bbc.co.uk/rss/newsonline_world_edition/front_page/rss.xml";
my $xml = get($url);
my $rp = new XML::RSS::Parser::Lite;
$rp -> parse($xml);
for (my $count = 0; $count != $rp->count() - 1; $count++) {
my $item = $rp -> get($count);
$url = $item->get('url');
my $h = HTML::TreeBuilder->new_from_content(get($url));
my @links = $h->look_down('_tag','p', sub {
my $paragraph = $_[0]->as_text;
next if not defined($paragraph);
if ( $paragraph =~ /(.*)([!?.])?/ ) {
my $length = split /\s/, $1;
print "$1$2 " if $length > 3;
}
} );
$h->delete;
print "\n--------------------------------------------------------------------------------\n" if $count != 0;
