How to Distinguish Between a "Real" and "Scripting" Language
Anyone who has gone through a computational theory course will realise the headline is a trick question -- there isn't any. Yes, that's right -- contrary to what EJR is saying "PHP is the most widespread language for web apps but it is more a scripting language". While not Turing complete out of the box, PHP can be tweaked into being Turing-complete -- just replace the 30s timeout, with an infinite one. This is a more theoretical argument, but an important one. Anything that can be written in C#, VB.Net, or whatever can be written in PHP. However, I'm pretty sure that ASP is a fitting competitor to PHP in the webapp arena. Is Microsoft conceding that ASP is a dead horse and finally getting behind the LAMP stack, but replacing Linux with Windows?
Do Mobile OS's Matter
Russ asks whether the mobile OS really matters and concludes that no, it does not. I don't know why Google seems to be getting away from its roots. Let me explain.
Since the beginning, Google has always been platform-agnostic. Sure, it may be Linux on the backend, but that was invisible to the user. Indeed, to the client, it was essentially blind. It took HTTP requests in, and returned HTTP Responses. Even when they added APIs, they were always built on top of HTTP requests and responses. Google Maps was similarly built on HTTP. So is Gmail, Adsense. Even Google Earth is a glorified HTTP client (except it transfers SVG and looks like a desktop application). So, when I first heard about Android, I figured it was a platform-independant data interchange system, because control of a network derives not from controlling the endpoints, it derives from controlling the communications medium (this is what Microsoft and Apple don't realise and what Google did, but doesn't seem to anymore). In other words, I thought that it would be a peer-to-peer version of funambol. Apparently, I was wrong. But that's what I'd do if I were developing it. Whereas power on a network derives from the endpoints, control comes from the transport medium. Starting out as devoted to controlling the transport medium (and having succeeded at that, on the desktop anyway), Google is duplicating it on the mobile, in hyperdrive. The problem is that it's a serialised process and is very hard to do in parallel. Time will tell if Google can pull it off. But, for now, I'm sticking with Symbian, whether owned by Nokia or not.
Google Reader Violates User Privacy?
Dare Obasanjo writes a couple of screenfuls on how Google Reader violates privacy, within which we find this gem of understanding: the fact that there was URL obfuscation involved implies that they realized that users didn't want their Shared Items to be PUBLIC. For someone who worked on the Access Control technology behind Windows Live sharing initiatives from SkyDrive to Windows Live Spaces, this is an immensely daft statement. Larry Wall once wrote that one of the hallmarks of a decent computer programmer is laziness. My own explanation for this is that the obfuscated URIs were the default, and Google didn't bother to change them.
Gates on Productivity Software
Bill Gates pens an article on productivity software for the BBC. Hey Bill, software reaches its zenith of productivity if it has data interchange and interoperability!
Mario Monte Doesn't Give up
The mild-mannered economist-turned-Brussels'-antittrust-hawk, Mario Monte, is continuing his crusade against unregulated monopolies. Most recently, he's continuing his charade against Microsoft, this time taking evidence from Opera regarding Internet Explorer. Go on, Mario. Show them that Europe is not to be messed with.
Why CNBC.com was Lost Anyway
Andy Beale carries the story that CNBC changed its online advert provider from Google to Microsoft. This isn't exactly Earth-shattering news, as Beale seems to suggest.
Again, a follow-the-money analysis leads this to be inevitable. CNBC is the sister network of MSNBC, which is part owned by Microsoft, a company known for forcing its employees to eat their own dog food. So, where did you think they were going to go, Mr Beale?
How to Improve GMail's Spam Filter
GoogleMail has a spam filter second-to-none, but it could be better. The one failure that's a very low-hanging fruit (at least from my view) is language identification. Indeed, I'm including code below to identify a given message's language and give it a confidence:
#!/usr/bin/env perl
use warnings;
use strict;
use diagnostics;
use Lingua::Identify qw/:language_identification/;
use Mail::POP3Client;
my $pop = new Mail::POP3Client(USER => "$USER",
PASSWORD => "$PASSWORD",
HOST => 'pop.gmail.com',
PORT => 995,
USESSL => 'true'
);
$pop->Connect;
my $count = $pop->Count;
my $debug = 1;
for (my $counter = 1; $counter != $count; $counter++) {
my ($language, $prob);
while (my $text = $pop->Body($counter)) {
print $text if defined $debug;
($language, $prob) = langof($text);
}
print "Message $counter is $language, $prob probability.\n";
}
$pop->Close;
So, basically, you analyse the sent mail as a control group to determine which languages the user knows. Then you store these languages and anything that doesn't match these can be assumed to be spam. Then you just apply the standard bayesian filter.
Older posts: 1 2


