How to Clean Your Inbox

Posted by Prolific Programmer Tue, 24 Jun 2008 19:47:00 GMT

Lifehacker relays how to clear your inbox using GMail. Basically, you create a filter that marks all matching messages as red and apply it to all matching conversations. Remember to remove the filter after you're done.

How to Search GMail from the Comfort of Your Command-Line

Posted by Prolific Programmer Fri, 04 Apr 2008 14:16:00 GMT

The command-line gmail search is working. Next step: see how to speed it up. It's still taking almost a minute to search 317 messages. Code pasted after the flip, as with the last message.

package com.prolificprogrammer.lucenegmail;
import java.io.File;
import java.util.logging.Logger;
import java.util.logging.Level;

import javax.mail.Folder;
import javax.mail.Message;
import javax.mail.Session;
import javax.mail.Store;
import javax.mail.internet.InternetAddress;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.Hits;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;

public class SearchGMail {
    private static Logger logger = Logger.getLogger(new SearchGMail().getClass().getCanonicalName());
    public static void main (String[] args) throws Exception {
	//logger.setLevel(java.util.logging.Level.FINE);
	try {
	    File path = new File(System.getProperty("java.io.tmpdir")+File.separator+"gmail.index");
	    path.mkdir();
	    path.deleteOnExit();
	    long starttime = System.currentTimeMillis();
	    IndexWriter index = new IndexWriter(path.getAbsolutePath(), new StandardAnalyzer(), true);
	    Session session = Session.getDefaultInstance(System.getProperties(), null);
	    Store store = session.getStore("pop3s");
	    store.connect("pop.gmail.com", args[0], args[1]);
	    logger.fine("Connected!");
	    Folder folder = store.getDefaultFolder();
	    folder = folder.getFolder("INBOX");
	    folder.open(Folder.READ_ONLY);
	    logger.fine("Opened INBOX");
	    Message[] messages = folder.getMessages();
	    int x;
	    for (x = 0; x != messages.length; x++) {
		try {
		    Document document = new Document();
		    String allField = ((InternetAddress)messages[x].getFrom()[0]).getAddress()+"\n"+messages[x].getSubject();
		    document.add(new Field("all", allField, Field.Store.YES, Field.Index.TOKENIZED));
		    Field messageNumberField = new Field("messageNumber", new Integer(x).toString(), Field.Store.YES, Field.Index.NO);
		    messageNumberField.setBoost((float)0.0);
		    document.add(messageNumberField);
		    index.addDocument(document);
		    logger.fine("Message "+x+" added.");
		} catch (OutOfMemoryError e) {
		    index.optimize();
		    continue;
		}
	    }
	    index.optimize();
	    index.close();
	    
	    logger.info("Index Constructed -- now searching");
	    
	    IndexSearcher searcher = new IndexSearcher(path.getAbsolutePath());
	    Analyzer analyzer = new StandardAnalyzer();
	    String query = args[2];
	    QueryParser queryParser = new QueryParser("all", analyzer);
	    Query parsedQuery = queryParser.parse(query);
	    Hits hits = searcher.search(parsedQuery);
	    for (int i = 0; i!= hits.length();i++) {
		Document doc = hits.doc(i);
		System.out.println("Message "+doc.getField("messageNumber").stringValue()+" matches "+query+" with a score of "+hits.score(i));
	    }
	    searcher.close();
	    long endtime = System.currentTimeMillis();
	    logger.severe("program took "+new Long(endtime-starttime).toString()+" miliseconds to search "+new Integer(x).toString()+" messages, which occupy "+new Long(path.length()).toString()+" bytes.");
	    java.awt.Toolkit.getDefaultToolkit().beep();
	} catch (ArrayIndexOutOfBoundsException e) {
	    logger.severe("Usage: "+new SearchGMail().getClass().getName()+" [google login] [password] [query]\nAll required");
	    System.exit(-1);
	}
    }
}

How to Search Gmail from the comfort of your Keyboard 2

Posted by Prolific Programmer Fri, 04 Apr 2008 02:41:00 GMT

The Java code below leverages Lucene 2.3.1 and javamail to create a command-line search of your GMail inbox. It's actually quite slow, so I'd like to speed it up over time, but it does give updates as it runs, perhaps too many. Any (and all) suggestions appreciated?

package com.prolificprogrammer.lucenegmail;
import java.io.File;
import javax.mail.Folder;
import javax.mail.Message;
import javax.mail.Session;
import javax.mail.Store;
import javax.mail.internet.InternetAddress;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.Hits;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;

public class SearchGMail {
    public static void main (String[] args) throws Exception {
	File path = new File(System.getProperty("java.io.tmpdir")+File.separator+"gmail.index");
	path.mkdir();
	path.deleteOnExit();
	long starttime = System.currentTimeMillis();
	IndexWriter index = new IndexWriter(path.getAbsolutePath(), new StandardAnalyzer(), true);
	Session session = Session.getDefaultInstance(System.getProperties(), null);
	Store store = session.getStore("imaps");
	store.connect("imap.gmail.com", args[0], args[1]);
	System.err.println("Connected!");
	Folder folder = store.getDefaultFolder();
	folder = folder.getFolder("INBOX");
	folder.open(Folder.READ_ONLY);
	System.err.println("Opened INBOX");
	Message[] messages = folder.getMessages();
	System.err.println("Messages retrieved!");
	int x;
	for (x = 0; x != messages.length; x++) {
	    Document document = new Document();
	    String allField = ((InternetAddress)messages[x].getFrom()[0]).getAddress()+"\n"+messages[x].getSubject();
	    document.add(new Field("all", allField, Field.Store.YES, Field.Index.TOKENIZED));
	    document.add(new Field("messageNumber", new Integer(x).toString(), Field.Store.YES, Field.Index.NO));
	    index.addDocument(document);
	    System.err.println("Message "+x+" added.");
	}
	index.optimize();
	index.close();

	System.err.println("Ok, index constructed with "+x+" messages in "+path.getAbsolutePath()+", now searching it");

	IndexSearcher searcher = new IndexSearcher(path.getAbsolutePath());
	Analyzer analyzer = new StandardAnalyzer();
	String query = args[3];
	QueryParser queryParser = new QueryParser("all", analyzer);
	Query parsedQuery = queryParser.parse(query);
	Hits hits = searcher.search(parsedQuery);
	for (int i = 0; i!= hits.length();i++) {
	    Document doc = hits.doc(i);
	    System.out.println(doc.getField("messageNumber"));
	}
	searcher.close();
	long endtime = System.currentTimeMillis();
	System.err.println("program took "+endtime-starttime+" miliseconds to search "+x+" messages, which occupy "+path.length()+" bytes.");
	java.awt.Toolkit.getDefaultToolkit().beep();
    }
}

How to Search Your Gmail in One Command

Posted by Prolific Programmer Wed, 19 Mar 2008 04:22:00 GMT

So, tonight, aside from reminiscing about old flames, Tareeq and I got to implementing a command line search for Gmail. Leveraging Javamail, lucene and maintaining no notion of state whatever. It allows you to type ./search.sh from:Tareeq subject:Tunis and returns the subject lines of messages that match the query, sorted by score. This is my first maven-managed project and so far, I'm liking it much better than ant. I haven't timed a run yet, maybe at the weekend?

How to Improve GMail's Spam Filter

Posted by Prolific Programmer Fri, 02 Nov 2007 21:44:00 GMT

GoogleMail has a spam filter second-to-none, but it could be better. The one failure that's a very low-hanging fruit (at least from my view) is language identification. Indeed, I'm including code below to identify a given message's language and give it a confidence:

#!/usr/bin/env perl 
use warnings;
use strict;
use diagnostics;
use Lingua::Identify qw/:language_identification/;
use Mail::POP3Client;

my $pop = new Mail::POP3Client(USER => "$USER",
                               PASSWORD => "$PASSWORD",
                               HOST => 'pop.gmail.com',
                               PORT => 995,
                               USESSL => 'true'
                               );
$pop->Connect;
my $count = $pop->Count;
my $debug = 1;
for (my $counter = 1; $counter != $count; $counter++) {
    my ($language, $prob);
    while (my $text = $pop->Body($counter)) {
        print $text if defined $debug;
        ($language, $prob) = langof($text);
    }    
    print "Message $counter is $language, $prob probability.\n";
        
}
$pop->Close;

So, basically, you analyse the sent mail as a control group to determine which languages the user knows. Then you store these languages and anything that doesn't match these can be assumed to be spam. Then you just apply the standard bayesian filter.