Wednesday, September 3, 2008

[SQL] Case-insensitive string prefix matching with parameterized query

Apologies if this posts twice... I've run into issues with the
listserv lately.

I am implementing an autosuggest-style text input for my site, where a
user can start typing the name of a thing (call it a 'Foo'), and get a
list of all things whose name starts with the string the user typed.
For example, if the user types 'car', the database might return the
names 'Car', 'Caramel', 'Carbon', etc. I want the search to be case-
insensitive.

Just to have some code, here's some bare-bones info:

CREATE TABLE foo (
id SERIAL PRIMARY KEY,
name VARCHAR NOT NULL UNIQUE,
other_info VARCHAR
);

Initially, I just did a naive match using ilike:

SELECT name FROM foo WHERE name ilike 'car%';

This worked alright when the table was small, but now my foo table has
millions of rows, and the query takes far too long. So I created a
new index on name:

CREATE INDEX lower_name_idx ON foo (lower(name));

Now I can use the query:

SELECT name FROM foo WHERE lower(name) like 'car%';

which runs in 2ms, which is exactly what I want.

My problem comes when I execute this query from my application. I'm
using Hibernate to generate the query, the code for which looks like
this:

return session()
.createCriteria( Foo.class )
.add( Restrictions.like( "name", queryString,
MatchMode.START ).ignoreCase() )
.list();

This generates the proper query... almost. It creates a parameterized
query like this:

SELECT id, name, other_info FROM foo WHERE lower(name) like $1;

where "$1" gets substituted with (for example) 'car%'. This query
runs very slowly and doesn't use the index. I think what's happening
is that when the parameterized query gets created, the planner doesn't
know that it's going to be matching at the beginning of the string
(indicated by the trailing percent character), and so it doesn't use
the index. That information comes into play when the parameter gets
substituted, but by then the plan has been set.

I can get it to work if I directly create an HQL query like this:
String hql = "from Foo where lower(name) like '" +
queryString.toLowerCase() + "%'";

I could also generate a SQL query that does the same thing; the fact
that I used HQL for this example is incidental. The point is that the
query isn't parameterized, so it generates the plan I want. This
works fine, but I don't like doing string manipulations like this.
Plus I have to now take care to escape the string to avoid injection
attacks.

Is there any SQL or PostgreSQL string function I could use that would
allow me to have a parameterized query that uses my index?
Alternatively, is there another index definition that would work? Is
there any Hibernate-functionality I've overlooked that would do what I
want without having to do string manipulations? Is there something
simple I've missed?

Also, if string manipulation is the best way to do this, what all
would I have to make sure I've escaped? Quotes, single quotes, and
backslashes, but anything else I'm overlooking?

Thanks for any suggestions,
Chris

--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

No comments: