Saturday, May 17, 2008

[GENERAL] tsearch2 on-demand dictionary loading & using functions in tsearch2

Hello,

I'd like to ask about two separate things regarding tsearch2 in
PostgreSQL 8.3.

Firstly, I've noticed that dictionary is loaded on-demand specifically
for each session, and apparently this behavior cannot be changed in any way.

If that's the case, would it be reasonable to ask for an option to allow
loading during Postgres startup, rather than during the first usage of
the dictionary in each distinctive session?

I am currently working with ispell dictionaries for multiple languages,
each being approx. 3MB large. With a lookup within a single dictionary,
the first ts_lexize takes over one second, which from user's point of
view is quite a long time.

I see several benefits of the suggested approach:
* For those who do not use persistent connections of any sort, using
ispell dictionaries right now inflicts a severe blow in application
responsiveness. Loading the dictionaries during database startup instead
would speed things up significantly.
* Considering the database is loaded separately for each session, does
this also imply that each running backend has a separate dictionary
stored in memory? If that is the case, using eg. 2 dictionaries, each
3MB large, on a database server with 20 backends running would eat up as
much as 120MB of RAM, while if the server loaded the dictionaries
beforehand, the OS could (possibly) keep the dictionaries shared in memory.

As for downsides, I only really see two:
* Tracking updates of dictionaries - but it's reasonable to believe
that new connections get open more often than the dictionary gets
updated. Also, this might be easily solved by stat()-ing the dictionary
file before starting up session, and only have the server reload it if
there's a notified change.
* Possibly complicated to implement?

As for my second question, is it possible to use functions in tsearch2?
For example, writing my own stemmer in PL/pgSQL or in C as a postgres
function.

Thanks in advance for reply,
Steve

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

No comments: