Tuesday, July 22, 2008

[GENERAL] Using ISpell dictionary - headaches...

Hi everybody.

Well... I have a problem when trying to install and use an ISpell dictionary (the Thai one to be more precise) with the tsearch feature.

What I am trying to do

I have a table containing a "title" field, and I want to fill a "vector" field with the following command:
UPDATE thai_table SET vectors = to_tsvector('thai_utf8', coalesce(title,''));

How I installed the Thai dictionary

I installed the "th_TH.dic" and the "th_TH.aff" files (downloaded from http://wiki.services.openoffice.org/wiki/Dictionaries) in a "/usr/local/share/dicts/ispell/" folder, and I executed the following commands:

SET search_path = public;
BEGIN;

INSERT INTO pg_ts_dict (dict_name, dict_init, dict_initoption, dict_lexize, dict_comment)
VALUES (
        'th_spell_utf8',
        'spell_init(internal)',
        'DictFile="/usr/local/share/dicts/ispell/th_TH.dic",AffFile="/usr/local/share/dicts/ispell/th_TH.aff"',
        'spell_lexize(internal,internal,integer)',
        'Thai ISpell dict utf8 encoding'
    );

INSERT INTO pg_ts_cfg (ts_name, prs_name, locale) VALUES ('thai_utf8', 'default', 'th_TH.utf8');

INSERT INTO pg_ts_cfgmap (ts_name, tok_alias, dict_name) VALUES ('thai_utf8', 'email', '{simple}');
INSERT INTO pg_ts_cfgmap (ts_name, tok_alias, dict_name) VALUES ('thai_utf8', 'url', '{simple}');
INSERT INTO pg_ts_cfgmap (ts_name, tok_alias, dict_name) VALUES ('thai_utf8', 'host', '{simple}');
INSERT INTO pg_ts_cfgmap (ts_name, tok_alias, dict_name) VALUES ('thai_utf8', 'sfloat', '{simple}');
INSERT INTO pg_ts_cfgmap (ts_name, tok_alias, dict_name) VALUES ('thai_utf8', 'version', '{simple}');
INSERT INTO pg_ts_cfgmap (ts_name, tok_alias, dict_name) VALUES ('thai_utf8', 'uri', '{simple}');
INSERT INTO pg_ts_cfgmap (ts_name, tok_alias, dict_name) VALUES ('thai_utf8', 'file', '{simple}');
INSERT INTO pg_ts_cfgmap (ts_name, tok_alias, dict_name) VALUES ('thai_utf8', 'float', '{simple}');
INSERT INTO pg_ts_cfgmap (ts_name, tok_alias, dict_name) VALUES ('thai_utf8', 'int', '{simple}');
INSERT INTO pg_ts_cfgmap (ts_name, tok_alias, dict_name) VALUES ('thai_utf8', 'uint', '{simple}');
INSERT INTO pg_ts_cfgmap (ts_name, tok_alias, dict_name) VALUES ('thai_utf8', 'lword', '{th_spell_utf8,simple}');
INSERT INTO pg_ts_cfgmap (ts_name, tok_alias, dict_name) VALUES ('thai_utf8', 'nlword', '{th_spell_utf8,simple}');
INSERT INTO pg_ts_cfgmap (ts_name, tok_alias, dict_name) VALUES ('thai_utf8', 'word', '{th_spell_utf8,simple}');
INSERT INTO pg_ts_cfgmap (ts_name, tok_alias, dict_name) VALUES ('thai_utf8', 'part_hword', '{th_spell_utf8,simple}');
INSERT INTO pg_ts_cfgmap (ts_name, tok_alias, dict_name) VALUES ('thai_utf8', 'nlpart_hword', '{th_spell_utf8,simple}');
INSERT INTO pg_ts_cfgmap (ts_name, tok_alias, dict_name) VALUES ('thai_utf8', 'lpart_hword', '{th_spell_utf8,simple}');

COMMIT;


What my problem is

The problem is that, when i execute the request to fill my "vectors" field, psql crashes...

la connexion au serveur a été coupée à l'improviste
        Le serveur s'est peut-être arrêté anormalement
        avant ou durant le traitement de la requête.
La connexion au serveur a été perdue. Tentative de réinitialisation: Echec.
!>


(it means: the connection with the server has been cut unexpectedly. The server may have stop abnormaly before or during the request handling. The connection with the server has been lost. Trying to reinitialization: Failed)

I have no idea on what may cause that, nor what I could look for to find idea on how to solve that.

It *may* be because I'm using psql 8.0.3 and not the latest version (but I'm stucked with that version), i'm just hoping that one of you have met similar problem and have successfully solved it, or maybe if you know a site where an Ispell dictionary installation is detailed step by step so that I can check if I did something wrong somewhere...

Many thanks for your attention,
Daniel Chiaramello

No comments: