Friday, May 23, 2008

Re: [GENERAL] tsearch2 on-demand dictionary loading & using functions in tsearch2

Hello,

We definitely came across this issue recently. When new postgres
backend is started it uses ~3MB of the memory accordingly to pmap.
When one runs within this backend several typical queries that our
application generates its consumed memory increases to 5-8MB which is
not critical for us. But when one hits some FTS function with the
token that requires ispell dictionaries to be loaded we instantly get
26MB of consumed memory in this backend.

Having 50 backends behind pgbouncer all of them containing ~20MB
redundant FTS data is a serious penalty on some hardware since during
the peak load kernel invalidates huge parts of its disk cache with
actually 'hot' data to allocate more RAM for postgres backends and
we've got huge iowait as a result.

We definitely observe this scenario on one of the servers now and
ability to save so much RAM by putting some FTS data in shared memory
would help here. We alter dictionaries once per couple of months and
would endure even postgres restart after such a change.

--
Regards,
Ivan


>> This is probably a stupid question, but: with PostgreSQL's use of
>> shared memory, is it possible to load dictionaries into a small
>> reserved shm area when the first backend starts, then use the
>> preloaded copy in subsequent backends?
>>
>> That way the postmaster doesn't have to do any risky work.
>>
>> Anything that reduces backend startup costs and per-backend unshared
>> memory would have to be a good thing.
>>
>> I've found it useful in the past to share resources with an mmap()ped
>> file, too, especially if I want write protection from some or all
>> processes. If the postmaster forked a process to generate the
>> mmap()able compiled dictionary files on startup then it'd be pretty
>> safe from any misbehaviour of the dictionary compiling process.
>>
>> Then again, I can't say I've personally noticed the cost of loading
>> tsearch2 dictionaries.
>
> So the dictionary will be parsed on the first usage by the given backend,
> and from that moment on, all running backends and all backends that will be
> spawned afterwards will have access to the parsed dictionary structures
> thanks to the shm?
>
> That seems to solve all issues - speed, memory and updating. Would this be a
> way to go? Obviously, it might boil down to "write a patch", but if someone
> actually wrote a patch, would this approach be acceptable?

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

No comments: