Saturday, July 19, 2008

Re: [HACKERS] phrase search

Sushant,

the problem of phrase search not in implementation, but in the theoretical
basis. tsearch is query rich and phrase search should support all query
operations, so we need algebra for query operations. We need more time
to investigate this problem, but just have no spare time for this.
If you are interesting, you might think in this direction.

Oleg

On Sat, 19 Jul 2008, Sushant Sinha wrote:

> I looked at query operators for tsquery and here are some of the new
> query operators for position based queries. I am just proposing some
> changes and the questions I have.
>
> 1. What is the meaning of such a query operator?
>
> foo #5 bar -> true if the document has word "foo" followed by "bar" at
> 5th position.
>
> foo #<5 bar -> true if document has word "foo" followed by "bar" with in
> 5 positions
>
> foo #>5 bar -> true if document has word "foo" followed by "bar" after 5
> positions
>
> then some other ways it can be used are
> !(foo #<5 bar) -> true if document never has any "foo" followed by bar
> with in 5 positions.
>
> etc .....
>
> 2. How to implement such query operators?
>
> Should we modify QueryItem to include additional distance information or
> is there any other way to accomplish it?
>
> Is the following list sufficient to accomplish this?
> a. Modify to_tsquery
> b. Modify TS_execute in tsvector_op.c to check new operator
>
> Is there anything needed in rewrite subsystem?
>
> 3. Are these valid uses of the operators and if yes what would they
> mean?
>
> foo #5 (bar & cup)
>
> If no then should the operator be applied to only two QI_VAL's?
>
> 4. If the operator only applies to two query items can we create an
> index such that (foo, bar)-> documents[min distance, max distance]
> How difficult it is to implement an index like this?
>
>
> Thanks,
> -Sushant.
>
> On Thu, 2008-06-05 at 19:37 +0400, Teodor Sigaev wrote:
>>> I can add index support and support for arbitrary distance between
>>> lexeme.
>>> It appears to me that supporting arbitrary boolean expression will be
>>> complicated. Can we pull out something from TSQuery?
>>
>> I don't very like an idea to have separated interface for phrase search. Your
>> patch may be a module and used by people who really wants to have a phrase search.
>>
>> Introducing new operator in tsquery allows to use already existing
>> infrastructure of tsquery such as concatenations (&&, ||, !!), rewrite subsystem
>> etc. But new operation/types specially designed for phrase search makes needing
>> to make that work again.
>>
>
>
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

No comments: