Wednesday, July 16, 2008

Re: [HACKERS] [GENERAL] Fragments in tsearch2 headline

On Wed, 16 Jul 2008, Sushant Sinha wrote:

> I will add test queries and their results for the corner cases in a
> separate file. I guess the only thing I am confused about is what should
> be the behavior of headline generation when Query items have words of
> size less than ShortWord. I guess the answer is to ignore ShortWord
> parameter but let me know if the answer is any different.
>

ShortWord is about headline text, it doesn't affects words in query,
so you can't discard them from query.

> -Sushant.
>
> On Thu, 2008-07-17 at 02:53 +0400, Oleg Bartunov wrote:
>> Sushant,
>>
>> first, please, provide simple test queries, which demonstrate the right work
>> in the corner cases. This will helps reviewers to test your patch and
>> helps you to make sure your new version is ok. For example:
>>
>> =# select ts_headline('1 2 3 4 5 1 2 3 1','1&3'::tsquery);
>> ts_headline
>> ------------------------------------------------------
>> <b>1</b> 2 <b>3</b> 4 5 <b>1</b> 2 <b>3</b> <b>1</b>
>>
>> This select breaks your code:
>>
>> =# select ts_headline('1 2 3 4 5 1 2 3 1','1&3'::tsquery,'maxfragments=2');
>> ts_headline
>> --------------
>> ... 2 ...
>>
>> and so on ....
>>
>>
>> Oleg
>> On Tue, 15 Jul 2008, Sushant Sinha wrote:
>>
>>> Attached a new patch that:
>>>
>>> 1. fixes previous bug
>>> 2. better handles the case when cover size is greater than the MaxWords.
>>> Basically it divides a cover greater than MaxWords into fragments of
>>> MaxWords, resizes each such fragment so that each end of the fragment
>>> contains a query word and then evaluates best fragments based on number of
>>> query words in each fragment. In case of tie it picks up the smaller
>>> fragment. This allows more query words to be shown with multiple fragments
>>> in case a single cover is larger than the MaxWords.
>>>
>>> The resizing of a fragment such that each end is a query word provides room
>>> for stretching both sides of the fragment. This (hopefully) better presents
>>> the context in which query words appear in the document. If a cover is
>>> smaller than MaxWords then the cover is treated as a fragment.
>>>
>>> Let me know if you have any more suggestions or anything is not clear.
>>>
>>> I have not yet added the regression tests. The regression test suite seemed
>>> to be only ensuring that the function works. How many tests should I be
>>> adding? Is there any other place that I need to add different test cases for
>>> the function?
>>>
>>> -Sushant.
>>>
>>>
>>> Nice. But it will be good to resolve following issues:
>>>> 1) Patch contains mistakes, I didn't investigate or carefully read it. Get
>>>> http://www.sai.msu.su/~megera/postgres/fts/apod.dump.gz<http://www.sai.msu.su/%7Emegera/postgres/fts/apod.dump.gz>and load in db.
>>>>
>>>> Queries
>>>> # select ts_headline(body, plainto_tsquery('black hole'), 'MaxFragments=1')
>>>> from apod where to_tsvector(body) @@ plainto_tsquery('black hole');
>>>>
>>>> and
>>>>
>>>> # select ts_headline(body, plainto_tsquery('black hole'), 'MaxFragments=1')
>>>> from apod;
>>>>
>>>> crash postgresql :(
>>>>
>>>> 2) pls, include in your patch documentation and regression tests.
>>>>
>>>>
>>>>> Another change that I was thinking:
>>>>>
>>>>> Right now if cover size > max_words then I just cut the trailing words.
>>>>> Instead I was thinking that we should split the cover into more
>>>>> fragments such that each fragment contains a few query words. Then each
>>>>> fragment will not contain all query words but will show more occurrences
>>>>> of query words in the headline. I would like to know what your opinion
>>>>> on this is.
>>>>>
>>>>
>>>> Agreed.
>>>>
>>>>
>>>> --
>>>> Teodor Sigaev E-mail: teodor@sigaev.ru
>>>> WWW:
>>>> http://www.sigaev.ru/
>>>>
>>>
>>
>> Regards,
>> Oleg
>> _____________________________________________________________
>> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
>> Sternberg Astronomical Institute, Moscow University, Russia
>> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
>> phone: +007(495)939-16-83, +007(495)939-23-83
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

No comments: