Wednesday, July 16, 2008

Re: [HACKERS] [GENERAL] Fragments in tsearch2 headline

I will add test queries and their results for the corner cases in a
separate file. I guess the only thing I am confused about is what should
be the behavior of headline generation when Query items have words of
size less than ShortWord. I guess the answer is to ignore ShortWord
parameter but let me know if the answer is any different.

-Sushant.

On Thu, 2008-07-17 at 02:53 +0400, Oleg Bartunov wrote:
> Sushant,
>
> first, please, provide simple test queries, which demonstrate the right work
> in the corner cases. This will helps reviewers to test your patch and
> helps you to make sure your new version is ok. For example:
>
> =# select ts_headline('1 2 3 4 5 1 2 3 1','1&3'::tsquery);
> ts_headline
> ------------------------------------------------------
> <b>1</b> 2 <b>3</b> 4 5 <b>1</b> 2 <b>3</b> <b>1</b>
>
> This select breaks your code:
>
> =# select ts_headline('1 2 3 4 5 1 2 3 1','1&3'::tsquery,'maxfragments=2');
> ts_headline
> --------------
> ... 2 ...
>
> and so on ....
>
>
> Oleg
> On Tue, 15 Jul 2008, Sushant Sinha wrote:
>
> > Attached a new patch that:
> >
> > 1. fixes previous bug
> > 2. better handles the case when cover size is greater than the MaxWords.
> > Basically it divides a cover greater than MaxWords into fragments of
> > MaxWords, resizes each such fragment so that each end of the fragment
> > contains a query word and then evaluates best fragments based on number of
> > query words in each fragment. In case of tie it picks up the smaller
> > fragment. This allows more query words to be shown with multiple fragments
> > in case a single cover is larger than the MaxWords.
> >
> > The resizing of a fragment such that each end is a query word provides room
> > for stretching both sides of the fragment. This (hopefully) better presents
> > the context in which query words appear in the document. If a cover is
> > smaller than MaxWords then the cover is treated as a fragment.
> >
> > Let me know if you have any more suggestions or anything is not clear.
> >
> > I have not yet added the regression tests. The regression test suite seemed
> > to be only ensuring that the function works. How many tests should I be
> > adding? Is there any other place that I need to add different test cases for
> > the function?
> >
> > -Sushant.
> >
> >
> > Nice. But it will be good to resolve following issues:
> >> 1) Patch contains mistakes, I didn't investigate or carefully read it. Get
> >> http://www.sai.msu.su/~megera/postgres/fts/apod.dump.gz<http://www.sai.msu.su/%7Emegera/postgres/fts/apod.dump.gz>and load in db.
> >>
> >> Queries
> >> # select ts_headline(body, plainto_tsquery('black hole'), 'MaxFragments=1')
> >> from apod where to_tsvector(body) @@ plainto_tsquery('black hole');
> >>
> >> and
> >>
> >> # select ts_headline(body, plainto_tsquery('black hole'), 'MaxFragments=1')
> >> from apod;
> >>
> >> crash postgresql :(
> >>
> >> 2) pls, include in your patch documentation and regression tests.
> >>
> >>
> >>> Another change that I was thinking:
> >>>
> >>> Right now if cover size > max_words then I just cut the trailing words.
> >>> Instead I was thinking that we should split the cover into more
> >>> fragments such that each fragment contains a few query words. Then each
> >>> fragment will not contain all query words but will show more occurrences
> >>> of query words in the headline. I would like to know what your opinion
> >>> on this is.
> >>>
> >>
> >> Agreed.
> >>
> >>
> >> --
> >> Teodor Sigaev E-mail: teodor@sigaev.ru
> >> WWW:
> >> http://www.sigaev.ru/
> >>
> >
>
> Regards,
> Oleg
> _____________________________________________________________
> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
> Sternberg Astronomical Institute, Moscow University, Russia
> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
> phone: +007(495)939-16-83, +007(495)939-23-83


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

No comments: