Wednesday, June 18, 2008

Re: [GENERAL] Problem with volatile function

Artacus wrote:
> So my understanding of volatile functions is that volatile functions can
> return different results given the same input.
>
> I have a function random(int, int) that returns a random value between
> $1 and $2. I want to use it in a query to generate values. But it only
> evaluates once per query and not once per row like I need it to.
>
> -- This always returns the same value
> SELECT ts.sis_id, bldg_id, f_name.name, l_name.name
> FROM tmp_students ts
> JOIN names AS f_name ON
> ts.gender = f_name.gender
> WHERE f_name.counter = random(1,300)

Personally I'd expect that to only evaluate once. It's saying "where
f_name.counter in this row is equal to some single random value
generated at the start of the query". The parameters of the random()
function do not depend on the input, so Pg evaluates it once rather than
with each row. That's not to do with volatility; rather, it's the
semantics of your query.

It's like the difference between correlated and uncorrelated subqueries.
A subquery that doesn't reference outside context is only evaluated
once, ie it's uncorrelated. Only if the subquery contains references to
values in rows in the outside query is it evaluated once for each row.

If Pg re-evaluated your random() function for every record, how would
you then write "fetch all records that have f_name.counter equal to the
same randomly selected value" ?

You can force Pg to re-evaluate random() by adding a dummy parameter
that depends on the input record, or (probably better) by writing a
variant of it that tests the input against a randomly generated value
and returns a boolean. Eg:

SELECT ts.sis_id, bldg_id, f_name.name, l_name.name
FROM tmp_students ts
JOIN names AS f_name ON
ts.gender = f_name.gender
WHERE random_equals(1,300,f_name.counter)


random_equals would be called once per input record. It would generate a
random number between 1 and 300, and test the result to see if it was
equal to the argument value f_name.counter, then return the result of
the test.

Alternately, you could pair each result up with a random value and
re-check the results with a subquery (slower):

SELECT sis_id, bldg_id, fname, lname
FROM
(SELECT
ts.sis_id,
bldg_id,
f_name.name AS fname,
l_name.name AS lname,
f_name.counter AS counter,
random(1,300) AS r
FROM tmp_students ts
JOIN names AS f_name ON
ts.gender = f_name.gender
)
AS x
WHERE r = counter;

... but that's pretty ugly.

> --As does this
> SELECT ts.sis_id, bldg_id, f_name.name, l_name.name
> FROM tmp_students ts
> JOIN names AS f_name ON
> ts.gender = f_name.gender
> AND ts.counter = random(1,100)

Same deal as above.

>
> -- This generates different numbers
> SELECT random(1,100), s.*
> FROM usr_students s

... because you've used random() in the result list. That's where
VOLATILE and STABLE etc will make a difference.

--
Craig Ringer

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

No comments: