Wednesday, June 18, 2008

[GENERAL] Database design questions

Hi list.

There are some database design-related issues I've pondered about for some time.

But first:

* Is this the correct list to ask these questions on?

* Should I split this into separate threads instead of 1 thread for
all my questions?

Assuming there isn't a problem, here are my questions:

==========

Question 1: Storing app defaults.

If you have a table like this:

table1
- id
- field1
- field2
- field3

table2
- id
- table1_id
- field1
- field2
- field3

table1 & table2 are setup as 1-to-many.

If I want to start providing user-customizable defaults to the
database (ie, we don't want apps to update database schema), is it ok
database design to add a table2 record, with a NULL table1_id field?

In other words, if table1 has no matching table2 record, then the app
will use the table2 record with a NULL table1_id field to get
defaults.

This looks messy however. Is there a better way to do it?

A few other ways I can think of:

1) Have an extra table1 record (with string fields containing
'DEFAULT'), against which the extra table2 record is linked.

2) Have a new table, just for defaults, like this:

table2_defaults
- field1
- field2
- field3

Which is the cleanest way? Is there another method I should use instead?

==========

Question 2: Backwards-compatible field addition

If you have an existing table, and apps which use it, then how do you
add new fields to the table (for new apps), but which might affect
existing apps negatively?

eg: I start with a table like this:

table1
- id
- field1
- field2
- field3

Later, I want to add a use case, where there is new behaviour, if a
new field is set in the table, like this:

table1
- id
- field1
- field2
- field3
- field4 - NEW - if unset, do old behaviour. if set, do something else

The problem is, that existing apps (besides your new app) won't know
about field4, so they will keep using the old behaviour for new
records (where field4 is set), which you don't want.

The most obvious thing to do is to update all apps using table1, so
they also check the value of field4.

Is there another, more backwards-compatible way to add field4 for the
new behaviour, without having to update all the apps?

A few things I can think of:

1) table1 becomes a view of an updated table, with a 'WHERE field4 IS
NULL' clause.

Problem with this is that some RDBMS (Postgresql specifically) don't
let you run update statements on views.

2) Apps use stored procedures for all database access.

Maybe ok for new apps, not so much for existing apps which use regular SQL.

3) All apps use the same library for accessing database

Then you update the library and all apps automagically know about the
extra field. Again, maybe ok for new apps, not so much for existing
apps.

4) Make a new table (copy of the old one), with the extra field.

Then your app checks both tables, or just the new one if applicable.

This can work, but you may end up with a lot of app-specific tables,
where the main difference between the tables is extra columns, and
which apps use the tables.

5) Have a 'db version' column in the table. Older apps only operate on
records at or before the version the programmer knew about at the
time.

This can work, but it seems like a very non-standard, hackish way of
designing database tables. Also it's a pain for all apps to have to
hardcode a db version number.

6) Find a clever way to use table inheritance

I haven't thought it through, but here are some docs I've read on the subject:

http://www.postgresql.org/docs/8.1/static/ddl-inherit.html

Any other ideas?

==========

Question 3: Temporal databases

http://en.wikipedia.org/wiki/Temporal_database

I haven't used them before, but I like the idea of never
deleting/updating records so you have a complete history (a bit like
source code version control).

How well do temporal databases work? Do RDBMS (ie Postgresql) need
add-ons to make it effective, or can you just add extra temporal
columns to all your tables and add them to your app queries? Does this
increase app complexity and increase server load a lot?

Are there Python libraries which simplify this? (eg: add-ons for
Elixir or SQLAlchemy).

Or should apps all implement their own 'temporal data access' module,
which transparently uses the current date & time until queried for
historical data?

==========

Question 4: Data synchronization

2 cases I'm interested in:

1) Migrating data from one database to another

2) Distributing data over many databases, and later merging

In what ways can you design tables to easier facilitate the above cases?

I am aware of multi-master replication software, as described here:

http://en.wikipedia.org/wiki/Multi-master_replication

For this question, I'm more interested in schema design, so that a
home-brewed database synchronization can perform synchronization.

I have some experience with this. One of my previous projects was to
reliably migrate data from one database to another, where the 2
databases had problems like:

- Many foreign keys weren't enforced

- Some fields needed special treatment (eg: should be unique, or
behave like a foreign key ref, even if db schema doesn't specify it.
In other cases they need to be updated during the migration).

- Most auto-incrementing primary keys (and related foreign key
references) needed to be updated during migration, because they are
already used in the destination database for other records.

- Many tables are undocumented, some fields have an unknown purpose

- Some tables didn't have fields that can be used as a 'natural' key
for the purpose of migration (eg: tables which only exist to link
together other tables, or tables where there are duplicate records).

I wrote a Python script (using SQLAlchemy and Elixir) to do the above
for our databases.

Are there any existing migration tools which could have helped with
the above? (it would have required a *lot* of user help).

Are there recommended ways of designing tables so that synchronization
is easier?

The main thing I've read about is ensuring that all records have a
natural key of some kind, eg GUID. Also, your migration app needs to
have rules for conflict resolution.

==========

Thanks in advance for any suggestions :-)

David.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

No comments: