Benefits of a primary key with GoldenGate

GoldenGate supports tables with and without a primary key, but if you’re replicating tables that don’t have a primary key there are some important factors you should consider.

Take my sample table, created as follows.

create table size_test (idno number, data1 varchar2(50), data2 varchar(4000), data3 varchar2(50), data4 varchar2(30));
alter table size_test add constraint pk_size_test primary key (idno);

This table perhaps overemphasises the problem because it has a lot of character columns (and in particular one large one) but this is often the case with real world data.

I populated it with 200,000 rows as follows –

create sequence seq_sizetest;

begin
for counter in 1..200000
loop
insert into size_test values (
seq_sizetest.nextval,
dbms_random.string('A',dbms_random.value(0,50)),
dbms_random.string('A',dbms_random.value(0,4000)),
dbms_random.string('A',dbms_random.value(0,50)),
dbms_random.string('A',dbms_random.value(0,30)));
end loop;
commit;
end;
/

The table ends up being 560MB in size, and the primary key is 4MB in size.

Trail File Size

First of all, I ran 1000 updates using the primary key as follows (minimum and maximum key values taken from my data beforehand).

begin
for counter in 1..1000
loop
update size_test set data3 = dbms_random.string('A',40) where idno = trunc(dbms_random.value(1514,201514));
end loop;
end;

This completes quickly on the source and generates 152KB of trail files.

Now I drop the primary key and re-create the supplemental log groups (GoldenGate is now forced to log every column and reports OGG-00869 when running the add trandata command). I also stop and start the extract so that we create a new trail file. We run another 1000 row update.

update size_test set data3 = dbms_random.string('A',40) where rownum < 1001;

Again it runs quite quickly on the source (I’ve used a different update operation to avoid 1000 full table scans). This time the trail file is 4276KB in size (nearly 30 times the size as when the primary key was in place).

None of this should be a great surprise. GoldenGate has to log every column if a primary key isn’t present. Clearly that is going to take far more space in the trail file.

Performance on target database

Another concern is how the data is applied on the target. Our non-PK source update ran quite quickly – it was a set based operation on the top 1000 rows of the table and just took a second or two to run. On the target though, it’ll be a different matter entirely.

Replicat is going to run 1000 separate update statements, something like this.

update size_test set data3 = :record.data3 where idno = :record.idno and data1 = record.data1 and data2 = :record.data2 and data3 = :record.data3 and data4 = record.data4;

There is no index on these predicates so it is going to cause Oracle to perform 1000 full table scans on our 560MB table. Not good. The lag will increase, the load on the target database will clearly increase and people will wonder why their data isn’t replicating in a timely manner.

Solutions

So to fix this situation you could create an all-column index on the target database. That way when Replicat runs the update statement with every column in the predicates, it at least has an index to run through. This is probably not a particuarly desirable solution though. The index will probably be as big as the table itself (568MB for the index in my case vs the 560MB table size) and it is going to take a long time to build.

Another solution is to put a primary (or at least unique key) index in place. MOS article 1271578.1 explains how to create a column on the source which will be populated by an Oracle-generated unique value. You can configure it to have a default value so that you don’t need to change your application, and you can back-populate the data without locking the entire table.

Yet another solution is to tell GoldenGate about a business rule which performs as if it were a primary key. Say for example we know that IDNO is unique but we’re not allowed to create a primary or unique key index on it. In this case we can add the supplemental logging as follows.

GGSCI> add trandata test.size_test nokey cols idno

This tells GoldenGate we don’t want it to use the primary key to calculate which columns to include (because we know it’ll chose to use all columns) and we’re telling it that we only want to use the IDNO column. This will create a supplemental log group with just the IDNO in it. Next we configure the extract to tell it about this business-rule-key.

TABLE TEST.SIZE_TEST KEYCOLS (IDNO);
TABLE TEST.*;

We’ll want to put something similar in for Replicat in the MAP clause, but we’re still going to cause a full table scan for each update on the target unless we can create an index on this column. Perhaps your business is more relaxed about the modifications which can be made on the target database?

Summary

So if you’re replicating tables which don’t have a primary key, you need to take into account the following.

– Trail file (and redo) size. You’re logging every column each time an update runs in an uncompressed format.
– Replicat performance. Every update on the target is going to generate a full table scan.

Advertisements
Post a comment or leave a trackback: Trackback URL.

Comments

  • Paul Steffensen  On November 19, 2013 at 12:42 AM

    Assuming the test.size_test table had no primary key or unique indexes then in your extract parameter file if you had:

    TABLE test.size_test, KEYCOLS(idno);

    Then according to the manual:

    “A KEYCOLS clause is checked when processing starts and prevents ADD TRANDATA from logging all of the columns of the table when it determines there is no primary or unique key.”

    I’m not sure how it would do this assuming GoldenGate has already logged all columns in the redo log as a result of ADD TRANDATA test.size_test command.
    Maybe the extract process just takes columns listed in the KEYCOLS. That would reduce the size of the trail file but the redo logs would still be large.

    If I have time I’ll try and play with this parameter combination.
    Great blog by the way.

    • matthewdba  On November 19, 2013 at 7:54 AM

      Hi Paul,

      I see the paragraph you’re referencing in the 11.1.1.1 documentation but it isn’t clear to me what they mean exactly or how it would work in practice. The KEYCOLS clause in the extract parameter file isn’t evaluated at the time you’re adding supplemental log groups via the add trandata command.

      I see this paragraph has been removed in subsequent GoldenGate documentation, so I assume they realised it was slightly confusing.

      Using “add trandata .

      nokey cols (col1,col2, colN)” in conjunction with the extract parameter keycols does the job nicely though.

      Thanks for the comments and for reading!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: