GoldenGate – bounded recovery and log retention

GoldenGate Extract includes a feature called Bounded Recovery, but this doesn’t mean you don’t need to keep a certain amount of historical archive logs on disk. The GoldenGate documentation recommends keeping 8 hours of archive logs on disk, and this article explains why that is important.

I had a situation recently where a customer was stopping GoldenGate extract in the evening for some business processing to take place. It was started about an hour later, but Extract started looking for archive logs which were 9 hours old. These archive logs had been backed up and deleted by then, and the question was naturally raised as to why GoldenGate needs such old archive logs.

GoldenGate only writes committed transactions to it’s trail files, but this doesn’t mean it isn’t working with your uncommitted transactions in the mean time. It is constantly reading ahead in the redo for the uncommitted records, caching them in various temporary files on disk so that it can simply merge them into the trail file once they’re committed. This is fine but when Extract is stopped prior to it being able to merge a committed transaction into the trail file, Bounded Recovery steps in to speed up the time taken to pick up where it left off.

A simple explanation of bounded recovery is that it records the state of long running (uncommitted) transactions in it’s own state files, so that it isn’t necessary to read back through hours of archive logs in the event that the Extract process is stopped. By default, it does this every 4 hours (governed by the BR checkpoint interval) and a transaction is also considered long running if it is older than this 4 hour period.

It is important to understand though that as far as bounded recovery goes, a transaction doesn’t become long running as soon as it is older than 4 hours. It is only considered long running when it is older than 4 hours at the next BR checkpoint. Therefore, in a worst case scenario it might be 3h 59m 59s old at the first BR checkpoint and therefore 7h 59m 59s old at the second checkpoint (and only then recorded). This is why Oracle say you should keep 8 hours of archive logs on disk.

Take the following example –

A long running transaction started at 12:49 and the next BR checkpoint was at 16:18 at which time the transaction was 3h28m old. Therefore it wasn’t saved in the BR state files. The Extract process was then stopped at 20:08 and restarted around an hour later. As the 12:49 transaction was never saved into the BR state files, Extract requires logs which are over 8 hours old.

If the customer had waited another 10 minutes to stop the Extract process, they would have been fine and only required 4 hours of archive logs to be on disk!

There are actually a few possible solutions to this problem –

1. Ensure that at least 8 hours of archive logs are available on disk. This is the ideal solution as it requires no configuration changes. If bounded recovery fails, you’re potentially going to need older logs (see info extract showch) in which case you would have to restore them from backups – this is unavoidable.

2. Reduce the BR checkpoint interval. Not recommended, and Oracle recommends opening a support ticket before making this change. There is some overhead in making a BR checkpoint interval (and much of the work is re-processed for long running transactions each time this checkpoint runs). It is a high price to pay to just save on some disk space.

3. Issue a manual “immediate” checkpoint prior to shutting down the Extract process. This can potentially half the amount of archive logs you need to retain.

The best advice is to not change anything, and just follow the simplified advice of keeping 8 hours + of archive logs on disk.

Advertisements
Post a comment or leave a trackback: Trackback URL.

Comments

  • Paul Steffensen  On May 27, 2014 at 10:22 PM

    Good post Matt,
    The key point, as you’ve said, a transaction only becomes “long running” if it’s older than the BR interval at the next BR checkpoint. Hence the recommendation to keep 2 x BR interval of archive logs available.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: