Using checkpoint table is the preferred approach.To avoid specifying checkpoint table name each time creating a new replicat, you can set the checkpoint table info in the ./GLOBALS file. (i.e. CHECKPOINTTABLE test.checkpoint_tbl)
Checkpoints are used in all of the GoldenGate processes [3]:
- Extract (Integrated): checkpoints are kept in both the source database and the checkpoint file (*.cpe file). The checkpoint files are used with higher priority. There are three types of checkpoints: Recovery Checkpoint showing the start of the oldest unprocessed (open) transaction, Read Checkpoint (or Current Checkpoint) indicating the current read position related to the database log file, and Write Checkpoint showing the last transaction position that is saved into the trail file.
- Pump: checkpoints are kept in the checkpoint file. There are two types of checkpoints: Read Checkpoint showing the current read position from the trail file, and Write Check showing the last transaction that are saved to the trail file on the target.
- Replicat: the checkpoint can be stored in a database or a checkpoint file ( *.cpr file). The checkpoint table is shared by all replicat groups in one Oracle GoldenGate instance. There is one checkpoint: Read Checkpoint showing the position of the last record in the trail file read by the replicat and has been delivered to the target. For recoverability, you also need to know high watermark (LOG_CSN) and low watermark of replicat(LOG_CMPLT_CSN). Any transaction with a CSN greater than the high watermark has not been processed, and any transaction with a CSN lower than the low watermark has already been processed. The role of replicat checkpoint is when the replicat process is restarted, Oracle GoldenGate can apply only transactions since the last checkpoint.
- Checkpoint File: a binary file (*.cpe - Extract, *.cpb, *.cpr - Replicat) is stored in the $OGG_HOME/dirchk/ directory.
- Checkpoint Table: a table created in the database. The checkpoint table can be shared across processes or per each process.The advantage of the checkpoint table is that the table can be flashed back along with the data.
Yes, but you have to turn off the checkpoint table option with the NODBCHECKPOINT when adding the replicat. For example: "ADD REPLICAT sales, EXTTRAIL dirdat\rt, NODBCHECKPOINT"
The followings are examples on how to create and manage checkpoint tables for the replicat.
2.1 Create Checkpoint Table
GGSCI 1> dblogin sourcedb test@localhost:3305 userid root password mysql Successfully logged into database. GGSCI 2> create checkpointtable checkpoint_56_tbl Successfully created the checkpoint table checkpoint_56_tbl.
GGSCI 6> info checkpointtable checkpoint_tbl Checkpoint table checkpoint_tbl created 2014-07-22 20:29:17.
GGSC 1> dblogin sourcedb test@localhost:3305 userid root password mysql Successfully logged into database. GGSCI 2> delete checkpointtable checkpoint_56_tbl This checkpoint table may be required for other installations. Are you sure you want to delete this checkpoint table? yes Successfully deleted checkpoint table checkpoint_56_tbl.
mysql> desc checkpoint_tbl +----------------+---------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +----------------+---------------+------+-----+---------+-------+ | group_name | varchar(8) | NO | PRI | NULL | | | group_key | decimal(19,0) | NO | PRI | NULL | | | seqno | decimal(10,0) | YES | | NULL | | | rba | decimal(19,0) | NO | | NULL | | | audit_ts | varchar(29) | YES | | NULL | | | create_ts | datetime | NO | | NULL | | | last_update_ts | datetime | NO | | NULL | | | current_dir | varchar(255) | NO | | NULL | | | log_bsn | varchar(128) | YES | | NULL | | | log_csn | varchar(128) | YES | | NULL | | | log_xid | varchar(128) | YES | | NULL | | | log_cmplt_csn | varchar(128) | YES | | NULL | | | log_cmplt_xids | varchar(2000) | YES | | NULL | | | version | decimal(3,0) | YES | | NULL | | +----------------+---------------+------+-----+---------+-------+ 14 rows in set (0.00 sec)
Let's discuss how checkpoints are used during the Oracle GoldenGate recovery. The example scenario is:
- The network connection between the source and target is broken. Oracle GoldenGate data pump abends after several retries for the reconnection(defined in the tcperrs files [4]) . The extract keeps running and creating new trail files. After some time of this network outage, the machine hosting the GoldenGate extract runs out of disk space, and the Oracle GoldenGate extract process abends.
- Frist, the network connect is fixed. The pump is restarted so that the existing trail files then can be delivered to the target database. The pump process will use the write checkpoint on the target to resend the trail files. After the transactions are applied to the target, the old trail files are then purged. (If fixing network connect takes time and you have to start GoldenGate capture immediately, you can consider to add new disk spaces. )
- Restart the extract. The extract will scan the source redo/archive log files from the recovery checkpoint to recover long running open transactions. After reaching to the read checkpoint, all transactions are then captured, the new trail files are created, and the write checkpoint moves forward.
- Make sure no trail file is purged before replication completed on the target. You need to configure GoldenGate manager to purge trail files based on checkpoints using PURGEOLDEXTRACTS ...USECHECKPOINTS configuration.
- Make sure you keep all the log files/archive log files till the recovery checkpoint. For Oracle database, integrated capture by default enables RMAN to hold on source DB from removing archive logs if they are still needed by Oracle GoldenGate replication.
- Introducing Oracle Data Integrator and Oracle GoldenGate by Marco Ragogna
- Oracle GoldenGate Extract Recovery Process.
- About Checkpoints (Oracle GoldenGate 12.1.2 Documentation Appendix E)
- Oracle GoldenGate: Automated Recovery From TCP/IP Network Errors, Oracle A-Team Blog
- Application checkpointing, Wikipidia