DB2 i allows users to define a character set for each table column, which is also called the Column-level Character Set Encoding. For example, the following table in DB2 i has two character encodings:
CREATE TABLE DEPARTMENT (DEPTNO CHAR(3) CCSID 37 NOT NULL, DEPTNAME VARCHAR(36) CCSID 37 NOT NULL, DEPTNAMECH VARCHAR(20) CCSID 935 NOT NULL, PRENDATE DATE DEFAULT NULL)
The DEPTNO and DEPTNAME are in with US English (CCISD 37). The DEPTNAMECH is in simplified Chinese (CCSID 935). In this blog, let's me discuss how you can configure Oracle GoldenGate to correctly replicate data from DB2 i database with column-level character set encodings.
Understanding the Universal Character Set Encoding
Before we discuss how to use Oracle GoldenGate, let's look at the Universal Character Set Encoding used by DB2 databases including DB2i, DB2 LUW, and DB2 z/OS.
The Universal Character Set Encoding defines code pages which are tables mapping of characters' alphanumeric code to its binary representation. The group of code pages also define the well known American Standard Code for Information Interchange (ASCII), Extended Binary Coded Decimal Information Code (EBCDIC) and Unicode character encoding standard (UNICODE) character encoding. Code pages are independent of each other. Therefore, the same characters can be represented by different code points (binary representation) in different CCSIDs.
In the mainframe (z/OS, OS/390®) and iSeries (i5/OS™, OS/400®), the term Coded Character Set Identifier (CCSID) is used instead of the code page. A CCSID is a 16-bit unsigned integer that uniquely identifies a particular code page. [1] The followings are some commonly used CCSID:
Before we discuss how to use Oracle GoldenGate, let's look at the Universal Character Set Encoding used by DB2 databases including DB2i, DB2 LUW, and DB2 z/OS.
The Universal Character Set Encoding defines code pages which are tables mapping of characters' alphanumeric code to its binary representation. The group of code pages also define the well known American Standard Code for Information Interchange (ASCII), Extended Binary Coded Decimal Information Code (EBCDIC) and Unicode character encoding standard (UNICODE) character encoding. Code pages are independent of each other. Therefore, the same characters can be represented by different code points (binary representation) in different CCSIDs.
In the mainframe (z/OS, OS/390®) and iSeries (i5/OS™, OS/400®), the term Coded Character Set Identifier (CCSID) is used instead of the code page. A CCSID is a 16-bit unsigned integer that uniquely identifies a particular code page. [1] The followings are some commonly used CCSID:
- CCSID 37 - US-English Code Page
- CCSID 273 - German Code Page
- CCSID 1208 - Unicode
- CCSID 65535 - A special value indicating data is hex and should not be converted
Configuring Oracle GoldenGate to Handle Character Set Encoding
For Oracle GoldenGate 12.2 and earlier versions, Oracle GoldenGate for DB2 i by default converts all of the character data to Unicode when capturing the data from the DB2 i databases. However, the conversion impacts the extraction performance.
Starting Oracle GoldenGate 12.3, Oracle GoldenGate for DB2 i starts to support column level character set encoding. By default, the character data are still converted to Unicode. But you can you can use CHARSET and COLCHARSET on the TABLE statement. to overwrite the character set or use COLCHARSET(PASSTHRU) on the TABLE to avoid any character conversion. (Also refer to BINDCHARFORBITASCHAR)
For Oracle GoldenGate 12.2 and earlier versions, Oracle GoldenGate for DB2 i by default converts all of the character data to Unicode when capturing the data from the DB2 i databases. However, the conversion impacts the extraction performance.
Starting Oracle GoldenGate 12.3, Oracle GoldenGate for DB2 i starts to support column level character set encoding. By default, the character data are still converted to Unicode. But you can you can use CHARSET and COLCHARSET on the TABLE statement. to overwrite the character set or use COLCHARSET(PASSTHRU) on the TABLE to avoid any character conversion. (Also refer to BINDCHARFORBITASCHAR)
From GoldenGate 12.3 Documentation [6]:
The Extract TRAILCHARSETUNICODE parameter has been deprecated because the default behavior of a DB2 for I Extract is now (in 12.3) reverted to convert all character data to Unicode. The character conversion efficiency has been significantly improved., however if a table must be sent without conversion, COLCHARSET(PASSTHRU) may be used on the TABLE statement.
The Extract TRAILCHARSETUNICODE parameter has been deprecated because the default behavior of a DB2 for I Extract is now (in 12.3) reverted to convert all character data to Unicode. The character conversion efficiency has been significantly improved., however if a table must be sent without conversion, COLCHARSET(PASSTHRU) may be used on the TABLE statement.
The following is an example configuration in the extract parameter file:
MAP SchemaName.DEPARTMENT, COLCHARSET(ibm-37, DEPTNO ,DEPTNAME), COLCHARSET(ibm-935, DEPTNAMECH)
You can find the supported character set encoding names in Oracle GoldenGate 12.3 documentation: A2 Supported Character Sets - Non-Oracle.
Resources