Performing Data Validation v6.2
The current working directory from which you invoke the Data Validator script runValidation.sh
(runValidation.bat
for Windows) must be the bin subdirectory containing the script (that is, XDB_HOME/bin
).
For example, if the xDB Replication Server is installed into its default directory location, then issue the following command before invoking the Data Validator:
cd /opt/PostgreSQL/EnterpriseDB-xDBReplicationServer/bin
Similarly for Windows hosts, issue the following:
cd C:\Program Files\edb\EnterpriseDB-xDBReplicationServer\bin
The general command format for invoking the Data Validator is the following:
./runValidation.sh { –ss | --source-schema } schema_name [ option ] ...
schema_name
is the name of the schema in the source database containing the tables to be validated. The choices for option are listed later in this section within the Options subsection.
For Windows hosts, the command format is the following:
runValidation { –ss | --source-schema } schema_name [ option ] ...
The following option displays the Data Validator version:
./runValidation.sh { –v | --version }
On Linux the version is displayed as follows:
$ ./runValidation.sh --version EnterpriseDB DataValidator Build 3
On Windows the version is displayed as follows:
C:\Program Files\edb\EnterpriseDB-xDBReplicationServer\bin>runValidation -v EnterpriseDB DataValidator Build 3
The following option displays the help information.
./runValidation.sh { –h | --help }
This is shown by the following:
$ ./runValidation.sh --help Usage: runValidation.sh (-v | --version) | (-h | --help) runValidation.sh (-ss | --source-schema) SOURCE_SCHEMA [OPTIONS] CONNECTION_INFO_FILE OPTIONS: (-ts | --target-schema) target-schema-name (-it | --include-tables) comma-seperated-tables-name (-et | --exclude-tables) comma-seperated-tables-name (-ld | --logging-dir) logging-dir-path (-ds | --display-summary) (true|false) (-srs | --skip-rowsonlyin-source) (true|false) (-srt | --skip-rowsonlyin-target) (true|false) (-srb | --skip-rowsin-both) (true|false) (-fs | --fetch-size) row count (-bs | --batch-size) row count (-sdbms | --source-dbms) source database type (-sh | --source-host) source database server name/IP (-sp | --source-port) source database server port (-sdb | --source-database) source database name (-su | --source-user) source database user id (-spw | --source-password) source database user password (-tdbms | --target-dbms) target database type (-th | --target-host) target database server name/IP (-tp | --target-port) target database server port (-tdb | --target-database) target database name (-tu | --target-user) target database user id (-tpw | --target-password) target database user password
The general syntax for all options except for --version
and --help
is shown by the following:
./runValidation.sh –ss schema [ -ts schema ] [ -it table_1 [,table_2 ] ... ] [ -et table_1 [,table_2 ] ... ] [ -srs { true | false } ] [ -srt { true | false } ] [ -srb { true | false } ] [ -ld log_directory_path ] [ -ds { true | false } ] [ -sdbms database_type ] [ -sh host ] [ -sp port ] [ -sdb dbname ] [ -su user ] [ -spw password ] [ -tdbms database_type ] [ -th host ] [ -tp port ] [ -tdb dbname ] [ -tu user ] [ -tpw password ] [ -bs row_count ] [ -fs row_count ]
For clarity, the preceding syntax diagram shows only the single-character form of the option. The Options subsection lists both the single-character and multi-character forms of the options.
Specification of any database connection option (-sdbms
through -tpw
listed in the preceding syntax diagram) overrides the corresponding parameter in the datavalidator.properties
file. Installation and Configuration for information on the datavalidator.properties
file.
Options
-ss, --source-schema schema
The schema of the source database containing the tables to be compared against the target database.
-ts, --target-schema schema
The schema of the target database containing the tables to be compared against the source database. If omitted, the schema of the target database is the same schema as specified for the source database with the -ss option.
-it, --include-tables table_1 [,table_2 ] ...
The tables within the source schema that are to be included for comparison. If omitted, all tables within the source schema are compared against tables in the target schema with the exception of those tables excluded from comparison using the
-et
option. Note: There must be no white space between the comma and table names.
-et, --exclude-tables table_1 [,table_2 ] ...
The tables within the source schema that are to be excluded from comparison. If omitted, only those tables specified with the
-it
option are included for comparison. If both the-it
and-et
options are omitted, all source schema tables are included for comparison. Note: There must be no white space between the comma and table names.
-srs, --skip-rowsonlyin-source { true | false }
When true is specified, the logging of differences for rows that exist only in the source database table are skipped. The default is false.
-srt, --skip-rowsonlyin-target { true | false }
When true is specified, the logging of differences for rows that exist only in the target database table are skipped. The default is false.
-srb, --skip-rowsin-both { true | false }
When true is specified, the logging of differences for rows that exist both in the source and target database tables with the same primary key, but with different non-primary key values are skipped. The default is false.
-ld, --logging-dir log_directory_path
Directory path to where the Data Validator log and diff files are to be created and stored. If
log_directory_path
does not exist, Data Validator attempts to create it. If a full directory path is not specifiedlog_directory_path
is created or assumed to be located relative to theXDB_HOME/bin
subdirectory where therunValidation.sh
script is invoked. (That is, the logs directory isXDB_HOME/bin/log_directory_path
.) Be sure the operating system account used to invoke therunValidation.sh
script has the privileges to create the directory if it does not already exist, or to create files in the specified directory if it does already exist. If omitted, the default is theXDB_HOME/bin/logs
directory.
-ds, --display-summary { true | false }
Specify true to display only the Data Validator summary. This omits the source and target database connection information as well as the detailed breakdown of the results by source database table. Specify false to display all of the Data Validator results. The type and amount of information that is displayed at the command line console when the Data Validator is invoked is the same information that is also stored in the log file for that run. If omitted, the default is false (that is, all of the Data Validator results is displayed).
-sdbms, --source-dbms database_type
The type of the source database server. Supported types are oracle, enterprisedb, sqlserver, sybase, and mysql.
-sh, --source-host host
The IP address or server name of the host on which the source database server is running.
-sp, --source-port port
The port number on which the source database server is listening for connections.
-sdb, --source-database dbname
The database name of the source database.
-su, --source-user user
The database user name for connecting to the source database.
-spw, --source-password password
The password of the source database user in unencrypted form.
-tdbms, --target-dbms database_type
The type of the target database server. Supported types are enterprisedb and oracle.
-th, --target-host host
The IP address or server name of the host on which the target database server is running.
-tp, --target-port port
The port number on which the target database server is listening for connections.
-tdb, --target-database dbname
The database name of the target database.
-tu, --target-user user
The database user name for connecting to the target database.
-tpw, --target-password password
The password of the target database user in unencrypted form.
-bs, --batch-size row_count
The -bs option specifies the number of rows to group in a batch to be used for comparison across the source and target database tables. For example, if a table contains 1000 rows, then a
-bs
setting of 100 requires 10 batch iterations to complete the comparison across the source and target databases. The Data Validator reads 100 rows, both from the source and target tables, and adds them in source and target buffers. The validation thread then reads the 100 rows from the source and target buffers and performs the comparison. It will then move to read and prepare the next 100 rows for comparison and so on. Note that the actual database round trips required to bring in 100 rows from the database depends on the-fs
option for the fetch size. For example, an-fs
setting of 100 needs just one round trip whereas an-fs
setting of 10 requires 10 database round trips.
-fs, --fetch-size row_count
Performing data validation for tables that are quite large in size may cause the Data Validator to terminate with an out of heap space error when using the default fetch size of 5000 rows. Use the
-fs
option to specify a smaller fetch size to help avoid the out of heap space issue. The result set iteration will bring in as many rows as represented by therow_count
value in a single database round trip.
Examples
The following examples use an Oracle source database and an Advanced Server target database to compare the tables in schema EDB
on Oracle against the tables in schema public
in Advanced Server.
The following lists the tables in schema EDB along with the content of tables DEPT
and EMP
in the Oracle source database:
SQL> SELECT table_name FROM user_tables; TABLE_NAME ------------------------------ ORATAB DEPT EMP JOBHIST SQL> SELECT * FROM dept; DEPTNO DNAME LOC ---------- -------------- ------------- 10 ACCOUNTING NEW YORK 20 RESEARCH DALLAS 30 SALES CHICAGO 40 OPERATIONS BOSTON 50 FINANCE CHICAGO SQL> SELECT * FROM emp; EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO ---------- ---------- --------- ---------- --------- ---------- ---------- ---------- 7369 SMITH CLERK 7902 17-DEC-80 800 20 7499 ALLEN SALESMAN 7698 20-FEB-81 1600 300 30 7521 WARD SALESMAN 7698 22-FEB-81 1250 500 30 7566 JONES MANAGER 7839 02-APR-81 2975 20 7654 MARTIN SALESMAN 7698 28-SEP-81 1250 1400 30 7698 BLAKE MANAGER 7839 01-MAY-81 2850 30 7782 CLARK MANAGER 7839 09-JUN-81 2450 10 7788 SCOTT ANALYST 7566 19-APR-87 3000 20 7839 KING PRESIDENT 17-NOV-81 5000 10 7844 TURNER SALESMAN 7698 08-SEP-81 1500 0 30 7876 ADAMS CLERK 7788 23-MAY-87 1100 20 7900 JAMES CLERK 7698 03-DEC-81 950 30 7902 FORD ANALYST 7566 03-DEC-81 3000 20 7934 MILLER CLERK 7782 23-JAN-82 1300 10 9001 SMITH ANALYST 7566 8500 20 9002 ROGERS SALESMAN 7698 8000 4000 30 16 rows selected.
The following lists the tables in schema public along with the content of tables dept
and emp
in the Advanced Server edb
database:
edb=# \dt List of relations Schema | Name | Type | Owner --------+---------+-------+-------------- public | dept | table | enterprisedb public | emp | table | enterprisedb public | jobhist | table | enterprisedb (3 rows) edb=# SELECT * FROM dept; deptno | dname | loc --------+------------+---------- 10 | ACCOUNTING | NEW YORK 20 | RESEARCH | DALLAS 30 | SALES | CHICAGO 40 | OPERATIONS | BOSTON (4 rows) edb=# SELECT * FROM emp; empno | ename | job | mgr | hiredate | sal | comm | deptno -------+--------+-----------+------+--------------------+---------+---------+-------- 7369 | SMITH | CLERK | 7902 | 17-DEC-80 00:00:00 | 800.00 | | 20 7499 | ALLEN | SALESMAN | 7698 | 20-FEB-81 00:00:00 | 1600.00 | 300.00 | 30 7521 | WARD | SALESMAN | 7698 | 22-FEB-81 00:00:00 | 1250.00 | 500.00 | 30 7566 | JONES | MANAGER | 7839 | 02-APR-81 00:00:00 | 2975.00 | | 20 7654 | MARTIN | SALESMAN | 7698 | 28-SEP-81 00:00:00 | 1250.00 | 1400.00 | 30 7698 | BLAKE | MANAGER | 7839 | 01-MAY-81 00:00:00 | 2850.00 | | 30 7782 | CLARK | MANAGER | 7839 | 09-JUN-81 00:00:00 | 2450.00 | | 10 7788 | SCOTT | ANALYST | 7566 | 19-APR-87 00:00:00 | 3000.00 | | 20 7839 | KING | PRESIDENT | | 17-NOV-81 00:00:00 | 5000.00 | | 10 7844 | TURNER | SALESMAN | 7698 | 08-SEP-81 00:00:00 | 1500.00 | 0.00 | 30 7876 | ADAMS | CLERK | 7788 | 23-MAY-87 00:00:00 | 1100.00 | | 20 7900 | JAMES | CLERK | 7698 | 03-DEC-81 00:00:00 | 950.00 | | 30 7902 | FORD | ANALYST | 7566 | 03-DEC-81 00:00:00 | 3000.00 | | 20 7934 | MILLER | CLERK | 7782 | 23-JAN-82 00:00:00 | 1300.00 | | 10 9001 | SMITH | SALESMAN | 7698 | | 8000.00 | 4000.00 | 30 9002 | ROGERS | SALESMAN | 7698 | | 9500.00 | 4000.00 | 30 (16 rows)
Note the following differences:
- The Oracle
EDB
schema contains one additional table namedORATAB
that does not exist in the Advanced Server public schema. - The Oracle
DEPT
table contains one extra row withDEPTNO 50
that does not exist in the Advanced Server dept table. - The rows in the
EMP
table withEMPNO
values9001
and9002
have column values that differ between the Oracle and Advanced Server tables - In this example, the
JOBHIST
table contains identical rows for both the Oracle and Advanced Server tables.
The content of the datavalidator.properties
file is set as follows:
############################################################### Source database connection ############################################################### #source_dbms=(enterprisedb | oracle | sqlserver | sybase | mysql) source_dbms=oracle source_host=192.168.2.23 source_port=1521 source_database=xe source_user=edb source_password=password ############################################################### Target database connection ############################################################### #target_dbms=(enterprisedb | oracle) target_dbms=enterprisedb target_host=localhost target_port=5444 target_database=edb target_user=enterprisedb target_password=password
The following example compares all tables in the Oracle EDB
schema against the Advanced Server public schema.
The Data Validator log files are created in directory /home/user/datavalidator_logs
as specified with the -ld
option. The operating system account used to invoke the runValidation.sh
script has write access to the /home/user
directory so the Data Validator can create the datavalidator_logs
subdirectory.
$ cd /opt/PostgreSQL/EnterpriseDB-xDBReplicationServer/bin $ pwd /opt/PostgreSQL/EnterpriseDB-xDBReplicationServer/bin $ ./runValidation.sh -ss edb -ts public -ld /home/user/datavalidator_logs EnterpriseDB DataValidator Build 3 ---------------------------------------------------------------------------- Source and target databases connection information ---------------------------------------------------------------------------- Source database: DBMS: ORACLE Host: 192.168.2.23 Port: 1521 Database: xe User: edb Target database: DBMS: ENTERPRISEDB Host: localhost Port: 5444 Database: edb User: enterprisedb ---------------------------------------------------------------------------- Databases data validation process started... ---------------------------------------------------------------------------- Validating Table DEPT Rows validated: 5 Finished validating table DEPT with 1 errors. Logging errors details in the diff file... Validating Table EMP Rows validated: 16 Finished validating table EMP with 2 errors. Logging errors details in the diff file... Validating Table JOBHIST Rows validated: 17 Finished validating table JOBHIST with 0 errors. Validating Table ORATAB Table not validated as it does not exist on the target database. DataValidator found 3 errors across source and target databases. For detailed error report see datavalidator_20150713-144417.diff file. ---------------------------------------------------------------------------- Data validation process has completed. ---------------------------------------------------------------------------- *************************************************************** DataValidator Summary *************************************************************** All tables count: 4 Validated tables count: 3 Rows count: 38 Errors count: 3 Missing tables on the target database count: 1 Tables list: - EDB.ORATAB Tables having only unsupported datatypes count: 0 Tables having primary key limitation count: 0 Total time(s): 0.678 Rows per second: 56 ***************************************************************
The Data Validator output indicates the following:
- There is one error in the
DEPT
table (the missing row). - There are two errors in the
EMP
table (the two rows with mismatching column values) - The
JOBHIST
table contains no errors. - The
ORATAB
table does not exist on the target database.
The following shows the files created in the Data Validator logs directory:
$ pwd /home/user/datavalidator_logs $ ls -l total 24 -rw-rw-r-- 1 user user 18999 Aug 13 15:44 datavalidator_20150713-144417.diff -rw-rw-r-- 1 user user 2133 Aug 13 15:44 datavalidator_20150713-144417.log
The log file contains the same content as displayed when the Data Validator is invoked. The diff file compares the differences where errors were detected.
The following is the diff file as displayed in a text editor:
Figure 9-1: Data Validator diff file
The following example includes only tables dept
and emp
with the -it
option when comparing the Oracle EDB schema against the Advanced Server public schema.
$ cd /opt/PostgreSQL/EnterpriseDB-xDBReplicationServer/bin $ pwd /opt/PostgreSQL/EnterpriseDB-xDBReplicationServer/bin $ ./runValidation.sh -ss edb -ts public -ld /home/user/datavalidator_logs -it dept,emp EnterpriseDB DataValidator Build 3 ---------------------------------------------------------------------------- Source and target databases connection information ---------------------------------------------------------------------------- Source database: DBMS: ORACLE Host: 192.168.2.23 Port: 1521 Database: xe User: edb Target database: DBMS: ENTERPRISEDB Host: localhost Port: 5444 Database: edb User: enterprisedb ---------------------------------------------------------------------------- Databases data validation process started... ---------------------------------------------------------------------------- Validating Table DEPT Rows validated: 5 Finished validating table DEPT with 1 errors. Logging errors details in the diff file... Validating Table EMP Rows validated: 16 Finished validating table EMP with 2 errors. Logging errors details in the diff file... DataValidator found 3 errors across source and target databases. For detailed error report see ``datavalidator_20150714-123353.diff`` file. ---------------------------------------------------------------------------- Data validation process has completed. ---------------------------------------------------------------------------- *************************************************************** DataValidator Summary *************************************************************** All tables count: 2 Validated tables count: 2 Rows count: 21 Errors count: 3 Missing tables on the target database count: 0 Tables having only unsupported datatypes count: 0 Tables having primary key limitation count: 0 Total time(s): 0.539 Rows per second: 39 ***************************************************************
The following example excludes tables ORATAB
and jobhist
with the -et
option when comparing the Oracle EDB schema against the Advanced Server public schema. The -ds
true option results in the display of only the Data Validator summary.
$ ./runValidation.sh -ss edb -ts public -ld /home/user/datavalidator_logs -et ORATAB,jobhist -ds true Databases data validation process started... *************************************************************** DataValidator Summary *************************************************************** All tables count: 2 Validated tables count: 2 Rows count: 21 Errors count: 3 Missing tables on the target database count: 0 Tables having only unsupported datatypes count: 0 Tables having primary key limitation count: 0 Total time(s): 0.535 Rows per second: 39 ***************************************************************
For this run, the corresponding log file contains only the Data Validator summary, omitting the source and target database connection information along with the error breakdown by table.