Loading tables from an external data source (offline snapshot) v7

You might want to initially load your target tables (subscription tables of a single-master replication system, or non-MDN nodes of a multi-master replication system) using a method other than the snapshot replication functionality of Replication Server. This method is referred to as using an offline snapshot.

For example, you might at first load the tables by running the Migration Toolkit from the command line or by using a backup from an external data source. When you load the target tables using an offline snapshot, you must take into account special preparations for the following deviations from the default target table creation and loading process:

  • In the typical default scenario, Replication Server creates the target table definitions when you define the subscription in a single-master replication system or add another primary node in a multi-master replication system. When using an offline snapshot, creating the target table definitions is your responsibility. You must therefore prevent Replication Server from creating the target table definitions.
  • In the typical default scenario, Replication Server performs synchronization replication using batches of SQL statements. If any statement in a batch results in an error, all statements in the batch are rolled back. When using an offline snapshot, the external data source used to load the target tables might already have transactions applied to it that are also recorded in the shadow tables of the source tables. In this case, you must perform the first synchronization replication in non-batch mode. This condition can result in a statement failure in certain cases.

Non-batch mode synchronization

Synchronization replications are done in batches of updates, each batch committed in a separate transaction. If any single update in a batch fails, all the updates in the batch are rolled back.

This process has the following implications.

Prior to and during the time when the offline snapshot is in progress, there might be updates to the source tables, which are recorded in the source tables’ shadow tables. After the offline snapshot completes, there might be more updates to the source tables that are also recorded in the shadow tables.

Since Replication Server does not know about the external data source used to load the target tables, Replication Server doesn't know whether any of the updates made to the source tables during or after the offline snapshot were already included in the data used to load the target tables.

As a result, the shadow tables might include a mixture of duplicate updates that were already applied to the target tables as well as new updates that weren't applied to the target tables.

If you then perform synchronization replication, the publication server attempts to apply all updates recorded in the shadow tables in batches.

If one of the updates was inserting a new row and this new row is already in the target table loaded from the offline snapshot, a duplicate key error results when the publication server attempts to apply the batch containing the INSERT statement for this row. The duplicate key error forces the rollback of the entire batch. This causes the exclusion of updates in the batch that might not yet have been carried over to the target tables. The source tables and target tables are now inconsistent, since there were updates to the source tables that weren't applied to the target tables.

Note

The effects of applying UPDATE and DELETE statements in the batch to a target table that was already changed by these updates doesn't cause the same problem as repeated application of INSERT statements. The UPDATE statement changes the row to the same values a second time. When a DELETE statement doesn't affects any rows, this isn't considered an error by the database server. No rollback of the batch occurs.

The solution to the potential rollback of a batch is to apply the shadow table updates in non-batch mode. That is, commit each SQL statement individually. In that way, if inserting a row fails due to a duplicate key error, that statement alone is rolled back. The error doesn't affect the other shadow table updates that must be applied since all updates are enclosed in their own, individual transactions.

The batchInitialSync configuration option controls whether the first synchronization replication occurs in batch or non-batch mode. Suppose you're using an offline snapshot in an active replication system where updates are occurring to the source tables. Transactions are thus accumulating in the shadow tables for the trigger-based method. In this case, we recommend setting batchInitialSync to false to perform the first synchronization replication in non-batch mode.

Note

You can't use an offline snapshot to add a subscription or a primary node to an active replication system that uses the log-based method. For the log-based method, you can use offline snapshots only to first configure the system. You can't use them to update the system with additional nodes after the publication database or primary node is actively receiving transactions.

Suppose you're using offline snapshots to first create the entire replication system that has yet to be activated. The content of the offline snapshots are all assumed to be consistent for the source and target tables. You can then leave batchInitialSync with its default setting of true. It is assumed that the first synchronization replication will not apply any duplicate updates.

Offline snapshot configuration options

The following are the configuration options that you need to modify when using an offline snapshot.

Note

These options apply only to the publication server.

offlineSnapshot

You must set the offlineSnapshot to true before creating the subscription for a single-master replication system or before adding the primary node for a multi-master replication system.

offlineSnapshot={true | false}

The default value is false.

When set to true, the offlineSnapshot option prevents the usual creation of the subscription schema and table definitions when the subscription is defined in a single-master replication system. It is assumed that you're creating the subscription table definitions and loading them from an external source other than the publication.

When adding the primary node in a multi-master replication system, leave the Replicate Publication Schema and Perform Initial Snapshot boxes cleared (see Creating more primary nodes).

When offlineSnapshot is set to true, this configuration has the direct effect in the control schema by setting column has_initial_snapshot to a value of O. This setting indicates an offline snapshot is used for the target subscription or primary node represented by the row. Column has_initial_snapshot is set in table _edb_replicator_pub.xdb_publication_subscriptions for a single-master replication system and in table _edb_replicator_pub.xdb_mmr_pub_group for a multi-master replication system.

After the first replication completes to the target subscription or primary node, has_initial_snapshot is changed to Y by Replication Server.

The setting of has_initial_snapshot influences the behavior of the batchInitialSync option.

batchInitialSync

The batchInitialSync option controls whether the first synchronization after loading the target tables from an offline snapshot is done in batch mode (the default) or non-batch mode.

Set the batchInitialSync option to false to perform synchronization replication in non-batch mode.

You must set the offlineSnapshot configuration option to true prior to creating the subscription or adding another primary node. A non-batch mode synchronization occurs only if batchInitialSync is false and the has_initial_snapshot column in the control schema is set to a value of O as described for the offlineSnapshot option.

batchInitialSync={true | false}

The default value is true.

Single-master replication offline snapshot

You can use an offline snapshot to first load the subscription tables of a single-master replication system. For a publication that's intended to have multiple subscriptions, you can create some of the subscriptions using the default Replication Server snapshot replication process as described in Performing snapshot replication. You can create other subscriptions from an offline snapshot.

To create a subscription from an offline snapshot:

  1. Register the publication server, add the publication database definition, and create the publication as described in Creating a publication.

  2. Register the subscription server and add the subscription database definition as described in Registering a subscription server and Adding a subscription database.

    Note

    You must perform steps 3 and 4 before creating the subscription. You can repeat steps 5 through 9 each time you want to create another subscription from an offline snapshot.

  3. Modify the publication server configuration file if these options aren't already set as described by the following:

    • Change the offlineSnapshot option to true. When you restart the publication server or reload the publication server's configuration via reloadconf, offlineSnapshot set to true has two effects. One is that creating a subscription doesn't create the schema and subscription table definitions in the subscription database as is done with the default setting. The other is that creating a subscription sets a column in the control schema indicating an offline snapshot is used to load this subscription.
    • Set the batchInitialSync option to the appropriate setting for your situation as discussed at the end of Non-batch mode synchronization.
  4. If you modified the publication server configuration file in Step 3, reload configuration of the publication server. See "Reloading the Publication or Subscription Server Configuration File (reloadconf)" for directions on reloading the publication server's configuration.

  5. In the subscription database, create the schema and the subscription table definitions, and load the subscription tables from your offline data source. The subscription database user name used in Adding a subscription database must have full privileges over the database objects created in this step. Also review the beginning of Adding a subscription database regarding the rules as to how Replication Server creates the subscription definitions from the publication for each database type. You must follow these same conventions when you create the target definitions manually.

  6. Add the subscription as described in Adding a subscription.

  7. Perform an on-demand synchronization replication. See Performing synchronization replication to learn how to perform an on-demand synchronization replication.

  8. If you aren't planning to load any other subscriptions using an offline snapshot at this time, change the offlineSnapshot option back to false and the batchInitialSync option to true in the publication server configuration file.

  9. If you modified the publication server configuration file in step 8, reload configuration of the publication server.

Multi-master replication offline snapshot

You can use an offline snapshot to first load the primary nodes of a multi-master replication system. You can load some of the primary nodes using the Replication Server snapshot replication functionality when defining the primary node as described in Creating more primary nodes or by using an on-demand snapshot as described in Performing snapshot replication. You can load other primary nodes from an offline snapshot.

Note

Offline snapshots aren't supported for a multi-master replication system that's actively in use. Any changes on an active primary node are lost during the offline snapshot process of dumping or restoring the data of another node.

To create a primary node from an offline snapshot:

  1. Register the publication server, add the primary definition node, and create the publication as described in Creating a publication.

    Note

    You must perform the steps 3 and 4 before adding a primary node to be loaded by an offline snapshot. You can repeat Steps 5 through 10 each time you want to create another primary node from an offline snapshot.

  2. Be sure there's no schedule defined on the replication system. If there is, remove the schedule until you complete this process. See Removing a schedule for details.

  3. Modify the publication server configuration file so the options are set as follows:

    • Set the offlineSnapshot option to true. When you restart the publication server or reload the publication server's configuration via reloadconf, this setting has the effect that adding a primary node sets a column in the control schema indicating an offline snapshot is used to load this primary node.
    • Set the batchInitialSync option to the appropriate setting for your situation as discussed at the end of Non-batch mode synchronization.
  4. If you modified the publication server configuration file in step 3, reload configuration of the publication server. See "Reloading the Publication or Subscription Server Configuration File (reloadconf)" for directions to reload the publication server's configuration.

  5. In the database to use as the new primary node, create the schema and the table definitions, and load the tables from your offline data source.

  6. Add the primary node as described in Creating more primary nodes with the options Replicate Publication Schema and Perform Initial Snapshot cleared.

  7. Perform an initial on-demand synchronization. See Performing synchronization replication to learn how to perform an on demand-synchronization.

  8. If you aren't planning to load any other primary nodes using an offline snapshot at this time, change the offlineSnapshot option back to false and the batchInitialSync option to true in the publication server configuration file.

  9. If you modified the publication server configuration file in step 8, reload configuration of the publication server.

  10. Add the schedule again if you removed it. See Creating a schedule to learn how to create a schedule.