14.17. RDF Graph Replication
The following section demonstrates how to replicate graphs from one Virtuoso
instance to (an)other Virtuoso instance(s), using the RDF Replication Feature.
Terms used in this section:
-
Host Virtuoso Instance, aka the publisher: the instance where we
will insert RDF data into a Named Graph; then create a publication of this graph.
-
Destination Virtuoso Instance, aka the subscriber: the instance
which will subscribe to the publication from the Host Virtuoso Instance.
The basic outline:
- First, use the Virtuoso Conductor on a Host Virtuoso Instance to publish a named
graph.
- Then, use the Virtuoso Conductor on a Destination Virtuoso Instance to subscribe
to deltas from the published graph.
- Finally, see how a change in the publisher's graph will appear in the subscriber's
graph.
14.17.1. Replication Topologies
Typical replication topologies are Chains, Stars and Bi-directional. They can be achieved with
Virtuoso, by repeating the "Publish" and/or "Subscribe" steps on each relevant node.
14.17.1.1. Star Replication Topology
In a Star, there is one Publisher, and many Subscribers.
To set up a Star, follow the scenario:
- Configure Instance #1 to Publish.
- Configure Instance #2 to Subscribe to #1.
- Repeat as necessary.
14.17.1.1.2. Star Replication Topology Example
The following How-To walks you through setting up Virtuoso RDF Graph Replication in a Star Topology.
Prerequisites
Database INI Parameters
Suppose there are 3 Virtuoso instances respectively with the following ini parameters values:
- virtuoso1.ini:
...
[Database]
DatabaseFile = virtuoso1.db
TransactionFile = virtuoso1.trx
ErrorLogFile = virtuoso1.log
...
[Parameters]
ServerPort = 1111
SchedulerInterval = 1
...
[HTTPServer]
ServerPort = 8891
...
[URIQA]
DefaultHost = localhost:8891
...
[Replication]
ServerName = db1
...
- virtuoso2.ini:
...
[Database]
DatabaseFile = virtuoso2.db
TransactionFile = virtuoso2.trx
ErrorLogFile = virtuoso2.log
...
[Parameters]
ServerPort = 1112
SchedulerInterval = 1
...
[HTTPServer]
ServerPort = 8892
...
[URIQA]
DefaultHost = localhost:8892
...
[Replication]
ServerName = db2
...
- virtuoso3.ini:
...
[Database]
DatabaseFile = virtuoso3.db
TransactionFile = virtuoso3.trx
ErrorLogFile = virtuoso3.log
...
[Parameters]
ServerPort = 1113
SchedulerInterval = 1
...
[HTTPServer]
ServerPort = 8893
...
[URIQA]
DefaultHost = localhost:8893
...
[Replication]
ServerName = db3
...
Database DSNs
Use the ODBC Administrator on your Virtuoso host (e.g., on Windows, Start menu -> Control Panel -> Administrative Tools -> Data Sources (ODBC); on Mac OS X, /Applications/Utilities/OpenLink ODBC Administrator.app) to create a System DSN for each of db1, db2, db3, with names db1, db2 and db3, respectively.
Install Conductor package
On each of the 3 Virtuoso instances install the conductor_dav.vad package.
Create a Publication on the Host Virtuoso Instance db1
- Go to Conductor -> Replication -> Transactional -> Publications
- Click Enable RDF Publishing
- A publication with the name RDF Publication should be created:
- Click the link which is the publication name.
- You will be shown the publication items page:
- Enter for Graph IRI:
- Click Add New
- The item will be created and shown in the list of items for the currently viewed publication.
Insert Data into a Named Graph on the Host
Virtuoso Instance
There are several ways to insert data into a Virtuoso Named Graph. In this example, we
will use the Virtuoso Conductor's Import RDF feature:
- In the Virtuoso Conductor, go to RDF -> RDF Store Upload
- In the form:
- Tick the box for Resource URL and enter your resource URL, for e.g.:
http://www.openlinksw.com/dataspace/person/kidehen@openlinksw.com#this
- Enter for Named Graph IRI:
- Click Upload
- A successful upload will result in this message:
- Check the inserted triples by executing a query like the following against the SPARQL endpoint, http://cname:port/sparql:
SELECT *
FROM <http://example.org>
WHERE { ?s ?p ?o }
- See how many triples have been inserted in your graph:
SELECT COUNT(*)
FROM <http://example.org>
WHERE { ?s ?p ?o }
Subscribe to the Publication on the a
Destination Virtuoso Instance db2, db3, etc.
- Go to Conductor -> Replication -> Transactional -> Subscriptions
- Click New Subscription
- Specify a new Data Source Enter or selected target data source from the available connected Data Sources:
- Click Publications list
- Select the RDF Publication and click List Items
- Click Subscribe
- The subscription will be created
- Click Sync
- Check the retrieved triples by executing the following query
SELECT *
FROM <http://example.org>
WHERE {?s ?p ?o}
- See how many triples have been inserted into your graph by executing the following query:
SELECT COUNT(*)
FROM <http://example.org>
WHERE {?s ?p ?o}
These steps may be repeated for any number of Subscriber.
Insert Triples into the Host Virtuoso
Instance Graph and check availability at Destination Virtuoso Instance Graph
- To check the starting count, on the Destination Virtuoso Instance SPARQL Endpoint, execute:
SELECT COUNT(*)
FROM <http://example.org>
WHERE { ?s ?p ?o }
- On the Host Virtuoso Instance go to Conductor -> Database -> Interactive SQL and execute the following statement:
SPARQL INSERT INTO GRAPH <http://example.org>
{
<http://www.openlinksw.com/dataspace/person/kidehen@openlinksw.com#this>
<http://xmlns.com/foaf/0.1/interest>
<http://dbpedia.org/resource/Web_Services>
} ;
SPARQL INSERT INTO GRAPH <http://example.org>
{
<http://www.openlinksw.com/dataspace/person/kidehen@openlinksw.com#this>
<http://xmlns.com/foaf/0.1/interest>
<http://dbpedia.org/resource/Web_Clients>
} ;
SPARQL INSERT INTO GRAPH <http://example.org>
{
<http://www.openlinksw.com/dataspace/person/kidehen@openlinksw.com#this>
<http://xmlns.com/foaf/0.1/interest>
<http://dbpedia.org/resource/SPARQL>
} ;
- To confirm that the triple count has increased by the number of inserted triples, execute the following on the Destination Virtuoso Instance SPARQL Endpoint:
SELECT COUNT(*)
FROM <http://example.org>
WHERE { ?s ?p ?o }
14.17.1.2. Chain Replication Topology
In a Chain, there is one original Publisher, to which there is only one Subscriber. That
Subscriber may also serve as a Publisher, again with only one Subscriber. The chain ends with
a Subscriber which does not Publish.
To set up a Chain, follow the scenario:
- Configure Instance #1 to Publish.
- Configure Instance #2 to Subscribe to #1.
- Configure Instance #2 to Publish.
- Configure Instance #3 to Subscribe to #2.
- Repeat as necessary.
14.17.1.2.2. Chain Replication Topology Example
The following How-To walks you through setting up Virtuoso RDF Graph Replication in a
Chain Topology.
Prerequisites
Database INI Parameters
Suppose there are 3 Virtuoso instances respectively with the following ini parameters values:
- virtuoso1.ini:
...
[Database]
DatabaseFile = virtuoso1.db
TransactionFile = virtuoso1.trx
ErrorLogFile = virtuoso1.log
...
[Parameters]
ServerPort = 1111
SchedulerInterval = 1
...
[HTTPServer]
ServerPort = 8891
...
[URIQA]
DefaultHost = localhost:8891
...
[Replication]
ServerName = db1
...
- virtuoso2.ini:
...
[Database]
DatabaseFile = virtuoso2.db
TransactionFile = virtuoso2.trx
ErrorLogFile = virtuoso2.log
...
[Parameters]
ServerPort = 1112
SchedulerInterval = 1
...
[HTTPServer]
ServerPort = 8892
...
[URIQA]
DefaultHost = localhost:8892
...
[Replication]
ServerName = db2
...
- virtuoso3.ini:
...
[Database]
DatabaseFile = virtuoso3.db
TransactionFile = virtuoso3.trx
ErrorLogFile = virtuoso3.log
...
[Parameters]
ServerPort = 1113
SchedulerInterval = 1
...
[HTTPServer]
ServerPort = 8893
...
[URIQA]
DefaultHost = localhost:8893
...
[Replication]
ServerName = db3
...
Database DSNs
Use the ODBC Administrator on your Virtuoso host (e.g., on Windows, Start menu -> Control Panel -> Administrative Tools -> Data Sources (ODBC); on Mac OS X, /Applications/Utilities/OpenLink ODBC Administrator.app) to create a System DSN for each of db1, db2, db3, with names db1, db2 and db3, respectively.
Install Conductor package
On each of the 3 Virtuoso instances install the conductor_dav.vad package.
Create Publication on db1
- Go to http://localhost:8891/conductor and log in as dba
- Go to Conductor - > Replication - > Transactional - > Publications
- Click Enable RDF Publishing
- As result publication with the name RDF Publication should be created
- Click the link which is the publication name.
- You will be shown the publication items page
- Enter for Graph IRI:
- Click Add New
- The item will be created and shown in the list of items for the currently viewed publication.
Create subscription from db2 to db1's Publication
- Log in at http://localhost:8892/conductor
- Go to Replication - > Transactional - > Subscriptions
- Click New Subscription
- From the list of "Specify new data source" select Data Source db1
- Enter for db1 dba user credentials
- Click "Add Data Source"
- As result db1 will be shown in the "Connected Data Sources" list.
- Select db1 the "Connected Data Sources" list and click "Publications list"
- As result will be shown the list of available publications for the selected data source. Select the one with name "RDF Publication" and click "List Items".
- As result will be shown the "Confirm subscription" page.
- The sync interval by default is 10 minutes. For the testing purposes, we will change it to 1 minute.
- Click "Subscribe"
- The subscription will be created.
Create Publication on db2
- Go to http://localhost:8892/conductor and log in as dba
- Go to Conductor - > Replication - > Transactional - > Publications
- Click Enable RDF Publishing
- As result publication with the name RDF Publication should be created
- Click the link which is the publication name.
- You will be shown the publication items page
- Enter for Graph IRI:
- Click Add New
- The item will be created and shown in the list of items for the currently viewed publication.
Create subscription from db3 to db2's Publication
- Log in at http://localhost:8893/conductor
- Go to Replication - > Transactional - > Subscriptions
- Click New Subscription
- From the list of "Specify new data source" select Data Source db2
- Enter for db2 dba user credentials
- Click "Add Data Source"
- As result db2 will be shown in the "Connected Data Sources" list. Select it and click "Publications list"
- As result will be shown the list of available publications for the selected data source. Select the one with name "RDF Publication" and click "List Items".
- As result will be shown the "Confirm subscription" page.
- The sync interval by default is 10 minutes. For the testing purposes, we will change it to 1 minute.
- Click "Subscribe"
- The subscription will be created.
Insert Data into a Named Graph on the db1 Virtuoso Instance
- Log in at http://localhost:8891/conductor
- Go to RDF - > RDF Store Upload
- In the shown form:
- Tick the box for Resource URL and enter your resource URL, e.g.:
http://www.openlinksw.com/dataspace/person/kidehen@openlinksw.com#this
- Enter for Named Graph IRI:
- Click Upload
- A successful upload will result in a shown message.
- Check the count of the inserted triples by executing a query like the following against the SPARQL endpoint,
http://localhost:8891/sparql:
SELECT COUNT(*)
FROM <http://example.org>
WHERE { ?s ?p ?o }
- Should return 55 as total.
Check data on the Destination instances db2 and db3
- To check the starting count, on each of the Destination Virtuoso Instances db2 and db3 from SPARQL Endpoint execute:
SELECT COUNT(*)
FROM <http://example.org>
WHERE { ?s ?p ?o }
- Should return 55 as total.
Add new data on db1
- Disconnect db2 and db3.
- On the Host Virtuoso Instance db1 go to Conductor - > Database - > Interactive SQL enter the following statement:
SPARQL INSERT INTO GRAPH <http://example.org>
{
<http://www.openlinksw.com/dataspace/person/kidehen@openlinksw.com#this>
<http://xmlns.com/foaf/0.1/interest>
<http://dbpedia.org/resource/Web_Services>
} ;
SPARQL INSERT INTO GRAPH <http://example.org>
{
<http://www.openlinksw.com/dataspace/person/kidehen@openlinksw.com#this>
<http://xmlns.com/foaf/0.1/interest>
<http://dbpedia.org/resource/Web_Clients>
} ;
SPARQL INSERT INTO GRAPH <http://example.org>
{
<http://www.openlinksw.com/dataspace/person/kidehen@openlinksw.com#this>
<http://xmlns.com/foaf/0.1/interest>
<http://dbpedia.org/resource/SPARQL>
} ;
- Click "Execute"
- As result the triples will be inserted
- Check the count of the destination instance graph's triples by executing the following query like against the SPARQL endpoint,
http://localhost:8891/sparql:
SELECT COUNT(*)
FROM <http://example.org>
WHERE { ?s ?p ?o }
- Should return 58 as total.
Check data on the Destination instances db2 and db3
- Start instances db2 and db3
- To confirm that the triple count has increased by the number of inserted triples, execute the following on the Destination Virtuoso Instance db2 and db3 SPARQL Endpoint:
SELECT COUNT(*)
FROM <http://example.org>
WHERE { ?s ?p ?o }
- Should return 58 as total.
14.17.1.3. Bi-directional Replication Topology
14.17.1.3.1. Bi-directional Replication Topology Example
The following How-To walks you through setting up Virtuoso RDF Graph Replication in a
Bi-directional Topology.
db1 <---- db2
db1 ----> db2
Prerequisites
Database INI Parameters
Suppose there are 2 Virtuoso instances respectively with the following ini parameters values:
- virtuoso1.ini:
...
[Database]
DatabaseFile = virtuoso1.db
TransactionFile = virtuoso1.trx
ErrorLogFile = virtuoso1.log
...
[Parameters]
ServerPort = 1111
SchedulerInterval = 1
...
[HTTPServer]
ServerPort = 8891
...
[URIQA]
DefaultHost = localhost:8891
...
[Replication]
ServerName = db1
...
- virtuoso2.ini:
...
[Database]
DatabaseFile = virtuoso2.db
TransactionFile = virtuoso2.trx
ErrorLogFile = virtuoso2.log
...
[Parameters]
ServerPort = 1112
SchedulerInterval = 1
...
[HTTPServer]
ServerPort = 8892
...
[URIQA]
DefaultHost = localhost:8892
...
[Replication]
ServerName = db2
...
Database DSNs
Use the ODBC Administrator on your Virtuoso host (e.g., on Windows, Start menu -> Control Panel -> Administrative Tools -> Data Sources (ODBC); on Mac OS X, /Applications/Utilities/OpenLink ODBC Administrator.app) to create a System DSN for db1 and db2 with names db1 and db2 respectively.
Install Conductor package
On each of the 2 Virtuoso instances install the conductor_dav.vad package.
Create Publication on db2
- Go to http://localhost:8892/conductor and log in as dba
- Go to Conductor -> Replication -> Transactional -> Publications
- Click Enable RDF Publishing
- As result publication with the name RDF Publication should be created
- Click the link which is the publication name.
- You will be shown the publication items page
- Enter for Graph IRI:
- Click Add New
- The item will be created and shown in the list of items for the currently viewed publication.
Create subscription from db1 to db2's Publication
- Log in at http://localhost:8891/conductor
- Go to Replication -> Transactional -> Subscriptions
- Click New Subscription
- From the list of "Specify new data source" select Data Source db2
- Enter for db2 dba user credentials
- Click "Add Data Source"
- As result db2 will be shown in the "Connected Data Sources" list.
- Select db2 the "Connected Data Sources" list and click "Publications list"
- As result will be shown the list of available publications for the selected data source. Select the one with name "RDF Publication" and click "List Items".
- As result will be shown the "Confirm subscription" page.
- The sync interval by default is 10 minutes. For the testing purposes, we will change it to 1 minute.
- Click "Subscribe"
- The subscription will be created.
Create Publication on db1
- Go to http://localhost:8891/conductor and log in as dba
- Go to Conductor -> Replication -> Transactional -> Publications
- Click Enable RDF Publishing
- As result publication with the name RDF Publication should be created
- Click the link which is the publication name.
- You will be shown the publication items page
- Enter for Graph IRI:
- Click Add New
- The item will be created and shown in the list of items for the currently viewed publication.
Create subscription from db2 to db1's Publication
- Log in at http://localhost:8892/conductor
- Go to Replication -> Transactional -> Subscriptions
- Click New Subscription
- From the list of "Specify new data source" select Data Source db1
- Enter for db1 dba user credentials
- Click "Add Data Source"
- As result db1 will be shown in the "Connected Data Sources" list. Select it and click "Publications list"
- As result will be shown the list of available publications for the selected data source. Select the one with name "RDF Publication" and click "List Items".
- As result will be shown the "Confirm subscription" page.
- The sync interval by default is 10 minutes. For the testing purposes, we will change it to 1 minute.
- Click "Subscribe"
- The subscription will be created.
Insert Data into a Named Graph on the db2 Virtuoso Instance
- Log in at http://localhost:8892/conductor
- Go to RDF -> RDF Store Upload
- In the shown form:
- Tick the box for Resource URL and enter your resource URL, e.g.:
http://www.openlinksw.com/dataspace/person/kidehen@openlinksw.com#this
- Enter for Named Graph IRI:
- Click Upload
- A successful upload will result in a shown message.
- Check the count of the inserted triples by executing a query like the following against the SPARQL endpoint,
http://localhost:8892/sparql:
SELECT COUNT(*)
FROM <http://example.org>
WHERE { ?s ?p ?o }
- Should return 55 as total.
Check data on the Destination instance db1
- To check the starting count, execute from db1's SPARQL Endpoint:
SELECT COUNT(*)
FROM <http://example.org>
WHERE { ?s ?p ?o }
- Should return 55 as total.
Add new data on db2
- Disconnect db1.
- On the Host Virtuoso Instance db2 go to Conductor -> Database -> Interactive SQL enter the following statement:
SPARQL INSERT INTO GRAPH <http://example.org>
{
<http://www.openlinksw.com/dataspace/person/kidehen@openlinksw.com#this>
<http://xmlns.com/foaf/0.1/interest>
<http://dbpedia.org/resource/Web_Services>
} ;
- Click "Execute"
- As result the triples will be inserted
- Check the count of the destination instance graph's triples by executing the following query like against the SPARQL endpoint,
http://localhost:8892/sparql:
SELECT COUNT(*)
FROM <http://example.org>
WHERE { ?s ?p ?o }
- Should return 56 as total.
Check data on the Destination instance db1
- Start instance db1
- To confirm that the triple count has increased by the number of inserted triples, execute the following statement on db1's SPARQL Endpoint:
SELECT COUNT(*)
FROM <http://example.org>
WHERE { ?s ?p ?o }
- Should return 56 as total.
Add new data on db1
- Disconnect db2.
- On the Host Virtuoso Instance db1 go to Conductor -> Database -> Interactive SQL enter the following statement:
SPARQL INSERT INTO GRAPH <http://example.org>
{
<http://www.openlinksw.com/dataspace/person/kidehen@openlinksw.com#this>
<http://xmlns.com/foaf/0.1/interest>
<http://dbpedia.org/resource/Web_Clients>
} ;
SPARQL INSERT INTO GRAPH <http://example.org>
{
<http://www.openlinksw.com/dataspace/person/kidehen@openlinksw.com#this>
<http://xmlns.com/foaf/0.1/interest>
<http://dbpedia.org/resource/SPARQL>
} ;
- Click "Execute"
- As result the triples will be inserted
- Check the count of the destination instance graph's triples by executing the following query like against the SPARQL endpoint,
http://localhost:8891/sparql:
SELECT COUNT(*)
FROM <http://example.org>
WHERE { ?s ?p ?o }
- Should return 58 as total.
Check data on the Destination instance db2
- Start instance db2
- To confirm that the triple count has increased by the number of inserted triples, execute the following statement on db2's SPARQL Endpoint:
SELECT COUNT(*)
FROM <http://example.org>
WHERE { ?s ?p ?o }
- Should return 58 as total.
14.17.2. Set up RDF Replication via procedure calls
14.17.2.1. Example
The following example shows how to use SQL procedures to set up Virtuoso RDF Graph Replication in a Chain Topology.
This can also be done through the HTTP-based Virtuoso Conductor.
14.17.2.1.2. Prerequisites
Database INI Parameters
Suppose there are 3 Virtuoso instances on the same machine.
The first instance holds the master copy of the data and publishes its changes to all other instances that subscribe to this master.
The second instance subscribes to the publication of the master copy, but also publishes all of these changes to any instance that subscribes to it.
The third instance only subscribes to the publication of the second instance.
Each of these 3 servers need unique ports and ServerName, DefaultHost for this replication scheme to work properly. Although not needed, this example also sets separate names for the database and related files. This results in the following ini parameters values (only changes are shown, the rest can remain default):
- repl1/virtuoso.ini:
...
[Database]
DatabaseFile = virtuoso1.db
TransactionFile = virtuoso1.trx
ErrorLogFile = virtuoso1.log
...
[Parameters]
ServerPort = 1111
SchedulerInterval = 1
...
[HTTPServer]
ServerPort = 8891
...
[URIQA]
DefaultHost = localhost:8891
...
[Replication]
ServerName = db1-r
...
- repl2/virtuoso.ini:
...
[Database]
DatabaseFile = virtuoso2.db
TransactionFile = virtuoso2.trx
ErrorLogFile = virtuoso2.log
...
[Parameters]
ServerPort = 1112
SchedulerInterval = 1
...
[HTTPServer]
ServerPort = 8892
...
[URIQA]
DefaultHost = localhost:8892
...
[Replication]
ServerName = db2-r
...
- repl3/virtuoso.ini:
...
[Database]
DatabaseFile = virtuoso3.db
TransactionFile = virtuoso3.trx
ErrorLogFile = virtuoso3.log
...
[Parameters]
ServerPort = 1113
SchedulerInterval = 1
...
[HTTPServer]
ServerPort = 8893
...
[URIQA]
DefaultHost = localhost:8893
...
[Replication]
ServerName = db3-r
...
Database DSNs
Use the ODBC Administrator on your Virtuoso host (e.g., on Windows, Start menu -> Control Panel -> Administrative Tools -> Data Sources (ODBC); on Mac OS X, /Applications/Utilities/OpenLink ODBC Administrator.app) to create a System DSN for each of db1, db2, db3, with names db1, db2 and db3, respectively.
14.17.2.1.3. Configure Publishers and Subscribers
- Run the databases by starting start.sh, which has the following content:
cd repl1
virtuoso -f &
cd ../repl2
virtuoso -f &
cd ../repl3
virtuoso -f &
cd ..
- Use the isql command to execute the following rep.sql file:
--
-- connect to the first database which is only a publisher
--
set DSN=localhost:1111;
reconnect;
--
-- start publishing the graph http://test.org
---
DB.DBA.RDF_REPL_START();
DB.DBA.RDF_REPL_GRAPH_INS ('http://test.org');
--
-- connect to the second database in the chain, which is both a publisher and a subscriber
--
set DSN=localhost:1112;
reconnect;
--
-- start publishing the graph http://test.org
--
DB.DBA.RDF_REPL_START();
DB.DBA.RDF_REPL_GRAPH_INS ('http://test.org');
--
-- contact the first database
--
repl_server ('db1-r', 'db1', 'localhost:1111');
--
-- subscribe to its RDF publication(s)
--
repl_subscribe ('db1-r', '__rdf_repl', 'dav', 'dav', 'dba', 'dba');
--
-- bring the replication service online
--
repl_sync_all();
--
-- and set scheduler to check every minute
--
DB.DBA.SUB_SCHEDULE ('db1-r', '__rdf_repl', 1);
--
-- connect to the third database in the chain, which is only a subscriber
--
set DSN=localhost:1113;
reconnect;
--
-- uncomment next 2 commands if this database should also be a publisher
--
--DB.DBA.RDF_REPL_START();
--DB.DBA.RDF_REPL_GRAPH_INS ('http://test.org');
--
-- contact second database
--
repl_server ('db2-r', 'db2', 'localhost:1112');
--
-- subscribe to its RDF publication(s)
--
repl_subscribe ('db2-r', '__rdf_repl', 'dav', 'dav', 'dba', 'dba');
--
-- bring the replication service online
--
repl_sync_all();
--
-- and set schedule to check every minute
--
DB.DBA.SUB_SCHEDULE ('db2-r', '__rdf_repl', 1);