Technical Blog

2 Posts tagged with the failover tag


The performance team at Elastic Path has recently gone through Oracle RAC validation with Elastic Path 6.1.1 and made it out the other side unscathed. And the best part, is that there are no code changes required on EP out-of-the-box to fully support Oracle RAC with Fast-Connection-Failover (FCF).


The benefits of Oracle RAC (Real Application Clusters, or Oracle database clustering in simple terms) are three-fold: performance, scalability, and reliability. Which one matters the most to you depends on your needs, but usually having the assurance of database failover is the most valuable, with scalability and performance coming a close second.


In our testing, we used WebLogic 10.0.1 and Oracle 11g (Release 1) on physical machines using Intel Xeon quad-core 2.5Ghz CPUs and 8GBs RAM. The OS was 64-bit RedHat EL 5.  In-house, we are typically able to push a single Oracle node to capacity with 3 EP storefront nodes. For our validation testing, with a four storefront configuration, we were utilizing roughly 50-60% capacity on a two node RAC configuration. The following is a rough guide for setting up RAC for EP.

 

RAC Configuration w/ EP

Deployment and configuration of Oracle Clusterware and Oracle 11g was fairly straight-forward and required no special configuration with Elastic Path, other than the standard RAC connection config outlined below. Oracle's online documentation for the Clusterware  set up is excellent and very detailed when you need to drill down.


Once the Clusterware and database are up and running, and your data has been populated, there are many different ways to set up RAC with WebLogic. See the WebLogic documentation for details. WebLogic recommends the use of multi data sources to connect to the RAC nodes. This method supports failover and load-balancing at the application level which is more effective as WebLogic's health monitors can be used and failover is done more quickly than Connect-Time failover or allowing the cluster-ware to handle this. It is recommended to set up a data source for each RAC node. Below is a configuration example for the data source XML; it is based on a two node setup (a data source for each node) and the DS pool.



WebLogic Data Source Example XML

<jdbc-data-source>

<name>jdbcPool</name>

<jdbc-driver-params>

  <url>jdbc:oracle:thin:@lcqsol24:1521:snrac1</url>

  <driver-name>oracle.jdbc.OracleDriver</driver-name>

  <properties>

   <property>

    <name>user</name>

    <value>wlsqa</value>

   </property>

  </properties>

  <password-encrypted>{3DES}aP/xScCS8uI=</password-encrypted>

</jdbc-driver-params>

<jdbc-connection-pool-params>

  <test-connections-on-reserve>true</test-connections-on-reserve>

  <test-table-name>SQL SELECT 1 FROM DUAL</test-table-name>

</jdbc-connection-pool-params>

<jdbc-data-source-params>

  <jndi-name>jdbcDataSource</jndi-name>

</jdbc-data-source-params>

</jdbc-data-source>


<jdbc-data-source>

<name>jdbcPool2</name>

<jdbc-driver-params>

  <url>jdbc:oracle:thin:@lcqsol25:1521:SNRAC2</url>

  <driver-name>oracle.jdbc.OracleDriver</driver-name>

  <properties>

   <property>

    <name>user</name>

    <value>wlsqa</value>

   </property>

  </properties>

  <password-encrypted>{3DES}aP/xScCS8uI=</password-encrypted>

</jdbc-driver-params>

<jdbc-connection-pool-params>

  <test-connections-on-reserve>true</test-connections-on-reserve>

  <test-table-name>SQL SELECT 1 FROM DUAL</test-table-name>

</jdbc-connection-pool-params>

<jdbc-data-source-params>

  <jndi-name>jdbcDataSource2</jndi-name>

  <global-transactions-protocol>OnePhaseCommit</global-transactions-protocol>

</jdbc-data-source-params>

</jdbc-data-source>


<jdbc-data-source>

<name>jdbcNonXAMultiPool</name>

<jdbc-data-source-params>

  <jndi-name>jdbcDataSource</jndi-name>

  <algorithm-type>Failover</algorithm-type>

  <data-source-list>jdbcPool,jdbcPool2</data-source-list>

  <failover-request-if-busy>true</failover-request-if-busy>

</jdbc-data-source-params>

</jdbc-data-source>


 

Fast-Connection-Failover

WebLogic also supports Fast-Connection-Failover (FCF). This mechanism provides a means to receive event notification from the Oracle RAC nodes such as notification and cleanup of invalid connections, load balancing events, and node failures. In order to enable FCF, you must tweak the Oracle JDBC driver and add a couple additional properties to the data source connection such that it knows how to receive the ONS (Oracle Notification System) messages.

 

To enable FCF on a data source:

  1. In the WebLogic console, under the data source:
    1. In Driver Class Name, set the driver class to oracle.jdbc.pool.OracleDataSource.
    2. In Properties, set the ONS configuration string to subscribe to RAC's ONS messages, for example: ONSConfiguration=nodes=hostname1:port1,hostname2:port2
  2. Finally, make sure that ONS is properly configured on the RAC nodes and you have no blocking firewalls on those ports on either the RAC nodes or the application server nodes.
3 Comments Permalink

Search Server Failover

Posted by Alan Schroder Jan 21, 2009

Every Elastic Path administrator knows: if your search server goes down, your storefront is down and so is Commerce Manager. You can make your Elastic Path deployment more resilient by setting up search server failover.

 

In a nutshell, you set up two machines to run the search server web application. One is the main search server (master). The other is a backup (slave). Both machines are running behind a load balancer. The master search server uses rsync to synchronize multiple index files to the slave machine and then commits these new indexes into the slave's indexes. This essentially creates a duplicate of the master on the slave and, if the master server goes down, the slave server will be able to handle all requests.

 

The failover configuration consists of three components:

  • The master server
  • The slave server
  • The load balancer.

The load balancer is a PC running Apache web server with the mod_proxy module. The master server contains the search server web application, the index builders, and the indexes. The slave server also contains the search server web application, but it does not build its own indexes. Instead, the master uses rsync to replicate the search index files to the slave. Then, the data in the updated index files is committed into the slave search server's indexes. During normal operation, this machine receives search requests and forwards them to the master.

SSFailover_Figure1.png

If the master becomes unavailable, due to a failure or planned downtime, the load balancer redirects search requests to the slave.

SSFailover_Figure2.png

To make it all work, there are some scripts that need to get run by cron jobs on the master and slave. Note these scripts use rsync and Unix hard-links, so they only work on Linux/Unix environments.

 

You also need to set the searchHost setting in Elastic Path to point to the proxy server.

 

You can download the scripts and the setup documentation from the downloads page.

0 Comments 0 References Permalink