Technical Blog

2 Posts tagged with the caching tag

If some of the recent performance-related posts by our much cherished Get Elastic blogger extraordinaire Linda are starting to worry you, fear not! With the newly released Elastic Path 6.2, the Product Development and Performance teams at Elastic Path have done a fantastic job ramping up the standard performance of the product. I'll let them brag about the numbers at a later date, but today's post is to dig into the guts of the caching introduced into the system, and how you can start tweaking some of the configuration settings to squeek out every millisecond for page responses.

 

Within Elastic Path, there are now two caches that are available to be tweaked:

  • Application-level: Sitting between the view layer and the data access layer
  • Persistence-level: A Level-2 cache, within the OpenJPA ORM framework

 

Application Level Cache

All products loaded within the Storefront application via the StoreProductService will be from an Ehcache-backed cache by default. Each application is responsible for loading products via a ProductRetrieveStrategy. You'll notice storefront has two new configurations to facilitate using Ehcache:

 

Cache.xml:
     <bean id="productCache" class="org.springframework.cache.ehcache.EhCacheFactoryBean">
          <property name="timeToLive" value="600"/>
          <property name="timeToIdle" value="600"/>
     </bean>

     <bean id="cachingProductRetrieveStrategy" class="com.elasticpath.sfweb.service.impl.EhCacheProductRetrieveStrategyImpl">
          <property name="productService" ref="productService" />
          <property name="cache" ref="productCache" />
     </bean>

 

ServiceSF.xml:
    <alias name="cachedSettingsReader" alias="settingsReader"/>
    <alias name="cachingProductRetrieveStrategy" alias="productRetrieveStrategy"/>

 

You'll note that the storefront is now using aliases in the Spring configuration to override the same bean definitions in the default core service.xml. This allows the storefront to setup caching-specific classes. For tweaking purposes, the productCache bean definition should be updated to optimize the Ehcache settings. By default, Spring's EhCacheFactoryBean will initialize the cache to allow overflow to disk, use LRU eviction and limit the in-memory size to 10k objects. For catalogs with a larger product mix, these settings should be optimized and potentially moved to an distributed cache via Terracotta to keep the JVM heap size to a reasonable level.

 

You can choose to add new properties here to tweak these values, or add an ehcache.xml configuration file to the classpath and define a specific cache name as part of the productCache definition to use. A quick tip on monitoring the cache statistics for tweaking the settings during load test is to setup JMX monitoring for the storefront cache. This can be done via two steps:

 

Running the appserver with remote JMX enabled (no authentication for a non-production environment):

-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=6969 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false

 

Configuring the JMX beans for Ehcache in cache.xml. Note in this case we are explicitly setting up a cache manager, which we can also use to specify a custom Ehcache configuration file instead of the default ehcache.xml:

<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-2.5.xsd ">

    <bean id="productCache" class="org.springframework.cache.ehcache.EhCacheFactoryBean">
        <property name="timeToLive" value="600"/>
        <property name="timeToIdle" value="600"/>
        <property name="cacheManager" ref="cacheManager" />
    </bean>

    <bean id="cachingProductRetrieveStrategy" class="com.elasticpath.sfweb.service.impl.EhCacheProductRetrieveStrategyImpl">
        <property name="productService" ref="productService" />
        <property name="cache" ref="productCache" />
    </bean>

    <bean id="cacheManager"
        class="org.springframework.cache.ehcache.EhCacheManagerFactoryBean">
    </bean>

    <!-- Spring initialization of ehCache's mbeans -->
    <bean id="ehCacheMBeanRegistration"
        class="org.springframework.beans.factory.config.MethodInvokingFactoryBean">
        <property name="staticMethod"
            value="net.sf.ehcache.management.ManagementService.registerMBeans" />
        <property name="arguments">
            <list>
                <ref bean="cacheManager" />
                <ref bean="mbeanServer" />
                <value>true</value>
                <value>true</value>
                <value>true</value>
                <value>true</value>
            </list>
        </property>
    </bean>

    <bean id="mbeanServer" class="org.springframework.jmx.support.MBeanServerFactoryBean">
        <property name="locateExistingServerIfPossible" value="true" />
    </bean>

</beans>

 

Connecting up with jConsole lets us check the cache settings are properly configured, and to check the statistics, as per the below image. We should see updates to statistics as you browse the storefront and load more products:

screen-capture-1.png

 

 

Persistence Level Cache

As part of the Elastic Path 6.2 release, the included OpenJPA library has been upgraded to a 1.2.1 version, which overcomes some data cache issues in the previous 1.0.1 version. Please see the OpenJPA 1.2.1 documentation for all details on the native OpenJPA datacache. We'll go over some of the changes that enable the data cache using the native implementation.

 

Annotations:

All transactional persistent entities (anything submitted or updated regularly as an online transaction such as orders, payments and customers) have a new data cache annotation so that they are non-cacheable. By default, entities are enabled in the data cache unless this annotation is present and explicitly disabled the caching. All relatively static data, such as catalog entities should be part of the cache and thus will be missing this new annotation.

 

@DataCache(enabled = false)

 

Persistence.xml Configuration:

As part of the persistence unit configuration, three new properties are configured by default:

            <property name="openjpa.DataCache" value="true"/>
            <property name="openjpa.RemoteCommitProvider" value="sjvm"/>
            <property name="openjpa.DataCacheTimeout" value="1000"/>

 

These values are used for:

  • openjpa.DataCache - enabling the cache, and specifying the cache properties. We recommend tweaking this value to accomodate the cache size according to the size of the underlying data: (true(CacheSize=25000, SoftReferenceSize=0))
  • openjpa.RemoteCommitProvider - specifying the commit provider. For a clustered setup, evictions should be handled via configuring this setting to use JMS or TCP-based evictions.
  • openjpa.DataCacheTimeout - maximum time to live for entities in the cache

 

Also suggested is to tune the query cache, which is enabled with default values when the DataCache property is set to true:

<property name="openjpa.DataCache" value="true(CacheSize=25000, SoftReferenceSize=0)"/>
<property name="openjpa.RemoteCommitProvider" value="sjvm"/>
<property name="openjpa.QueryCache" value="CacheSize=25000, SoftReferenceSize=0"/>

 

Tip: Catching Cache Hits and Misses

If caching is enabled and you're still seeing a large amount of database queries from the storefront server, you can enable data cache Log4j tracing and grep out the hits and misses logged to track down which entities and queries are mistakingly hitting the database. Most of the times, these can be tracked down to accidental evictions or entities within the inheritance hierachy being disabled from cache.

 

log4j.category.openjpa.DataCache=TRACE

 

This spits out some logs similar to below that's easily greppable to count and track which entities are creating problems:

DEBUG 2009-09-10 12:56:33,918 org.apache.renamed.openjpa.lib.log.CommonsLogFactory$LogAdapter.trace(CommonsLogFactory.java:76) - Cache hit while looking up key "com.elasticpath.domain.attribute.impl.ProductTypeProductAttributeImpl-1".
DEBUG 2009-09-10 12:56:33,926 org.apache.renamed.openjpa.lib.log.CommonsLogFactory$LogAdapter.trace(CommonsLogFactory.java:76) - Cache miss while looking up key "org.apache.renamed.openjpa.datacache.QueryKey@b7ee9cab[query:[SELECT ps.skuCodeInternal FROM ProductSkuImpl ps

 

OpenJPA Cache Plugins

The native data cache within OpenJPA is architected to be swappable via plugin configuration. This allows the ability to swap in alternative caching technologies like Ehcache or Oracle Coherence to support extensive scalability requirements, as both Ehcache/Terracotta and Oracle Coherence support distributed caching setups. We've tested Ehcache and Coherence plugins internally with favourable results.

 

An Ehcache plugin is provided by the Ehcache group, however this version must be repackaged to match the custom package names of the Elastic Path specific org.apache.renamed.openjpa jar. Once a repackaged instance of the Ehcache-OpenJPA jar is in the classpath, the configuration changes to:

<property name="openjpa.DataCacheManager" value="ehcache"/>

 

Similarily enough, OpenJPA commiter Pinaki has posted the initial workings of an Oracle Coherence plugin on his blog, along with some additional JPA caching insights. Elastic Path with OpenJPA and Oracle Coherence is probably worth an entire blog entry in itself (coming soon!), but the same configuration settings apply, along with the need to repackage the code to point at the Elastic Path OpenJPA jar:

 

<property name="openjpa.DataCacheManager" value="coherence"/>

 

So there it is, two new caching mechanisms to tweak and fine-tune as part of load tests that should favourably reflect storefront page load times, and hopefully conversion rates and green dollar signs. Feel free to ask away about some of our load testing and tuning war stories. We'd be happy to talk about our hands on experience with the recent caching work.

6 Comments Permalink

We had a client who needed to display entire product categories (several hundred products in all) in a set of drop down list boxes on their Elastic Path Commerce storefront homepage. The data that had to be loaded was quite minimal, just the product codes, display names and a few localized attributes.

 

In this type of situation, using the default product loader to retrieve this information would not be the best approach. This is due to the fact that each Product domain object contains a large amount of data such as prices, skus, attributes, inventory and recommendations etc.  Loading all these details requires a significant number of queries to be run on the database so applying this approach to a homepage with hundreds of products would result in extremely poor performance.

 

For this customer, a better solution was to create a set of lightweight product display classes, and have these mapped directly to some custom JPA native queries. These queries were tailored to return only the specific product details needed and therefore avoid loading any unnecessary information.  To further reduce the number of database trips, we also introduced a timed cache in the storefront controllers which would store frequently accessed catalog items for a set period of time before being refreshed. 

 

The combination of these two techniques reduced their page response times from tens of seconds (using the default product loader) to under a few seconds even in the worst case scenario where a stale cache had to be refreshed.   Since most of the time the information would be available in the cache, the amount of database overhead was kept to a minimum.

 

If your storefront scenario has similar requirements, then the following code examples may also be useful for your project.

 

Using Native SQL Queries with JPA

To get all product codes and product names in a particular category, you can use the EntityManager's createNativeQuery method to create a native query with a WHERE clause that passes in the language string and a specific category UID.

 

final String productNamesByCatSql =

"SELECT tp.code, tpldf.display_name AS displayName"

+ " FROM tproduct AS tp"

+ " INNER JOIN tproductldf AS tpldf"

+ " ON tpldf.product_uid = tp.uidpk"

+ " INNER JOIN tproductcategory AS tpc"

+ " ON tp.uidpk = tpc.product_uid"

+ " WHERE tpldf.LOCALE = ?1"

+ " AND tpc.category_uid = ?2";

 

long categoryUid = getCategoryUid(request);

long start = System.currentTimeMillis();

 

// create a native SQL query

final Query query = entityManager.createNativeQuery(productNamesByCatSql, ProductDisplayBeanImpl.class);

 

// retrieve a simplified product list for a given category

final List<ProductDisplayBean> custProducts = (List<ProductDisplayBean>)

     query.setParameter(1, shoppingCart.getLocale().getLanguage())

     .setParameter(2, new Long(categoryUid))

     .getResultList();

 

if (LOG.isDebugEnabled()) {

     long elapsedTimeMillis = System.currentTimeMillis() - start;

     LOG.debug("No. of products returned = " + custProducts.size() + ", elapsed time (ms) = " + elapsedTimeMillis);

     for (ProductDisplayBean productBean : custProducts) {

          LOG.debug("[" + productBean.getCode() + ":" + productBean.getDisplayName() + "]");

     }

}

 

Like JPQL, native queries can be named for easy reuse using the @NamedNativeQuery annotation, however it's probably a better idea to externalize all the native query strings into an *orm.xml file so that the SQL can be updated without having to change the class files.  If you do that, you'll have to change all the createNativeQuery()calls to createNamedQuery().

 

A very simple display bean containing only the product name and code is needed just to pass the information to the storefront.

 

 

 

public interface ProductDisplayBean {

 

     public String getCode();

     public void setCode(String code);

     public String getDisplayName();

     public void setDisplayName(String displayName);

}

 

public class ProductDisplayBeanImpl implements ProductDisplayBean {

 

     public static final long serialVersionUID = 5000000001L;

     private String code;

     private String displayName;

 

     public String getCode() {

          return code;

     }

 

     public void setCode(String code) {

          this.code = code;

     }

     public String getDisplayName() {

          return displayName;

     }

     public void setDisplayName(String displayName) {

          this.displayName = displayName;

     }

}

Caching Frequently Accessed Items

 

Once you've retrieved a set of items from the catalog, you can easily cache these results at the application layer using either a third party solution, such as Ehcache (http://ehcache.sourceforge.net), or your own caching mechanism.  Elastic Path 6.1 already provides some out-of-the-box caching classes that you can use in your own code.  We normally use our SimpleTimeoutCache class to hold a list of frequently retrieved products that don't need to be updated very often. The following code sample shows how you could integrate SimpleTimeoutCache with your product retrieval service class.

 

private static final long CACHE_TIMEOUT = 30000;  // cache expires after 30 seconds

private final SimpleTimeoutCache<String, List<Product>> productsCache = new SimpleTimeoutCache<String, List<Product>>(CACHE_TIMEOUT);

...

String storeCode = shoppingCart.getStore().getCode();

List<StoreProduct> products = productsCache.get(storeCode);

 

// if the store code does not exist in the cache, then get it from the database

// and update the cache

if (products == null) {

     final IndexSearchResult productResults = retrieveProducts(browsingRequest, category);

     List<Long> productUids = getProductsUsingPageNumber(pageNumber, storeCode, productResults);

     products = getProductRetrieveStrategy().retrieveProducts(productUids,

     shoppingCart, productLoadTuner);

     productsCache.put(storeCode, products);

}

 

The timeout value can be adjusted for your requirements. A larger value will reduce the number of trips to the database but the trade-off is a longer wait time for catalog product changes to appear.

4 Comments Permalink