Technical Blog

3 Posts tagged with the load tag

In Star Trek there is a test named the Kobayashi Maru given to all Starfleet Academy cadets. In this test the civilian vessel Kobayashi Maru is disabled and trapped in the Klingon Neutral Zone, and the cadet must choose whether to leave the ship to certain destruction or to attempt a rescue but risk the lives of their own ship and crew and possibly provoke the Klingons into an all-out war. If the cadet chooses to attempt a rescue the simulation inevitably ends in battle and complete destruction of the cadet's ship. The test is designed as a no-win situation to test the cadet's character.

 

How is this relevant to OpenJPA and EP you ask? In our case the Kobayashi Maru is your e-commerce store - stuck in the neutral zone between development and go live, disabled due to performance issues caused by a large object graph. The cadet is you, the developers. The Klingons represent the project stakeholders who will fight you to meet deadlines whilst requiring quality maintainable and functional code. What makes it a no-win situation is there is no simple way to tell OpenJPA to load just specific parts of the object graph. You have the following options:

 

  • Try to use the EP load tuners to flag which parts of the object graph you are not interested in.
    • For example, the ProductLoadTuner lets you specify setLoadingCategories(false) to avoid loading all of the categories that a product belongs to. Seems sound, right? The only problem is it doesn't work! For legacy reasons there are many service methods and queries that load a product (either directly or as part of another object graph) and expect the categories - so a long time ago the JPA annotations on ProductImpl.getProductCategories() was set to FetchType.EAGER. This means JPA will always load the categories. If you change the annotation to use FetchType.LAZY then you would need to find everywhere in the system that loads a product expecting to see categories and change it to use the appropriate load tuner. You don't have that much time, and so you are defeated by the Klingons.
  • Create a new OpenJPA FetchGroup which is a way of defining exactly which fields to load.
    • This is the strategy that has already been used in parts of the code - for example when building the product index. The drawbacks of this approach is that a fetch group will only load the specified fields, and you can only define a fetch group through annotations in the same classes that the fields are defined in. If you are trying to use Binary Based Development then this option is already disqualified. If you are happy to make core changes, then you need to edit every class in the object graph to define the fields that should be loaded as part of your fetch group. A quick search shows the PRODUCT_INDEX fetch group is defined across 20 classes! Miss one field in one class and you get an unexpected NullPointerException. You don't have time to do all the testing required to ensure you have all the necessary fields, and so you are either defeated by the Klingons or your ship implodes later due to a null pointer in the warp core.
  • Change the OpenJPAFetchPlanHelperImpl to remove the default fetch group.
    • This is a special fetch group created by OpenJPA (FetchGroup.NAME_DEFAULT) which includes all basic fields, *ToOne fields, and other fields marked with FetchType.EAGER. Remove this and you then need to add back all the fields that you want to load. So you are most likely in the same situation as above. Even if you try to do something clever like having a spring injected list of fields to include, this will need to be maintained in the future when new fields are added to any of the classes involved, and you still need to do exhaustive testing to ensure no essential field was missed. So the Klingons win again.

 

 

I am now going to play the role of Captain James Kirk. Kirk stated "I don't believe in the no-win scenario", and his response to the Kobayashi Maru test was to subtely reprogram the simulator to make it possible to rescue the vessel. I'm going to reprogram our simulator using some of the OpenJPA API.

 

We can use the OpenJPA metadata API to recreate the "default" fetch plan in such a way as to allow ad-hoc addition and removal from this plan. Here's a code snippet that does this:

 

protected Set<String> getDefaultFetchFields(final Class< ? > persistenceClass, final EntityManager entityManager, final Set<Class<?>> configuredObjects) {
     Set<String> fetchFields = new HashSet<String>();
 
     if (configuredObjects == null) {
          configuredObjects = new HashSet<Class<?>>();
     } else if (configuredObjects.contains(persistenceClass)) {
          return fetchFields;
     }
     configuredObjects.add(persistenceClass);
 
     ClassMetaData classMetaData = JPAFacadeHelper.getMetaData(entityManager, persistenceClass);
     FieldMetaData[] fields = classMetaData.getFields();
     for (FieldMetaData field : fields) {
          if (field.isInDefaultFetchGroup()) {
               fetchFields.add(field.getFullName(false));
               if (field.isTypePC()) {
                    fetchFields.addAll(getDefaultFetchFields(field.getType(), entityManager, configuredObjects));
               } else if (field.getElement() != null && field.getElement().isTypePC()) {
                    getDefaultFetchFields(field.getElement().getType(), entityManager, configuredObjects);
               }
          }
     }
 
     return fetchFields;
}

 

The first block keeps track of any classes we have already processed since there can be self-references or bi-directional relationships in an object graph. We then use OpenJPA's JPAFacadeHelper class to get the metadata for the given class. This provides a collection of persistence fields - both in the current class and any of its superclasses. We loop through these fields and examine each one in turn. The field metadata will tell us if the field is in the default fetch group, and if so we add it to the list of fields to include. Next, we check if the field is itself a PersistenceCapable class, in which case we recurse down into the fields of that class, and so on. For a *ToMany relationship the field will have an element which may be PersistenceCapable so we recurse into that similarly.

 

The results of all this is a list of (fully qualified) field names that are in the default fetch group for the entire object graph of the given class. We can then tweak this list, removing any fields we don't want loaded and adding other (non-default) fields we specifically want to load. We can then tell OpenJPA to remove the default fetch group, and give it our list as the definitive list of fields to load. For example:

 

Set<String> productFetchFields = getDefaultFetchFields(beanFactory.getBeanImplClass(ContextIdNames.PRODUCT), entityManager, null);
fetchPlan.removeFetchGroup(FetchGroup.NAME_DEFAULT);
fetchPlan.addFields(productFetchFields);
fetchPlan.removeField(ProductImpl.class, "productCategories"); 

 

We now have a fetch plan that tells OpenJPA to load a product without the categories!

 

Using this technique you can easily define to a fine degree exactly what gets loaded in your object graph and thus rescue the civilian vessel without arousing the wrath of the Klingons.

 

In my next mission as captain of this starship I will refactor the OpenJPAFetchPlanHelperImpl to use this technique which will enable the load tuners flags to actually work as expected!  I'll also add methods to that class (and interface) to allow easy removal of fields and also a method for removal of all fields (except the uidPk) of a class. Sometimes having that object "reference" on the object graph is sufficient without requiring any of the other fields, and it would be tiresome to remove the fields from the plan individually. For an encore I will replace all the hard-coded @FetchGroup annotations with a spring-injectable plan which is more extension-friendly (whilst still preserving current behaviour).

 

Feeback welcome, in the meantime Live Long and Prosper!

10 Comments Permalink

Recently, I needed to reproduce a problem that was only occurring when the storefront was under heavy load. As a developer, I cannot easily generate realistic load in my environment and I don't have access to a live production system, so I used JMeter to help me out. JMeter is a Java application for performing load tests on web applications. In this post, I'll show how to use JMeter to quickly set up a simple performance test against the Elastic Path Commerce storefront.

 

I wanted to see how long it would take to perform a search in the storefront for Canon products. In this case, I wanted JMeter to fire the following query a few hundred times using one or more threads: http://demo.elasticpath.com:8080/storefront/search.ep?categoryId=&keyWords=canon&submit=search

 

First, I created a thread group under the test plan tree node. (Right-click the test plan and choose Add->Thread Group.) If you want to simulate more than one user hitting the same search at the same time, you can set the number of threads in the thread group configuration.

img1_1.png

 

Next, I added a loop controller to my thread group. (Right-click the thread group and click Add->Logic Controller->Loop Controller.) Here, I defined how many times the loop will run. In my case, I checked the Infinite checkbox.

img2.png

 

Then, I added an HTTP request to the loop controller. (Right-clicking the loop controller and click Add->Sampler->HTTP Request.) I set the Server Name, Server Port, URL Path, chose GET in the dropdown and set three parameters in the request URL.

img3_1.png

 

At this point, I started the test by clicking Run->Start. JMeter starts to execute the HTTP request I defined in the loop controller. Since I set the loop count to Forever, it will continue to execute until I stop it.

 

Now, it would be a good idea to attach some reports to debug the execution. I tried a few reports and found that the View Results Tree presented all the information I needed. It shows the raw request sent by JMeter and the response data the server is providing. You can examine the server response to make sure the HTTP request you configured is doing what it is expected to do.

 

To add a View Results Tree report to your test plan, right-click your HTTP request test, and choose Add->Listener->View Results Tree.

You can finally see the results of each call on the report.

 

img4.png

 

You can build more complex tests, with several simultaneous requests over different URLs. For example, you can configure JMeter to first simulate a user logging in to your application (and hold session information), then a few requests to simulate the user navigating through the web application and then repeat this test a few hundred times using as many threads as you want (or your machine can handle).

 

Using JMeter, I was able to simulate a sufficently heavy load to reproduce the problem I wanted to reproduce and this post should help you do the same thing if you face a similar problem. For more details on JMeter, go to: http://jakarta.apache.org/jmeter/

 

Hope you enjoy!

2 Comments Permalink

Apache HTTP Server is a very effective tool for caching static content and, if configured properly, can improve performance of your Elastic Path deployment by up to 30%! Furthermore, Apache does a great job of load balancing a cluster of storefront nodes, giving you even more throughput and scalability, without resorting to expensive hardware load balancers. Obviously, Apache will never perform like a hardware load balancer, but it is a little more affordable (read: free). So really, what more can you ask for from an HTTP server?

 

In this post, we'll look at using Apache to load balance our storefront servers. We'll also look at enabling caching of static content at the Apache level, removing a lot of network and CPU load from our application servers and giving a faster load time to browsers. Before we begin, make sure you have the following:

 

  • Apache HTTP Server 2.2.10+ with either JBoss 4.2+ or Tomcat 5.5+ (using Apache with WebLogic is more complicated and requires the use of a specific Oracle-WebLogic Apache plug-in.)
  • Apache has been built with the following modules: mod_proxy, mod_proxy_ajp, mod_proxy_balancer, mod_cache, mod_disk_cache.

 

Configuring a Proxy and Static Content Cache

Let's start by creating a proxy server and caching static content at the Apache level. This is relatively easy to set up, but important to understand before moving on to load balancing. We'll assume Apache is the front-most facing component to the user's browser. The architecture will look something like the following diagram.

 

ApacheSimple.jpg

 

Let's examine a request working it's way through this architecture. A typical first request from a shopper's browser, such as viewing a product page, will flow through Apache (bypassing all caches since they're empty) and arrive at the application server. The application server will gather and serve the necessary HTML and subsequent embedded objects (images, js, css, etc). These objects will pass back through Apache and to the user's browser. The key process here, however, is that as these static objects pass back through Apache, Apache will cache them based on their cache control headers.

 

When a request comes in for the same product page (or any request for the same set of static HTML objects), Apache will serve the static objects straight back to the user's browser from its cache. Only the dynamic HTML and other dynamic content will come from the app server. Although a second load of the same page on the same user's browser will already be cached at the user's browser level, it will be very useful for new sessions that have an empty browser cache.

 

Unfortunately, there are a couple issues we need to think about before we can implement this setup, such as:

 

  • How do we communicate between Apache and the app server?
  • What protocol do we use between Apache and the app server, HTTP or AJP?
  • How do we support Acegi security, which is required by the storefront application servers?

 

Don't worry! We did a fair amount of performance testing to answer these questions, and came up with the following diagram.

 

ApacheProtocols.jpg

 

The key here, is that a) we're using AJP between Apache and the app server, a fast binary protocol, and b) we're using two separate AJP connectors on the app server, one non-secure for HTTP traffic and one considered "secure" for HTTPS traffic. This allows Acegi to know that a request is "secure" so that it will not try to redirect endlessly to a secure port (a typical problem we see). I'm putting "secure" in quotes because it's really no different than the insecure channel (it's not encrypted). It simply has additional header information stating it's a secure channel.

 

In order to implement this, there are a number of items to configure such as Apache's mod_proxy and mod_cache, as well as any cache control configuration that needs to be done on the application server.

 

mod_proxy

We need to allow requests that come in to Apache to pass through to the application server and then return to the user. This is done using Apache's mod_proxy module. The full mod_proxy documentation is here: http://httpd.apache.org/docs/2.2/mod/mod_proxy.html. It's a recommended read. We'll also be using the mod_proxy_ajp module for AJP support.

 

The first step is to enable the two AJP connectors on the application server, in server.xml (or jboss-server.xml):

<Connector enableLookups="false" port="8009" protocol="AJP/1.3"/>
<Connector enableLookups="false" port="8010" protocol="AJP/1.3" scheme="https" secure="true"/>

 

Note the secure parameters for port 8010. This fools Acegi into thinking that anything coming over this port with AJP is a secure connection and it will not redirect it.

 

The second step is to ensure Acegi knows it may receive connections over port 80 and its secure mapped port is then 443 (the typical HTTP and HTTPS ports). To do this, we edit the storefront web app's WEB-INF/conf/spring/security/acegi.xml file and add an additional port mapping to the portMapper bean as follows:

    <!-- port # are specified in default.xml -->
    <bean id="portMapper" class="org.acegisecurity.util.PortMapperImpl">
        <property name="portMappings">
            <map>
                <entry key="80"><value>443</value></entry>
                <entry key="8080"><value>8443</value></entry>
            </map>
        </property>
    </bean>

 

 

In the third and final step, we want to configure the HTTP and HTTPS virtual hosts on Apache to listen to ports 80 and 443.


LoadModule proxy_ajp_module modules/mod_proxy_ajp.so

<VirtualHost 10.10.90.54:80>
        ServerName 10.10.90.54
        ProxyPreserveHost On
        ProxyPass /storefront ajp://10.10.90.54:8009/storefront keepalive=On
</VirtualHost>

<VirtualHost 10.10.90.54:443>
        ServerName 10.10.90.54
        # Enable/Disable SSL for this virtual host if you want to terminate SSL here
        ProxyPreserveHost On
        ProxyPass /storefront ajp://10.10.90.54:8010/storefront keepalive=On
</VirtualHost>

 

There's a lot going on here, so let's have a look at the HTTPS:443 virtual host as it's the more complex one here:

  1. Clearly, one would want to configure the virtual hosts to listen on the specific machine's port.
  2. Within here is where we would do any SSL termination before passing the request over AJP to the app server.
  3. "ProxyPreserveHost On" ensures the Host header is maintained as it's passed to the app server. This is required for Elastic Path 6.1 and later to be able to handle multi-store requests.
  4. The ProxyPass directive is the key here. This tells Apache to pass any requests coming in matching /storefront to the app server's AJP connector under /storefront.
  5. There are a large number of options for this directive, including maintaining keepalive, as we've done here.
  6. Note that the storefront server doesn't have to be the localhost. We'll see this later when we being load balancing.

 

At this point, after rebooting, you should be able to hit Apache on port 80 and pull up your storefront.

mod_cache

Next, we want to cache any static content we can on the Apache side. To do this, we'll use mod_cache, or more specifically mod_disk_cache. There is also mod_mem_cache, which is a memory based cache, but we've actually found better performance results with mod_disk_cache, plus the persistence of all cache files is a plus.

 

Adding the httpd.conf directives for a disk cache is fairly straightforward. Let's try the following:

 

CacheEnable disk /storefront/
CacheRoot /var/www/cache
CacheDirLevels 5
CacheDirLength 2
CacheIgnoreHeaders Set-Cookie

 

Looking at the lines in detail:

 

  1. Enable the disk cache on the URL /storefront/.
  2. Specify the cache location on the local disk, in this case /var/www/cache. You'll want to make sure the Apache user can write to that directory.
  3. The number of directory levels in the cache tree structure.
  4. The number of characters for each directory.
  5. Finally, we specify which headers we DO NOT want to cache. This is essential. If we don't set this for cookies, we will end up getting someone else's session!

 

At this point, after rebooting Apache, we will begin to cache any static objects with cache control headers. In order to expand on what is (or isn't cached), let's move on to the next section.

 

Cache Control Header Config

Finally, we want the application server, or more specifically, the deployed applications, to tell Apache if there's anything to cache. This is typically done by using cache control headers, such as max-age.

 

For the storefront web application, you can use the Caching Control Filter to add the max-age cache control header to requests for specific types of content (based on URL patterns). The Caching Control Filter configuration is in the storefront's conf/spring/web/filter-config.xml file, in the cachingControlFilter bean definition. The cachingControlEntries list contains bean definitions that represent the URL patterns to test and max-age value to set.

 

The following is an example of caching all /renderImage.image dynamic image calls, all /template-resources/ calls (css, js, etc) and any dynamic content assets under /content/:

 

<bean id="cachingControlFilter"
     class="com.elasticpath.commons.filter.impl.CachingControlFilter">
     <property name="cachingControlEntries">
          <list>
               <bean class="com.elasticpath.commons.filter.impl.CachingControlFilter$CachingControlEntry">
                    <property name="urlPattern">
                         <value>^.*renderImage\.image.*$</value>
                    </property>
                    <property name="maxAge">
                         <value>86400</value>
                    </property>
               </bean>
               <bean class="com.elasticpath.commons.filter.impl.CachingControlFilter$CachingControlEntry">
                    <property name="urlPattern">
                         <value>^.*template-resources.*$</value>
                    </property>
                    <property name="maxAge">
                         <value>86400</value>
                    </property>
               </bean>
               <bean class="com.elasticpath.commons.filter.impl.CachingControlFilter$CachingControlEntry">
                          <property name="urlPattern">
                                  <value>^.*content.*$</value>
                    </property>
                    <property name="maxAge">
                         <value>86400</value>
                    </property>
               </bean>
          </list>
     </property>
</bean>

 

 

Now, after restarting Apache, you should have a fully functioning Apache proxy with proper caching of static content.

 

 

Configuring Load Balancing

Load balancing is an easy extension once our proxy is set up. Essentially, with load balancing, instead of passing the request through to the same machine each time, we pass it to a cluster of machines (two or more) based on a certain algorithm. I recommend reading the complete Apache documentation on mod_proxy_balancer, which is the module we'll use to enable load balancing. It can be found here: http://httpd.apache.org/docs/2.2/mod/mod_proxy_balancer.html

 

Let's first lay out the Apache configuration, adding to our existing VirtualHost entries.

<VirtualHost 10.10.90.54:80>

        ServerName 10.10.90.54

        # ProxyPreserveHost On
        RequestHeader set Host mars.elasticpath.net

        <Proxy balancer://tomcatservers>
                BalancerMember ajp://localhost:9009 route=node1 loadfactor=90
                BalancerMember ajp://10.10.90.51:9009 route=node2 loadfactor=100
                BalancerMember ajp://10.10.90.52:9009 route=node3 loadfactor=100
                BalancerMember ajp://10.10.90.53:9009 route=node4 loadfactor=100
        </Proxy>

        ProxyPass /storefront balancer://tomcatservers/storefront stickysession=JSESSIONID nofailover=Off
        ProxyPass /server-status !

</VirtualHost>

<VirtualHost 10.10.90.54:443>

        ServerName 10.10.90.54

        LogLevel warn
        #CustomLog logs/ssl_request_log "%t %h %{SSL_PROTOCOL}x %{SSL_CIPHER}x \"%r\" %b"
        LogFormat "%h %l %u %t \"%r\" %>s %b" common
        CustomLog logs/ssl_access_log common
        ErrorLog logs/ssl_error_log

        #   SSL Engine Switch:
        #   Enable/Disable SSL for this virtual host.
        SSLEngine on
        SSLCipherSuite ALL:!ADH:!EXPORT56:RC4+RSA:+HIGH:+MEDIUM:+LOW:+SSLv2:+EXP:+eNULL
        SSLCertificateFile "/usr/local/apache2/conf/server.crt"
        SSLCertificateKeyFile "/usr/local/apache2/conf/server.key"

        #DocumentRoot    "/var/www/html/one"

        # ProxyPreserveHost On
        RequestHeader set HOST mars.elasticpath.net

        SetEnvIf User-Agent ".*MSIE.*" nokeepalive ssl-unclean-shutdown downgrade-1.0 force-response-1.0

        <Proxy balancer://tomcatservers-ssl>
                BalancerMember ajp://localhost:9010 route=node1 loadfactor=90
                BalancerMember ajp://10.10.90.51:9010 route=node2 loadfactor=100
                BalancerMember ajp://10.10.90.52:9010 route=node3 loadfactor=100
                BalancerMember ajp://10.10.90.53:9010 route=node4 loadfactor=100
        </Proxy>

        ProxyPass /storefront balancer://tomcatservers-ssl/storefront stickysession=JSESSIONID nofailover=off

</VirtualHost>

 

We've seen the VirtualHost entries before, so let's just look at the Proxy balancer configuration in detail. We'll zoom in on this below for the insecure, port 80 connector:

 

...
        <Proxy balancer://tomcatservers>
                BalancerMember ajp://localhost:9009 route=node1 loadfactor=80
                BalancerMember ajp://10.10.90.51:9009 route=node2 loadfactor=100
                BalancerMember ajp://10.10.90.52:9009 route=node3 loadfactor=100
                BalancerMember ajp://10.10.90.53:9009 route=node4 loadfactor=100
        </Proxy>

        ProxyPass /storefront balancer://tomcatservers/storefront stickysession=JSESSIONID nofailover=Off
        ProxyPass /server-status !
...

 

Within the Proxy balancer directive, we've named our balancer <Proxy balancer://tomcatservers> and defined our load balancer members, for example BalancerMember ajp://10.10.90.51:9009 route=node2 loadfactor=100. In this case, we have 4 storefronts included, 1 being on the localhost (the same machine as the Apache install). We've opened up AJP port 9009 on these machines and configured all to have an even load factor, except the first node, which has slightly lower factor to give breathing room for Apache on the same machine.

 

The next directive, ProxyPass /storefront balancer://tomcatservers/storefront, we specify our ProxyPass to allow requests to /storefront* to pass to the balancer's /storefront*.  Note that we're also specifying the cookie name we want to keep stuck to each storefront node, so subsequent requests return to the same node. In our case, this is the JSESSIONID.

 

The last key setting in here is the route=nodeN on each BalancerMember. This is the name you configure for a node's jvmRoute within the app's server.xml. This allows Apache and the application server to identify which requests will go to which node. Without this setting (and/or the stickysession setting), the user's session may bounce between storefront nodes. This will cause strange behavior, like gettin bounced back to the homepage.

 

To set the jvmRoute within the server.xml, look for a commented-out line like the following:

 

<!-- You should set jvmRoute to support load-balancing via AJP ie :
<Engine name="Catalina" defaultHost="localhost" jvmRoute="node1">        
-->  

 

Uncomment this and change jvmRoute="" to be the same as your BalancerMember entry (or vice versa). The same configuration as above is done for the secure connectors, which, in this case are on port 9010.

 

After rebooting Apache, you should be getting load balanced to a specific node in the cluster and stay on that node for subsequent requests. Your HTML assets will also be getting cached at the Apache layer as they pass through the proxy.

 

Now you can cache and load balance storefront servers with Apache HTTP Server. Go ahead and try it. Once you're set up, I would recommend tailing your Apache and app server access logs to watch your requests pass through Apache and your app server and ensure they're using sticky sessions correctly. Increasing the access log level on Apache and the app server to output cookie names/values is handy if you need to debug any sticky session config issues.

 

Some Final Considerations

  • There are some known issues around keep-alive and some older versions of Apache HTTP and Tomcat where the AJP connections between the two will not get released, causing the connection pool to fill and not allow new requests.
  • Consider using Apache's htcacheclean, which runs as a daemon or a one-time job, to control the size of your Apache cache on the disk. If your website has a small, finite number of cacheable HTML objects, this typically isn't a huge issue. On the other hand, if you have many GBs of assets and want to keep your cache to, say, 500 MB, htcacheclean is your tool. See the documentation for full details: http://httpd.apache.org/docs/2.2/programs/htcacheclean.html
  • Test, test, test. Make sure you do proper functional testing on a staging environment to ensure there are no strange redirects or odd behavior after putting another layer between your ecommerce site and the user. And just as importantly, proper performance testing will ensure there are no capacity issues between Apache and the app server. This will allow you to fine tune your connection pools for maximum performance, both on the Apache side and the app server side.
1 Comments Permalink