Technical Blog

1 Post tagged with the proxy tag

Apache HTTP Server is a very effective tool for caching static content and, if configured properly, can improve performance of your Elastic Path deployment by up to 30%! Furthermore, Apache does a great job of load balancing a cluster of storefront nodes, giving you even more throughput and scalability, without resorting to expensive hardware load balancers. Obviously, Apache will never perform like a hardware load balancer, but it is a little more affordable (read: free). So really, what more can you ask for from an HTTP server?

 

In this post, we'll look at using Apache to load balance our storefront servers. We'll also look at enabling caching of static content at the Apache level, removing a lot of network and CPU load from our application servers and giving a faster load time to browsers. Before we begin, make sure you have the following:

 

  • Apache HTTP Server 2.2.10+ with either JBoss 4.2+ or Tomcat 5.5+ (using Apache with WebLogic is more complicated and requires the use of a specific Oracle-WebLogic Apache plug-in.)
  • Apache has been built with the following modules: mod_proxy, mod_proxy_ajp, mod_proxy_balancer, mod_cache, mod_disk_cache.

 

Configuring a Proxy and Static Content Cache

Let's start by creating a proxy server and caching static content at the Apache level. This is relatively easy to set up, but important to understand before moving on to load balancing. We'll assume Apache is the front-most facing component to the user's browser. The architecture will look something like the following diagram.

 

ApacheSimple.jpg

 

Let's examine a request working it's way through this architecture. A typical first request from a shopper's browser, such as viewing a product page, will flow through Apache (bypassing all caches since they're empty) and arrive at the application server. The application server will gather and serve the necessary HTML and subsequent embedded objects (images, js, css, etc). These objects will pass back through Apache and to the user's browser. The key process here, however, is that as these static objects pass back through Apache, Apache will cache them based on their cache control headers.

 

When a request comes in for the same product page (or any request for the same set of static HTML objects), Apache will serve the static objects straight back to the user's browser from its cache. Only the dynamic HTML and other dynamic content will come from the app server. Although a second load of the same page on the same user's browser will already be cached at the user's browser level, it will be very useful for new sessions that have an empty browser cache.

 

Unfortunately, there are a couple issues we need to think about before we can implement this setup, such as:

 

  • How do we communicate between Apache and the app server?
  • What protocol do we use between Apache and the app server, HTTP or AJP?
  • How do we support Acegi security, which is required by the storefront application servers?

 

Don't worry! We did a fair amount of performance testing to answer these questions, and came up with the following diagram.

 

ApacheProtocols.jpg

 

The key here, is that a) we're using AJP between Apache and the app server, a fast binary protocol, and b) we're using two separate AJP connectors on the app server, one non-secure for HTTP traffic and one considered "secure" for HTTPS traffic. This allows Acegi to know that a request is "secure" so that it will not try to redirect endlessly to a secure port (a typical problem we see). I'm putting "secure" in quotes because it's really no different than the insecure channel (it's not encrypted). It simply has additional header information stating it's a secure channel.

 

In order to implement this, there are a number of items to configure such as Apache's mod_proxy and mod_cache, as well as any cache control configuration that needs to be done on the application server.

 

mod_proxy

We need to allow requests that come in to Apache to pass through to the application server and then return to the user. This is done using Apache's mod_proxy module. The full mod_proxy documentation is here: http://httpd.apache.org/docs/2.2/mod/mod_proxy.html. It's a recommended read. We'll also be using the mod_proxy_ajp module for AJP support.

 

The first step is to enable the two AJP connectors on the application server, in server.xml (or jboss-server.xml):

<Connector enableLookups="false" port="8009" protocol="AJP/1.3"/>
<Connector enableLookups="false" port="8010" protocol="AJP/1.3" scheme="https" secure="true"/>

 

Note the secure parameters for port 8010. This fools Acegi into thinking that anything coming over this port with AJP is a secure connection and it will not redirect it.

 

The second step is to ensure Acegi knows it may receive connections over port 80 and its secure mapped port is then 443 (the typical HTTP and HTTPS ports). To do this, we edit the storefront web app's WEB-INF/conf/spring/security/acegi.xml file and add an additional port mapping to the portMapper bean as follows:

    <!-- port # are specified in default.xml -->
    <bean id="portMapper" class="org.acegisecurity.util.PortMapperImpl">
        <property name="portMappings">
            <map>
                <entry key="80"><value>443</value></entry>
                <entry key="8080"><value>8443</value></entry>
            </map>
        </property>
    </bean>

 

 

In the third and final step, we want to configure the HTTP and HTTPS virtual hosts on Apache to listen to ports 80 and 443.


LoadModule proxy_ajp_module modules/mod_proxy_ajp.so

<VirtualHost 10.10.90.54:80>
        ServerName 10.10.90.54
        ProxyPreserveHost On
        ProxyPass /storefront ajp://10.10.90.54:8009/storefront keepalive=On
</VirtualHost>

<VirtualHost 10.10.90.54:443>
        ServerName 10.10.90.54
        # Enable/Disable SSL for this virtual host if you want to terminate SSL here
        ProxyPreserveHost On
        ProxyPass /storefront ajp://10.10.90.54:8010/storefront keepalive=On
</VirtualHost>

 

There's a lot going on here, so let's have a look at the HTTPS:443 virtual host as it's the more complex one here:

  1. Clearly, one would want to configure the virtual hosts to listen on the specific machine's port.
  2. Within here is where we would do any SSL termination before passing the request over AJP to the app server.
  3. "ProxyPreserveHost On" ensures the Host header is maintained as it's passed to the app server. This is required for Elastic Path 6.1 and later to be able to handle multi-store requests.
  4. The ProxyPass directive is the key here. This tells Apache to pass any requests coming in matching /storefront to the app server's AJP connector under /storefront.
  5. There are a large number of options for this directive, including maintaining keepalive, as we've done here.
  6. Note that the storefront server doesn't have to be the localhost. We'll see this later when we being load balancing.

 

At this point, after rebooting, you should be able to hit Apache on port 80 and pull up your storefront.

mod_cache

Next, we want to cache any static content we can on the Apache side. To do this, we'll use mod_cache, or more specifically mod_disk_cache. There is also mod_mem_cache, which is a memory based cache, but we've actually found better performance results with mod_disk_cache, plus the persistence of all cache files is a plus.

 

Adding the httpd.conf directives for a disk cache is fairly straightforward. Let's try the following:

 

CacheEnable disk /storefront/
CacheRoot /var/www/cache
CacheDirLevels 5
CacheDirLength 2
CacheIgnoreHeaders Set-Cookie

 

Looking at the lines in detail:

 

  1. Enable the disk cache on the URL /storefront/.
  2. Specify the cache location on the local disk, in this case /var/www/cache. You'll want to make sure the Apache user can write to that directory.
  3. The number of directory levels in the cache tree structure.
  4. The number of characters for each directory.
  5. Finally, we specify which headers we DO NOT want to cache. This is essential. If we don't set this for cookies, we will end up getting someone else's session!

 

At this point, after rebooting Apache, we will begin to cache any static objects with cache control headers. In order to expand on what is (or isn't cached), let's move on to the next section.

 

Cache Control Header Config

Finally, we want the application server, or more specifically, the deployed applications, to tell Apache if there's anything to cache. This is typically done by using cache control headers, such as max-age.

 

For the storefront web application, you can use the Caching Control Filter to add the max-age cache control header to requests for specific types of content (based on URL patterns). The Caching Control Filter configuration is in the storefront's conf/spring/web/filter-config.xml file, in the cachingControlFilter bean definition. The cachingControlEntries list contains bean definitions that represent the URL patterns to test and max-age value to set.

 

The following is an example of caching all /renderImage.image dynamic image calls, all /template-resources/ calls (css, js, etc) and any dynamic content assets under /content/:

 

<bean id="cachingControlFilter"
     class="com.elasticpath.commons.filter.impl.CachingControlFilter">
     <property name="cachingControlEntries">
          <list>
               <bean class="com.elasticpath.commons.filter.impl.CachingControlFilter$CachingControlEntry">
                    <property name="urlPattern">
                         <value>^.*renderImage\.image.*$</value>
                    </property>
                    <property name="maxAge">
                         <value>86400</value>
                    </property>
               </bean>
               <bean class="com.elasticpath.commons.filter.impl.CachingControlFilter$CachingControlEntry">
                    <property name="urlPattern">
                         <value>^.*template-resources.*$</value>
                    </property>
                    <property name="maxAge">
                         <value>86400</value>
                    </property>
               </bean>
               <bean class="com.elasticpath.commons.filter.impl.CachingControlFilter$CachingControlEntry">
                          <property name="urlPattern">
                                  <value>^.*content.*$</value>
                    </property>
                    <property name="maxAge">
                         <value>86400</value>
                    </property>
               </bean>
          </list>
     </property>
</bean>

 

 

Now, after restarting Apache, you should have a fully functioning Apache proxy with proper caching of static content.

 

 

Configuring Load Balancing

Load balancing is an easy extension once our proxy is set up. Essentially, with load balancing, instead of passing the request through to the same machine each time, we pass it to a cluster of machines (two or more) based on a certain algorithm. I recommend reading the complete Apache documentation on mod_proxy_balancer, which is the module we'll use to enable load balancing. It can be found here: http://httpd.apache.org/docs/2.2/mod/mod_proxy_balancer.html

 

Let's first lay out the Apache configuration, adding to our existing VirtualHost entries.

<VirtualHost 10.10.90.54:80>

        ServerName 10.10.90.54

        # ProxyPreserveHost On
        RequestHeader set Host mars.elasticpath.net

        <Proxy balancer://tomcatservers>
                BalancerMember ajp://localhost:9009 route=node1 loadfactor=90
                BalancerMember ajp://10.10.90.51:9009 route=node2 loadfactor=100
                BalancerMember ajp://10.10.90.52:9009 route=node3 loadfactor=100
                BalancerMember ajp://10.10.90.53:9009 route=node4 loadfactor=100
        </Proxy>

        ProxyPass /storefront balancer://tomcatservers/storefront stickysession=JSESSIONID nofailover=Off
        ProxyPass /server-status !

</VirtualHost>

<VirtualHost 10.10.90.54:443>

        ServerName 10.10.90.54

        LogLevel warn
        #CustomLog logs/ssl_request_log "%t %h %{SSL_PROTOCOL}x %{SSL_CIPHER}x \"%r\" %b"
        LogFormat "%h %l %u %t \"%r\" %>s %b" common
        CustomLog logs/ssl_access_log common
        ErrorLog logs/ssl_error_log

        #   SSL Engine Switch:
        #   Enable/Disable SSL for this virtual host.
        SSLEngine on
        SSLCipherSuite ALL:!ADH:!EXPORT56:RC4+RSA:+HIGH:+MEDIUM:+LOW:+SSLv2:+EXP:+eNULL
        SSLCertificateFile "/usr/local/apache2/conf/server.crt"
        SSLCertificateKeyFile "/usr/local/apache2/conf/server.key"

        #DocumentRoot    "/var/www/html/one"

        # ProxyPreserveHost On
        RequestHeader set HOST mars.elasticpath.net

        SetEnvIf User-Agent ".*MSIE.*" nokeepalive ssl-unclean-shutdown downgrade-1.0 force-response-1.0

        <Proxy balancer://tomcatservers-ssl>
                BalancerMember ajp://localhost:9010 route=node1 loadfactor=90
                BalancerMember ajp://10.10.90.51:9010 route=node2 loadfactor=100
                BalancerMember ajp://10.10.90.52:9010 route=node3 loadfactor=100
                BalancerMember ajp://10.10.90.53:9010 route=node4 loadfactor=100
        </Proxy>

        ProxyPass /storefront balancer://tomcatservers-ssl/storefront stickysession=JSESSIONID nofailover=off

</VirtualHost>

 

We've seen the VirtualHost entries before, so let's just look at the Proxy balancer configuration in detail. We'll zoom in on this below for the insecure, port 80 connector:

 

...
        <Proxy balancer://tomcatservers>
                BalancerMember ajp://localhost:9009 route=node1 loadfactor=80
                BalancerMember ajp://10.10.90.51:9009 route=node2 loadfactor=100
                BalancerMember ajp://10.10.90.52:9009 route=node3 loadfactor=100
                BalancerMember ajp://10.10.90.53:9009 route=node4 loadfactor=100
        </Proxy>

        ProxyPass /storefront balancer://tomcatservers/storefront stickysession=JSESSIONID nofailover=Off
        ProxyPass /server-status !
...

 

Within the Proxy balancer directive, we've named our balancer <Proxy balancer://tomcatservers> and defined our load balancer members, for example BalancerMember ajp://10.10.90.51:9009 route=node2 loadfactor=100. In this case, we have 4 storefronts included, 1 being on the localhost (the same machine as the Apache install). We've opened up AJP port 9009 on these machines and configured all to have an even load factor, except the first node, which has slightly lower factor to give breathing room for Apache on the same machine.

 

The next directive, ProxyPass /storefront balancer://tomcatservers/storefront, we specify our ProxyPass to allow requests to /storefront* to pass to the balancer's /storefront*.  Note that we're also specifying the cookie name we want to keep stuck to each storefront node, so subsequent requests return to the same node. In our case, this is the JSESSIONID.

 

The last key setting in here is the route=nodeN on each BalancerMember. This is the name you configure for a node's jvmRoute within the app's server.xml. This allows Apache and the application server to identify which requests will go to which node. Without this setting (and/or the stickysession setting), the user's session may bounce between storefront nodes. This will cause strange behavior, like gettin bounced back to the homepage.

 

To set the jvmRoute within the server.xml, look for a commented-out line like the following:

 

<!-- You should set jvmRoute to support load-balancing via AJP ie :
<Engine name="Catalina" defaultHost="localhost" jvmRoute="node1">        
-->  

 

Uncomment this and change jvmRoute="" to be the same as your BalancerMember entry (or vice versa). The same configuration as above is done for the secure connectors, which, in this case are on port 9010.

 

After rebooting Apache, you should be getting load balanced to a specific node in the cluster and stay on that node for subsequent requests. Your HTML assets will also be getting cached at the Apache layer as they pass through the proxy.

 

Now you can cache and load balance storefront servers with Apache HTTP Server. Go ahead and try it. Once you're set up, I would recommend tailing your Apache and app server access logs to watch your requests pass through Apache and your app server and ensure they're using sticky sessions correctly. Increasing the access log level on Apache and the app server to output cookie names/values is handy if you need to debug any sticky session config issues.

 

Some Final Considerations

  • There are some known issues around keep-alive and some older versions of Apache HTTP and Tomcat where the AJP connections between the two will not get released, causing the connection pool to fill and not allow new requests.
  • Consider using Apache's htcacheclean, which runs as a daemon or a one-time job, to control the size of your Apache cache on the disk. If your website has a small, finite number of cacheable HTML objects, this typically isn't a huge issue. On the other hand, if you have many GBs of assets and want to keep your cache to, say, 500 MB, htcacheclean is your tool. See the documentation for full details: http://httpd.apache.org/docs/2.2/programs/htcacheclean.html
  • Test, test, test. Make sure you do proper functional testing on a staging environment to ensure there are no strange redirects or odd behavior after putting another layer between your ecommerce site and the user. And just as importantly, proper performance testing will ensure there are no capacity issues between Apache and the app server. This will allow you to fine tune your connection pools for maximum performance, both on the Apache side and the app server side.
1 Comments Permalink