This is a tutorial on how to set up an Apache reverse proxy for caching content from JFrog Artifactory. I had to learn how to do this for work to lessen the request load on the origin Artifactory server, and improve performance overall.
The Apache server will be run on Alma Linux, but the steps are similar for other Linux versions. This was also done in AWS , but will work regardless of where your clients and/or Artifactory is being hosted.
Install Apache Packages#
Update your local software with the relevant command. In the case of Alma/CentOS/RHEL, etc. this
is done with
yum. Then install the Apache packages.
Note: If you’re using an Artifactory server with HTTPS,
mod_ssl is required!
1yum update 2yum install -y httpd httpd-tools mod_ssl
Enable the required modules in Apache (though they should be enabled by default). They are:
1cache 2cache_disk 3headers 4expires 5proxy 6proxy_http 7ssl # if using Artifactory with HTTPS
In Alma/CentOS/RHEL, this
is done by going through
/etc/httpd/conf.modules.d and finding the conf files containing the
modules we’d like to enable. Once found, make sure that they’re not commented out.
For example, the
ssl module (
mod_ssl.so) is loaded in
/etc/httpd/conf.modules.d/00-ssl.conf. If I want that module
loaded, this is what that conf file should look like:
1LoadModule ssl_module modules/mod_ssl.so
Once each module is loaded, start and enable the Apache service:
1systemctl start httpd 2systemctl enable httpd
Create Apache Configuration File#
Add a new configuration to Apache, where we’ll define the caching behaviour. This file will
Set the file contents as below. These parameters might be explained in finer detail at a later date, but can easily be found on Apache’s documentation for those curious. Otherwise, these should work fine for the majority of cases.
1# I. Cache Behaviour 2CacheEnable disk / 3CacheRoot /var/cache/httpd/mod_cache_disk/routing 4# Don't cache files higher than 1GB 5CacheMaxFileSize 1000000000 6# 1 Day Cache 7CacheDefaultExpire 86400 8CacheQuickHandler off 9CacheLock on 10CacheLockPath /tmp/mod_cache-lock 11CacheLockMaxAge 5 12 13# II. Cache Control Headers 14CacheHeader On 15ExpiresActive On 16 17# ignore upstream caching headers 18Header unset Expires 19Header unset Cache-Control 20Header unset Pragma 21CacheIgnoreCacheControl On 22 23# III. Reverse Proxy Settings 24SetEnv proxy-initial-not-pooled 1 25SetEnv force-proxy-request-1.0 1 26SetEnv proxy-nokeepalive 1 27 28# IV. Virtual Host Config 29<VirtualHost *:80> 30 ServerName localhost 31 SSLProxyEngine On # off if Artifactory doesn't use HTTPS 32 33 AllowEncodedSlashes On 34 RewriteEngine On 35 36 ProxyRequests Off # used for forward proxying 37 ProxyPassReverseCookiePath / / 38 39 ProxyPass "/artifactory/" https://<artifactory-domain>/artifactory connectionTimeout=5 timeout=2400 40 ProxyPassReverse "/artifactory/" https://<artifactory-domain>/artifactory 41 42 ProxyPass "/" https://<artifactory-domain> nocanon connectionTimeout=5 timeout=2400 43 ProxyPassReverse "/" https://<artifactory-domain> 44</VirtualHost>
Configure the Firewall#
These commands can be run to ensure that the firewall is open and accepts HTTP connections. They may be slightly different depending on your version of Linux:
1sudo /usr/sbin/setsebool -P httpd_can_network_connect 1 2iptables -P INPUT ACCEPT 3iptables -P FORWARD ACCEPT 4iptables -P OUTPUT ACCEPT 5 6iptables -t mangle -F 7iptables -F 8iptables -X 9 10iptables-save
Verify that the firewall is open by running
iptables -nvL. The command result should be similar
in output to below (ignoring the packet counts):
1Chain INPUT (policy ACCEPT 777K packets, 2230M bytes) 2 pkts bytes target prot opt in out source destination 3 4Chain FORWARD (policy ACCEPT 0 packets, 0 bytes) 5 pkts bytes target prot opt in out source destination 6 7Chain OUTPUT (policy ACCEPT 744K packets, 2953M bytes) 8 pkts bytes target prot opt in out source destination
Test it Out!#
At this point, everything should be fine. Now if you attempt to download a file from this caching server, 2 things should happen:
- The file will download from the origin the first time.
- The file will be cached under
/var/cache/httpd/mod_cache_disk/routingon the reverse proxy. This was determined by the
CacheRootvariable in the configuration file.
Attempt this on another machine that can access the reverse proxy. We’ll try to
wget a file
through the host. It’s a pretty big file (347 MB), so we can see the difference caching
1$ wget http://188.8.131.52/artifactory/dev-test/7/x86_64/mycustom.rpm 2--2022-08-17 13:50:03-- http://184.108.40.206/artifactory/dev-test/7/x86_64/mycustom.rpm 3Connecting to 220.127.116.11... connected 4HTTP request sent, awaiting response... 200 OK 5Length: 364220416 (347M) 6Saving to mycustom.rpm 7 8100%[==========================================================================>] 364,220,416 27.2 MB/s in 14s 9 102022-08-17 13:50:17 (25.7 MB/s) - 'mycustom.rpm' saved [364220416/364220416]
So the first download took 14 seconds. Now, if we look at the
CacheRoot directory on the
reverse proxy, you might see a new folder. In my case, it was named
1$ ls /var/cache/httpd/mod_cache_disk/routing/ 2tP
Dig deeper in that folder, and you’ll find a
.data file, with the same amount of bytes that comprised
the original file (347M):
1$ ls -l /var/cache/httpd/mod_cache_disk/routing/tP/ux 2total 355688 3-rw-------. 1 apache apache 364220416 Aug 17 13:50 FcmSKZFSflmYcCYhQQ.data 4-rw-------. 1 apache apache 605 Aug 17 13:50 FcmSKZFSflmYcCYhQQ.header
Now back on the other server, attempt to download the file again.
1$ wget http://18.104.22.168/artifactory/dev-test/7/x86_64/mycustom.rpm 2--2022-08-17 13:50:36-- http://22.214.171.124/artifactory/dev-test/7/x86_64/mycustom.rpm 3Connecting to 126.96.36.199... connected 4HTTP request sent, awaiting response... 200 OK 5Length: 364220416 (347M) 6Saving to mycustom.rpm 7 8100%[==========================================================================>] 364,220,416 482 MB/s in 0.7s 9 102022-08-17 13:50:37 (482 MB/s) - 'mycustom.rpm' saved [364220416/364220416]
It took less than a second (0.7s), much, much faster than 14s!
Using this method, you can greatly save on time and bandwidth for downloading content to your clients. This isn’t only limited to Artifactory as the source as well. I’ve used this reverse proxy setup to cache content from even other Apache servers for example.
Big thanks to Taylor Callsen’s original article for getting me started, and I’ll link it in the references below.1
Taylor Callsen: Creating a Caching Proxy Server with Apache. https://taylor.callsen.me/creating-a-caching-proxy-server-with-apache ↩︎