This is a tutorial on how to set up an Apache reverse proxy for caching content from JFrog Artifactory. I had to learn how to do this for work to lessen the request load on the origin Artifactory server, and improve performance overall.
The Apache server will be run on Alma Linux, but the steps are similar for other Linux versions. This was also done in AWS , but will work regardless of where your clients and/or Artifactory is being hosted.
Install Apache Packages#
Update your local software with the relevant command. In the case of Alma/CentOS/RHEL, etc. this
is done with yum
. Then install the Apache packages.
Note: If you’re using an Artifactory server with HTTPS, mod_ssl
is required!
1yum update
2yum install -y httpd httpd-tools mod_ssl
Enable the required modules in Apache (though they should be enabled by default). They are:
1cache
2cache_disk
3headers
4expires
5proxy
6proxy_http
7ssl # if using Artifactory with HTTPS
In Alma/CentOS/RHEL, this
is done by going through /etc/httpd/conf.modules.d
and finding the conf files containing the
modules we’d like to enable. Once found, make sure that they’re not commented out.
For example, the ssl
module (mod_ssl.so
) is loaded in /etc/httpd/conf.modules.d/00-ssl.conf
. If I want that module
loaded, this is what that conf file should look like:
1LoadModule ssl_module modules/mod_ssl.so
Once each module is loaded, start and enable the Apache service:
1systemctl start httpd
2systemctl enable httpd
Create Apache Configuration File#
Add a new configuration to Apache, where we’ll define the caching behaviour. This file will
be at /etc/httpd/conf.d/proxy_cache.conf
.
1touch /etc/httpd/conf.d/proxy_cache.conf
Set the file contents as below. These parameters might be explained in finer detail at a later date, but can easily be found on Apache’s documentation for those curious. Otherwise, these should work fine for the majority of cases.
1# I. Cache Behaviour
2CacheEnable disk /
3CacheRoot /var/cache/httpd/mod_cache_disk/routing
4# Don't cache files higher than 1GB
5CacheMaxFileSize 1000000000
6# 1 Day Cache
7CacheDefaultExpire 86400
8CacheQuickHandler off
9CacheLock on
10CacheLockPath /tmp/mod_cache-lock
11CacheLockMaxAge 5
12
13# II. Cache Control Headers
14CacheHeader On
15ExpiresActive On
16
17# ignore upstream caching headers
18Header unset Expires
19Header unset Cache-Control
20Header unset Pragma
21CacheIgnoreCacheControl On
22
23# III. Reverse Proxy Settings
24SetEnv proxy-initial-not-pooled 1
25SetEnv force-proxy-request-1.0 1
26SetEnv proxy-nokeepalive 1
27
28# IV. Virtual Host Config
29<VirtualHost *:80>
30 ServerName localhost
31 SSLProxyEngine On # off if Artifactory doesn't use HTTPS
32
33 AllowEncodedSlashes On
34 RewriteEngine On
35
36 ProxyRequests Off # used for forward proxying
37 ProxyPassReverseCookiePath / /
38
39 ProxyPass "/artifactory/" https://<artifactory-domain>/artifactory connectionTimeout=5 timeout=2400
40 ProxyPassReverse "/artifactory/" https://<artifactory-domain>/artifactory
41
42 ProxyPass "/" https://<artifactory-domain> nocanon connectionTimeout=5 timeout=2400
43 ProxyPassReverse "/" https://<artifactory-domain>
44</VirtualHost>
Configure the Firewall#
These commands can be run to ensure that the firewall is open and accepts HTTP connections. They may be slightly different depending on your version of Linux:
1sudo /usr/sbin/setsebool -P httpd_can_network_connect 1
2iptables -P INPUT ACCEPT
3iptables -P FORWARD ACCEPT
4iptables -P OUTPUT ACCEPT
5
6iptables -t mangle -F
7iptables -F
8iptables -X
9
10iptables-save
Verify that the firewall is open by running iptables -nvL
. The command result should be similar
in output to below (ignoring the packet counts):
1Chain INPUT (policy ACCEPT 777K packets, 2230M bytes)
2 pkts bytes target prot opt in out source destination
3
4Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
5 pkts bytes target prot opt in out source destination
6
7Chain OUTPUT (policy ACCEPT 744K packets, 2953M bytes)
8 pkts bytes target prot opt in out source destination
Test it Out!#
At this point, everything should be fine. Now if you attempt to download a file from this caching server, 2 things should happen:
- The file will download from the origin the first time.
- The file will be cached under
/var/cache/httpd/mod_cache_disk/routing
on the reverse proxy. This was determined by theCacheRoot
variable in the configuration file.
Attempt this on another machine that can access the reverse proxy. We’ll try to wget
a file
through the host. It’s a pretty big file (347 MB), so we can see the difference caching
can make.
1$ wget http://54.248.12.6/artifactory/dev-test/7/x86_64/mycustom.rpm
2--2022-08-17 13:50:03-- http://54.248.12.6/artifactory/dev-test/7/x86_64/mycustom.rpm
3Connecting to 54.248.12.6... connected
4HTTP request sent, awaiting response... 200 OK
5Length: 364220416 (347M)
6Saving to mycustom.rpm
7
8100%[==========================================================================>] 364,220,416 27.2 MB/s in 14s
9
102022-08-17 13:50:17 (25.7 MB/s) - 'mycustom.rpm' saved [364220416/364220416]
So the first download took 14 seconds. Now, if we look at the CacheRoot
directory on the
reverse proxy, you might see a new folder. In my case, it was named tP
:
1$ ls /var/cache/httpd/mod_cache_disk/routing/
2tP
Dig deeper in that folder, and you’ll find a .data
file, with the same amount of bytes that comprised
the original file (347M):
1$ ls -l /var/cache/httpd/mod_cache_disk/routing/tP/ux
2total 355688
3-rw-------. 1 apache apache 364220416 Aug 17 13:50 FcmSKZFSflmYcCYhQQ.data
4-rw-------. 1 apache apache 605 Aug 17 13:50 FcmSKZFSflmYcCYhQQ.header
Now back on the other server, attempt to download the file again.
1$ wget http://54.248.12.6/artifactory/dev-test/7/x86_64/mycustom.rpm
2--2022-08-17 13:50:36-- http://54.248.12.6/artifactory/dev-test/7/x86_64/mycustom.rpm
3Connecting to 54.248.12.6... connected
4HTTP request sent, awaiting response... 200 OK
5Length: 364220416 (347M)
6Saving to mycustom.rpm
7
8100%[==========================================================================>] 364,220,416 482 MB/s in 0.7s
9
102022-08-17 13:50:37 (482 MB/s) - 'mycustom.rpm' saved [364220416/364220416]
It took less than a second (0.7s), much, much faster than 14s!
Final Notes#
Using this method, you can greatly save on time and bandwidth for downloading content to your clients. This isn’t only limited to Artifactory as the source as well. I’ve used this reverse proxy setup to cache content from even other Apache servers for example.
Big thanks to Taylor Callsen’s original article for getting me started, and I’ll link it in the references below.1
Taylor Callsen: Creating a Caching Proxy Server with Apache. https://taylor.callsen.me/creating-a-caching-proxy-server-with-apache ↩︎