Benchmarking Methodology
for Network Security Device PerformanceEANTC AGSalzufer 14Berlin10587Germanybalarajah@eantc.deBenchmarking Methodology Working GroupThis document provides benchmarking terminology and methodology for
next-generation network security devices including next-generation
firewalls (NGFW), intrusion detection and prevention solutions (IDS/IPS)
and unified threat management (UTM) implementations. The document aims
to strongly improve the applicability, reproducibility and transparency
of benchmarks and to align the test methodology with today’s
increasingly complex 7application use cases. The main areas covered in
this document are test terminology, traffic profiles and benchmarking
methodology for NGFWs to start with.TBDThe key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119.TBD.Test setup defined in this document will be applicable to all of the
benchmarking test cases described in Section 7.Testbed configuration MUST ensure that any performance implications
that are discovered during the benchmark testing aren’t due to the
inherent physical network limitations such as number of physical links
and forwarding performance capabilities (throughput and latency) of
the network devise in the testbed. For this reason, this document
recommends to avoid external devices such as switch and router in the
testbed as possible. In the typical deployment, the security devices (DUT/SUT) will not
have a large number of entries in MAC or ARP tables, which impact the
actual DUT/SUT performance due to MAC and ARP table lookup processes.
Therefore, depend on number of used IP address in client and server
side, it is recommended to connect Layer 3 device(s) between test
equipment and DUT/SUT as shown in figure
1. If the test equipment is capable to emulate layer 3 routing
functionality and there is no need for test equipment ports
aggregation, it is recommended to configure the test setup as shown in
figure 2.An unique DUT/SUT configuration MUST be used for all of the
benchmarking tests described in section
7. Since each DUT/SUT will have their own unique configuration,
users SHOULD configure their device with the same parameters that
would be used in the actual deployment of the device or a typical
deployment. Also it is mandatory to enable all the security features
on the DUT/SUT in order to achieve maximum security coverage for a
specific deployment scenario.This document attempts to define the recommended security features
which SHOULD be consistently enabled for all test cases. The table
below describes the recommended sets of feature list which SHOULD be
configured on the DUT/SUT. In order to improve repeatability, a
summary of the DUT configuration including description of all enabled
DUT/SUT features MUST be published with the benchmarking results.It is also recommended to configure a realistic number of access
policy rules on the DUT/SUT. This document attempts to determine the
number of access policy rules for three different class of DUT/SUT.
The document classified the DUT/SUT based on its performance
capability. The access rule defined in the, MUST be configured from
top to bottom in correct order. The configured access policy rule MUST
NOT block the test traffic used for the performance test.In general, test equipment allows configuring parameters in
different protocol level. These parameters thereby influencing the
traffic flows which will be offered and impacting performance
measurements. This document attempts to explicitly specify which test
equipment parameters SHOULD be configurable, any such parameter(s)
MUST be noted in the test report.This section specifies which parameters SHOULD be considerable
while configuring emulated clients using test equipment. Also this
section specifies the recommended values for certain parameters.The TCP stack SHOULD use a TCP Reno variant, which include
congestion avoidance, back off and windowing, retransmission and
recovery on every TCP connection between client and server
endpoints. The default IPv4 and IPv6 MSS segments size MUST be set
to 1460 bytes and 1440 bytes and a TX and RX receive windows of
32768 bytes. Delayed ACKs are permitted, but it SHOULD be limited
to either a 200 mSec delay timeout or 3000 in bytes before a
forced ACK. Up to 3 retries SHOULD be allowed before a timeout
event is declared. All traffic MUST set the TCP PSH flag to high.
The source port range SHOULD be in the range of 1024 – 65535.
Internal timeout SHOULD be dynamically scalable per RFC 793..The sum of the client IP space SHOULD contain the following
attributes. The traffic blocks SHOULD consist of multiple unique,
continuous static address blocks. A default gateway is permitted.
The IPv4 ToS byte should be set to ‘00’.The following equation can be used to determine the required
total number of client IP address.Desired total number of client IP = Target throughput [Mbit/s]
/ Throughput per IP address [Mbit/s]6-7 Mbps per IP= 1,400–1,700 IPs per 10Gbit/s
throughput0.1-0.2 Mbps per IP = 50,000–100,000 IPs per 10Gbit/s
throughputBased on deployment and usecase scenario, client IP addresses
SHOULD be distributed between IPv4 and IPv6 type. This document
recommends using the following ratio(s) between IPv4 and IPv6:100 % IPv4, no IPv680 % IPv4, 20 % IPv650 % IPv4, 50 % IPv60 % IPv4, 100 % IPv6The emulated web browser contains attributes that will
materially affect how traffic is loaded. The objective is to
emulate a modern, typical browser attributes to improve realism of
the result set. The emulated browser must negotiate HTTP 1.1 with
persistence. The browser will open up to 6 TCP connections per
Server endpoint IP at any time depending on how many sequential
transactions are needed to be processed. Within the TCP connection
multiple transactions can be processed if the emulated browser has
available connections, for example where transactions to the same
server endpoint IP exceed 6 or are non-sequential. The browser
must advertise a User-Agent header. Headers will be sent
uncompressed. The browser should enforce content length
validation.The test traffic shall be a realistic blend of encrypted and
clear traffic. For encrypted traffic, the following attributes
shall define the negotiated encryption parameters. The tests must
use TLSv1.2 or higher with a record size of 16383, commonly used
cipher suite and key strength. Session reuse or ticket resumption
may be used for subsequent connections to the same Server endpoint
IP. The client endpoint must send TLS Extension SNI information
when opening up a security tunnel. Server certificate validation
should be disabled.If the DUT/SUT doesn’t perform SSL inspection, cipher suite and
certificate selection for the test is irrelevant. However, it is
recommended to use latest and not deprecated certificates, in
order to mimic real world traffic.This document attempts to specify which parameters should be
considerable while configuring emulated backend servers using test
equipment.The TCP stack SHOULD use a TCP Reno variant, which include
congestion avoidance, back off and windowing, retransmission and
recovery on every TCP connection between client and server
endpoints. The default IPv4 MSS segment size MUST be set to 1460
bytes and a TX and RX receive windows of at least 32768 bytes.
Delayed ACKs are permitted but SHOULD be limited to either a 200
mSec delay timeout or 3k in bytes before a forced ACK. Up to 2
retries SHOULD be allowed before a timeout event is declared. All
traffic must set the TCP PSH flag to high. The source port range
SHOULD be in the range of 1024 – 65535. Internal timeout should be
dynamically scalable per RFC 793.The server IP blocks should consist of unique, continuous
static address blocks with one IP per Server FQDN endpoint per
test port. The IPv4 ToS byte should be set to ‘00’. The source mac
address of the server endpoints shall be the same emulating routed
behavior. Each Server FQDN should have it’s own unique IP address.
The Server IP addressing should be fixed to the same number of
FQDN entries.The emulated server pool for HTTP should listen on TCP port 80
and emulated HTTP version 1.1 with persistence. For HTTPS server,
the pool must have the same basic attributes of an HTTP server
pool plus attributes for SSL/TLS. The server must advertise a
server type. For HTTPS server, TLS 1.2 or higher must be used with
a record size of 16,383 bytes and ticket resumption or Session ID
reuse enabled. The server must listen on port TCP 443. The server
shall serve a 2048 server SSL certificate to the client. It is
required that the HTTPS server also check Host SNI information
with the Fully Qualified Domain Name (FQDN). Client certificate
validation should be disabled.If the DUT/SUT doesn’t perform SSL inspection, cipher suite and
certificate selection for the test is irrelevant. However, it is
recommended to use latest and not deprecated certificates, in
order to mimic real world traffic.The section describes the traffic pattern between the client and
server endpoints. At the beginning of the test, the server endpoint
initializes and will be in a ready to accept connection state
including initialization of the TCP stack as well as bound HTTP and
HTTPS servers. When a client endpoint is needed, it will initialize
and be given attributes such as the MAC and IP address. The behavior
of the client is to sweep though the given server IP space,
sequentially generating a recognizable service by the DUT. Thus, a
balanced, mesh between client endpoints and server endpoints will be
generated in a client port server port combination. Each client
endpoint performs the same actions as other endpoints, with the
difference being the source IP of the client endpoint and the target
server IP pool. The client shall use Fully Qualified Domain Names in
Host Headers and for TLS 1.2 Server Name Indication (SNI).Client endpoints are independent of other clients that are
concurrently executing. When a client endpoint initiate traffic,
this section will describe how the steps though different
services. Once initialized, the user should randomly hold (perform
no operation) for a few milliseconds to allow for better
randomization of start of client traffic. The client will then
either open up a new TCP connection or connect to a TCP
persistence stack still open to that specific server. At any point
that the service profile may require encryption, a TLS 1.2
encryption tunnel will form presenting the URL request to the
server. The server will then perform an SNI name check with the
proposed FQDN compared to the domain embedded in the certificate.
Only when correct, will the server process the object. The initial
object to the server does not have a fixed size, its size is based
on for example the URL path length. Up to six additional sub-URLs
(Objects on the service page) may be requested simultaneously.
This may or may not be to the same server IP as the initial URL.
Each sub-object will also use a conical FQDN and URL path, as
observed in the traffic mix used. The traffic mix in the appendix
table is represented by the actions of each and every client
endpoint. Therefor the instantaneous percent of mix will vary, but
the overall mix through the duration of the test will be fixed.
This is based on the number of active users, TCP recovery
mechanism, etc.The loading of traffic will be described in this section. The
loading of an traffic load profile has five distinct phases: Init,
ramp up, sustain, ramp down/close, and collection.Within the Init phase, test bed devices including the client and
server endpoints should negotiate layer 2-3 connectivity such as MAC
learning and ARP. Only after successful MAC learning or ARP
resolution shall the test iteration move to the next phase. No
measurements are made in this phase. The minimum recommended time
for init phase is 5 seconds. During this phase the emulated clients
SHOULD NOT initiate any sessions with the DUT/SUT, in contrast, the
emulated servers should be ready to accept requests from DUT/SUT or
from emulated clients.In the ramp up phase, the test equipment should start to generate
the test traffic. It should use a set approximate number of unique
client IP addresses actively to generate traffic. The traffic should
ramp from zero to desired target throughput objective. The duration
for the ramp up phase must be configured long enough, so that the
test equipment does not overwhelm DUT/SUT’s supported performance
metrics, namely: connection setup rate, concurrent connection and
application transaction. The recommended time duration for the ramp
up phase is 180-300 seconds. No measurements are made in this
phase.In the sustain phase, the test equipment should keep to generate
traffic at constant rate for a constant number of active client IPs.
The recommended time duration for sustain phase is 600 seconds. This
is the phase where measurements occur.In the ramp down/close phase, no new connection is established
and no measurements are made. The recommend duration of this phase
is 180- 300 seconds.The last phase is administrative and will be when the tester
merges and collates the report data.This section recommends steps to control the test environment and
test equipment, specifically focusing on virtualized environments and
virtualized test equipment.Ensure that any ancillary switching or routing functions between
the system under test and the test equipment do not limit the
performance of the traffic generator. This is specifically important
for virtualized components (vSwitches, vRouters).Verify that the performance of the test equipment matches and
reasonably exceeds the expected maximum performance of the system
under test.Assert that the test bed characteristics are stable during the
whole test session. A number of factors might influence stability
specifically for virtualized test beds, for example additional work
loads in a virtualized system, load balancing and movement of
virtual machines during the test, or simple issues such as
additional heat created by high workloads leading to an emergency
CPU performance reduction.Test bed reference pre-tests help to ensure that the desired traffic
generator aspects such as maximum throughput and the network performance
metrics such as maximum latency and maximum packet loss are met.Once the desired maximum performance goals for the system under test
have been identified, a safety margin of 10 % SHOULD be added for
throughput and subtracted for maximum latency and maximum packet
loss.Test bed preparation can be performed either by configuring the DUT
in the most trivial setup (fast forwarding) or without presence of
DUT.This section describes how the final report should be formatted and
presented. The final test report may have two major sections;
Introduction and result sections. The following attributes should be
present in the introduction section of the test report.The name of the NetSecOPEN traffic mix must be prominent.The time and date of the execution of the test must be
prominent.Summary of testbed software and Hardware detailsDUT Hardware/Virtual ConfigurationThis section should clearly identify the make and model
of the DUTiThe port interfaces, including speed and link
information must be documented.If the DUT is a virtual VNF, interface acceleration such
as DPDK and SR-IOV must be documented as well as cores used,
RAM used, and the pinning / resource sharing configuration.
The Hypervisor and version must be documented.Any additional hardware relevant to the DUT such as
controllers must be documentedDUT SoftwareThe operating system name must be documentedThe version must be documentedThe specific configuration must be documentedDUT Enabled FeaturesSpecific features, such as logging, NGFW, DPI must be
documentediAttributes of those featured must be documentedAny additional relevant information about features must
be documentedTest equipment hardware and software Test equipment vendor nameHardware details including model number, interface
typeTest equipment firmware and test application software
versionResults Summary / Executive SummaryResults should resemble a pyramid in how it is reported, with
the introduction section documenting the summary of results in a
prominent, easy to read block.In the result section of the test report, the following
attributes should be present for each test scenario.KPIs must be documented separately for each test
scenario. The format of the KPI metrics should be presented
as described in section 6.1.The next level of detains should be graphs showing each
of these metrics over the duration (sustain phase) of the
test. This allows the user to see the measured performance
stability changes over time.This section lists KPIs for overall benchmarking tests scenarios.
All KPIs MUST be measured in whole period of sustain phase as
described insection 4.3.4.
All KPIs MUST be measured from test equipment statistics only.TCP Concurrent Connection CapacityThis key performance
indicator will measure the average concurrent open TCP connections
in the sustaining period.TCP Connection Setup RateThis key performance
indicator will measure the average established TCP connections per
second in the sustaining period. For Session setup rate
benchmarking test scenario, the KPI will measure average
established and terminated TCP connections per second
simultaneously. Application Transaction RateThis key performance
indicator will measure the average successful transactions per
seconds in the sustaining period.TLS Handshake RateThis key performance indicator will
measure the average TLS 1.2 or higher session formation rate
within the sustaining period.URL Response time / Time to Last Byte (TTLB)This key
performance indicator will measure the minimum, average and
maximum per URL response time in the sustaining period as well as
the average variance in the same period.Application Transaction TimeThis key performance
indicator will measure the minimum, average and maximum the amount
of time to receive all objects from the server.Time to First Byte (TTFB)This key performance
indicator will measure minimum, average and maximum the time to
first byte. TTFB is the elapsed time between sending the SYN
packet from the client and receiving the first byte of application
date from the DUT/SUT. TTFB SHOULD be expressed in
millisecond.TCP Connect TimeThis key performance indicator will
measure minimum, average and maximum TCP connect time. It is
elapsed between the time the client sends a SYN packet and the
time it receives the SYN/ACK. TCP connect time SHOULD be expressed
in millisecond. To determine the average throughput performance of the DUT/SUT
when using application traffic mix defined insection 7.1.3.3.Test bed setup MUST be configured as defined in section 4. Any test scenario specific
test bed configuration changes must be documented.In this section, test scenario specific parameters SHOULD be
defined.Test equipment configuration parameters MUST conform to the
requirements defined in section 4.3.
Following parameters MUST be noted for this test scenario:Client IP address rangeServer IP address rangeTraffic distribution ratio between IPv4 and IPv6Traffic load objective or specification type (e.g
Throughput, SimUsers and etc.) Target throughput: It can be defined based on
requirements. Otherwise it represents aggregated line rate of
interface(s) used in the DUT/SUTInitial throughput: Initial throughput can be up to 10% of
the “Target throughput” DUT/SUT parameters MUST conform to the requirements defined in
section 4.2. Any
configuration changes for this specific test scenario MUST be
documented.Test scenario MUST be run with a single application traffic mix
profile. The name of the NetSecOpen traffic mix MUST be
documented.The following test Criteria is defined as test results
acceptance criteriaNumber of failed Application transaction MUST be 0.01%.Number of Terminated TCP connection due to unexpected TCP
RST sent by DUT/SUT MUST be less than 0.01%Maximum deviation (max. dev) of application transaction
time / TTLB (Time To Last Byte) MUST be less than X (e.g. 2,
TBD)The following equation MUST be used to calculate
the deviation of application transaction time or
TTLB.max. dev = max((avg_latency –
min_latency),(max_latency – avg_latency)) / (Initial
latency)Where, the initial latency is
calculated using the following equation. For this calculation,
the latency values (min’, avg’ and max’) MUST be measured
during test procedure step 1 as defined in section 7.1.4.1.
The variable latency represents application
transaction time or TTLB. Initial latency:=
min((avg’ latency – min’ latency) | (max’ latency – avg’
latency))Maximum value of TCP connect time must be less than (TBD)
ms. (beta tests required to determine the value). The
definition for TCP connect time can be found in section 6.2. Maximum value of Time to First Byte must be less than 2*
TCP connect time. Test Acceptance criteria for this test scenario MUST be
monitored during the sustain phase of the traffic load profile
only.Following KPI metrics MUST be reported for this test
scenario.Mandatory KPIs: average Throughput, maximum Concurrent TCP
connection, TTLB/application transaction time (minimum, average
and maximum) and average application transaction rateOptional KPIs: average TCP connection setup rate, average TLS
handshake rate, TCP connect time and TTFBThe test procedure is designed to measure the throughput
performance of the DUT/SUT at the sustaining period of traffic load
profile. The test procedure consists of three major steps.Verify the link status of the all connected physical
interfaces. All interfaces are expected to be “UP” status.Configure traffic load profile of the test equipment to
generate test traffic at “initial throughput" rate as described in
the parameters section. The DUT/SUT SHOULD reach the "initial
throughput" during the sustain phase. Measure all KPI as defined
in section 7.1.3.5. The measured
KPIs during the sustain phase MUST meet acceptance criteria “a”
and “b” defined in section
7.1.3.4.If the KPI metrics do not meet the acceptance criteria, the
test procedure MUST NOT be continued to step 2.Configure test equipment to generate traffic at “Target
throughput” rate defined in the parameter table. The test
equipment SHOULD follow the traffic load profile definition as
described in section
4.3.4. The test equipment SHOULD start to measure and
record all specified KPIs. The frequency of KPI metrics
measurement MUST be less than 5 seconds. Continue the test until
all traffic profile phases are completed.The DUT/SUT is expected to reach the desired target throughput
during the sustain phase. In addition, the measured KPIs must meet
all acceptance criteria. Follow the step 3, if the KPI metrics do
not meet the acceptance criteria.Use binary search algorithm to configure the desired traffic
load profile for each test iteration.Determine the maximum and average achievable throughput within
the acceptance criteria.TBD Resolution:=0.01* Target throughput and Backoff:= 50%
This document makes no request of IANA.Note to RFC Editor: this section may be removed on publication as an
RFC.tbd