Sampling Mode

Sampling mode makes it possible to apply the Core Rule Set to a limited percentage of traffic only. This may be useful in certain scenarios when enabling CRS for the first time, as this page explains.

Introduction to Sampling Mode

The Core Rule Set’s sampling mode mechanism was first introduced in version 3.0.0 in 2016. Although the feature has been available since then, it’s rarely used in practice, partly due to it being one of the lesser-known features of CRS.

When deploying ModSecurity and CRS in front of an existing web service for the first time, it’s difficult to predict what’s going to happen when CRS is turned on. A well-developed test environment can help, but it’s rare to find an installation where real world traffic can be reproduced 1:1 on a test setup. As such, fully enabling ModSecurity and CRS can be something of a leap into the unknown, and potentially very disruptive. This scenario prompted the introduction of CRS 3’s sampling mode.

Sampling mode makes it possible to run CRS on a limited percentage of traffic. The remaining traffic will bypass the rule set. If it turns out that ModSecurity is extremely disruptive or if the rules are too resource heavy for the server, only a limited percentage of the total traffic can be negatively affected (for example, only 1%). This significantly reduces the potential impact and risk of enabling CRS, especially when the logs are being monitored and the deployment can be rolled back if the alerts start to pile up.

Using sampling mode means that CRS offers relatively little security if the sampling percentage is set to be low. The idea, however, is to increase the percentage over time, from 1% to 2%, to 5%, 10%, 20%, 50%, and ultimately to 100%, where the rules are applied to all traffic.

The default sampling percentage of CRS is 100%. As such, if sampling is not of interest then the option can be safely ignored.

Applying Sampling Mode

Sampling mode is controlled by setting the sampling percentage. This is defined in the crs-setup.conf configuration file and can be found in the rule with ID 900400. To use sampling mode, uncomment this rule and set the variable tx.sampling_percentage to the desired value:

SecAction "id:900400,\
  phase:1,\
  pass,\
  nolog,\
  setvar:tx.sampling_percentage=50"

To test sampling mode, set the sampling percentage to 50 (which represents 50%), reload the server, and issue a few requests featuring a payload resembling an exploit. For example:

$ curl -v http://localhost/index.html?test=/etc/passwd
  • If the Core Rule Set is applied to the transaction (and the inbound anomaly threshold is set to 10 or lower) then a 403 Forbidden status code will be returned, since the request causes two critical rules to match, by default.
  • If sampling mode is triggered for the transaction (with a 50% probability) then the rule set will be bypassed and an ordinary response will be received, e.g. a 200 OK status code.

In the latter case, where sampling mode is triggered and CRS is bypassed, an alert like the following can be found in the error log:

[Wed Jan 01 00:00:00.123456 2022] [:error] [pid 3728:tid 139664291870464] [client 10.0.0.1:0] [client 10.0.0.1] ModSecurity: Warning. Match of "lt %{tx.sampling_percentage}" against "TX:sampling_rnd100" required. [file "/etc/crs/rules/REQUEST-901-INITIALIZATION.conf"] [line "434"] [id "901450"] [msg "Sampling: Disable the rule engine based on sampling_percentage 50 and random number 81"] [ver "OWASP_CRS/3.3.2"] [hostname "www.example.com"] [uri "/index.html"] [unique_id "YgBKx4BqNoKe-XoGhPCPtAAAAIQ"]

Here, CRS reports that it disabled the rule engine because the random number was above the sampling limit. The sampling percentage is set at the desired level, the rule set generates a random integer in the range 0-99 per-transaction, and if it’s above the sampling percentage then the WAF is disabled for the remainder of the transaction.

Warning

As sampling mode works by selectively disabling the ModSecurity WAF engine, if other rule sets are installed then they will be bypassed too.

  • For requests where the rule set is bypassed, a log entry is emitted by rule 901450.
  • For the other requests, those without a corresponding 901450 entry, the rule set is applied normally.

Disabling Sampling Mode Log Entries

If the log entires generated by rule 901450 seem excessive in volume then the rule can be silenced by applying the following directive, either after the CRS include statement or in the file RESPONSE-999-EXCLUSION-RULES-AFTER-CRS.conf:

SecRuleUpdateActionById 901450 "nolog"

Rollback

If CRS is deployed in front of a large service and sampling mode is in use, with a low sampling rate defined, if the logs still start piling up then it may be desirable to completely disable CRS. Rather than carrying out a full rollback of the deployment, the quickest solution is to define tx.sampling_percentage to be 0, which means that every request will bypass the WAF (a sampling percentage of 0%: “sample 0% of traffic”). This will take effect once the web server has been reloaded so that it picks up the modified configuration. This leaves ModSecurity and CRS installed and ready for use, but completely disabled.

Random Number Generation

ModSecurity has no built-in functionality to return random numbers, forcing CRS to find entropy for itself. It does this by taking advantage of the fact that the UNIQUE_ID variable, which identifies each request with a token that’s guaranteed to be unique, has a random element to it. This is the entropy that’s used for sampling.

Rule 901410 hashes the unique ID and encodes the result as a string of hexadecimal characters. The first two digits of the string are then extracted to get a random number from 0 to 99. In the extremely rare case where the hex encoded hash doesn’t contain a digit, there’s a fallback routine in place which takes the last digits of the DURATION variable.

The random numbers generated using this method are not cryptographically secure, but they are sufficient for the purposes of sampling.