Data Enrichment
  • 13 Nov 2023
  • Dark
    Light

Data Enrichment

  • Dark
    Light

Article Summary

Data Classification Enrichment allows you to add enrichment data to each request. Every time HUMAN handles a request, the enriched data is added to the server-to-server call and as an additional cookie called _pxde.

A dictionary for each data type can be downloaded in the Portal. The dictionary is available as a JSON object, and can be consumed by processes that may run after the data is received.

The Data Classification Enrichment feature is opt-in only. Each toggle controls the specific data enrichment option so once at least one toggle is active, the _pxde cookie will be sent.

The toggle controls can be found by going to Bot Defender Product Settings -> Data & Configuration -> Data classification enrichment.

1462

The data is delivered as a key and value only, as shown in the example below.

439

Note
Make sure to click ‘Save Changes’ to save your configuration.

The data enrichment cookie (_pxde) is an analytics cookie, and is non-essential. If the user declines to accept non-essential cookies, you can dynamically disable the _pxde cookie.

TO disable the _pxde cookie:

  • Add this variable assignment to the JavaScript snippet deployed on your application pages:
    window._pxPreventAnalyticsCookie = true;
    Example
    <script type="text/javascript">
        (function(){
            window._pxPreventAnalyticsCookie = true;
            var p = document.getElementsByTagName('script')[0],
                s = document.createElement('script');
            s.async = 1;
            s.src = '/xxxxxxxxxxx/.init.js';
            p.parentNode.insertBefore(s,p);
        }());
    </script>
    

TO enable the _pxde cookie:

  • Change the value of the variable assignment to false:
    window._pxPreventAnalyticsCookie = false;
    or
    Delete the variable assignment.

Retrieving Enriched Data

On the Enforcer
The enriched data is retrieved using a hook function. See the documentation for the relevant Enforcer

On the Client Side
The enriched data is retrieved on the client side by running the following initialization code:

  px.Events.on('enrich', function (value) {
    // value - the enriched data, in the form of <HMAC>:<Base64 encoded data>
    const base64Data = value.split(":")[1]; // split value to get the base64 encoded data
    const dataStr = atob(base64Data); // base64 decode the enrichment data
    const data = JSON.parse(dataStr); // get the data as JSON
    console.log('DATA', data);
  });

From the Enriched Data Cookie (_pxde)
The above initialization code should be placed before the HUMAN JS snippet in each site page. The event is triggered for each cookie update.
The cookie is built as :. The HMAC can be used to ensure that the enriched data is valid.

Note
If you are running an Enforcer, it is recommended that you run a hook function. This ensures that all available enriched data is returned.

Enforcers (and versions) supporting built-in Data Classification Enrichment:

  • Apache - C Module (v0.10.1 and above)
  • AWS Lambda (v2.13.0 and above)
  • Cloudflare Worker (v1.5.0 and above)
  • Fastly VCL (v2.16.0 and above)
  • GO (v2.0.0 and above)
  • Java (v5.3.0 and above)
  • NGINX (v4.1.0 and above)
  • NGINX - C Module (v0.10.1 and above)
  • Node Express (v4.0.0 and above)
  • PHP (v2.10.0 and above)
  • Python (v2.1.0 and above)

The JS Sensor version 3.19.1 and above supports built-in Data Classification Enrichment.

Available Enrichment Data

Access Control

The following access control rules are defined in the policy:

  • All custom rules
  • All known bots
  • All IP classifications

The cookie size for Access Control is approximately 200 bytes.
In the case that no access control filter was found on the request, the cookie will be empty and only the timestamp will be sent.
The following fields are available from this enrichment:

  • Timestamp - the creation time of the cookie.
  • f_type- the access control rule type:
    • = whitelist
    • = blacklist.
  • f_id- the access control rule ID. The ID is detailed in the access control dictionary in the HUMANConsole. Possible values can come from the following types of rules:
    • Custom rule
    • Good known bot
    • IP classification
  • f_origin - the data is defined either as a Custom Rule or as a HUMAN rule. Good known bots and IP Classification rules will always be marked as “px”.
  • f_kb- specifies if the request is made by a known bot or not:
    • 1 = known bot
    • 0 = other

IP Categorization

The categories defined in the different services types include Cloud and Proxy, as well as other general categories.
The following fields are available from this enrichment:

  • ipc_id - an array of IP Categorization IDs. The IDs are detailed in the IP Categorization dictionary in the HUMAN Console.

Incident Types
The Incident Types are an expanded reasoning which provide further insight into why HUMAN identified the traffic as automated. The list of incident types can be found here. The following field is available from this enrichment.

  • inc_id - an array of incident types.

Captcha Bypass

The Captcha Bypass filter holds a variable called cgp (“Captcha Grace Period”) which indicates if the request is allowed due to the user solving a captcha successfully and is now within the grace period. The two values returned are:
1 = In bypass.
0 = Not in bypass.

Access Tokens

The Access Tokens are used toallow traffic that is generated by friendly applications/users. This enrichmentwill be provided in case the traffic passed due to f_type = 'whitelist' and the filter reason was 'access_token'.

  • f_access_token - the access token name. 


Credential Intelligence

The following fields are available from this enrichment:

  • breached_account - Indicates if the credentials on the activity are identified as compromised - if so, the value will be 1 (otherwise, the field will not exist).

Please contact your account manager for more details on Credential Intelligence or review the product information on our site.


Was this article helpful?