SYNOPSIS

Input

  1. A record stream originating from the Extract HTTP tool.

  2. A record set. Optional.

Output

The original record is output with four fields appended.

DESCRIPTION

This tool determines whether a hit is to be considered a page impression.

It is possible to use an exclude-set with URLs that must not be considered a page. The tool will then set is_page to 0 for records matching this urls in the exclude-set.

Updates done to the exclude-set might not take effect immediately since the tool maintains a cache that is updated periodically. However, the effect should be visible within approx. 60 seconds.

The fields appended to the output are these:

Field Name Value
is_page 1 if the hit is determined to be a page impression. 0 otherwise.
parent_page The url of the page from where this hit is derived.
sequence A sequence number used to keep track on a unique clients behavior. Incremented for each page impression generated by a unique client.
confidence The value of this field indicates the likelihood of the hit being a page impression. The range is 0-1. The closer to 1 it gets the better the chances are that the hit is a page impression.

OPTION

Identifier

The field used to uniquely identify a client.

Cache size (Mb)

Max. allowed cache size (1-15 Mb).

Use exclude-set

With this option it is possible to lookup in a record set to determine whether a given url is to be excluded as a possible page impression. Typically the record set could be a table read by the Lookup/ODBC tool.

Field

The field of the exclude-set to match URLs against. The values of this field must be full URLs.