LinkScan

LinkScan for Unix. Reference Manual.

Section 18

  Previous   Contents   Next   Help   Reference   HowTo   Card 

LinkScan Profiler

[Not available in LinkScan Workstation]

The LinkScan Profiler may be used to help identify pages that contain or link to "inappropriate" [1] content. The Profiler operates on a rule-based scoring system.

The profile.txt file in the main LinkScan directory defines the actual rules and associated scores. The default profile.txt file contains some minimal profiling criteria based on the Platform for Internet Content Selection (PICS) standard. Under this standard, many sites include self-ratings in their web pages via META tags. The LinkScan Profiler specifically supports the RASC, ICRA and SafeSurf implementations. See the following References.

A much more comprehensive set of rules is available free of charge from Elsop. Since this implementation of the profile.txt file includes a significant amount of profane and offensive language, it is distributed separately once we receive satisfactory evidence of age verification and a waiver. To obtain a copy of this file, please send e-mail such as:

To: linkscan@elsop.com
From: myname@example.com
Subject: Profiler Request

Please send me a copy of the LinkScan Profiler rules.
I confirm that:

1. I am over 21 years old.

2. I understand that the LinkScan Profiler rules
   contain a significant quantity of profane and
   offensive language including explicit sexual
   depictions.

3. I understand and agree that the LinkScan Profiler
   rules are subject to the same License Agreement
   and restrictions of use as LinkScan itself.

4. I confirm that I will use the LinkScan Profiler
   rules only in conjunction with LinkScan and in
   accordance with the LinkScan License Agreement.
   I shall not re-distribute the Profiler rules to
   any other person or organization.

The message must be sent from a verifiable corporate Email address. Mail sent via semi-anonymous services such as yahoo.com, MSN and AOL is not acceptable. If necessary, we will contact you to make alternative arrangements but Elsop will not supply the LinkScan Profiler files until we are satisfied that the request is made by an adult and is legitimate.

Configuring the Profiler

In a typical configuration, you will need to add the following commands to the Project linkscan.cfg file. On Windows systems they are available via the Advanced Tab of the Project Planning Property Sheet:


Profiler = 2
Profilerlog = 1
Profilermax = 200

The Profiler command enables the LinkScan Profiler. Valid options are:

The Profilerlog command enables a detailed trace indicating exactly what profiling rules were triggered. The log is maintained in the file:

.../LinkScan/Projectname/data/linkscan.red

The Profilermax command sets the trigger threshold for the LinkScan Profiler. The default and recommended setting is 200. Reduce this to 100 to make the Profiler even more sensitive. Increase the value to 300 or more to reduce the sensitivity.

Note: When enabled, the Profiler will force the following settings:


Fetchext = 1
Followext = 1

The Followext command instructs LinkScan to follow redirections when validating the external links. This is the default setting. The Fetchext command instructs LinkScan to fetch the body of a document referenced via an external link. Normally, LinkScan seeks to validate external links without retrieving the document bodies. This enables LinkScan to profile the content but note this will significantly increase the amount of bandwidth and processing required.

Initially, we recommend you complete a full scan with the settings shown above (at the top of this document) and manually review the linkscan.red log file. We think you will find this informative. More importantly, you will be able to decide what threshold to use for subsequent check-ups and whether you want to enable/disable/modify any of the existing rules. Some users may want to whitelist all .gov sites for example.

At the end of the day, only you can decide what links are appropriate for your site and consistent with your editorial policies. Material that may be entirely appropriate for a current affairs website may also be highly undesirable for a site specifically intended for younger children.

Hence you may want/need to review the active rules in the profile.txt file.

Proxy Servers and Firewalls

When LinkScan is operated behind a Proxy Server or Firewall that implements content-based access control policies, then you need to be aware that your proxy/firewall will likely prevent LinkScan from accessing the site. In this case, you will need to implement a Profiler rule which will enable LinkScan to detect the fact that access was denied. The Bess proxy system is widely used by many schools and some Internet Service Providers. When access is denied, the Bess system typically adds a special HTTP header: Pragma: BESSBLOCK The SonicWALL systems typically replace an offending page with a page that includes the phrase "Blocked By SonicWALL". The following header (H) and body (B) rules will detect those conditions:


H BESS-01    2000   pragma: bessblock
B SWALL-01   2000   blocked by sonicwall

References

Definition of Inappropriate

I shall not today attempt further to define the kinds of material I understand to be embraced within that shorthand description; and perhaps I could never succeed in intelligibly doing so. But I know it when I see it...

With apologies to:
Mr. Justice Stewart
United States Supreme Court
JACOBELLIS v. OHIO, 378 U.S. 184 (1964)

LinkScan for Unix. Reference Manual. Section 18. LinkScan Profiler
LinkScan Version 11.6
© Copyright 1997-2006 Electronic Software Publishing Corporation (Elsop)
LinkScan™ and Elsop™ are Trademarks of Electronic Software Publishing Corporation

  Previous   Contents   Next   Help   Reference   HowTo   Card