LinkScan

LinkScan for Unix. Reference Manual

 

LinkScan for Unix. Reference Manual. Table of Contents

LinkScan Reference Manual. Table of Contents

    Part I. LinkScan Core Capabilities

  1. Introduction to LinkScan
  2. Essential LinkScan Concepts
  3. New LinkScan Installations
  4. Upgrading Existing LinkScan Installations
  5. Basic Scanning
  6. Examining the Results
  7. LinkScan Status and Error Codes
  8. Scheduling LinkScan
  9. File System Scanning and Orphaned Files
  10. Import Scanning
  11. Advanced and Custom Scanning
  12. Advanced, Custom and Command Line Results
  13. LinkScan Enterprise/Unlimited Extensions
  14. LinkScan Support
  15. Known Problems and Limitations

    Part II. Companion Programs

  16. LinkScan Dispatch
  17. LinkScan Excel
  18. LinkScan Profiler
  19. LinkScan QuickCheck
  20. LinkScan Recorder
  21. LinkScan TapMap
  22. LinkScan WebServer
  23. LinkScan Pinger
  24. Weblint Man Page

    Part III. Appendixes

  25. Glossary of Terms
  26. LinkScan Quick Reference Card
  27. LinkScan and Various Web Servers
  28. LinkScan File Formats
  29. LinkScan Application Notes
  30. LinkScan Revision History
  31. LinkScan License Agreement

Other Documents

Search

You may use this form to perform keyword searches over the LinkScan for Unix documentation.

Enter search term(s):


Note: This Reference Manual is divided into multiple documents for ease and speed of navigation. However, the contents are also available as a single document suitable for searching and/or printing as the Single Document LinkScan Reference Manual.

LinkScan for Unix. Reference Manual. Section 1

Introduction to LinkScan

LinkScan™ is an industrial-strength link checking and website management tool. It saves time and money by automating the quality assurance testing of virtually any website or web-based application.

LinkScan is built around applicable open systems standards. Hence it integrates easily with many other content development, management and testing applications as well as general purpose computer tools. It operates on all Microsoft Windows and Unix/Linux platforms and is professionally supported.

LinkScan users include Fortune 1000 companies such as Hewlett Packard, government agencies like NASA, as well as many smaller businesses.

New users will find that LinkScan is extremely simple to install, configure and use. And the more experienced user will appreciate the vast array of customization features built into the system. Together, these attributes make LinkScan ideal for:

Five LinkScan Editions

LinkScan is available in five different editions all based upon the same core technology:

The above descriptions are not complete nor comprehensive. You must read the LinkScan License Agreement for a complete definition of the products and your other rights and obligations.

Using LinkScan

The steps involved in using LinkScan include:

  1. Installing and Configuring LinkScan for your environment
  2. Planning the specific test scenario(s) that you wish to execute
  3. Scanning the website to create a LinkScan Database
  4. Examining the results from the LinkScan Database

Each of these steps is described in this Reference Manual. However, we recommend that new users get a fast start by jumping to one of the following pages:

LinkScan for Unix. Reference Manual. Section 2

Essential LinkScan Concepts

This section introduces some important concepts and terms that are used throughout the remainder of this Reference Manual. These are:

  1. LinkScan Projects
  2. LinkScan Owners
  3. LinkScan Usernames
  4. Scanning Methods
  5. Documents and Links
  6. LinkScan Directory and File Structure
  7. LinkScan Configuration Files
  8. Perl Regular Expressions
  9. relative-path and relative-path-expression

2.1 LinkScan Projects

LinkScan is able to scan multiple websites. You may also scan the same website multiple times with different configuration options. In each case, LinkScan creates a unique and corresponding LinkScan Database containing the results of the analysis. Together, the configuration files and database constitute a LinkScan Project.

Users/administrators are required to select a Project when scanning, if multiple projects are defined. And, users must select a Project when viewing the results.

Each LinkScan Project is stored within a subdirectory of the main LinkScan installation directory.

For addition information concerning Projects, how to create them and how to scan them, see Basic Scanning.

2.2 LinkScan Owners

Within each Project, you may also configure multiple LinkScan Owners. Collections of HTML documents and other files are assigned between Owners in a variety of ways:

The LinkScan Owner concept enables individual content developers or workgroups to view results that pertain to their documents or areas of responsibility. LinkScan Owners are defined via the LinkScan Configuration Files, discussed below. By default, LinkScan will create and assign Owners as follows:

This enables users to browse the results selectively so that the reports are smaller and more relevant to their needs. They're also produced more rapidly.

2.3 LinkScan Usernames

LinkScan incorporates access controls that may be used to limit user access to LinkScan databases and results. These controls are not enabled by default.

When activated, users may be required to login to the LinkScan system used a pre-defined LinkScan Username and associated password. The Username will define the Projects and Owners that an individual user is permitted to access.

Those wishing to enable these access control features should see LinkScan Access Controls.

2.4 Scanning Methods

LinkScan supports three different scanning methods:

Network HTTP scanning is generally the best mode to use for sites with a large amount of dynamic content: .jsp, .asp files, etc. The File System Scanning method mode enables tracking of "orphaned" files, files which aren't linked to currently, and is more appropriate for sites with limited dynamic content.

2.5 Documents and Links

The LinkScan software, and this document, both maintain a strong distinction between Documents and Links.

Hence an HTML file is a Document containing Links. Dynamically generated web pages, PDF and Flash Files as well as Import Files may also be considered Documents since LinkScan can examine those files for the presence of Links. Images (such as .gif and .jpg files) are not considered documents.

References to sites other than the one being scanned (External Links) are not documents either, since LinkScan does not examine the content of those files for the presence of Links.

2.6 LinkScan Directory and File Structure

The LinkScan system is made up of a number of different file types:

In a basic LinkScan installation these files are organized within the following directory structure:

2.7 LinkScan Configuration Files

LinkScan's operation is controlled by a number of different configuration files. When running LinkScan via the Windows Graphical User Interface, these files are somewhat invisible. However, they still control the execution of the program and you may find it useful to view the raw configuration files from time to time. On Unix systems, these files represent the primary method of configuring LinkScan. All of the files are formatted in plain ASCII text and may be viewed and modified using the editor of your choice (e.g. Windows Notepad, Unix vi, emacs, pico, nedit, et al).

The most important configuration files are:

This approach provides tremendous flexibility. It means you can establish Global Settings in the Global Configuration File that apply to all Projects. And you may override (single-valued) settings or supplement (multi-valued) settings with additional commands in the Project Configuration File(s); these being Project-specific.

Some additional configuration/control files are discussed elsewhere in this manual. They are used by LinkScan (i.e. do not delete them!) but it is rarely necessary for users to examine or modify them.

All of the configuration files include extensive comments. Comments are signified by the pound sign like this:


# This line contains only a comment

Realcommand = 1   # This comment could describe Realcommand

2.8 Perl Regular Expressions

LinkScan incorporates a vast array of customization features many of which exploit the power of Perl Regular Expressions. For a description of Perl Regular Expressions on Unix systems, see man perlre. HTML versions are available at many locations including:

http://www.perl.com/doc/manual/html/pod/perlre.html

We also recommend the book Mastering Regular Expressions (a.k.a. the Owl Book) by Jeffrey E.F. Friedl, and published by O'Reilly [ISBN: 1-56592-257-3].

2.9 relative-path and relative-path-expression

We make extensive reference to these terms in the customization sections of this manual and they are introduced here for your convenience.

Let us assume that we are scanning the website:

http://www.example.com/

An individual document within that website might be:

http://www.example.com/products/widget.html

LinkScan will refer to that page using its relative-path, which in this case, is:

products/widget.html

A relative-path-expression is a Perl Regular Regular Expression that matches relative-path. For example, all of the following will match our widget page:


products/widget.html      # Also matches products/widgetXhtml
products/widget\.html$    # Does not match anything else
(|.*/)widget\.html$       # Matches widget.html in any directory

LinkScan for Unix. Reference Manual. Section 3

New LinkScan Installations

This section describes the pre-requisites for LinkScan and leads into step-by-step instructions for performing a new installation.

  1. Hardware Requirements
  2. Prerequisites
  3. Installation Step-by-Step

3.1 Hardware Requirements

LinkScan is supported on a wide variety of platforms including:

We do not recommend Windows 95/98/ME for scanning large websites of more than 5000 documents. Although LinkScan has been tested on websites of significantly greater size, performance and stability will be much improved when running under operating systems with a true multi-processing implementation such as Windows NT/2000/XP/Vista or Linux/Unix.

Disk and memory requirement depend almost exclusively on the size and nature of the website(s) to be analyzed. However, the following guidelines are intended to assist users with their capacity planning needs:

3.2 Prerequisites

To successfully install and configure LinkScan on your computer you must have:

  1. An appropriate version of Perl Version 5 installed on your computer. You may download a version suitable for your system via:

    http://www.elsop.com/perl/

  2. A copy of the LinkScan software and a LinkScan License Key. Both are available from:

    http://www.elsop.com/linkscan/dleval.cgi

3.3 Installation Step-by-Step

We recommended that new users get a fast start by jumping to one of the following pages:

LinkScan for Unix. Reference Manual. Section 4

Upgrading Existing LinkScan Installations

This section describes how to upgrade an existing LinkScan installation to LinkScan Version 12.1.

LinkScan for Unix. Reference Manual. Section 5

Basic Scanning with the Command Line Interface

This section describes how to create, configure and scan a LinkScan Project using the command line interface.

Before executing the LinkScan programs you must set the current working directory:

web:/> cd /usr/www/htdocs/linkscan/
web:/usr/www/htdocs/linkscan>

Creating a New Project

To create a new Project, simply execute the main LinkScan program (linkscan.pl) with the -newproject command line option:

web:/usr/www/htdocs/linkscan> perl linkscan.pl -newproject newproj

[...]

This Will Create the New LinkScan Project: newproj

The answers to the following questions are accepted verbatim without
validation. Please type carefully. <Control-C> to abort and start again.


Enter Homedir: 
Enter Home URL: http://www.example.com/index.html
Enter Organization: My Department
Enter Project Description: My First Test
** Status: Project newproj Created Successfully
web:/usr/www/htdocs/linkscan>

Configuring a Project

To configure a Project, simply edit the appropriate Project configuration file using your editor of choice:

web:/usr/www/htdocs/linkscan> vi ./newproj/linkscan.cfg

Note that lines starting with a pound sign (#) are comments.

In the simple case of scanning a website using the normal Network (HTTP) Scanning Method, you would only need to configure Homeurl with the URL to the root of the website, and Homefile with the filename (relative to server root) of the starting page. Be sure to leave Homedir blank since this will force LinkScan to use Network (HTTP) Scanning.

[...]
Homedir = 
Homeurl = http://www.example.com/
Mirrorurl = 
Homefile = index.html
Projectdesc = My First Test
Organization = My Department
[...]

This will scan the entire site www.example.com from it's starting page, index.html. The Homeurl parameter should always be the "root" URL of the site being scanned. To specify scans for sub-level areas, add information the Homefile parameter. For example, using the same Homeurl as above, and setting:


Homefile = recommendations/external/index.html

would start the scan at:

http://www.example.com/recommendations/external/index.html

Scanning a Project

To scan a Project, simple execute the main LinkScan program. You may specify the Project on the command line as shown below. Otherwise LinkScan will prompt you to select from the available list of valid Projects.

web:/usr/www/htdocs/linkscan> perl linkscan.pl -project newproj

LinkScan Enterprise Version 12.1 Unix.

[...]

** Status: LinkScan is Starting Processes...
** Status: Started 3 Processes...
** Status: LinkScan is Scanning Internal Links...
Processing  URL: 
Processing  URL: about.html
Processing  URL: linkscan/
Processing  URL: linkscan/dleval.cgi
Processing  URL: linkscan/order.cgi
Processing  URL: linkscan/support.html
[...]

You have now completed a scan of the website and LinkScan has created a Database for that Project. Next you will want to examine the findings by following the steps described in Viewing the Results.

Command Line Options

Run the main LinkScan program with the -help option to see a short listing of the available command line switches:

web:/usr/www/htdocs/linkscan> perl linkscan.pl -help
LinkScan Version 12.1 Unix
Copyright 1997-2010 Electronic Software Publishing Corporation

USAGE: linkscan.pl  {-help} {-alllinks} {-fast} {-home pathname} {-http}
       {-newproject name} {-noexternal} {-noorphans} {-project name}
       {-quiet} {-remote URL} {-retest}

-help            Displays this message
-alllinks        Check all external links [Override: Maxgoodhours etc]
-fast            Use larger number of processes to speed testing
-home pathname   Specify starting page [Override: Homefile in linkscan.cfg]
-http            Use HTTP navigation [Equiv: Execute .* and -noorphans]
-newproject name Create a new LinkScan Project
-noexternal      Test internal links only [Default: Internal and External]
-noorphans       Disable checking for orphaned files
-project name    Select a LinkScan Project
-quiet           Reduce verbosity of progress/status messages
-remote URL      Specify Remote Site [Equiv: -http; Override: Homeurl/Homefile]
-retest          Repeat last test, rechecking only those links that failed
Detailed Help [Y/N]:n

LinkScan for Unix. Reference Manual. Section 6

Examining the Results

Once a Project has been scanned and a database created, a wide range of different reports are available.

This document describes those reports and how to view them interactively using a simple web browser-based interface. Note that a batch command-line interface is also available. See Section 12 of this manual.

To view the reports interactively:

Users will need to point a web browser at the LinkScan Main Menu which typically resides at:

http://your.server.name/linkscan/linkscan.cgi
or
http://your.server.name/cgi-bin/linkscan.cgi

The first time you access the results, you will be presented with the LinkScan Login and Preferences Menu. Simply click Login Now. No username is required unless you later decide to enable various LinkScan security features.

Once you have logged in, you will be presented with the LinkScan Main Menu.

Report Selection

You must select one of the individual Reports and submit the form by pressing Select Report.

A help page is available for each type of LinkScan Report. You may view the appropriate help page at any time by using the Help option on the context-sensitive LinkScan Toolbar. You may also use the [?] links on the LinkScan Main Menu, or the links provided in the summary table below.

The most frequently used reports have been organized in the left hand column; we suggest new users start there. Also, many of the reports incorporate hyperlinks to other reports. This means you can use a drill-down paradigm to view more detail associated with a specific problem or document. For example, some users may never explicitly select a LinkScan/QuickCheck Report. But they will likely view reports of that type by following the [Src] links from other reports.

Summary of Available Reports

Project Summary Report
Summary statistics for the current project
Summary of All Projects Report
Summary statistics for all configured projects
Problem Documents Report
List documents containing potential problems
Selected Status Codes Report
List errors of specific types
Document Detail Report
List all/selected documents
All Pages Linking To ... Report
Find pages that link to...
Critical Errors Report
List most critical errors
Orphaned Files Report
List orphaned files
Detailed Errors Report
List all/selected errors
External History Report
View history of an external link
Changed Documents Report
Compare two scans of the current project
Redirections Report
List a summary of redirections
Search Documents Report
Ad hoc searching: document-centric
System Configuration Report
Display current LinkScan configuration settings
Search Links Report
Ad hoc searching: link-centric
LinkScan/QuickCheck
View source code and detailed analysis of a document
SiteMap Report
Display LinkScan SiteMap
LinkScan/TapMap
Display LinkScan TapMap

Owner Selection

The LinkScan Main Menu may include an Owner Selection Box. If enabled, this option will allow you to select a sub-set of the website to which subsequent reports will apply.

In a default configuration, the Owner Selection Box will include entries for each top-level directory scanned, in addition to the special entry "All". This will be the default selection and subsequent reports will apply to the entire website scanned.

Note however, that the LinkScan Administrator may configure and customize the manner in which Owners are created. Hence your installation may appear and behave somewhat differently from that described herein.

SubMenu Selection

In many cases, when you submit the form by pressing Select Report you will be presented with a second menu of options. Initially, we suggest you accept the default options which have been carefully designed to produce excellent results in the vast majority of situations. However, to learn more, you may use the context-sensitive Help button on the LinkScan Toolbar at any time.

LinkScan Toolbar

Each of the LinkScan Menus and Reports includes a common LinkScan Toolbar. It contains a number of links:

 Main Menu   Preferences   Advanced   Help   Reference   HowTo   Card 

The Main Menu link will always return you to the LinkScan Main Menu.

The Preferences link will always take you to the LinkScan Login and Preferences Menu.

The Advanced link appears when appropriate and it will cause the current menu to be redrawn with additional options.

The Help link will display an appropriate section of the LinkScan Documentation depending upon the current context.

The Reference link will display the table of contents for the LinkScan Reference Manual.

The HowTo link will display a brief How To Guide with instructions for completing certain Common Tasks.

The Card link will display the LinkScan Quick Reference Card.

LinkScan for Unix. Reference Manual. Section 7

LinkScan Status and Error Codes

The following section describes each of the LinkScan Error and Status Codes. Each Status Code is assigned to one of six Severities:

Symbol Code Severity Explanation
* 0 Unknown: LinkScan has not tested or was unable to test this link
* 1 Error: LinkScan found a hard error on this link
* 2 Possible Error: There may be a problem with this link. It should be retested at a later time
* 3 Warning: LinkScan found something unusual about this link. Manual inspection highly recommended
* 4 Advisory: This link is probably ok, but manual inspection recommended
* 5 No Error: This is a good link

The Severity associated with any specific Error or Status Code may be customized by the LinkScan Administrator through the use of the Statuscode option.

Status codes in the range 0-99 are generated exclusively by LinkScan and generally refer to the status of local links (HTML files, Non-HTML files, etc.).

Status codes in the range 100-699 are defined exclusively by the HyperText Transfer Protocol.

Status codes in the range 800-3099 are generated exclusively by LinkScan and generally refer to Networking Problems (Failed DNS lookups, failure to connect to a remote server or timeouts) as well as some other LinkScan detected warning or advisory messages.

* No Status (0)

* HTML File (1)

* Error: Bad HTML File (2)

* Non-HTML File (3)

* Error: Bad non-HTML File (4)

* Anchor (5)

* Error: Bad Anchor (6)

* Warning: Orphaned HTML File (7)

* Warning: Orphaned non-HTML File (8)

* Imagemap File (9)

* Error: Bad Imagemap File (10)

* Valid Mailto Link (11)

* Possible Error: Invalid Mailto Link (12)

* Warning: Missing / (13)

* Warning: Unprocessed SSI (14)

* PDF File (15)

* Error: Bad PDF File (16)

* Warning: No Closing /a (17)

* Error: Invalid Scheme (18)

* Advisory: No Alt/Height/Width (20)

* Flash File (21)

* Error: Bad Flash File (22)

* Text File (23)

* Error: Bad Text File (24)

* Javascript File (25)

* Error: Bad Javascript File (26)

* XML File (27)

* Error: Bad XML File (28)

* Error: HTML Syntax (99)

* Continue (100)

* Switching Protocols (101)

* Good URL (200, 201, 202, 203, 205, 206)

* Error: No Content (204)

* Error: Multiple Choices (300)

* Error: Moved Permanently (301)

* Advisory: Moved Temporarily (302)

* Error: Network/Server Error (303, 304)

* Error: Use Proxy (305)

* Error: Unused (306)

* Warning: Temporary Redirect (307)

* Error: Network/Server Error (400)

* Warning: Unauthorized (401)

* Warning: Payment Required (402)

* Error: Forbidden (403)

* Error: Not Found (404)

* Error: Method Not Allowed (405)

* Error: Not Acceptable (406)

* Error: Proxy Authentication Required (407)

* Possible Error: Request Timed Out (408)

* Error: Conflict (409)

* Error: Gone (410)

* Error: Length Required (411)

* Error: Precondition Failed (412)

* Error: Request Entity Too Large (413)

* Error: Request URI Too Large (414)

* Error: Unsupported Media Type (415)

* Possible Error: Server Error (500)

* Possible Error: Not Implemented (501)

* Possible Error: Bad Gateway (502)

* Possible Error: Service Unavailable (503)

* Possible Error: Gateway Timed Out (504)

* Possible Error: HTTP Version Not Supported (505)

* Possible Error: Network/Server Error (600, 601, 602, 603)

* Advisory: Skipped - Recently Test (800)

* Possible Error: Skipped - Bad Server (801)

* Advisory: Skipped - FTP Limit (802)

* Advisory: Skipped - CGI Limit (803)

* Possible Error: No DNS Entry (900)

* Possible Error: DNS Timeout (901)

* Possible Error: Connect Error (902)

* Possible Error: Connect Timeout (903)

* Warning: Missing / (904)

* Warning: Probably OK (905)

* Warning: Contains an IP Address (906)

* Error: Multiple Redirections (907)

* Warning: Missing / (908)

* Error: Disconnected (909)

* Warning: Location Not Absolute (910)

* Error: Unsafe Character (911)

* Advisory: SSL Server Path Not Checked (912)

* Advisory: Simulated Redirect (913)

* Warning: Meta Redirect (914)

* Warning: Meta Loc not Absolute (915)

* Advisory: LDAP Server Query Not Checked (916)

* Error: No Headers Seen (917)

* Possible Error: Timeout Header (930)

* Possible Error: Timeout Body (931)

* Possible Error: Timeout Unknown (932)

* Warning: Body Truncated (933)

* Error: Error Creating Socket (990)

* Error: SSL Error (991)

* Error: Post Data Not Found (992)

* Error: Unknown (999)

* Error: FTP Error (1000)

* Error: Bad Syntax (2000)

* Error: SMTP No Such User (2001)

* Warning: SMTP Mailbox Full (2002)

* Possible Error: SMTP Failure (2003)

* Error: Errordoc Match (3000)

* Error: Errorbody Match (3001)

* Error: Profiler Match (3002)

LinkScan for Unix. Reference Manual. Section 8

Scheduling LinkScan on Unix Systems

The following example is provided to assist those users who wish to run LinkScan as a cron job. The crontab system is a standard Unix utility that enables jobs to be executed automatically according to some regular schedule. On most Unix systems, see man crontab or man 5 crontab for help.

  1. Save any existing configured cron jobs to a file (for example, cron.job) using the following shell command:

    crontab -l > cron.job
    
  2. Edit the file cron.job and append an additional entry for LinkScan containing something like:

    40 8 * * 0,1,2,3,4,5,6 /usr/linkscan/linkscan.cron
    

    This will execute /usr/linkscan/linkscan.cron at 08:40am each day. Adjust the pathname to linkscan.cron accordingly.

  3. Submit this to the crontab system with the following shell command:

    crontab cron.job
    

    You can check that it's been scheduled with:

    crontab -l
    
  4. Edit the linkscan.cron file -- the following example file is automatically installed in the LinkScan directory:

    #!/bin/sh
    
    # Set current working directory
    cd /usr/linkscan/
    
    # Execute LinkScan
    /usr/local/bin/perl linkscan.pl -project proja
    /usr/local/bin/perl linkscan.pl -project projb
    
    # Execute LinkScan/Dispatch (if required)
    /usr/local/bin/perl dispatch.pl -project proja -options
    
    # Execute command line reports (if required)
    # Must set environnment variable for these
    # setenv linkscan linkscan
    export linkscan=linkscan
    /usr/local/bin/perl linkscan.cgi -project proja -options
    

    See the following for a summary of the available command line switches/options:

    Please note the following points:

LinkScan for Unix. Reference Manual. Section 9

File System Scanning and Orphaned Files

LinkScan incorporates the ability to examine the files on your local hard drive and interpret them in a manner very similar to a web server. This capability has two major applications:

Configuration is inherently significantly more complex when compared to normal HTTP Scanning. In particular, you must configure the following items:

If you do not configure the File System Pathnames, LinkScan will automatically use HTTP Scanning. It will also disable the Orphaned File checking.

If you wish to enable Orphaned File checking and use HTTP Scanning, you must configure the File System Pathnames to enable orphan checking. Then, simply set Http = 1.

This is best illustrated by example:

# Map the server root
# http://www.example.com/index.html  <==> /usr/www/htdocs/index.html

Homeurl = http://www.example.com/
Homedir = /usr/www/htdocs/
Homefile = index.html

# http://www.example.com/cgi-bin/    <==> /usr/www/cgi-bin/
# http://www.example.com/~username/  <==> /home/username/public_html/

Alias cgi-bin/ /usr/www/cgi-bin/
Alias ~([^/]+)/ /home/$1/public_html/

# Hide hidden files and directories from the Orphans Report

Noorphans (\.|.*/\.)

# The following are significant (but default) settings

Execute cgi-bin/             # Test cgi-bin/ via HTTP
Execute (?i).*\.(cgi|asp)$   # Test .cgi and .asp files via HTTP

Htmlfiles = html, shtml, htm
Mapfiles = map
Pdffiles = 
Flashfiles = swf
Defaultpages = index.html, index.shtml, index.htm, home.html, home.shtml, home.htm

Indexoptions = 0             # Disallow directory listings
Expandssi = 1                # Expand Server Side Includes
Autohttp = 0                 # Disable automatic HTTP retry
Maxdirlevels = 10            # Don't explore file system beyond 10 levels

On Unix systems only, the Alias directive supports the special !HOME expression:

Alias ~([^/]+)(/|$) !HOME/public_html/

A reference to ~someuser/ will be Aliased to !HOME/public_html/. Then, !HOME will be replaced by the someuser's Home Directory which is determined via a lookup of /etc/passwd.

Remote File Systems

In some cases, the file system directories containing the web site may reside on a physically different computer from LinkScan. In these cases, LinkScan will support Network File System pathnames (subject to any locally imposed security controls).

In other cases, the file system of the remote system may not be visible via the network, quite possibly for security reasons. LinkScan will be unable to scan the remote computer using the File System Scanning Method. You must use HTTP Scanning.

However, it is still possible to enable Orphaned File checking. In summary, you will need to execute a small, self-contained Perl program on the remote computer. It will assemble a "picture" of the file system and save it as a simple ASCII file. That file may be transferred to the LinkScan computer using FTP (or any other more secure technique) and used to perform the orphan analysis in lieu of direct access to the remote server.

  1. Fully configure the selected Project as if your were using File System Scanning on your local machine. However, when setting the pathname to the root of the target webserver, (and any associated Aliases) use the pathname conventions applicable to the remote server.

  2. In the Project configuration file, force LinkScan to use normal HTTP Scanning by setting:

    
    Http = 1
    
  3. Set the Orphanfile setting in the Project configuration file to the full pathname of a file on your local computer. For example:

    
    Orphanfile = /usr/linkscan/someproject/orphans.list
    
  4. Transfer the following files to the remote server:

    
    /usr/linkscan/lsfind.pl
    /usr/linkscan/someproject/linkscan.cfg
    
  5. On the remote server, execute the lsfind.pl program:

    
    perl lsfind.pl orphans.list
    
  6. Transfer the orphans.list file back to the LinkScan machine.

  7. Initiate a scan of the target website in the normal manner. LinkScan will use the orphans.list file from the remote server in lieu of scanning the file system on the local server.

LinkScan for Unix. Reference Manual. Section 10

Import Scanning

The LinkScan Import function may be used to:

When processing a list of Links each URL is checked in turn and its status stored in the LinkScan database. When processing a list of Documents, each document and every link within that document is checked and its status stored.

The import function offers enormous flexibility. To use this feature, carry out the following steps:

  1. Prepare the Import File

    LinkScan will import a simple ASCII file of the following format:

    URL ... one or more tab characters ... URL-Description

    URL's may be absolute, or relative to the Home URL for the current server. The URL-Description is imported and carried through to the LinkScan Reports for identification purposes. You may use any ASCII string, for example a database record number.

    Import files may also include URL's using the extended LinkScan conventions for form submissions (GET, POST and Multi-Part POST). See How to Submit Forms.

    An alternative field separator may be specified by including a special command as the first line of the file:

    ## \s+

    The command starts with '##' in column one followed by a Perl expression that specifies the field delimiter. In the example above, '\s+' means one or more whitespace characters (tab or space).

    Lines with a '#' in column one, and blank lines, are ignored as comments.

    To use the Import Function, open the linkscan.cfg file for the appropriate Project, and edit the Importfile setting. Supply the full pathname to the prepared ASCII import file. For example:

    
    Importfile = /usr/home/linkscan/importfiles/test.txt
    

    Then select the import mode by changing the Import setting. Valid values are:

    Import = 0 Import mode disabled
    Import = 1 Import a list of links
    Import = 2 Import a list of documents
    Import = 3 Import a list of documents with caching disabled

    When using Import Documents LinkScan will by default check each document listed in the Import file but it will not follow those links and scan the entire site. Optionally, you may set Maxclicks and force LinkScan to execute a deeper scan. e.g. with Maxclicks = 3, LinkScan will check the Import File, the documents listed in the Import File, and the children (but not the grandchildren) of those documents.

  2. Special Considerations

    LinkScan de-duplicates the list of links within an Import Document list. This means that LinkScan will validate each unique URL within the list only one time.

    However, you may force LinkScan to process an Import Sequence so that the same URL or document is checked more than once. This may be achieved by adjusting the URL's to make them appear unique. Note that this also provides a means by which to differentiate the test results for each step. Simply edit the URL's to make them unique by adding dummy name-value pairs to the query string of the URL's:

    http://www.example.com/cookie_sensitive?dummyseq=1
    [...]
    http://www.example.com/set_cookie
    [...]
    http://www.example.com/cookie_sensitive?dummyseq=2

    If the URL's already include a query string, simply append the additional parameter to the existing query and change:

    http://www.example.com/foo?name=value

    to:

    http://www.example.com/foo?name=value&dummyseq=1

    Normally, LinkScan maintains the status of each link in a cache while it scans a site. This dramatically improves performance since LinkScan does not need to re-check commonly used images and other components over and over. However, it may also be undesirable with some stateful sequences. For example, if the same URL produces a completely different result before and after a cookie is set.

    In those situations, you may use a special option (Import = 3) which will force LinkScan to flush its cache after each imported document has been validated.

LinkScan for Unix. Reference Manual. Section 11

Advanced and Custom Scanning

LinkScan incorporates many powerful customization features described below.

  1. How to control the scope of a scan
  2. How to handle authentication schemes
  3. How to scan additional pages and submit forms
  4. How to validate JavaScript and drop-down lists
  5. How to handle special Error documents
  6. How to manipulate URLs on-the-fly
  7. How to emulate different browser types
  8. How to remap different hosts
  9. How to assign documents to Owners
  10. How to process additional per-document data
  11. How to control the testing of external links
  12. Other miscellaneous customizations

Hint: We strongly recommend that you read Essential LinkScan Concepts before studying this section of the Reference Manual.

11.1 How to control the scope of a scan

You may use any combination of the following commands to include or exclude specific areas of the target website.


Exclude relative-path-expression
Exclude absolute-url-expression
Nofollow relative-path-expression
Onlyfollow relative-path-expression
Onlyinclude relative-path-expression
Maxlevels depth
Maxclicks depth

Exclude: The Exclude command may be used to completely ignore specific links. You may supply a relative-path-expression to exclude Internal Links, or an absolute-url-expression to exclude External Links.

Nofollow: The Nofollow command may be used to provide even finer control over LinkScan's behavior. When LinkScan encounters a link matching a Nofollow command, it will validate the link (and check for any <a name = ... > tags if appropriate). However, it will not test any links that lead from the target document.

For greater flexibility and completeness, the Onlyinclude and Onlyfollow commands are also supported.

Onlyinclude: is logically equivalent to "Exclude everything except".

Onlyfollow: is logically equivalent to "Nofollow everything except".

Maxlevels: A command such as Maxlevels = 3 will limit the depth of the scan to three directory levels under server root.

Maxclicks: A command such as Maxclicks = 3 will limit the depth of the scan based on the number of clicks from the start of the scan. In order to more closely model the real user experience, LinkScan does not include clicks that result from following framesets or redirections.

The following rules of precedence apply when using multiple commands in combination:


Example 1:

Exclude http://www.domain.com/
Exclude test/

All links to "http://www.domain.com/" and all files in the local "test/" subdirectory will be ignored by LinkScan.


Example 2:

Nofollow user2/

LinkScan will check the links to files in the "user2/" directory, but it will not examine the content of any documents within the "user2/" directory or test any of the links contained within them.


Example 3:

Onlyfollow user1/

LinkScan will check the documents in the local "user1/" subdirectory and test the links to files in other local directories. However, LinkScan will not examine the content of any documents that lie outside of the local "user1/" directory or test any of the links contained within them.

Dynamic content

On websites that incorporate a high proportion of dynamic content it may not be productive to test any or all scripts with large number of query parameters or other variations. Controls are provided.

Maxcgi: The maximum number of times any single URL should be probed with different query parameters. This prevents LinkScan from trying to validate a CGI script or dynamic page with a potentially infinite number of query parameters.
[Default: Maxcgi = 100 ]

Taglimit: The Taglimit command may be used to provide even finer control over the number of times clusters of URL's are probed. Syntax and example:


Syntax:

Taglimit relative-path-expression maxnumber

Example:

Taglimit scripts/DatabaseLookup.asp 20

LinkScan will only attempt to parse 20 documents matching the pattern "scripts/DatabaseLookup.asp". Any further links matching the specified pattern will be completely ignored.

11.2 How to handle authentication schemes

Many websites include some form of access control or user authentication features. These are:

In the case of HTTP or NTLM Authentication, when a user attempts to access a protected area, their browser will present a challenge in the form of a pop-up dialog box that requires a username and password to be entered. In the case of cookie-based arrangements, the user is normally required to login by filling out an HTML form and submitting it.

HTTP Authentication

For sites that require HTTP Authentication, you must configure LinkScan with an appropriate Auth command:


Syntax:

Auth server-name "realm-name" username password

Examples:

Auth www.example.com "" guestuser xxxxxx
Auth app.example.com "Controlled Access" guestuser xxxxxx

You must include a realm-name (enclosed in double-quotes) but it may be empty. In that case, LinkScan will use the configured username and password for any realm on the target server. This is the recommended approach unless your server uses multiple realms with different access control rules for different portions of the website.

NTLM Authentication

Some Intranet websites utilize the proprietary and undocumented Microsoft NTLM protocol to authenticate users. LinkScan (on Windows systems only) may be configured to scan such sites.

Note: This may result in other minor artifacts in the results of the scan since LinkScan will use the Microsoft Windows implementation of the HTTP protocol versus the (stricter) native LinkScan implementation.

Cookie-based Authentication

HTTP access to some sites is controlled via authentication schemes requiring Cookies.

LinkScan will automatically accept and return all valid cookies received during the course of a scan. However, to gain access to the site, you may need to configure LinkScan to ensure that the appropriate cookies are set. This may be achieved by one of two techniques:

The submissions of a login form may be configured using the Extrahome command (described in the next section). However, you may optionally initialize LinkScan's collection of stored cookies (aka Cookie Jar) with one or more permanent Cookies by using the Cookie command:


Syntax:

Cookie server-name cookiename=cookievalue

Example:

Cookie www.elsop.com LinkScan=cookie_value;

Note: Do not enter space characters around the '=' character

The server-name is the name of the server to be tested. For security reasons and in compliance with the applicable standards, LinkScan will only send the cookie when the specified server-name exactly matches the hostname portion of the requested URL. In this context, server names and their corresponding IP addresses are considered to be different (consistent with all major browsers). The cookie names and values must be reverse engineered from your server code or "discovered" via your browser by enabling the "Prompt before accepting cookies" or examination of stored cookies on disk.

Hint 1: Sites with especially complex schemes (multiple levels of access control, subscription expirations etc.) might consider configuring their server and/or scripts to recognize a "super-user-cookie" specifically for testing purposes. This approach may also be used to trigger test points within server-based scripts and greatly improve the meaningful testability of complex dynamic content.

Hint 2: HTTP Authentication and Cookie related transactions are logged by LinkScan during the course of the scan. You may examine the following file to view the log: .../LinkScan/Projectname/data/linkscan.red

11.3 How to scan additional pages and submit forms

You may configure LinkScan to examine additional documents that would not normally be found during the scan and might otherwise be reported as orphaned files. The same technique may be used to submit forms on your website with specific data values for testing purposes. This is achieved with the Extrahome command:


Syntax:

Extrahome relative-path-expression

Examples

Extrahome somedir/staticdoc.html
Extrahome cgi-bin/getscript.cgi?Var1=aaa&Var2=bbb

The second example above includes a query string and is therefore equivalent to a FORM submission using the GET method. In addition, LinkScan includes support for special conventions that allow users to specify FORM submission operations using the POST method, including the Multi-Part POST, frequently used to upload files from a client to the server.


Examples:

Extrahome cgi-bin/postscript.cgi??Name=Malcolm%20Hoar&Password=secret

Extrahome upload.cgi???(postedfile;C:\LinkScan10\post\test.jpg;image/jpeg)

Extrahome upload.cgi???Name1=Val1&(postedfile;/usr/home/test/test.jpg;image/jpeg)&Name2=Val2

Hint: Use the LinkScan Recorder to automatically capture the correctly constructed URL's.

Hint 2: When using the Extrahome command to submit a login form to provide access to a site, you may also need to configure LinkScan so that it doesn't immediately "click" any LOGOUT button which would invalidate the newly created session.

11.4 How to validate JavaScript and drop-down lists

LinkScan may be configured to interpret the contents of drop-down lists as links to other pages. The HTML specification does not define a standard method for indicating that a drop-down list contains hyperlinks (as opposed to regular data). Hence LinkScan needs some other "cue" and may be triggered by pattern matching of attributes within the SELECT tag. Consider, for example, the following:


<select name="URLLIST">
<option value="/products/" Selected> Relative URL to Products
<option value="http://www.mydomain.com/services/"> Absolute URL to Services
</select>

To instruct LinkScan to treat the contents of the drop-down list as URL's, use the following command:


Selecturl URLLIST

LinkScan will examine all SELECT tags and look for a Regular Expression match on the NAME attribute. If the match is successful (URLLIST in this example) LinkScan will treat each OPTION tag within the list as a hyperlink and validate it accordingly.

LinkScan includes the ability to validate links contained within JavaScript code. A relatively simple pattern matching technique is used -- LinkScan does not contain a full JavaScript interpreter. This means that LinkScan may "miss" some links or find "false positive errors" especially if the code creates the hyperlink references dynamically at run-time. The following Scriptmatch and Scriptnomatch commands give excellent results in most cases. However, you can customize the matching rules by changing these expressions and/or adding new ones.


Scriptmatch = (\w+://\S+|\S+/$|\S+\?\S+|\S+\.([a-z]{2,3}|[js]?html?|Z)$)
Scriptnomatch = .*([\(\)\[\]\{\}\']|document\.\S+|\.(src|com)$)

Some JavaScript constructs may still produce false errors. You may force LinkScan to ignore complete script blocks that match a specified pattern. For example:


Scriptexclude function\s+ZoomWindow

The above command will force LinkScan to ignore script blocks that contain a definition for the ZoomWindow function.

11.5 How to handle special Error documents

Many websites are constructed with special user-friendly error pages, sometimes known as "custom-404 documents". Some servers will deliver the error document directly whereas others may force a redirection to a specific error document. In either case, an issue arises if your server delivers the error document with a 200 OK response code. LinkScan (or any other link checker) would not be able to detect the error condition.

A similar issue arises with some dynamically generated documents. For example, a Java applet may encounter a run-time error condition after it has already sent a 200 OK response code to the client.

Hence LinkScan supports two special commands that may be used to detect such conditions and force a 404 Not Found error, regardless of the HTTP response code produced by the server/application. The first is used with servers that force a redirection by pattern matching on the HTTP Location: header. The second operates by pattern matches on the document bodies.


Syntax:

Errordoc pattern
Errorbody pattern

Examples:

Errordoc special/notfound\.html
Errorbody (?i).*runtime\serror

In the Errordoc example, LinkScan will report as 404 Not Found any URL that is redirected to http://your.server/special/notfound.html. In the Errorbody example, LinkScan will report as 404 any document that contains the string runtime error in the document body. Note the (?i) makes the pattern match case-insensitive.

Hint: The Errorbody pattern match is carried out on the entire document, including comments. Developers might consider including a standard error string within comment tags that may be used to trigger the Errorbody match.

11.6 How to manipulate URLs on-the-fly

One of the most powerful (and complex) customization features of LinkScan concerns the real-time manipulation of links during the course of the scan. This is typically used to control the testing of sites with complex dynamic content. The basic commands available are:


Sessionmatch expression
Substitute relative-path-expression expression
Substituteraw relative-path-expression expression
Substitutescript relative-path-expression expression

The Sessionmatch command is used to manipulate Session numbers. The Substitute command is used to perform transformations on resolved links. The Substituteraw is used to perform transformations on unresolved links (i.e. the raw contents of a tag or tag attribute). The Substitutescript is used to perform transformations of blocks of JavaScript code.

We shall consider a number of examples which may be adapted according to your specific needs.

Example 1

Consider a site that produces links such as:


http://www.example.com/page1.asp
http://www.example.com/page1.asp?Print

It is entirely possible that page1.asp has been designed in such a manner that it delivers the same basic content with minor variations in formatting depending upon the presence or absence of the Print query string. One might configure LinkScan with:


Substitute (.*\.asp)\?Print $1

Whenever LinkScan encounters a link matching the specified pattern it will make the substitution indicated before it tries to validate or follow that link. In this example, a link to:

http://www.example.com/page1.asp?Print

will immediately be transformed to:

http://www.example.com/page1.asp

Note, however, this is not the same as Excluding links which contain the Print query string; that would cause LinkScan to simply ignore the link. In this case, LinkScan will process the link but transform it on-the-fly during the scan.

Example 2

Next we will consider a significantly more complex scenario.


Sessionmatch .*&token=([^&]+)
Substitute (.*&token=)[^&]*(.*)$ $1!S$2

In this case, we use the special Sessionmatch command to capture and save the first value of the query parameter token that LinkScan sees. This is most likely some kind of session number assigned by the target server immediately following the submission of a login form. The Substitute command then instructs LinkScan to replace all subsequent values of token with the saved value (represented by the special parameter !S).

In this scenario, LinkScan ensures that the value of token can never change during the course of the scan from the originally assigned value.

Example 3

Next we'll consider a JSP site that produces URL's with the following structure:


http://www.example.com/content?A=123&B=456&C=789&D=XYZ

It may not be productive or efficient for LinkScan to scan all of the pages using every combination and permutation of values for the parameters A, B, C, D... etc.. We can control that by manipulating the individual name-value pairs during the scan. For example:


Substitute (content\.jsp\?.*)&B=[^&](.*) $1&B=456$2
Substitute (content\.jsp\?.*)&C=[^&](.*) $1$2
Taglimit content\.jsp\?.*&D= 20

The first command fixes the value of B=456. Whatever value the parameter B takes on during the scan, LinkScan will force the value back to 456. The second command deletes any references to the C parameter from every link that it finds. We have also included the third Taglimit command; this will cause LinkScan to completely ignore the twenty-first and subsequent links that include a D parameter. In other words, in this case, we only want to test a representative sample (20) of links that include a D parameter.

Example 4

For our next example, we shall consider a site that generates pages containing some links with the following structure:


http://www.example.com/cgi-bin/GenerateFrame?Referer=abc&Link=http%3A%2F%2Fwww.yahoo.com%2F

Rather than linking directly to Yahoo!, this page links to a script that generates a frameset that includes the referenced page. In a default configuration, LinkScan will happily follow the link, validating the frameset and the ultimate link to Yahoo!. However, it may not be productive to do that for potentially thousands of links. Furthermore, in the (extremely unlikely) event that the link to http://www.yahoo.com/ was broken, the error would appear in one of the GenerateFrame documents and not the original referring document. In order to repair that link, one would have to backtrack through the frameset to locate the original source of the trouble.

Hence we can apply more Substitute magic:


Substitute cgi-bin/GenerateFrame.*&Link=([^&]+).* !U$1

This command will extract the value of the Link= parameter, and the special !U token instructs LinkScan that the string needs to be un-encoded. So the original link:

http://www.example.com/cgi-bin/GenerateFrame?Referer=abc&Link=http%3A%2F%2Fwww.yahoo.com%2F

is transformed on-the-fly to:

http%3A%2F%2Fwww.yahoo.com%2F

and then decoded to:

http://www.yahoo.com/

And this means LinkScan can validate the link to Yahoo! directly without checking the GenerateFrame script many, many times. Furthermore, any errors will be flagged against the original document (and not one or more steps removed).

Example 5

For our final example, we include for illustration the complete configuration for a real-world large and very complex dynamic site:


# Set the CGI limit to be very large
# Include all file types on the Map

Maxcgi = 10000
Mapinclude .*

# Force &A=B and insert it immediately after the '?'

Substitute (cgi-bin.*[&\?])A=[^&=]*&*(.*) $1$2
Substitute (cgi-bin.*\?)(.*) $1A=B&$2

# Discard null and undefined values

Substitute (cgi-bin.*)&B=(null|undefined)(.*) $1$3
Substitute (cgi-bin.*)&C=(null|undefined)(.*) $1$3
Substitute (cgi-bin.*)&D=(null|undefined)(.*) $1$3
Substitute (cgi-bin.*)&R=(null|undefined)(.*) $1$3

# For 'category', take the &C= if present, otherwise the &B=

Substitute (cgi-bin/bv/scripts/category.*\?A=B).*?(&C=[^&=]*).* $1$2
Substitute (cgi-bin/bv/scripts/category.*\?A=B).*?(&B=[^&=]*).* $1$2

# For 'content', take the &D= or &R= if present (call it &D=). Otherwise take the &B=

Substitute (cgi-bin/bv/scripts/content.*\?A=B).*?&[DR]=([^&=]*).* $1&D=$2
Substitute (cgi-bin/bv/scripts/content.*\?A=B).*?(&B=[^&=]*).* $1$2

# For 'frame', take the &D= or &R= if present (call it &D=). Otherwise take the &B=

Substitute (cgi-bin/bv/scripts/frame.*\?A=B).*?&[DR]=([^&=]*).* $1&D=$2
Substitute (cgi-bin/bv/scripts/frame.*\?A=B).*?(&B=[^&=]*).* $1$2

# For 'mailing...', take the &R=

Substitute (cgi-bin/bv/scripts/mailing.*\?A=B).*?(&R=[^&=]*).* $1$2

# For 'contact', take the &B=, &C= and &Comments

Substitute (cgi-bin/bv/scripts/contact.*\?A=B).*?(&B=[^&=]*).*?(&C=[^&=]*).*?(&Comments=[^&=]*).* $1$2$3$4

# Mark redirects to Error page as 404
# Mark documents containing 'Error Code:' as 404

Errordoc cgi-bin/bv/scripts/error.jsp
Errorbody Error\s+Code:[^\n<]*

# Hide some frequent arising errors

Noforms = 1
Exclude images/arrow.gif

Example 6

Next we will consider a reference to a JavaScript function:


<a href="javascript:MyFunction(4,5,6);">

The following Substitutescript command:


Substitutescript .*:MyFunction\((\d+),(\d+),(\d+)\) '/somepage.jsp?Par1=$1&Par2=$2&Par3=$3'

will transform the function call into the following link which will then be validated/processed by LinkScan.


/somepage.jsp?Par1=4&Par2=5&Par3=6

Synthesizing Additional Links

The Substitute commands may be used to modify existing links on-the-fly. However, a variation of this, the Insertlink command, may be used to insert additional links into specified documents in order to achieve a specific test coverage. Again, it is best illustrated by example:


Insertlink .*complex\.jsp\?.*SPVAR= -
Insertlink (.*complex\.jsp\?.*) /$1&ALTMODE=1 +

As each document is scanned, LinkScan will process all Insertlink commands (in the order specified). The URL of the scanned document is matched against the first parameter of each Insertlink command. In the case of the first example above, a link to:

complex.jsp?VAR=1&SPVAR=2

will match the expression and LinkScan will abort all Insertlink processing for this document (signified by the minus character).

However, a link to:

complex.jsp?VAR=1

does not match the expression. Processing will continue to the second command. This does match the expression and LinkScan will insert a link into this document (signified by the plus character). Hence, when LinkScan processes:

complex.jsp?VAR=1

It will insert into that document, the following link:

complex.jsp?VAR=1&ALTMODE=1

Hint: Clearly, the Substitute command requires a good working knowledge of Perl Regular Expressions. If you need assistance, the LinkScan engineers will be happy to help. Please write to mailto:linkscan@elsop.com describing in as much detail as possible, the transformations you are seeking to achieve.

11.7 How to emulate different browser types

Most web browsers advertise their identity by including a User-Agent header with every request that they make. LinkScan also sends a User-Agent header. For example, the versions of Netscape Navigator, Microsoft Internet Explorer and LinkScan installed on the writers computer send, respectively:


User-Agent: Mozilla/4.08 [en] (WinNT; I ;Nav)
User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)
User-Agent: LinkScan Enterprise/12.1 Windows

Some websites are constructed in a manner that is browser sensitive. They may, for example, deliver customized pages depending on the users browser type. Hence LinkScan may be customized to emulate different browser types using the Extraheader command:


Syntax:

Extraheader literal-header-string

Example:

Extraheader User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)

In this example, LinkScan will advertise itself as Microsoft Internet Explorer version 5.5 running under Windows 2000.

In fact, the Extraheader command may be used to add any arbitrary HTTP headers to every request that LinkScan sends. A common application involves those servers which look for a language preference in the HTTP headers in order to deliver pages in the appropriate language. For example, the following command instructs LinkScan to include an English Language preference header with each request:


Extraheader Accept-Language: en

11.8 How to remap different hosts

Sometimes a single website may contain links such as:


http://www.example.com/
http://www2.example.com/

Where www.example.com and www2.example.com resolve to the same host IP address. However, LinkScan would consider www2.example.com to be an External Link and not part of the www.example.com Project. Hence the Hostalias command may be used to assign more than one name to the current server. Syntax and example:


Syntax:

Hostalias from-server-url to-server-url

Example:

Hostalias http://www2.example.com/  http://www.example.com/

A similar issue arises when scanning development or staging servers. For example, you may wish to scan the site:


http://staging.example.com/

but the site may contain one or more absolute links to http://www.example.com/. In this case, you can use the Mirrorurl command.


Syntax:

Mirrorurl absolute-url

Example:

Homeurl = http://www.example.com/
Mirrorurl = http://staging.example.com/

In this case, LinkScan will resolve all links as if it were scanning http://www.example.com/. However, all actual HTTP requests will be directed to http://staging.example.com/. This provides a convenient mechanism for scanning development and staging copies of a production website.

11.9 How to assign documents to Owners

You may define the ownership of any given document or file in one of several ways. Ownership directives are evaluated in the order specified with the last match taking precedence. Note that the file ownership attribute is case sensitive.

  1. By the Unix File System ownership attribute. Note: this is not supported on Windows systems

  2. By the Defaultowner command. The syntax for the Defaultowner command is:

    Defaultowner owner-name

  3. By pattern matching with one or more Owner commands. The syntax for the Owner command is:

    Owner relative-path-expression owner-name
    OR
    Ownerq relative-path-expression owner-name

    The Owner command operates on the pathname portion of the URL and does not process any query string (following a "?" character). The Ownerq command operates on the entire URL including any query string.

    LinkScan also supports a special variation of the Owner command. This will automatically assign every file an owner-name based on the name of the directory in which it resides. The syntax is:

    Owner *integer

    The default setting (Owner *1) will assign each document to an Owner based on the top-level directory name (i.e. under "www root"). A setting of Owner *2 will cause LinkScan to assign Ownership based on the first two directory names. For example:

    http://www.example.com/first/second/third/index.html

    Will be assigned to the Owner first_second.

  4. By using preexisting META tags in your HTML documents. For example, if your existing documents already contain tags of the form:

    <METa name="S11CONTENT_OWNER" CONTENT="Malcolm Hoar">

    You may set the Owner to 'Malcolm Hoar' by configuring a suitable pattern. e.g.:

    Ownertags = ^meta\s+name\s*=\s*"content_owner"\s+content\s*=\s*"([^"]+)

  5. Finally, once an Owner has been assigned to the file or document, you may manipulate the Owner string with a simple pattern substitution:

    Owneralias .*?([a-zA-Z0-9]+)[\s\.\)]*$ \L$1

    This example would take the string 'Malcolm Hoar' and convert the ownership to 'hoar'. This technique may be used to deal with synonyms such as 'M. Hoar.', 'Malcolm C Hoar '.


Example:

Defaultowner elsop         # Set default
Owner *1                   # Assign Owner based on top level dir ...
Owner wrc/humor/ humor     # But, make this subdir look like top-level
Owner .*\.cgi$ webmaster   # And give all *.cgi files to webmaster

When using LinkScan Dispatch to create reports for delivery by Electronic mail, you may define associations between Owners and Addresses with the Mailalias command. The syntax is:

Mailalias expression list-of-addresses

list-of-addresses may be a comma separated list of addressees if you wish to distribute the report to multiple recipients. Use Mailalias owner-name null to skip a specific Owner.


Example:

Defaultowner elsop         # Set default
Owner *1                   # Assign Owner based on top level dir ...
Owner wrc/humor/ humor     # But, make this subdir look like top-level
Owner .*\.cgi$ webmaster   # And give all *.cgi files to webmaster

Mailalias elsop            malch@elsop.com, ken@elsop.com
Mailalias links            ken@elsop.com
Mailalias linkscan         malch@elsop.com
Mailalias wrc              ken@elsop.com
Mailalias humor            ken@elsop.com
Mailalias test             null

If no Mailaliases are defined, Dispatch will address the reports to Ownername @ Mailhost

11.10 How to process additional per-document data

Facilities are provided to extract additional data from each document scanned, store those data in the LinkScan database and create various reports. The additional data collected are typically collected from the META tags in each HTML document.

Supported commands are provided for data extraction, substitution/manipulation and formatting:


# Userdata [123] match-expression expression
# Userdatafmt [123] [DHLTX] integer[LRC] caption
# D=date; H=hot links; L=link; T=truncate to format; X=normal
# Userdatasub [123] expression expression

The following example illustrates the use of these commands to extract and process an employee badge number from document META tags:


Userdata 1 (?i)<meta\s[^>]*employee\s*=\s*"\s*(#?\d+)\s*" $1
Userdatasub 1 #?(\d+) $1
Userdatafmt 1 X 6R Badge-Number

In the above example, we use the first of the three available userdata fields. The first command extracts the badge number from the document META tag. The second command performs a substitution on the matched data to remove an optional pound symbol from the badge number. The third command defines the formatting attributes; X defines a simple text field; 6R specifies a six-character, right-adjusted layout and Badge-Number defines a simple caption.

During the course of the scan, the employee badge numbers are extracted from each document and stored in the LinkScan database. In fact, the userdata fields are stored in a separate file:


PATH-TO-LINKSCAN/Project-name/data/linkscan.usr

This means that it is relatively simple to post-process the data before creating reports. For example, in this case, one might translate the badge numbers to employee names via a lookup on an employee database. The linkscan.usr file is a simple ASCII file with <Control-G> field delimiters.

The final data may be searched/viewed using the Search Documents Report and/or Changed Document Report.

11.11 How to control the testing of external links

LinkScan includes the capability to maintain a History File containing the date/time tested and status of all external links. This feature may be enabled and controlled via various settings in linkscan.sys.

A Site History Report, available from the main LinkScan Reports Menu, may be used to examine the historic behavior of doubtful links.

Once enabled, the LinkScan History file may be used to avoid testing links to remote servers with an excessive frequency. Appropriate use of the following controls will help ensure that you do not impose unnecessary loads on the network or the remote servers your links access. This feature enables you to be a responsible user of the network. But equally important, it can significantly speed up the testing of large projects. Note: The Site History Feature must be enabled (Maxhist > 0) for these settings to be effective:

Masterhist: Normally, LinkScan will maintain a History file on a per-Project basis. Enabling this feature will force LinkScan to maintain a single History file (in the LinkScan directory) for all Projects. Concurrency control is provided to ensure that the file is not damaged when scanning two or more Projects simultaneously.
[Default: Masterhist = 0 (Disabled) ]

Maxhist: The maximum number of entries maintained in the History File for each external link.
[Default: Maxhist = 0 (Disabled) ]

Maxgoodhours: The maximum number of hours between attempts to retest good external links. The scanning of URL's that have been checked within the specified period is skipped and the LinkScan Reports display the Status Code from the prior test.
[Default: Maxgoodhours = 0 (Disabled) ]

Maxbadhours: The maximum number of hours between attempts to retest bad external links. The scanning of URL's that have been checked within the specified period is skipped and the LinkScan Reports display the Status Code from the prior test.
[Default: Maxbadhours = (Disabled) ]

In addition, the following options are available via linkscan.cfg

Noexternal: Disable the checking of all External links.
[Default: Noexternal = 0 (Disabled) ]

Fetchext: Fetch the document bodies when checking External links. Enabling this option incurs a significant performance and bandwidth overhead. Typically, it is only used in conjunction with the LinkScan Profiler which will enable Fetchext automatically when required.
[Default: Fetchext = 0 (Disabled) ]

Followext: Follow all HTTP redirections.
[Default: Followext = 1 (Enabled) ]

Maxdns: Limit the total number of failed DNS lookups performed on a given hostname. After more than Maxdns failed lookups on the same host, all subsequent links to that host are assumed to be bad. This avoids excessive numbers of timeout trying to resolve the same hostname.
[Default: Maxdns = 3 ]

Retryext: When enabled, LinkScan will track all External links that appear to fail due to network related errors (e.g. DNS, connect and timeout errors). These links will be retested at the end of the scan. This tends to reduce the number of transient errors reported but the scan may require a little more time to complete.
[Default: Retryext = 0 (Disabled) ]

Showredirext: Enable this option when you want LinkScan to warn/report on redirections and store the status of the final (redirected) link.
[Default: Showredirext = 0 (Disabled) ]

How to control the hits on any one server

You may also control the number of hits per server with the following commands in linkscan.sys.

Maxservertries: The maximum number of links that should be tested on any given server when that server is apparently "dead". Once this limit is exceeded, all other links to that server are skipped and assigned an URL Skipped - Bad Server (801) Status Code.
[Default: Maxservertries = 25 ]

Maxftp: The maximum number of links to any single FTP server that should be validated. Once this limit is exceeded, all other FTP links to that server are skipped and assigned a URL Skipped - FTP Limit (802) Status Code.
[Default: Maxftp = 25 ]

FTPUser and FTPPass: Define the username and password that LinkScan will use when validating links to FTP sites.
[Default: FTPUser = anonymous; FTPPass = me@example.com ]

Active Validation of mailto: Links

In a default configuration, LinkScan performs a simple syntax check on mailto: links. Active checking of mailto: links may be configured -- LinkScan uses our Mailvet™ technology to contact the mail servers associated with the specified address and attempts to establish the validity of the address without actually sending a message. To enable this feature:

  1. Ensure the Perl Module Net::DNS is installed on your computer. The Net::DNS Module is available from http://www.net-dns.org/
  2. Configure the Hostname setting in linkscan.sys. This value is used for the SMTP HELO message and, for maximum accuracy, should match the Reverse DNS hostname of your computer. If your computer does not have a Reverse DNS entry, some mail servers configured with anti-SPAM measures may produce false errors.
  3. Configure the Mailfrom setting in linkscan.sys. This value is used for the SMTP MAIL FROM message and, for maximum accuracy, should be a valid (deliverable) return address.
  4. Set Checkmailto = 1 in linkscan.cfg.

On some systems, Net::DNS may not correctly identify the default name servers from your operating system configuration. If you encounter difficulties, please run the following test script:

perl ./utils/dns.pl

You may also configure DNS name server addresses in linkscan.sys by adding an entry such as:


Nameservers = 10.10.10.10, 10.10.10.20

11.12 Other miscellaneous customizations

This section deals with a few other miscellaneous commands:

LinkScan for Unix. Reference Manual. Section 12

Advanced, Custom and Command Line Reports

This Section covers:

  1. Customizing the appearance of LinkScan Menus and Reports
  2. Adding hyperlinks to other applications
  3. Mailing LinkScan reports from a browser
  4. Customizing the LinkScan SiteMap and TapMap
  5. Customizing the LinkScan Status Codes
  6. Creating Reports from the Command Line

12.1 Customizing the appearance of LinkScan Menus and Reports

You may change the appearance of the LinkScan Menus and Reports by creating one or more of the following header/footer files in the LinkScan installation directory:

The link*.* files are used when interactive reports are displayed or static reports are written to disk. The mail*.* files are used when the report is automatically sent via e-mail. The *.html files are used for HTML formatted reports and the *.txt files for plain ASCII text reports.

The *.html files may contain any valid HTML and they will be inserted at the top and bottom of each Menu and Report, respectively. The files linkhead.html and mailhead.html should include at least the following tags:


<html><head>
<title>Your title here</title>
</head><body><nobr>

There is no need to close out the <body> or <html> tags in linkfoot.html or mailfoot.html. LinkScan will always insert a Copyright notice and version stamp after the main body of the report and close out the document with </body></html>.

12.2 Adding hyperlinks to other applications

If the following optional directives are specified in linkscan.cfg, LinkScan will add [Edit] hyperlinks at various points throughout the reports:


Editlink = http://foo/bar.cgi?Url=!URL&Cap=!CAP&Status=!STAT
Editdoc  = http://foo/bar.cgi?Url=!URL&Cap=!CAP&Status=!STAT

The linking URL is constructed from the Editlink and Editdoc settings. Those settings may include the optional tokens !URL, !CAP or !STAT.

These tokens are replaced with %encoded strings containing:

In the case of Internal links (same scheme/host/port as Homeurl) the URL is relative. e.g.

http://foo/bar.cgi?Url=resume.html&Cap=My%20Resume&Status=200

In the case of External links, the URL is absolute. e.g.

http://foo/bar.cgi?Url=http://www.example.com/xyz%3F123&Cap=External=&Status=404

12.3 Mailing LinkScan reports from a browser

A user viewing any LinkScan report with a browser may send a copy of that report to any valid e-mail address.

To enable this feature, you must:

12.4 Customizing the LinkScan SiteMap and TapMap

LinkScan incorporates features that enable the automatic generation of customized, publication quality tables of contents for your Projects. Two types of Maps may be created:

When creating Maps based on Link Order, the presence of cross-links may distort the structure of the report in ways which you find undesirable. Therefore, LinkScan incorporates features that enable you to "manipulate" or override the LinkScan algorithm.

You may customize the structure and content of the SiteMap/TapMap with the following commands in the linkscan.cfg configuration files. Note the the Mapmove command only affects Maps based on Link Order (not the Maps based on Directory Structure).


Mapdefaulttitle [ string ] [ !PATH | !FILE ] [ string ]
Mapinclude relative-path-expression
Maphide relative-path-expression
Maptitle relative-path, Alternative Title
Mapmove relative-path, relative-path, position, [Alternative Title]

By default, all HTML type files are included on the SiteMap/TapMap. The Mapinclude and Maphide commands may be used to modify this behavior as illustrated in the following example:


Examples:

Mapdefaulttitle Pathname: !PATH; Filename: !FILE
Mapinclude .*
Maphide (?i).*\.(gif|jpg)$
Maphide first-doc.html#Top
Maptitle second-doc.html, An Alternative Title for second-doc.html
Mapmove third-doc.html, index.html, 5, Alternative Title

The above example will:

Note that the Mapinclude and Maphide commands accept Regular Expressions. The Mapdefaulttitle, Maptitle and Mapmove commands require exact values.

12.5 Customizing the LinkScan Status Codes

Each link validated by LinkScan is assigned a specific LinkScan Error or Status Code. And, every Status Code is associated with a Severity. You may customize the Severity associated with any Status Code by using the Statuscode command. The command syntax is:


Statuscode statuscode, severitycode

The following Severity codes are valid:

Symbol Code Severity Explanation
* 0 Unknown: LinkScan has not tested or was unable to test this link
* 1 Error: LinkScan found a hard error on this link
* 2 Possible Error: There may be a problem with this link. It should be retested at a later time
* 3 Warning: LinkScan found something unusual about this link. Manual inspection highly recommended
* 4 Advisory: This link is probably ok, but manual inspection recommended
* 5 No Error: This is a good link

Examples:

Statuscode = 301,3    # 301 (Moved Permanently) from Error to Warning
Statuscode = 7,4      #   7 (Orphaned HTML File) to Advisory
Statuscode = 8,4      #   8 (Orphaned non-HTML File) to Advisory

The above commands will downgrade all 301 status codes from Errors to Warnings, and all Orphaned Files from Warnings to Advisories.

12.6 Creating Reports from the Command Line

Command line reports are provided to address the following requirements:

To enable command line reporting, you must create an environment variable called linkscan and set it to any non-null value. Depending on your system/shell the command is:

Unix users may wish to add the appropriate command to their .login or .cshrc files so that the environment variable is automatically initialized at each login.

When LinkScan Reports are generated via the normal browser-based interface, users select the type and style of report by completing and submitting normal HTML forms. Other techniques are required in order to make these selections from the command line interface and several options are provided:

  1. You may specify your selections in a configuration file. An example file with sensible defaults -- linkscan.rep -- is placed in each Project directory automatically.

  2. You may also select a specific report using the interactive browser-based interface and copy/paste the URL to the command line interface (since your selections are already embedded within the name-value pairs on the query string).

Simply execute the program linkscan.cgi and it will prompt you for some or all of the following parameters:

Alternatively, you may specify any or all of these parameters on the command line, as shown by the -help switch:

web:/usr/local/www/data/linkscan> perl linkscan.cgi -help

LinkScan Version 12.1
Copyright 1997-2010 Electronic Software Publishing Corporation

USAGE: linkscan  {-help} {-type type} {-project name} {-owner owner}
                 {-repfile file} {-query string} {-outfile path}
                 {-tty} {-mailto address} {-format n}

-help            Displays this message
-type type       Select report type
-project name    Specify a LinkScan Project
-owner owner     Specify a LinkScan Owner
-repfile file    Specify a filename with the reporting options
-query string    Specify all options in the form of an encoded URL
-outfile path    Specify an output filename
-tty             Output to terminal
-mailto address  Send report to email address
-format n        1=Full HTML; 2=HTML; 3=Plain; 4=text

Detailed Help [Y/N]:

Where the parameter to -type is one of:


Examples:

perl linkscan.cgi -type d -project default -outfile myreport.html

perl linkscan.cgi -query 

Also see the Sections of this Manual covering LinkScan Dispatch and LinkScan QuickCheck. Note there is no command-line interface to LinkScan TapMap due to its interactive nature.

LinkScan for Unix. Reference Manual. Section 13

LinkScan Enterprise/Unlimited Extensions

LinkScan Enterprise and LinkScan Unlimited incorporate the additional option to scan multiple hosts (or virtual hosts) within a single LinkScan Project. The following parameters must be configured in linkscan.cfg for each host:


Host1.URL    = http://www.example.com/
Host1.Short  = www:

Each host must be configured with a one or two digit number in the range 1 to 99. In this context, '1' and '01' are considered to be equivalent.

The URL setting specifies the URL of a specific host. The Short setting specifies an abbreviated form of the URL which is used to save real-estate on the various LinkScan Reports.

In addition, the following per-host parameters are optional:


Host1.Mirror = http://dev.example.com/
Host1.Nocase = 1
Host1.Path   = /usr/vhosts/devex/

The Path setting sets the File System root for this host. The Mirror setting specifies an alternate URL to be used for all HTTP requests. All tags are resolved using the URL setting but any physical HTTP requests are directed to the host specified by the Mirror setting (typically a development/staging server). The Nocase setting may be set to a positive integer to indicate that the specified host uses case insensitive pathnames (i.e. index.html and INDEX.HTML are considered identical).

In addition, when operating in multi-host mode, all of the LinkScan commands that normally include host-relative expressions, must be modified to use Absolute URLs. For example:

Exclude serverlogs/

Should be specified as:

Exclude http://www.example.com/serverlogs/

We can put all of this together with the following example:


# Hostalias -- maps all https: references back to http:
# Extrahome -- submits login form (?? selects POST method)
# Exclude   -- prevents premature logout
# Maxcgi    -- large value to test many query strings

Homeurl = http://www.example.com/
Host1.URL = http://www.example.com/
Host1.Short = www:
Host2.URL = http://app.example.com/
Host2.Short = app:

Hostalias https://www.example.com http://www.example.com
Hostalias https://app.example.com http://app.example.com
Extrahome = http://app.example.com/login??username=xxx&password=yyy
Exclude .*LOGOFF
Maxcgi = 5000

The behavior of the Owner *N command is automatically modified when scanning multiple hosts within a single Project. Ownership is assigned based on the Short name for that host and the top level directory name within that host. Hence, the document:

http://www.example.com/somedir/somefile.html

is assigned to Owner www:somedir.

LinkScan for Unix. Reference Manual. Section 14

LinkScan Support

Technical Support is available via e-mail from Electronic Software Publishing Corporation at mailto:linkscan@elsop.com.

Also see the Support Section of our website at:

When contacting the LinkScan engineers, please try and provide as much of the following information as you can:

LinkScan for Unix. Reference Manual. Section 15

Known Problems and Limitations

LinkScan for Unix. Reference Manual. Section 16

LinkScan Dispatch

[Not available in LinkScan Workstation]

LinkScan Dispatch may be used to create specific reports for each Owner in a Project. The reports may be formatted in either plain text or HTML. They may be saved to disk as static files or dispatched via electronic mail to selected addresses. Before using LinkScan Dispatch you must:

  1. Configure the LinkScan to Email Interface if you wish to distribute any reports via email.

  2. Ensure that you have appropriate document Ownership rules defined. Note that, in a default configuration, LinkScan will create and assign Owners based on the top-level directory names immediately beneath the website root. See also How to assign documents to Owners.

  3. Ensure that you have configured Mailhost in linkscan.cfg. Note that, by default, e-mail reports are sent to Owner@Mailhost. Use the Mailalias command to map specific Owners to specific e-mail addresses. See How to assign documents to Owners.

  4. Successfully complete a scan of the selected website.

  5. Execute dispatch.pl to create the LinkScan Dispatch reports.

Note that LinkScan Dispatch supports the following command line options:

web:/usr/www/htdocs/linkscan> perl dispatch.pl -help     

LinkScan/Dispatch Version 12.1
Copyright 1997-2010 Electronic Software Publishing Corporation

USAGE: dispatch [{-help}] | [{-mail} {-test} {-project name}]
                [-type x {-repfile file} {-outfile file} {-format n}]

-help            Displays this message
-mail            Mails report to user versus storing in saved file
-project name    Specify project name
-test            Send mail to STDOUT -- no mail is sent
-type [xeskdbco] Select report type
-repfile file    Specify a filename with the reporting options
-outfile file    Output filename
-format n        1=Full HTML; 2=HTML; 3=Plain; 4=text
Report Types:
-type x = Project Summary Report
-type e = Problem Documents Report
-type s = Document Detail Report
-type k = Critical Errors Report
-type d = Detailed Errors Report
-type b = Changed Documents Report
-type c = Selected Status Codes Report
-type o = Orphaned Files Report

Detailed Help [Y/N]:

Examples


perl dispatch.pl -project myproj -type k -format 4 -mail

In the example above, Dispatch will create a Critical Errors Report for each Owner within Project myproj and deliver them via e-mail in TEXT format.

The following style of command-line options is also support for compatibility with pre-9.0 versions of LinkScan/Dispatch.


perl dispatch.pl -project myproj -errors 4 -mail 

In the example above, Dispatch will create a Detailed Report for each Owner within Project myproj and deliver them via e-mail in TEXT format.

Adding Custom Headers/Footers to LinkScan Dispatch Reports

When creating Dispatch Reports in plain text format, the following files are automatically inserted into the header and footer of each report:


mailhead.txt
mailfoot.txt

When creating Dispatch Reports in HTML format, the following files are automatically inserted into the header and footer of each report:


mailhead.html
mailfoot.html

LinkScan for Unix. Reference Manual. Section 17

LinkScan Excel

LinkScan is shipped with a Microsoft Excel spreadsheet including some macros. This may be used to import portions of the LinkScan database into Excel for further analysis. The macros are compatible with the following versions of Microsoft Excel:

  1. Open the following file (or a copy of this file if you want to preserve a clean master version) in Microsoft Excel:

    Excel 97: C:\LinkScan10\utils\LinkScan97.xls

    Excel 2000 or later: C:\LinkScan10\utils\LinkScan.xls

  2. Select the Control Sheet and, if necessary, adjust the value of Cell C2. This Cell must contain the pathname to your LinkScan installation folder (e.g. C:\LinkScan10\).

  3. Select the first cell of an empty worksheet. Note that the LinkScan Import Macro always places the imported data starting at the currently selected cell of the current worksheet. Note that the Import Macro will not permit you to import data into the Control Sheet.

  4. Execute the macro LinkScanImport:

    Tools | Macro | Macros... | LinkScanImport | Run

    You may also bind this macro to an Excel Function Key, Menu Item and/or Toolbar.

  5. The LinkScan Macro will display a dialog that allows you to select a LinkScan Project and an Import Function:

    Excel Screenshot

  6. Depending on the Import Function selected, you may be presented with further options. Following confirmation, the selected data will be imported and you may use the full range of Excel features to manipulate the data.

  7. Note that the Control Sheet of the LinkScan.xls workbook is reserved. This spreadsheet is used to control the LinkScan macros. For each Import Function, the sheet defines:

    You may modify the Control Sheet to customize the column order and headings etc. However, care is required, since the macro performs very limited validation on those data values.

LinkScan for Unix. Reference Manual. Section 18

LinkScan Profiler

[Not available in LinkScan Workstation]

The LinkScan Profiler may be used to help identify pages that contain or link to "inappropriate" [1] content. The Profiler operates on a rule-based scoring system.

The profile.txt file in the main LinkScan directory defines the actual rules and associated scores. The default profile.txt file contains some minimal profiling criteria based on the Platform for Internet Content Selection (PICS) standard. Under this standard, many sites include self-ratings in their web pages via META tags. The LinkScan Profiler specifically supports the RASC, ICRA and SafeSurf implementations. See the following References.

A much more comprehensive set of rules is available free of charge from Elsop. Since this implementation of the profile.txt file includes a significant amount of profane and offensive language, it is distributed separately once we receive satisfactory evidence of age verification and a waiver. To obtain a copy of this file, please send e-mail such as:

To: linkscan@elsop.com
From: myname@example.com
Subject: Profiler Request

Please send me a copy of the LinkScan Profiler rules.
I confirm that:

1. I am over 21 years old.

2. I understand that the LinkScan Profiler rules
   contain a significant quantity of profane and
   offensive language including explicit sexual
   depictions.

3. I understand and agree that the LinkScan Profiler
   rules are subject to the same License Agreement
   and restrictions of use as LinkScan itself.

4. I confirm that I will use the LinkScan Profiler
   rules only in conjunction with LinkScan and in
   accordance with the LinkScan License Agreement.
   I shall not re-distribute the Profiler rules to
   any other person or organization.

The message must be sent from a verifiable corporate Email address. Mail sent via semi-anonymous services such as yahoo.com, MSN and AOL is not acceptable. If necessary, we will contact you to make alternative arrangements but Elsop will not supply the LinkScan Profiler files until we are satisfied that the request is made by an adult and is legitimate.

Configuring the Profiler

In a typical configuration, you will need to add the following commands to the Project linkscan.cfg file. On Windows systems they are available via the Advanced Tab of the Project Planning Property Sheet:


Profiler = 2
Profilerlog = 1
Profilermax = 200

The Profiler command enables the LinkScan Profiler. Valid options are:

The Profilerlog command enables a detailed trace indicating exactly what profiling rules were triggered. The log is maintained in the file:

.../LinkScan/Projectname/data/linkscan.red

The Profilermax command sets the trigger threshold for the LinkScan Profiler. The default and recommended setting is 200. Reduce this to 100 to make the Profiler even more sensitive. Increase the value to 300 or more to reduce the sensitivity.

Note: When enabled, the Profiler will force the following settings:


Fetchext = 1
Followext = 1

The Followext command instructs LinkScan to follow redirections when validating the external links. This is the default setting. The Fetchext command instructs LinkScan to fetch the body of a document referenced via an external link. Normally, LinkScan seeks to validate external links without retrieving the document bodies. This enables LinkScan to profile the content but note this will significantly increase the amount of bandwidth and processing required.

Initially, we recommend you complete a full scan with the settings shown above (at the top of this document) and manually review the linkscan.red log file. We think you will find this informative. More importantly, you will be able to decide what threshold to use for subsequent check-ups and whether you want to enable/disable/modify any of the existing rules. Some users may want to whitelist all .gov sites for example.

At the end of the day, only you can decide what links are appropriate for your site and consistent with your editorial policies. Material that may be entirely appropriate for a current affairs website may also be highly undesirable for a site specifically intended for younger children.

Hence you may want/need to review the active rules in the profile.txt file.

Proxy Servers and Firewalls

When LinkScan is operated behind a Proxy Server or Firewall that implements content-based access control policies, then you need to be aware that your proxy/firewall will likely prevent LinkScan from accessing the site. In this case, you will need to implement a Profiler rule which will enable LinkScan to detect the fact that access was denied. The Bess proxy system is widely used by many schools and some Internet Service Providers. When access is denied, the Bess system typically adds a special HTTP header: Pragma: BESSBLOCK The SonicWALL systems typically replace an offending page with a page that includes the phrase "Blocked By SonicWALL". The following header (H) and body (B) rules will detect those conditions:


H BESS-01    2000   pragma: bessblock
B SWALL-01   2000   blocked by sonicwall

References

Definition of Inappropriate

I shall not today attempt further to define the kinds of material I understand to be embraced within that shorthand description; and perhaps I could never succeed in intelligibly doing so. But I know it when I see it...

With apologies to:
Mr. Justice Stewart
United States Supreme Court
JACOBELLIS v. OHIO, 378 U.S. 184 (1964)

LinkScan for Unix. Reference Manual. Section 19

LinkScan QuickCheck

LinkScan QuickCheck serves two functions:

  1. It is invoked automatically via hyperlinks from some of the other LinkScan Reports to display a highly detailed report for a single document.

  2. It may be invoked directly from the main LinkScan Reports Menu and used to check (or recheck) an single document or link.

Each QuickCheck Report includes several items of information that are transparently integrated:

QuickCheck has a strong affinity for the LinkScan database. If the data are available in the database associated with the currently selected Project, QuickCheck will seek to ascertain the status of each link using the database and the status found during the last full scan. If this is not available, or the requested document lies outside the scope of the current Project, QuickCheck will perform a full link analysis on that document in real-time.

If QuickCheck has pulled the link status data from the database, the user may force a fresh, real-time scan of that document. This is useful when, for example, you want to recheck a single document after making changes to it. Simply use the Recheck Now option included on each Report.

HTML Syntax Checking

By default, LinkScan QuickCheck will invoke the Weblint program to check for any HTML syntax errors. Weblint validates against the HTML 3.2 specifications.

QuickCheck includes a mechanism that permits integration with other HTML validators and the OpenSP program in particular. The OpenSP program permits validation against any SGML Document Type Definition (DTD). For more on OpenSP, see http://sourceforge.net/projects/openjade/.

LinkScan for Windows includes a copy of the OpenSP program together with a small number of DTD's including HTML 3.2, HTML 4.01 and XHTML 1.0. Unix users must download the OpenSP source code from the above URL and compile it. Additional DTD's are available from many public sources such as the World Wide Web Consortium. One large (but not terribly well organized) collection is known as sgml-lib.

To enable OpenSP, simply add the following commands to the linkscan.sys file, adjusting the file system pathnames as appropriate:

Windows Systems

Checkerpath = C:/LinkScan10/OpenSP/onsgmls.exe
Checkeroptions = onsgmls -s -c C:/LinkScan10/SGML/catalog
Checkerformat = ^.*?:(\d+):\d+:(?:E:)?\s*(.*)

Unix Systems

Checkerpath = /usr/local/bin/onsgmls
Checkeroptions = -s; -c; /usr/local/SGML/catalog
Checkerformat = ^.*?:(\d+):\d+:(?:E:)?\s*(.*)

Note: the Checkeroptions directive may also be overridden on a per-Project basis by inserting a command in the Project linkscan.cfg file. This enables users to use different options and SGML catalogs with different LinkScan Projects.

The Checkerformat command should not normally be changed. It is used to control the parsing of the checker program output. The Perl Regular Expression places line numbers into $1 and the error message into $2.

Additional Background

We also found the following references provided valuable primers on some of the applicable SGML/XML concepts, and the organization of a suitable catalog configuration file in particular:

Some Solaris users have reported difficultly building OpenSP from sources. Jim Clark's SP program will likely prove easier to build. As a pre-cursor to OpenSP, it is largely plug-compatible. However, there is one significant limitation; SP does not support DTDDECL directives in the catalog.

LinkScan QuickCheck Command Line Interface

You may also run LinkScan QuickCheck from the command line in exactly the same manner as the linkscan.cgi program as show below:

web:/usr/www/htdocs/linkscan> perl quick.cgi -help        

LinkScan/QuickCheck Version 12.1
Copyright 1997-2010 Electronic Software Publishing Corporation

USAGE: quick.cgi {-help} {-url URL} {-project name}
                 {-repfile file} {-outfile path} {-tty}
                 {-mailto address} {-format n} {-now} {-http}

-help            Displays this message
-url URL         Specify the URL to be scanned
-project name    Specify a Project. Equivalent to -site
-repfile file    Specify a filename with the reporting options
-outfile path    Specify an output filename
-tty             Output to terminal
-mailto address  Send report to email address
-format n        1=Full HTML; 2=HTML; 3=Plain; 4=text
-now             Perform real-time check
-http            Force HTTP Access

Detailed Help [Y/N]:

Example:

perl quick.cgi -project default -url http://www.example.com/index.html -tty

The above example will run QuickCheck against http://www.example.com/index.html, reading the options from linkscan.rep and displaying the results on the terminal.

LinkScan for Unix. Reference Manual. Section 20

LinkScan Recorder

Introduction

The LinkScan™ Recorder is a Windows feature that fully integrates with LinkScan and Microsoft Internet Explorer. [Unix users see below].

The Recorder may be used to capture real web browsing sessions, such as a complex order entry sequence. The captured recording includes all of the data entered into any associated forms. LinkScan may then be configured to replay the recording on demand, validating every link on each form and results page in the sequence.

Hence LinkScan and the LinkScan Recorder provide powerful and convenient capabilities for the rapid and comprehensive regression testing of complex transaction-based systems.

Applications

The principal applications of the LinkScan Recorder are:

  1. To capture user-sequences, such as an on-line shopping or purchase procedure. These are typically complex sequences that are time consuming to test regularly and comprehensively. They are also tend to be some of the most important pages on a website or Intranet application.

    Once a sequence has been recorded, you may use the LinkScan Recorder to replay it and display the results in an Internet Explorer Window. More importantly, LinkScan may be configured to automatically replay the same steps and validate every link on each page in the sequence.

  2. To capture cookies and pre-load those values into LinkScan's internal cookie jar at the commencement of a scan. This may be used to achieve user authentication or other effects. Note however, that you may need to capture new values before each scan if the cookies are session-based and/or have some built-in expiration.

  3. To capture special URL's that are used to define the start of a site scan. This is typically required when the site uses a login page and cookie arrangement for access control.

    Note: forms-based login procedures are completely different from HTTP authentication schemes. In the first case, users fill out a regular HTML form. In the latter case, the users browser presents an authentication challenge within a pop-up dialog box.

Using the LinkScan Recorder

Access the LinkScan Recorder by selecting the Recorder Tab on the main LinkScan Window. The LinkScan Recorder panel looks like this:

LinkScan Recorder

The upper half of the interface displays the links associated with the current recording together with a number of simple command buttons:

The lower half of the interface displays the cookies associated with the current selected link. The following buttons are available:

Note also that the LinkScan Browser Panel displays a button to indicate whether the LinkScan recorder is currently active (i.e. recording). Press the button to pause/restart the current recording session.

Saving a Recording to Disk

Once you have completed a recording, use the Save button to write the recording to disk. The Save (and Load) dialogs offer several options:

In all cases, the saved data are stored in plain ASCII text and may be edited using Windows Notepad or any other similar program.

Using a Saved Recording as Part of a Scan

You may turn this feature on and off by opening the Project Planning property sheet and selecting the Login Tab.

LinkScan Recorder and Unix Systems

The LinkScan Recorder is a Microsoft Windows application and does not run on Unix systems. A special distribution that permits LinkScan/Unix clients to install the Recorder on a Windows workstation is in preparation but not available at the time of writing. Email <linkscan@elsop.com> for the latest status and to request a copy when available.

Special Considerations

The following points are worthy of note and consideration:

  1. The data captured by the LinkScan Recorder includes POSTED form values that are normally invisible/hidden. The name-value pairs are represented using the special LinkScan URL convention based on the double question-mark. Hence forms utilizing the GET method are represented in the normal manner, for example:

    http://www.example.com/form.cgi?Name=John%20Doe&Country=USA

    Whereas, forms utilizing the POST method are represented thus:

    http://www.example.com/form.cgi??Name=John%20Doe&Country=USA

LinkScan for Unix. Reference Manual. Section 21

LinkScan TapMap

This hyperlink activates the LinkScan TapMap - an interactive and highly dynamic variation of the LinkScan SiteMap. TapMap is an expandable and collapsible SiteMap that allows viewers to tap down through the various levels of a website to easily navigate and explore the website by clicking on a few control icons.

See TapMap Overview and Legend for a brief description of the TapMap control icons.

LinkScan for Unix. Reference Manual. Section 22

LinkScan WebServer

The LinkScan WebServer is a small, easy-to-configure, HTTP compliant webserver. It enables interactive query and reporting capabilities from the LinkScan database via a standard web browser interface. The LinkScan WebServer supports a surprisingly large number of features found in more complex products but, with the emphasis on simplicity. Features include:

The LinkScan WebServer operates on Windows Systems only. Unix users should see LinkScan and Various Web Servers.

Installation and Operation

The LinkScan WebServer is installed and configured automatically when you install LinkScan on a Windows System.

Additional configuration options are available via the LinkScan System Options Property Sheet.

LinkScan for Unix. Reference Manual. Section 23

LinkScan Pinger

LinkScan Pinger is a small self-contained utility that may be used to periodically check a list of URL's and raise e-mail alarms if certain error conditions arise.

On each pass, the LinkScan Pinger will access each of the supplied URL's and log the results to a simple text file. Optionally, it may be configured to send e-mail alarms to one of more addresses if certain error thresholds are exceeded. In addition to generating alarms based on link status, the LinkScan Pinger may also be configured such that the document body for a given URL *must contain* (or must not contain) a specific string/expression.

This means the Pinger may be used to ensure the availability of back-end databases and other services as well as the uptime of the basic network/webserver functions.

In order to use the LinkScan Pinger you must:

We have designed the LinkScan Pinger configuration file to be extremely simple yet flexible. In many cases, it is only necessary to enter a list of URL's to be checked. Optionally, an email address (or comma-separated list of addresses) may be entered if alarm messages are to be generated.

# Pinglog      = Pinger log file
# Pingsecs     = Interval (seconds) between "pings" (perl pinger.pl -repeat)
# Probe        = Diagnostic trace; record HTTP headers in Pinglog
# Followext    = Follow redirections

Pinglog = pinger.log
Pingsecs = 600
Probe = 0
Followext = 0

# Pingmail     = E-mail address (comma-separated list) to receive alarm messages
# Pingsubj     = Subject line for e-mail alarm messages
# Pingsev      = Establish alarm thresholds

Pingmail = 
Pingsubj = LinkScan Pinger Alarm
Pingsev = 0,1    # One or more Status Unknown
Pingsev = 1,1    # One or more Errors
Pingsev = 2,1    # One or more Possible Errors
Pingsev = 3,2    # Two or more Warnings
Pingsev = 4,2    # Two or more Advisories

# Url          = Links to be "pinged" on each pass
# Url          = absolute-url [must-contain-expr  must-not-contain-expr]
# URL's may be followed by one or two optional Regular Expressions
# These are matched against the document body. In the following example
# the page returned from http://www.yahoo.com/ must match the string "Yahoo".
# And it must not match the expression "not\sfound"
#
# Url = http://www.yahoo.com/ Yahoo not\sfound
# Url = http://www.google.com/

Url = 

To execute the LinkScan Pinger:

perl pinger.pl [-repeat] [-test]

none       Test each configured URL once only

-repeat    Cycle continuously testing each URL every "Pingsecs" seconds.

-test      Single pass, forcing at least one error to generate an e-mail alarm

LinkScan for Unix. Reference Manual. Section 24

Weblint Man Page


weblint 1.020                                   weblint 1.020 

NAME

weblint - pick fluff off web pages (HTML)

SYNOPSIS

weblint [ -d id ] [ -e id ] [ -f filename ] [ -i ] [ -l ] [ -s ] [ -stderr ] [ -t ] [ -todo ] [ -help ] [ -U ] [ -urlget command ] [ -v ] [ -version ] [ -warnings ] [ -x extension ] file1 .. fileN

DESCRIPTION

Weblint is a Perl script which picks fluff off HTML pages. Files to be checked are passed on the command-line: % weblint foobar.html ./dodgy-files/ index.html If any of the arguments are directories weblint will recurse in the directory, and check any HTML files found. If an argument is a URL, then weblint will get the file using a URL retrieval program, and then check the file: % weblint http://www.foobar.com/ By default weblint will use lynx to retrieve URLs, but this can be over-ridden. A filename of `-' specifies that weblint should read from standard input: % lynx -source http://www.foobar.com/ | weblint - Warnings are generated a la lint: home.html(9): unmatched </A> (no matching <A> seen). Weblint includes the following features: + by default checks for HTML 3.2 (Wilbur) + 46 different checks and warnings + Warnings can be enabled/disabled individually, as per your preference + basic structure and syntax checks + warnings for use of unknown elements and ele- ment attributes. + context checks (where a tag must appear within a certain element). + overlapped or illegally nested elements. + do IMG elements have ALT text? + flags obsolete elements. + support for user and site configuration files + stylistic checks + checks for html which is not portable across all browsers + flags markup embedded in comments, since this can confuse some browsers + support for Netscape, and Microsoft HTML exten- sions

OPTIONS

-d warning-identifier Disable the warning associated with the identifier. Multiple identifiers can be specified, with a comma between identifiers. -e warning-identifier Enable the warning associated with the identifier. Multiple identifiers can be specified, with a comma between identifiers. -f config-file Specify a weblint configuration file which should be used in place of the user's default config file, or the site configuration file. -help Show a short usage summary. -i Ignore case of element tags. -l When recursing in directories, ignore any files which are symlinks (also known as soft links). This will also cause files on the command-line to be ignored if they are symlinks, unless only one file is given. -pedantic Turn on all warnings except the case-sensitive and bad-link warnings. -s Generate `short' warning messages, which do not include the filename. -stderr Print warning messages to STDERR rather than STD- OUT. -t Enable terse warning mode, which is mainly useful for the weblint testsuite. -U Same as -help. -urlget command The command which should be used to retrieve HTML pages specified by URL. -v Display the version number. -version Display the version number. -todo This prints out the URL for the online version of the weblint ToDo list. This includes known bugs, and requested/planned features. -warnings List all supported warnings, with warning identi- fier, and whether the warning is enabled. -x extension Include checks for the specified HTML extension; multiple extensions can be specified, separated with a comma. Currently the only extensions sup- ported are Netscape and Microsoft. This can also be set in your weblint configuration file, described below.

HTML EXTENSIONS

Unless you specify otherwise, weblint assumes you are using HTML 3.2. Weblint supports the Netscape and Microsoft HTML extensions in addition. For example, weblint will complain that the BLINK element is not known, unless you enable the Netscape extension. The following extensions are currently supported: Netscape The HTML extensions supported by the Netscape browser, version 4. Microsoft The HTML extensions supported by Microsoft Internet Explorer, version 4. To enable an extension, you can either use the -x command- line switch: % weblint -x Netscape foobar.html Or you can use the extension keyword in your .weblintrc: # enable the Microsoft extensions extension Microsoft

CONFIGURATION FILE

Weblint can be configured using a file .weblintrc in your home directory (or a file referenced by the WEBLINTRC environment variable). This file can be used to enable or disable specific warnings, set weblint variables, and include HTML extensions, as described above. Each warning has a short identifier string, used to refer to the warn- ing in config files, and from the command-line. For exam- ple, if you want to enable the check for tags in upper- case, but disable the check for obsolete elements, then you would include the following lines in your .weblintrc: # specify the command used to retrieve URLs (-urlget switch) set url-get = lynx -source # the style of warning message to generate (lint, short, or terse) set message-style = lint # enable warning for tags not in upper-case enable upper-case # disable the warning for obsolete tags disable obsolete # enable the Netscape HTML extensions extension Netscape # when recursing in a directory, # ignore files which are symlinks (also known as soft links) ignore symlinks The keywords can be followed by any number of arguments, separated by spaces or tabs. Anything following a `#' is treated as a comment. A sample configuration file is included in the weblint distribution (as of version 1.004), which mirrors the con- figuration built-in to weblint. Weblint also supports a site configuration file. If a user does not have a personal configuration file, then weblint will check for a local site configuration file. To provide such a file, create a directory such as /usr/local/weblint, and create a file global.weblintrc. You need to edit the weblint script and modify the $SITE_DIR variable, which you will find near the top of the file. For example: $SITE_DIR = '/usr/local/weblint'; At some point in the future there will be configuration support for weblint, so you won't have to modify the script directly yourself. If you have a site configuration file, then users can inherit the site defaults by adding the following line at the top of their .weblintrc file: use global weblintrc

WARNINGS

All warnings generated by weblint are listed below, along with the associated identifier, and whether the warning is enabled or disabled by default. tag <...> is not in upper case. Identifier: upper-case Default: disabled tag <...> is not in lower case. Identifier: lower-case Default: disabled foo attribute is required for <...> Identifier: required-attribute Default: enabled expected an attribute for <...> Identifier: expected-attribute Default: enabled unknown element <...> Identifier: unknown-element Default: enabled unknown attribute `...' for element <...>. Identifier: unknown-attribute Default: enabled should not have whitespace between `<' and `...>' Identifier: leading-whitespace Default: enabled bad form to use `here' as an anchor! Identifier: here-anchor Default: enabled no <TITLE> in HEAD element. Identifier: require-head Default: enabled tag <...> should only appear once. I saw one on line XX! Identifier: once-only Default: enabled <BODY> but no <HEAD>. Identifier: body-no-head Default: enabled outer tags should be <HTML> .. </HTML>. Identifier: html-outer Default: enabled <...> can only appear in the HEAD element. Identifier: head-element Default: enabled <...> cannot appear in the HEAD element. Identifier: non-head-element Default: enabled <...> is obsolete. Identifier: obsolete Default: enabled unmatched </...> (no matching <...> seen). Identifier: mis-match Default: enabled IMG does not have ALT text defined. Identifier: img-alt Default: enabled <...> cannot be nested. Identifier: nested-element Default: enabled Did not see <LINK REV=MADE HREF=mailto:...> in HEAD. Identifier: mailto-link Default: disabled </...> on line XX seems to overlap <...>, opened on line YY. Identifier: element-overlap Default: enabled no closing </...> seen for <...> on line XX. Identifier: unclosed-element Default: enabled markup embedded in a comment can confuse some browsers. Identifier: markup-in-comment Default: enabled odd number of quotes in element <...>. Identifier: odd-quotes Default: enabled heading <H?> follows <H?> on line N. Identifier: heading-order Default: enabled target for anchor Identifier: bad-link Default: disabled unexpected < in <...> -- potentially unclosed element. Identifier: unexpected-open Default: enabled illegal context for <...> - must appear in <...> element. Identifier: required-context Default: enabled unclosed comment (comment should be: <!-- ... --> Identifier: unclosed-comment Default: enabled element <...> is not a container -- </...> not legal. Identifier: illegal-closing Default: enabled <...> is physical font markup -- use logical (such as XXX) Identifier: physical-font Default: disabled attribute XYZ is repeated in element <...> Identifier: repeated-attribute Default: enabled empty container element <...> Identifier: empty-container Default: enabled use of ' for attribute value delimiter is not supported by all browsers (attribute XYZ of tag ABC) Identifier: attribute-delimiter Default: enabled closing tag <...> should not have any attributes speci- fied. Identifier: closing-attribute Default: enabled directory DIR does not have an index file (index.html) Identifier: directory-index Default: enabled <...> must immediately follow <...> Identifier: must-follow Default: enabled setting WIDTH and HEIGHT attributes on IMG tag can improve ren- dering performance on some browsers Identifier: img-size Default: disabled leading/trailing whitespace in content of container element ... Identifier: container-whitespace Default: disabled first element was not DOCTYPE specification Identifier: require-doctype Default: disabled `>' should be represented as `>' Identifier: literal-metacharacter Default: enabled malformed heading - open tag is <H?>, but closing is </H?> Identifier: heading-mismatch Default: enabled illegal context, <...>, for text; should be in XXX. Identifier: bad-text-context Default: enabled illegal value for AAA attribute of XXX (...) Identifier: attribute-format Default: enabled <...> is extended markup (use '-x <extension>' to allow this). Identifier: extension-markup Default: enabled attribute `...' for <...> is extended markup (use '-x <exten- sion>' to allow this). Identifier: extension-attribute Default: enabled value for attribute XYZ (xyz-value) of element FOOBAR should be quoted (i.e. XYZ='xyz-value') Identifier: quote-attribute-value Default: enabled you should use '>' in place of '>', even in a PRE ele- ment. Identifier: meta-in-pre Default: enabled <A> should be inside <H?>, not <H?> inside <A>. Identifier: heading-in-anchor Default: enabled The HTML spec. recommends the TITLE be no longer than 64 charac- ters. Identifier: title-length Default: enabled

TESTSUITE

A simple regression testsuite is included with weblint, in the Perl script test.pl. You can run the testsuite with either of the following commands: % make test % ./test.pl The results are printed to STDERR, with a more complete report generated in test.log. All tests should pass. If any tests fail, please email test.log to the address given in the AUTHOR section below.

ENVIRONMENT VARIABLES

WEBLINTRC If this variable is defined, and references a file, then weblint will read the referenced file for the user's configuration, rather than $HOME/.weblintrc. TMPDIR The directory where weblint will create temporary working files. Defaults to /usr/tmp.

FILES

$HOME/.weblintrc The user's configuration file. See the section `CONFIGURATION FILE'.

SEE ALSO

perl(1)

VERSION

This man page describes weblint 1.020.

AVAILABILITY

ftp://ftp.cre.canon.co.uk/pub/weblint/weblint.tar.gz http://www.cre.canon.co.uk/~neilb/weblint/

KNOWN BUGS

The list of known bugs can be found on the weblint home page: http://www.cre.canon.co.uk/~neilb/weblint/todo/ Certain versions of Perl have bugs which are triggered by weblint. You shouldn't experience problems if you have 4.036, or 5.002.

AUTHOR

Neil Bowers, Canon Research Centre Europe neilb@cre.canon.co.uk

CONTRIBUTIONS

Lots of people have contributed to weblint, in the form of suggestions, bug reports, fixes, and contributed code. Please email me if your name should appear in the roll call below. Abigail <abigail@mars.ic.iaf.nl>; Anthony Thyssen <anthony@cit.gu.edu.au>; Axel Boldt <axel@uni-pader- born.de>; Barry Bakalor <barry@hal.com>; Bill Arnett <billa@netcom.com>; Bob Friesenhahn <bfriesen@simple.dal- las.tx.us>; Mark Gates <mr-gates@uiuc.edu>; Bruce Speyer <bspeyer@texas-one.org>; Chris Siebenmann <cks@hawk- wind.utcs.toronto.edu>; Clay Webster <clay@unipress.com>; Dana Jacobsen <dana@acm.org>; David Begley <david@bacall.nepean.uws.edu.au>; David J. MacKenzie <djm@va.pubnix.com>; Douglas Brick <dbrick@u.washing- ton.edu>; Gil Citro; Eric de Mund <ead@ixian.com>; Richard Finegold <goldfndr@eskimo.com>; Joerg Heitkoetter <Joerg.Heitkoetter@germany.eu.net>; David Koblas <koblas@homepages.com>; John Labovitz <johnl@ora.com>; Eric Maryniak <E.Maryniak@rgd.nl>; John F. Whitehead <jfw@wral-tv.com> Juergen Schoenwaelder <schoenw@ibr.cs.tu-bs.de>; Frank Steinke <fsteinke@zeta.org.au>; Larry Virden <lvirden@cas.org>; Paul Black <black@lal.cs.byu.edu>; Doug Grinbergs <dougg@qualcomm.com>; Philip Hallstrom <philip@wolfe.net>; Craig Leres <leres@ee.lbl.gov>; Richard Lloyd <R.K.Lloyd@csc.liv.ac.uk>; Charles F. Randall <cran- dall@dmacc.cc.ia.us>; Robert Schmunk <pcrxs@nasagiss.giss.nasa.gov>; Jeff Schave <schave@engr.wisc.edu>; Jon Thackray <jrmt@uk.gdscorp.com>; Jens Thordarson <thor- durh@rhi.hi.is>; Ryan Waldron <rew@nuance.com>; Thomas Leavitt <leavitt@webcom.com>; Tom Neff <tneff@panix.com>; Victor Parada <vparada@inf.utfsm.cl>; Erick Branderhorst <branderhorst@fgg.eur.nl>; Bryan O'Sullivan <bos@serpen- tine.com>; Alan J. Flavell <FLAVELL@v2.ph.gla.ac.uk>; Raphael Manfredi <Raphael_Manfredi@grenoble.hp.com>; Keith Iosso <a-keithi@microsoft.com>; Chris Lambert <lam- bertc@sharelink.com>; Tristan Savatier <tristan@cre- ative.net>; Phil Hooper <hooper@bcci.eng.sun.com>; Gerald Viers <grviers@csupomona.edu>; Dean Brissinger <briss- ing@bvsd.k12.co.us>; Dave Schmitt <dschmi1@gl.umbc.edu>; John Van Essen <vanes002@maroon.tc.umn.edu>; Brandon Bell <brandon@arcs.bcit.bc.ca>; Fumio Moriya and Toshiaki Nomura <dsfrsoft@oai6.yk.fujitsu.co.jp>; Vincent Lefevre <vlefevre@ens-lyon.fr>; Jason Mathews <math- ews@nssdc.gsfc.nasa.gov>; Lars Balker Rasmussen <lbr@mjol- ner.dk>; Richard L. Hawes <rhawes@dmapub.dma.org>.

LinkScan for Unix. Reference Manual. Section 25

Glossary of Terms

This section define some LinkScan constructs and related terminology with reference to various standards, where appropriate:

1. Projects 2. Owners 3. Usernames
4. Virtual Hosts 5. Pathnames 6. Pathname Expressions
7. Home Directory 8. LinkScan Directory 9. Project Directory
10. Uniform Resource Locators (URL's) 11. Internal Links 12. External Links
13. Orphaned Files 14. HyperText Markup Language (HTML) 15. HyperText Transfer Protocol (HTTP)
16. File Transfer Protocol (FTP) 16. HTTP Scanning 18. File System Scanning
19. Import Scanning 20. Perl Regular Expressions 21. Content-Type/MIME
22. Date and Time Last-Modified 23. Document Weight 24. Click Depth

25.1 Projects

LinkScan is able to scan multiple websites. It can also scan the same website multiple times with different configuration options. In each case, LinkScan creates a unique and corresponding LinkScan Database containing the results of the analysis. Together, the configuration files and database constitute a LinkScan Project.

Each LinkScan Project is stored within a subdirectory of the main LinkScan installation directory.

Hence users must always select a Project when scanning a website. Any they must select a Project when viewing the results.

25.2 Owners

Within each Project, you may also configure multiple LinkScan Owners. Collections of HTML documents and other files are assigned between Owners in a variety of ways:

The LinkScan Owner concept enables individual content developers or workgroups to view results that pertain to their documents or areas of responsibility.

25.3 Usernames

LinkScan incorporates access controls that may be used to limit user access to LinkScan databases and results. These controls are not enabled by default.

When activated, users may be required to login to the LinkScan system used a pre-defined LinkScan Username and associated password. The Username will define the Projects and Owners that an individual user is permitted to access.

25.4 Virtual Hosts

A Virtual Host is the Fully Qualified Domain Name (or IP address) of a network host configured on your server. Many servers are configured for a single Virtual Host but others are configured to support multiple Virtual Hosts. You must define at least one LinkScan Project for each Virtual Host that you wish to test.

25.5 Pathnames

Pathnames are used to refer to directory structures. They may be Relative or Absolute. Note also that Pathnames are used in the URL context and the File System context. For example:

/usr/www/htdocs/products/widget.html          # Absolute pathname, file system context
C:/www/products/widget.html                   # Absolute pathname, file system context
http://www.example.com/products/widget.html   # Absolute URL
../products/widget.html                       # Relative link, URL or file system context

LinkScan makes extensive use of a normalized representation such that the documents referred to above would be referenced as:

products/widget.html

This offers the advantages of brevity and consistency, since products/widget.html may typically be used to refer to both:

C:/www/products/widget.html and
http://www.example.com/products/widget.html

The normalized format is referred to in this document as relative-path.

25.6 Pathname Expressions

Many LinkScan customization features refer to relative-path-expression. That is a Perl Regular Expression matching a relative-path.

25.7 Home Directory

The directory on your server that is considered to be the root directory of your HTTP server. Sometimes known as www root.

25.8 LinkScan Directory

The directory on your computer where LinkScan is installed.

25.9 Project Directory

A subdirectory of the LinkScan Directory containing the configuration and data files associated with a specific Project.

25.10 Uniform Resource Locators (URL's)

The various Uniform Resource Locator formats are defined in RFC 2396.

25.11 Internal Links

Internal Links are defined as links to the current Project.


Examples:

<a href="filename.html">This is an Internal Link</a>

<a href="http://www.elsop.com/index.html">This is an Internal
Link if the current Project is http://www.elsop.com/</a>

25.12 External Links

External Links are defined as links specified using an Absolute URL to any Project other than the current Project.


Example:

<a href="http://www.otherdomain.com/">This is an External Link</a>

25.13 Orphaned File

Orphaned Files are defined files present in the Home Directory (or any subdirectory thereof) which cannot be reached via one or more internal links from the Home Page.

25.14 HyperText Markup Language (HTML)

The HyperText Markup Language (HTML 3.2) lies at the heart of the World Wide Web.

LinkScan attempts to parse the HTML source code according to the published standards. However, as with all web browsers, the results can be unpredictable when the HTML source code deviates from the specifications. Experience with LinkScan indicates that the following points are worthy of note.

25.15 HyperText Transfer Protocol (HTTP)

The HyperText Transfer Protocol (HTTP 1.0) has been used for World Wide Web communications since 1990. In January 1997, the first specifications for HTTP 1.1 were published. LinkScan exploits many HTTP features to establish the status of the external links.

In most cases LinkScan is able to definitively establish the status of any given link. However, at any moment in time a small proportion of links (typically around 5%) are temporarily unavailable. In such cases, LinkScan will make two attempts to reach the site before flagging those URL's as "Possible Errors" to be retested at a later time (automatically or manually).

An even smaller percentage of sites are accessible via a web browser but fail to return message headers in accordance with the HTTP specifications. In many cases, LinkScan is still able to establish the status, but a few sites are so grossly non-compliant that LinkScan will return an "Unknown Error" to flag them for manual testing. In tests, only one or two sites per thousand fell into this category.

25.16 File Transfer Protocol (FTP)

The File Transfer Protocol (FTP) is a relatively old standard, compared to HTTP. See RFC 640.

25.17 HTTP Scanning

Typically, LinkScan accesses the scanned website via the Network and HTTP. This is an appropriate method in most cases.

25.18 File System Scanning

Optionally, LinkScan may be configured to access part of all of the scanned website by direct access to all of the website files on your computers file system. This offers several advantages and disadvantages:

Note that LinkScan may also be configured to scan a site using a combination of both the HTTP and File System Methods. This powerful capability my be used, for example, to enable HTTP Scanning of website content and the comparison of the results with those from File Systems Scanning to reconcile the Orphaned Files.

25.19 Import Scanning

In addition to HTTP Scanning and File System Scanning, LinkScan supports a third mode of operation; Import Scanning. This is used to validate lists of Documents or Links that are imported from simple text files. The Import Lists may be prepared manually but it is more common for them to be exported from a database management system or other application.

25.20 Perl Regular Expressions

LinkScan incorporates a vast array of customization features many of which exploit the power of Perl Regular Expressions. For a description of Perl Regular Expressions on Unix systems, see man perlre. HTML versions are available at many locations including:

http://www.perl.com/doc/manual/html/pod/perlre.html

We also recommend the book Mastering Regular Expressions (a.k.a. the Owl Book) by Jeffrey E.F. Friedl, and published by O'Reilly [ISBN: 1-56592-257-3].

25.21 Content-Type/MIME

When files are served via the Hypertext Transfer Protocol (HTTP) the normal conventions with respect to file extensions do not apply. The content of the file is defined by a HTTP Content-Type header (a.k.a. MIME type). Common examples include:

Content-Type: text/html
Content-Type: image/gif

25.22 Date and Time Last-Modified

LinkScan always attempted to store a date/time stamp with each document to indicate when the file was last modified. When scanning via the File System, LinkScan is able to capture this data directly from the operating system. However, when LinkScan does not have direct access to the server File System, it looks for a HTTP Last-Modified header. Most web server supply this when serving static HTML documents (without Server Side Includes). However, it is typically not supplied when serving dynamic pages and the data may not be available. Note however, that LinkScan does have the ability to extract information of this type from META tags when available -- see How to process additional per-document data.

25.23 Document Weight

LinkScan calculates the total weight of each document. This calculation is based on the total in-line byte count and takes account of:

25.24 Click Depth

LinkScan tracks and stores the depth of each document during the course of the scan. The depth reflects the number of hyperlinks the use must click to reach the target starting from the initial URL. Note that LinkScan uses a deepest-first algorithm to scan a site. In general, the click-count is not incremented when following:

LinkScan for Unix. Reference Manual. Section 26

LinkScan Quick Reference Card

Basic Casesensitive Homefile Homeurl Http
  Organization Projectdesc    

CustomReport Displaylang Editdoc Editlink Jisencode
  Reportsdir Statuscode    

CustomScan Auth Autoencspace Closeatag Collectmeta
  Cookie Errorbody Errorbodyext Errordoc
  Execute Extraheader Extrahit Extrahome
  Followframes Gsmchangefreq Gsmlevels Hostalias
  Imgtags Insertlink Maxdocbytes Maxredir
  Mimetypes Mirrorurl Noforms Noindex
  Probe Profiler Profilerlog Profilermax
  Relaxanchor Sessionmatch Showredirectext Substitute
  Substituteraw Substitutescript Usecookiefile Useloginfile
  Userdata Userdatafmt Userdatasub Xmeta
  Xmlmatch Xmlnomatch    

Database Tagonce      

Dispatch Dispatchsort Mailalias Mailhost Mailnoerr
  Maxsev Sendmailpath    

External Checkmailto FTPPass FTPUser Fetchext
  Followext Hostname Mailfrom Masterhist
  Maxbadhours Maxdns Maxftp Maxgoodhours
  Maxhist Maxservertries Nameservers Noexternal
  Retryext      

File Alias Autohttp Checkorphans Defaultpages
  Expandssi Flashfiles Homedir Htmlfiles
  Indexoptions Mapfiles Maxdirlevels Noorphan
  Noorphans Onlyorphans Orphanfile Pdffiles
  Redirect Textfiles    

Import Import Importfile    

JavaScript Scriptdisable Scriptexclude Scriptmatch Scriptnomatch
  Selecturl      

Misc Unsafechar      

Owner Defaultowner Owner Owneralias Ownerq
  Ownertags      

Scope Exclude Excludecookie Excludehidden Mask
  Maxcgi Maxclicks Maxdocs Maxlevels
  Nofollow Onlyfollow Onlyinclude Taglimit
         

Security Access Httpauth Linkscancookie Mailto
  Noprojectlist Nostaticmenu Notapmapoptions Winhttp
         

SiteMap Mapdefaulttitle Mapext Maphide Mapinclude
  Mapmove Maptitle    

System Cgibinurl Docsurl Httpsproxyport Httpsproxyserver
  Key LicenseNumber Licensee Linespeed
  Linkscandir Linkscanurl Longurls Masterport
  Msiis Noproxy Perlpath Proxyauth
  Proxymatch Proxyport Proxyserver Slaves1
  Slaves2 Slavesfast1 Slavesfast2 Smtphost
  Timeout1 Timeout2 Weblintoptions Weblintpath

Access [1] Syntax: Access username : password : project-list : owner-list : menu-options
Category: Security Default: Access * : * : * : * : *
Type: Multi-valued Used by: linkscan.sys
 
Activates the Access Controls on the LinkScan Reports. Not enabled by default; see references.
 
Alias [1] Syntax: Alias relative-path-expression absolute-path-expression
Category: File Default: none
Type: Multi-valued Used by: linkscan.cfg
 
The Alias command maps a URL to a physical file system path. This is required when, for example, a specific directory does not reside under the normal webserver root directory. It is important to ensure that the forward slash symbols are balanced exactly as shown in the example.
Alias cgi-bin/ /usr/www/cgi-bin/
 
Auth [1] Syntax: Auth server-name "realm-name" username password
Category: CustomScan Default: none
Type: Multi-valued Used by: linkscan.cfg
 
Configures LinkScan to use HTTP Basic Authentication. Note that server-name must be specified as a hostname and not as a URL. The realm-name must be specified and quoted. However, it may be empty, in which case LinkScan will use the supplied username/password for any realm-name on server-name.
Auth www.example.com "" guestuser xxxxxx
 
Autoencspace [1] Syntax: Autoencspace = boolean
Category: CustomScan Default: Autoencspace = 0
Type: Single-valued Used by: linkscan.cfg
 
When Autoencspace = 1 LinkScan will automatically encode any unencoded space characters in a URL as "%20" thereby mirroring the behavior of Microsoft Internet Explorer. We do not recommend the use of this option (since it masks real errors in the HTML documents) but it has been provided in response to user requests.
 
Autohttp [1] Syntax: Autohttp = boolean
Category: File Default: Autohttp = 0
Type: Single-valued Used by: linkscan.cfg
 
When Autohttp = 1 LinkScan will automatically attempt HTTP access on any link that cannot be found/validated when using File System Scanning.
 
Casesensitive Syntax: Casesensitive = boolean
Category: Basic Default: Casesensitive = 1
Type: Single-valued Used by: linkscan.cfg
 
When Casesensitive = 1 LinkScan assumes that all pathnames are case-sensitive (normally appropriate when scanning Unix-based servers). When Casesensitive = 0 LinkScan forces all pathnames to lower case (normally appropriate when scanning Windows-based servers).
 
Cgibinurl [1] Syntax: Cgibinurl = absolute-url
Category: System Default: Cgibinurl = Automatically set during installation
Type: Single-valued Used by: linkscan.sys
 
Sets the URL to the directory in which the LinkScan CGI scripts reside. Required in order that the LinkScan CGI scripts can link to each other.
 
Checkmailto [1] Syntax: Checkmailto = boolean
Category: External Default: Checkmailto = 0
Type: Single-valued Used by: linkscan.cfg
 
When Checkmailto = 1 enable active checking of mailto: links. Several other items must be configured when using this feature. See references.
 
Checkorphans [1] Syntax: Checkorphans relative-path
Category: File Default: none
Type: Multi-valued Used by: linkscan.cfg
 
Forces LinkScan to scan the directory specified by relative-path for Orphaned Files.
 
Closeatag Syntax: Closeatag = boolean
Category: CustomScan Default: Closeatag = 1
Type: Single-valued Used by: linkscan.cfg
 
When Closeatag = 0 do not generate errors for <A HREF=...> tags without a corresponding </A> tag.
 
Collectmeta Syntax: Collectmeta = boolean
Category: CustomScan Default: Collectmeta = 0
Type: Single-valued Used by: linkscan.cfg
 
When Collectmeta = 1 save all document <META> tags to the file: LinkScan/project_dir/data/linkscan.met
 
Cookie [1] Syntax: Cookie server-name cookie-name=cookie-value
Category: CustomScan Default: none
Type: Multi-valued Used by: linkscan.cfg
 
Pre-load LinkScan with Cookies. Note that server-name must be specified as a hostname and not as a URL. Do not enter spaces around the "=" sign. Prefix the domain name with a period to create a wildcard, as shown in the example.
Cookie .example.com USERID=1234
 
Defaultowner [1] Syntax: Defaultowner = owner-name
Category: Owner Default: none
Type: Single-valued Used by: linkscan.cfg
 
Establishes a default Owner.
 
Defaultpages [1] Syntax: Defaultpages = filename [, filename]...
Category: File Default: Defaultpages = index.html, index.shtml, index.htm, home.html, home.shtml, home.htm
Type: Single-valued Used by: linkscan.cfg
 
When configured to use File System Scanning and LinkScan encounters a link to a directory without a specific filename, it search for documents with these filenames (in the order specified).
 
Dispatchsort [1] Syntax: Dispatchsort = integer
Category: Dispatch Default: Dispatchsort = 1
Type: Single-valued Used by: linkscan.cfg
 
Defines the sort sequence for LinkScan Dispatch Reports.
1 = By referer; 2 = By status code; 3 = By links alphabetically
 
Displaylang Syntax: Displaylang = boolean
Category: CustomReport Default: Displaylang = 1
Type: Single-valued Used by: linkscan.cfg
 
Enable when scanning Japanese language websites. The following META tag will be included in each of the LinkScan reports:
<meta http-equiv="Content-Type" content="text/html; charset=EUC-JP">
See also Jisencode.
 
Docsurl [1] Syntax: Docsurl = absolute-url
Category: System Default: Docsurl = Automatically set during installation
Type: Single-valued Used by: linkscan.sys
 
Sets the URL to the directory in which the LinkScan documentation resides. Required in order that the LinkScan CGI scripts can link to the documentation and associated images.
 
Editdoc [1] Syntax: Editdoc = URL
Category: CustomReport Default: none
Type: Single-valued Used by: linkscan.cfg
 
Adds a linking URL to the LinkScan Reports. These may include the optional tokens !URL, !CAP or !STAT. The tokens are replaced with %encoded strings containing:
The URL of the target resource
The Title or Caption (as appropriate) associated with the target resource
The Status Code of the target resource.
Editdoc = http://foo/bar.cgi?Url=!URL&Cap=!CAP&Status=!STAT
 
Editlink [1] Syntax: Editlink = URL
Category: CustomReport Default: none
Type: Single-valued Used by: linkscan.cfg
 
Adds a linking URL to the LinkScan Reports. These may include the optional tokens !URL, !CAP or !STAT. The tokens are replaced with %encoded strings containing:
The URL of the target resource
The Title or Caption (as appropriate) associated with the target resource
The Status Code of the target resource.
Editlink = http://foo/bar.cgi?Url=!URL&Cap=!CAP&Status=!STAT
 
Errorbody [1] Syntax: Errorbody = expression
Category: CustomScan Default: none
Type: Single-valued Used by: linkscan.cfg
 
Any document with a body that matches expression is marked with a 3001 Errorbody Match status code regardless of the actual server status. Applies to Internal Documents only.
Errorbody (?i)runtime\s+error
 
Errorbodyext [1] Syntax: Errorbodyext = expression
Category: CustomScan Default: none
Type: Single-valued Used by: linkscan.cfg
 
Any document with a body that matches expression is marked with a 3001 Errorbody Match status code regardless of the actual server status. Applies to External Links only.
Note: Using this option will enable the Fetchext option. There may be significant performance penalties since LinkScan must retreive the document bodies when validating external links.
Errorbodyext (?i)]+refresh.*?>
 
Errordoc [1] Syntax: Errordoc = expression
Category: CustomScan Default: none
Type: Single-valued Used by: linkscan.cfg
 
Any URL that is redirected to a location that matches expression is marked with a 3000 Errordoc Match status code regardless of the actual server status.
Errordoc special/notfound\.html
 
Exclude Syntax: Exclude relative-path-expression
Category: Scope Default: none
Type: Multi-valued Used by: linkscan.cfg
 
Links matching relative-path-expression are completely ignored by LinkScan.
Exclude archives/
 
Excludecookie Syntax: Excludecookie expression
Category: Scope Default: none
Type: Multi-valued Used by: linkscan.cfg
 
Cookies matching expression are completely ignored by LinkScan. Expression must either match the cookie name OR the following semi-colon delimited string of cookie attributes: "domain;port;path;cookiename;cookievalue;expires;setbypage"
Excludecookie [^;]*;[^;]*;[^;]*;[^;]*;SESSIONID
 
Excludehidden Syntax: Excludehidden = boolean
Category: Scope Default: Excludehidden = 0
Type: Single-valued Used by: linkscan.cfg
 
Exclude links hidden by a null (empty) anchor.
 
Execute Syntax: Execute relative-path-expression
Category: CustomScan Default: Execute cgi-bin/, Execute (?i).*\.(cgi|asp)$
Type: Multi-valued Used by: linkscan.cfg
 
Links matching relative-path-expression are accessed using Network (HTTP) Scanning.
 
Expandssi [1] Syntax: Expandssi = boolean
Category: File Default: Expandssi = 1
Type: Single-valued Used by: linkscan.cfg
 
When Expandssi = 1 and File System Scanning is enabled LinkScan will process Server Side Includes (SSIs) constructed using the Apache Include Virtual conventions.
 
Extraheader [1] Syntax: Extraheader http-header
Category: CustomScan Default: Extraheader User-Agent: LinkScan Enterprise/12.1 Windows
Type: Multi-valued Used by: linkscan.cfg
 
Configures additional HTTP headers that LinkScan will send with every request. Mainly used to emulate different browser types.
Extraheader User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)
 
Extrahit [1] Syntax: Extrahit relative-path
Category: CustomScan Default: none
Type: Multi-valued Used by: linkscan.cfg
 
Instructs LinkScan to access the specified URL at the start of a scan. May be used to submit forms with specified data values. Note: with Extrahome, LinkScan will access the specified page *before* the start of a scan *and* a second time during the scan. With Extrahit, LinkScan will access the specified page only once, during a scan. See example and references.
Extrahome cgi-bin/postscript.cgi??Name=Malcolm%20Hoar&Password=secret
 
Extrahome [1] Syntax: Extrahome relative-path
Category: CustomScan Default: none
Type: Multi-valued Used by: linkscan.cfg
 
Instructs LinkScan to access the specified URL at the start of a scan. May be used to submit forms with specified data values. Note: with Extrahome, LinkScan will access the specified page *before* the start of a scan *and* a second time during the scan. With Extrahit, LinkScan will access the specified page only once, during a scan. See example and references.
Extrahome cgi-bin/postscript.cgi??Name=Malcolm%20Hoar&Password=secret
 
FTPPass [1] Syntax: FTPPass = password
Category: External Default: FTPPass = me@example.com
Type: Single-valued Used by: linkscan.sys
 
Sets the password to use when validating links to FTP sites.
 
FTPUser [1] Syntax: FTPUser = username
Category: External Default: FTPUser = anonymous
Type: Single-valued Used by: linkscan.sys
 
Sets the username to use when validating links to FTP sites.
 
Fetchext [1] Syntax: Fetchext = boolean
Category: External Default: Fetchext = 0
Type: Single-valued Used by: linkscan.cfg
 
Instructs LinkScan to fetch the document bodies when checking External links. Normally used in conjunction with the LinkScan Profiler.
 
Flashfiles [1] Syntax: Flashfiles = file-extension [, file-extension]...
Category: File Default: Flashfiles = swf
Type: Single-valued Used by: linkscan.cfg
 
When using File System Scanning, any file with this extension is interpreted using the Flash/Shockwave format. When using Network (HTTP) Scanning, a non-blank entry causes LinkScan to interpret any link with a Content-Type: application/x-shockwave-flash header using the Flash/Shockwave format.
 
Followext [1] Syntax: Followext = boolean
Category: External Default: Followext = 1
Type: Single-valued Used by: linkscan.cfg
 
When Followext = 1 LinkScan follows redirections when scanning External links.
 
Followframes Syntax: Followframes = boolean
Category: CustomScan Default: Followframes = 0
Type: Single-valued Used by: linkscan.cfg
 
When Followframes = 1 LinkScan will always follow links within framesets (regardless of any Nofollow commands).
 
Gsmchangefreq [1] Syntax: Gsmchangefreq = string
Category: CustomScan Default: Gsmchangefreq = weekly
Type: Single-valued Used by: linkscan.cfg
 
Update frequency for XML Google Sitemap.
 
Gsmlevels [1] Syntax: Gsmlevels = integer
Category: CustomScan Default: Gsmlevels = 0
Type: Single-valued Used by: linkscan.cfg
 
Maximum levels to include in XML Google Sitemap.
 
Homedir [1] Syntax: Homedir = absolute-path
Category: File Default: none
Type: Single-valued Used by: linkscan.cfg
 
Sets the absolute pathname to the directory/folder containing the root of the target website. Only applicable when File System Scanning and Orphan File detection are enabled. Note that Homedir must point at the root of the site and not a sub-directory thereof.
Homedir = C:/www/
 
Homefile [1] Syntax: Homefile = relative-url
Category: Basic Default: none
Type: Single-valued Used by: linkscan.cfg
 
Sets the initial document for the start of a scan (relative to Homeurl and Homedir).
Homefile = index.html
 
Homeurl [1] Syntax: Homeurl = absolute-url
Category: Basic Default: none
Type: Single-valued Used by: linkscan.cfg
 
Sets the base-URL for the start of a scan. Do not append additional directory or file names to the URL (use Homefile instead). Homedir must point at the root of the target website.
Homeurl = http://www.example.com/
 
Hostalias [1] Syntax: Hostalias from-absolute-url to-absolute-url
Category: CustomScan Default: none
Type: Multi-valued Used by: linkscan.cfg
 
Establishes synonyms for the same host.
Hostalias http://www2.example.com/ http://www.example.com/
 
Hostname Syntax: Hostname = hostname
Category: External Default: none
Type: Single-valued Used by: linkscan.cfg
 
Sets the Hostname to use for HELO messages. Only used when active mailto: checking is enabled.
 
Htmlfiles [1] Syntax: Htmlfiles = file-extension [, file-extension]...
Category: File Default: Htmlfiles = html, shtml, htm
Type: Single-valued Used by: linkscan.cfg
 
When using File System Scanning, any file with this extension is interpreted as an HTML document. When using Network (HTTP) Scanning, any link with a Content-Type: text/html header is interpreted as indicating HTML format.
 
Http Syntax: Http = boolean
Category: Basic Default: Http = 1
Type: Single-valued Used by: linkscan.cfg
 
When Http = 1 LinkScan uses Network (HTTP) Scanning for the entire target website. Note that this will disable Orphaned File checking. To enable Orphan checking, you must set Http = 0 and configure Homedir. Use Execute .* to force HTTP Scanning with Orphan File checking.
 
Httpauth Syntax: Httpauth = env-var
Category: Security Default: Httpauth = REMOTE_USER
Type: Single-valued Used by: linkscan.sys
 
Sets the system Environment variable name to use in conjunction with the LinkScan access controls and HTTP user authentication. Not required unless you enable LinkScan Access Controls.
 
Httpsproxyport [1] Syntax: Httpsproxyport = integer
Category: System Default: Httpsproxyport = 80
Type: Single-valued Used by: linkscan.sys
 
Sets the Port Number associated with Httpsproxyserver.
 
Httpsproxyserver [1] Syntax: Httpsproxyserver = hostname
Category: System Default: none
Type: Single-valued Used by: linkscan.sys
 
Sets the Hostname or IP address of your HTTPS Proxy Server (if any). Do not enter a URL address. Not required on Windows systems since LinkScan includes native support for the Secure Sockets Layer (SSL) and https:// addresses.
 
Imgtags Syntax: Imgtags = [AHW]
Category: CustomScan Default: none
Type: Single-valued Used by: linkscan.cfg
 
Enables additional checking of <IMG SRC=...> tags for Alt, Height and Width attributes.
 
Import Syntax: Import = 0 | 1 | 2 | 3
Category: Import Default: none
Type: Single-valued Used by: linkscan.cfg
 
Instructs LinkScan to use Import Scanning.
Import = 1; Import ASCII list of links
Import = 2; Import ASCII list of documents
Import = 3; Import ASCII list of documents (with de-caching)
 
Importfile Syntax: Importfile = absolute-path
Category: Import Default: none
Type: Single-valued Used by: linkscan.cfg
 
Sets the absolute pathname to the ASCII file to be processed when Import Scanning is selected.
 
Indexoptions [1] Syntax: Indexoptions = boolean
Category: File Default: Indexoptions = 0
Type: Single-valued Used by: linkscan.cfg
 
When Indexoptions = 1 and File System Scanning is enabled, LinkScan will create directory listing when no Defaultpages (e.g. index.html) are present.
 
Insertlink [1] Syntax: Insertlink Insertlink document-match new-document [-|+|*]
Category: CustomScan Default: none
Type: Multi-valued Used by: linkscan.cfg
 
May be used to insert synthetic links into a scanned document.
 
Jisencode Syntax: Jisencode = boolean
Category: CustomReport Default: Jisencode = 0
Type: Single-valued Used by: linkscan.cfg
 
Enable when scanning Japanese language websites. Pages containing JIS, Shift-JIS and/or EUC-JP encoded Japanese characters will be normalized to EUC-JP. See also Displaylang.
 
Key [1] Syntax: Key = special-key
Category: System Default: none
Type: Single-valued Used by: linkscan.sys
 
Sets the LinkScan License Key -- supplied by Elsop.
 
LicenseNumber [1] Syntax: LicenseNumber = integer (10-digit)
Category: System Default: none
Type: Single-valued Used by: linkscan.sys
 
Sets the LinkScan License Number -- supplied by Elsop.
 
Licensee [1] Syntax: Licensee = name
Category: System Default: none
Type: Single-valued Used by: linkscan.sys
 
Name of your Company or Department.
 
Linespeed [1] Syntax: Linespeed = integer
Category: System Default: Linespeed = 1
Type: Single-valued Used by: linkscan.sys
 
Sets a default linespeed for the calculation of document load times on the Summary/Detail Report.
 
Linkscancookie Syntax: Linkscancookie = boolean
Category: Security Default: Linkscancookie = 0
Type: Single-valued Used by: linkscan.sys
 
Define the type of Cookie used by the LinkScan Reporting System (i.e. linkscan.cgi) for storing user preferences. 0=Permanent cookie; 1=Session cookie; 2=No cookie
 
Linkscandir [1] Syntax: Linkscandir = absolute-path
Category: System Default: Linkscandir = Automatically set during installation
Type: Single-valued Used by: linkscan.sys
 
Sets the absolute pathname to the directory in which LinkScan is installed.
 
Linkscanurl [1] Syntax: Linkscanurl = absolute-url
Category: System Default: Linkscanurl = Automatically set during installation
Type: Single-valued Used by: linkscan.sys
 
Sets the URL to the directory in which LinkScan is installed.
 
Longurls Syntax: Longurls = boolean
Category: System Default: Longurls = 0
Type: Single-valued Used by: linkscan.sys
 
Force LinkScan CGI's to generate long URL's with the Pref parameter.
 
Mailalias [1] Syntax: Mailalias expression address [, address]...
Category: Dispatch Default: none
Type: Multi-valued Used by: linkscan.cfg
 
Sets associations between Owners matching expression and a comma separated list of e-mail addresses.
Mailalias Products sales@example.com, product-manager@example.com
 
Mailfrom [1] Syntax: Mailfrom = username
Category: External Default: none
Type: Single-valued Used by: linkscan.sys
 
Sets the address to use for FROM messages. Only used when active mailto: checking is enabled.
 
Mailhost [1] Syntax: Mailhost = hostname
Category: Dispatch Default: none
Type: Single-valued Used by: linkscan.cfg
 
Sets the default hostname for LinkScan Dispatch reports sent via e-mail. By default, all reports are mailed to Owner@Mailhost. See Mailalias if you need more control.
 
Mailnoerr [1] Syntax: Mailnoerr = boolean
Category: Dispatch Default: Mailnoerr = 0
Type: Single-valued Used by: linkscan.cfg
 
When Mailnoerr = 1 LinkScan Dispatch will e-mail reports to their respective Owners even when no broken links were detected.
 
Mailto [1] Syntax: Mailto = integer
Category: Security Default: Mailto = 0
Type: Single-valued Used by: linkscan.sys
 
Enable Mailto forms on the LinkScan reports. Setting Mailto=2 will add a comment box to the form. The Mailto option requires that the LinkScan to Email Interface be configured.
 
Mapdefaulttitle [1] Syntax: Mapdefaulttitle [ string ] [ !PATH | !FILE ] [ string ]
Category: SiteMap Default: none
Type: Multi-valued Used by: linkscan.cfg
 
Defines a default Title for SiteMap/TapMap; used when no actually <title> tags were seen. The special tokens !PATH and !FILE are replaced with the actual pathnames or filenames, respectively.
Mapdefaulttitle = No title tags in !PATH
 
Mapext [1] Syntax: Mapext boolean
Category: SiteMap Default: none
Type: Multi-valued Used by: linkscan.cfg
 
Include External Links on the SiteMap.
Mapext = 1
 
Mapfiles [1] Syntax: Mapfiles = file-extension [, file-extension]...
Category: File Default: Mapfiles = map
Type: Single-valued Used by: linkscan.cfg
 
When using File System Scanning, any file with this extension is interpreted as a server-side image map file.
 
Maphide [1] Syntax: Maphide relative-path-expression
Category: SiteMap Default: none
Type: Multi-valued Used by: linkscan.cfg
 
Documents matching relative-path-expression are hidden from the SiteMap and TapMap.
Maphide .*messages/
 
Mapinclude [1] Syntax: Mapinclude relative-path-expression
Category: SiteMap Default: Mapinclude HTML Documents
Type: Multi-valued Used by: linkscan.cfg
 
Documents matching relative-path-expression are included in the SiteMap and TapMap. By default, only HTML documents are included; links to images and other file types are hidden. You may include all files by using, for example:
Mapinclude .*
 
Mapmove [1] Syntax: Mapmove relative-document-path, new-parent-relative-path, position [, new-title]
Category: SiteMap Default: none
Type: Multi-valued Used by: linkscan.cfg
 
Used to customize the SiteMap and TapMap by forcing specific documents to assigned to different positions in the hierarchy.
Mapmove child.html, parent.html, 1
 
Maptitle [1] Syntax: Maptitle relative-document-path, string
Category: SiteMap Default: none
Type: Multi-valued Used by: linkscan.cfg
 
Replace the actual title of document relative-document-path with string.
 
Mask Syntax: Mask = relative-path-expression
Category: Scope Default: none
Type: Single-valued Used by: linkscan.cfg
 
Directly equivalent to Onlyinclude except that Mask is single-valued.
 
Masterhist Syntax: Masterhist = boolean
Category: External Default: Masterhist = 1
Type: Single-valued Used by: linkscan.sys
 
When Masterhist = 1 LinkScan maintains the status of external links in a global history file shared between all Projects.
 
Masterport Syntax: Masterport = port#
Category: System Default: Masterport = 8010
Type: Single-valued Used by: linkscan.sys,linkscan.cfg
 
Defines a TCP/IP Port Number on your computer. LinkScan uses this Port and the following "N" ports for its own interprocess communication. "N" is defined by the maximum of Slave processes used during the scan. You will not normally need to change this unless the default Port is being used by another application.
 
Maxbadhours [1] Syntax: Maxbadhours = integer
Category: External Default: Maxbadhours = 0
Type: Single-valued Used by: linkscan.sys
 
Do not check Bad External links more frequently than once every integer hours.
 
Maxcgi [1] Syntax: Maxcgi = integer
Category: Scope Default: Maxcgi = 100
Type: Single-valued Used by: linkscan.cfg
 
Controls the maximum number of times any given base URL with be tested with different query strings. Avoid the potential for excessive and potentially infinite iteration over many query strings. See also the Taglimit option provides even finer control.
 
Maxclicks [1] Syntax: Maxclicks = integer
Category: Scope Default: Maxclicks = 0
Type: Single-valued Used by: linkscan.cfg
 
Limit the scope of a scan to "N" click levels deep.
 
Maxdirlevels [1] Syntax: Maxdirlevels = integer
Category: File Default: Maxdirlevels = 10
Type: Single-valued Used by: linkscan.cfg
 
Do not scan the File System more than integer directory levels deep when scanning for Orphaned Files. Avoids recursion issues with Symlinks on Unix systems.
 
Maxdns [1] Syntax: Maxdns = integer
Category: External Default: Maxdns = 3
Type: Single-valued Used by: linkscan.cfg
 
Defines the maximum number of HTTP redirections to be followed when fetching a given URL (detect/protect potential loops).
 
Maxdocbytes [1] Syntax: Maxdocbytes = integer
Category: CustomScan Default: Maxdocbytes = none
Type: Single-valued Used by: linkscan.cfg
 
Defines the maximum size of a document body that will be fetched when scanning a remote server. Typically used to prevent excessive delays while LinkScan fetches very large PDF documents.
 
Maxdocs Syntax: Maxdocs = integer
Category: Scope Default: Maxdocs = 0
Type: Single-valued Used by: linkscan.cfg
 
Forces LinkScan to check (completely) the first Maxdocs pages only. Useful for quickly checking the first "N" pages of a website.
 
Maxftp [1] Syntax: Maxftp = integer
Category: External Default: Maxftp = 25
Type: Single-valued Used by: linkscan.cfg
 
Do not test more than integer links to any one FTP server. This prevents excessive/inappropriate loads on the remote server. The FTP protocol carries significantly more overhead than HTTP.
 
Maxgoodhours [1] Syntax: Maxgoodhours = integer
Category: External Default: Maxgoodhours = 4
Type: Single-valued Used by: linkscan.sys
 
Do not check Good External links more frequently than once every integer hours.
 
Maxhist Syntax: Maxhist = integer
Category: External Default: Maxhist = 10
Type: Single-valued Used by: linkscan.sys
 
For External links, store the last integer results in the History file.
 
Maxlevels Syntax: Maxlevels = integer
Category: Scope Default: Maxlevels = 0
Type: Single-valued Used by: linkscan.cfg
 
Limit the scope of a scan to "N" directory levels.
 
Maxredir Syntax: Maxredir = integer
Category: CustomScan Default: Maxredir = 5
Type: Single-valued Used by: linkscan.cfg
 
Defines the maximum number of HTTP redirections to be followed when fetching a given URL (detect/protect potential loops).
 
Maxservertries [1] Syntax: Maxservertries = integer
Category: External Default: Maxservertries = 25
Type: Single-valued Used by: linkscan.cfg
 
When validating External links, abort testing of all links to a host that has already recorded more than integer errors. This prevents LinkScan from attempting to check many links to a host that may be temporarily unavailable (and hence multiple timeout delays).
 
Maxsev [1] Syntax: Maxsev = severity
Category: Dispatch Default: Maxsev = 3
Type: Single-valued Used by: linkscan.cfg
 
Defines the maximum severity level to be included in the LinkScan Dispatch Reports.
 
Mimetypes Syntax: Mimetypes Mimetypes mime-type [D|H|J|S|T]
Category: CustomScan Default: none
Type: Multi-valued Used by: linkscan.cfg
 
Enables the scanning (via HTTP) of additional document types based on their MIME (Content-type) header. Analogous to the File System Scanning equivalents: Htmlfiles, Mapfiles, Pdffiles, Flashfiles and Textfiles. Documents are interpreted as follows: D=PDF, H=HTML, J=JavaScript, S=Shockwave/Flash, T=Text.
Mimetypes application/x-javascript J
 
Mirrorurl [1] Syntax: Mirrorurl = absolute-url
Category: CustomScan Default: none
Type: Single-valued Used by: linkscan.cfg
 
Instructs LinkScan to send all HTTP requests to the Mirrorurl address even though, logically, it behaves as if it is scanning a different host.
Mirrorurl = http://staging.example.com/
 
Msiis [1] Syntax: Msiis = boolean
Category: System Default: Msiis = 0
Type: Single-valued Used by: linkscan.sys
 
Set Msiis = 1 when you are using LinkScan in conjunction with a Microsoft IIS/PWS installation running on your computer. This enables a workaround to an IIS bug.
 
Nameservers [1] Syntax: Nameservers = ipaddress [, ipaddress]...
Category: External Default: none
Type: Single-valued Used by: linkscan.sys
 
Sets default name servers. Only used when active mailto: checking is enabled. See references.
 
Noexternal Syntax: Noexternal = boolean
Category: External Default: Noexternal = 0
Type: Single-valued Used by: linkscan.cfg
 
When Noexternal = 1 disable validation of all External links.
 
Nofollow [1] Syntax: Nofollow relative-path-expression
Category: Scope Default: none
Type: Multi-valued Used by: linkscan.cfg
 
Do not analyze documents matching relative-path-expression. LinkScan will validate links to pages matching this pattern but it will ignore all links flowing out of pages matching this pattern.
 
Noforms [1] Syntax: Noforms = boolean
Category: CustomScan Default: Noforms = 0
Type: Single-valued Used by: linkscan.cfg
 
When Noforms = 1 do not validate links found within <FORM ACTION=...> tags.
 
Noindex [1] Syntax: Noindex = boolean
Category: CustomScan Default: Noindex = 0
Type: Single-valued Used by: linkscan.cfg
 
Ignore links contained within <NOINDEX></NOINDEX> code blocks unless they are unique (i.e. new and not already seen during the current scan.
 
Noorphan [1] Syntax: Noorphan = boolean
Category: File Default: none
Type: Single-valued Used by: linkscan.cfg
 
Do not scan for Orphaned Files (equiv. -noorphans).
 
Noorphans [1] Syntax: Noorphans relative-path-expression
Category: File Default: none
Type: Multi-valued Used by: linkscan.cfg
 
Do not scan directories matching relative-path-expression for Orphaned Files.
 
Noprojectlist Syntax: Noprojectlist = boolean
Category: Security Default: Noprojectlist = 0
Type: Single-valued Used by: linkscan.sys
 
Noprojectlist = Prompt for Project versus displaying drop-down list
 
Noproxy [1] Syntax: Noproxy = hostname-expression [, hostname-expression]...
Category: System Default: none
Type: Single-valued Used by: linkscan.sys
 
Bypass any configured Proxy Server and use direct Network (HTTP) access to any hosts matching hostname-expression.
 
Nostaticmenu Syntax: Nostaticmenu = boolean
Category: Security Default: Nostaticmenu = 0
Type: Single-valued Used by: linkscan.sys
 
When Nostaticmenu = 1 disable the LinkScan Toolbar on command-line generated reports.
 
Notapmapoptions Syntax: Notapmapoptions = boolean
Category: Security Default: Notapmapoptions = 0
Type: Single-valued Used by: linkscan.sys
 
When Notapmapoptions = 1 disable the Options Menu on LinkScan/TapMap.
 
Onlyfollow Syntax: Onlyfollow relative-path-expression
Category: Scope Default: none
Type: Multi-valued Used by: linkscan.cfg
 
Only scan areas of the website matching relative-path-expression. Validate but do not follow all other Internal links.
 
Onlyinclude Syntax: Onlyinclude relative-path-expression
Category: Scope Default: none
Type: Multi-valued Used by: linkscan.cfg
 
Only scan areas of the website matching relative-path-expression. Completely ignore all other Internal links.
 
Onlyorphans [1] Syntax: Onlyorphans relative-path-expression
Category: File Default: none
Type: Multi-valued Used by: linkscan.cfg
 
Only scan directories matching relative-path-expression for Orphaned Files.
 
Organization Syntax: Organization = string
Category: Basic Default: none
Type: Single-valued Used by: linkscan.cfg
 
Name of the organization/department associated with this Project (will appear on the subsequent reports).
 
Orphanfile [1] Syntax: Orphanfile = absolute-path
Category: File Default: none
Type: Single-valued Used by: linkscan.cfg
 
Specifies the absolute pathname to a file containing data regarding orphaned files, created by the lsfind utility. See references.
 
Owner [1] Syntax: Owner relative-path-expression owner-name
Category: Owner Default: Owner *1
Type: Multi-valued Used by: linkscan.cfg
 
Set document ownership. Documents with pathnames matching relative-path-expression are assigned to owner-name.
Owner mydirectory/ ownedbyme
 
Owneralias [1] Syntax: Owneralias expression owner-name
Category: Owner Default: none
Type: Multi-valued Used by: linkscan.cfg
 
Used to manipulate Ownernames. Normally used in conjunction with Ownertags. See references.
 
Ownerq [1] Syntax: Ownerq relative-path-expression owner-name
Category: Owner Default: Ownerq *1
Type: Multi-valued Used by: linkscan.cfg
 
Set document ownership. Documents with pathnames matching relative-path-expression are assigned to owner-name. Unlike the Owner command which operates on the pathname portion of the URL, Ownerq operates on the full URL including any query string.
Ownerq somescript\?.*SomeOwnerParam=([^&]+) $1
 
Ownertags [1] Syntax: Ownertags = expression
Category: Owner Default: none
Type: Single-valued Used by: linkscan.cfg
 
Used to assign document Ownership based on META tags. See references.
 
Pdffiles [1] Syntax: Pdffiles = file-extension [, file-extension]...
Category: File Default: none
Type: Single-valued Used by: linkscan.cfg
 
When using File System Scanning, any file with this extension is interpreted using the PDF Document format. When using Network (HTTP) Scanning, a non-blank entry causes LinkScan to interpret any link with a Content-Type: application/pdf header using the PDF Document format.
 
Perlpath [1] Syntax: Perlpath = absolute-path
Category: System Default: Perlpath = Automatically set during installation
Type: Single-valued Used by: linkscan.sys
 
Absolute pathname to the Perl executable on your computer.
 
Probe [1] Syntax: Probe = integer
Category: CustomScan Default: Probe = 4
Type: Single-valued Used by: linkscan.cfg
 
Enable LinkScan diagnostic trace -- written to .../project-name/data/linkscan.red. The following bit-wise switches may be logically OR'ed: 1 = Trace full HTTP Headers 2 = Trace full HTTP Headers and (HTML) Document Bodies 4 = Trace all Cookies, Auth Requests and Sessionmatch operations 8 = Reserved for LinkScan Technical Support 128 = Disable all buffering on linkscan.red
 
Profiler [1] Syntax: Profiler = integer
Category: CustomScan Default: Profiler = 0
Type: Single-valued Used by: linkscan.cfg
 
Enables the LinkScan Profiler.
Profiler = 1 # Profile internal links
 
Profilerlog [1] Syntax: Profilerlog = integer
Category: CustomScan Default: Profilerlog = 0
Type: Single-valued Used by: linkscan.cfg
 
Enables a detailed trace of the LinkScan Profiler results. The log is written to: .../LinkScan/Projectname/data/linkscan.red
 
Profilermax [1] Syntax: Profilermax = integer
Category: CustomScan Default: Profilermax = 200
Type: Single-valued Used by: linkscan.cfg
 
Sets the trigger level threshold for the LinkScan Profiler.
 
Projectdesc Syntax: Projectdesc = string
Category: Basic Default: none
Type: Single-valued Used by: linkscan.cfg
 
A description for this Project (will appear on the subsequent reports).
 
Proxyauth [1] Syntax: Proxyauth = "username:password"
Category: System Default: none
Type: Single-valued Used by: linkscan.sys
 
Sets the username and password to use in conjunction with a Proxy Server that requires authentication (if any).
Proxyauth = "mylogin:mysecretpass"
 
Proxymatch [1] Syntax: Proxymatch Proxymatch [http|https|*] [host:port|direct] ["user:pass"] host1, host2...
Category: System Default: none
Type: Multi-valued Used by: linkscan.sys
 
The Proxymatch command may be used to configure complex proxy rules that are not handled by the (simpler) Proxyserver/Proxyport commands. Multiple Proxymatch commands are evaluated in the order specified with the last match assuming precedence.
 
Proxyport [1] Syntax: Proxyport = integer
Category: System Default: Proxyport = 80
Type: Single-valued Used by: linkscan.sys
 
Sets the Port number to use in conjunction with your Proxy Server (if any).
 
Proxyserver [1] Syntax: Proxyserver = hostname
Category: System Default: none
Type: Single-valued Used by: linkscan.sys
 
Sets the Hostname or IP address of your HTTP Proxy Server (if any). Do not enter a URL address.
 
Redirect Syntax: Redirect relative-path-expression absolute-url-expression
Category: File Default: none
Type: Multi-valued Used by: linkscan.cfg
 
Used to simulate a webserver configured redirection when using File System Scanning.
Redirect documents/oldpage.html http://www.example.com/html/newpage.html
 
Relaxanchor Syntax: Relaxanchor = boolean
Category: CustomScan Default: Relaxanchor = 0
Type: Single-valued Used by: linkscan.cfg
 
Enable relaxed anchor checking. Anchor checks are made case insensitive. Superflous '#' characters at the beginning of the NAME attribute are ignored.
 
Reportsdir [1] Syntax: Reportsdir = absolute-path
Category: CustomReport Default: Reportsdir = Automatically set during installation
Type: Single-valued Used by: linkscan.sys
 
Sets the path to the directory in which the LinkScan reports are created. Only used when generating reports from the command-line.
 
Retryext [1] Syntax: Retryext = boolean
Category: External Default: Retryext = 0
Type: Single-valued Used by: linkscan.cfg
 
When Retryext=1, LinkScan will track all External links that appear to fail due to network related errors (e.g. DNS, connect and timeout errors). These links will be retested at the end of the scan. This tends to reduce the number of transient errors reported but the scan may require a little more time to complete.
 
Scriptdisable Syntax: Scriptdisable = boolean
Category: JavaScript Default: Scriptdisable = 0
Type: Single-valued Used by: linkscan.cfg
 
Disable checking of links embedded within JavaScript. Equivalent to: Scriptexclude .*
 
Scriptexclude [1] Syntax: Scriptexclude expression
Category: JavaScript Default: none
Type: Multi-valued Used by: linkscan.cfg
 
JavaScript code blocks matching expression are discarded and not scanned for links.
 
Scriptmatch [1] Syntax: Scriptmatch expression
Category: JavaScript Default: Scriptmatch (\w+://\S+|\S+/$|\S+\?\S+|\S+\.([a-z]{2,3}|[js]?html?|Z)$)
Type: Multi-valued Used by: linkscan.cfg
 
Patterns used to control the scanning of JavaScript constructs. You should not normally need to change these from their defaults.
 
Scriptnomatch [1] Syntax: Scriptnomatch expression
Category: JavaScript Default: Scriptnomatch .*([\(\)\[\]\{\}\']|document\.\S+|\.(src|com)$)
Type: Multi-valued Used by: linkscan.cfg
 
Patterns used to control the scanning of JavaScript constructs. You should not normally need to change these from their defaults.
 
Selecturl [1] Syntax: Selecturl expression
Category: JavaScript Default: none
Type: Multi-valued Used by: linkscan.cfg
 
The contents of select tags (drop-down lists) with name attributed matching expression are processed as links versus arbitrary data.
 
Sendmailpath [1] [2] Syntax: Sendmailpath = absolute-path
Category: Dispatch Default: none
Type: Single-valued Used by: linkscan.sys
 
Sets the absolute pathname to the sendmail executable on your computer.
 
Sessionmatch [1] Syntax: Sessionmatch = expression
Category: CustomScan Default: none
Type: Single-valued Used by: linkscan.cfg
 
Used to capture, save, manipulate items such as session numbers. See references.
 
Showredirectext Syntax: Showredirectext = boolean
Category: CustomScan Default: Showredirectext = 0
Type: Single-valued Used by: linkscan.cfg
 
When checking External links, LinkScan will report any redirections and report the status of the final (redirected) link.
 
Slaves1 Syntax: Slaves1 = integer
Category: System Default: Slaves1 = 3
Type: Single-valued Used by: linkscan.sys,linkscan.cfg
 
Sets the number of simultaneous HTTP connections to be used when scanning the Internal links.
 
Slaves2 Syntax: Slaves2 = integer
Category: System Default: Slaves2 = 3
Type: Single-valued Used by: linkscan.sys,linkscan.cfg
 
Sets the number of simultaneous HTTP connections to be used when scanning the External links.
 
Slavesfast1 Syntax: Slavesfast1 = integer
Category: System Default: Slavesfast1 = 5
Type: Single-valued Used by: linkscan.sys,linkscan.cfg
 
Sets the number of simultaneous HTTP connections to be used when scanning the Internal links with the -fast option.
 
Slavesfast2 Syntax: Slavesfast2 = integer
Category: System Default: Slavesfast2 = 12
Type: Single-valued Used by: linkscan.sys,linkscan.cfg
 
Sets the number of simultaneous HTTP connections to be used when scanning the External links with the -fast option.
 
Smtphost [1] Syntax: Smtphost = hostname
Category: System Default: Smtphost = 12
Type: Single-valued Used by: linkscan.sys
 
Sets the SMTP hostname used for the distribution of emailed reports (Windows systems only).
 
Statuscode [1] Syntax: Statuscode statuscode, severity
Category: CustomReport Default: none
Type: Multi-valued Used by: linkscan.cfg
 
Modifies the severity associated with statuscode.
1=Error; 2=Possible Error; 3=Warning; 4=Advisory; 5=Good.
Statuscode = 301,3 # 301 (Moved Permanently) from Error to Warning
 
Substitute [1] Syntax: Substitute relative-path-expression expression
Category: CustomScan Default: none
Type: Multi-valued Used by: linkscan.cfg
 
Manipulate links on-the-fly. See references.
 
Substituteraw [1] Syntax: Substituteraw relative-path-expression expression
Category: CustomScan Default: none
Type: Multi-valued Used by: linkscan.cfg
 
Manipulate links on-the-fly. See references.
 
Substitutescript [1] Syntax: Substitutescript relative-path-expression expression
Category: CustomScan Default: none
Type: Multi-valued Used by: linkscan.cfg
 
Manipulate links on-the-fly. See references.
 
Taglimit [1] Syntax: Taglimit relative-path-expression integer
Category: Scope Default: none
Type: Multi-valued Used by: linkscan.cfg
 
When integer links matching relative-path-expression have been scanned, LinkScan ignores all subsequent matching links.
 
Tagonce [1] Syntax: Tagonce relative-path-expression
Category: Database Default: none
Type: Multi-valued Used by: linkscan.cfg
 
Links matching relative-path-expression are stored only once, regardless of how many references are seen. Typically used to prevent thousands of references to "blank/filler" images from adding excessive bulk to the LinkScan database.
Tagonce .*blank\.gif$
 
Textfiles [1] Syntax: Textfiles = file-extension [, file-extension]...
Category: File Default: Textfiles = txt
Type: Single-valued Used by: linkscan.cfg
 
When using File System Scanning, any file with this extension is interpreted as a plaintext document. When using Network (HTTP) Scanning, any link with a Content-Type: text/plain header is interpreted as indicating plaintext format.
 
Timeout1 Syntax: Timeout1 = integer
Category: System Default: Timeout1 = 20
Type: Single-valued Used by: linkscan.sys,linkscan.cfg
 
Timeout (in seconds) for first attempt to contact site.
 
Timeout2 Syntax: Timeout2 = integer
Category: System Default: Timeout2 = 40
Type: Single-valued Used by: linkscan.sys,linkscan.cfg
 
Timeout (in seconds) for second attempt to contact site.
 
Unsafechar [1] Syntax: Unsafechar = string
Category: Misc Default: Unsafechar = <>`"
Type: Single-valued Used by: linkscan.cfg
 
Unsafe characters. Do not escape these.
 
Usecookiefile Syntax: Usecookiefile = boolean
Category: CustomScan Default: Usecookiefile = 1
Type: Single-valued Used by: linkscan.cfg
 
If enabled, LinkScan will pre-load its cookie-jar from the file cookies.txt in the current Project directory.
 
Useloginfile Syntax: Useloginfile = boolean
Category: CustomScan Default: Useloginfile = 1
Type: Single-valued Used by: linkscan.cfg
 
If enabled, LinkScan will process any links contained within the file login.txt in the current Project directory, prior to the start of the scan.
 
Userdata Syntax: Userdata [123] match-expression expression
Category: CustomScan Default: none
Type: Multi-valued Used by: linkscan.cfg
 
Extract user specified data from document (e.g. from META tags).
Userdata 1 (?i)<meta[^>]*emp-badge-no\s*=\s*"(\d+) $1
 
Userdatafmt Syntax: Userdatafmt [123] [DHLTX] integer[LRC] caption
Category: CustomScan Default: none
Type: Multi-valued Used by: linkscan.cfg
 
Format user specified data. D=date; H=hot links; L=link; T=truncate to format; X=normal
20R=20 chars right adjusted; 40L=40 chars left adjusted
Userdatafmt 1 X 10R Badge Number
 
Userdatasub Syntax: Userdatasub [123] expression expression
Category: CustomScan Default: none
Type: Multi-valued Used by: linkscan.cfg
 
Perform RegExp manipulations on user data fields.
 
Weblintoptions [1] [2] Syntax: Weblintoptions = string
Category: System Default: Weblintoptions = -d extension-markup,extension-attribute
Type: Single-valued Used by: linkscan.sys
 
Sets command-line options that are automatically passed to weblint.
 
Weblintpath [1] [2] Syntax: Weblintpath = absolute-path
Category: System Default: Weblintpath = C:/LinkScan/weblint/weblint
Type: Single-valued Used by: linkscan.sys
 
Sets the full pathname to the weblint executable.
 
Wildtlds [1] Syntax: Wildtlds = comma separated list of TLD's
Category: System Default: Wildtlds = com,net
Type: Single-valued Used by: linkscan.sys
 
Enable checks for wildcard records in the listed Top Level Domains (TLD's). Prevents false negatives on DNS lookups caused by TLD wildcard records.
 
Winhttp Syntax: Winhttp = boolean
Category: Security Default: Winhttp = 0
Type: Single-valued Used by: linkscan.cfg
 
Use the native Microsoft Windows implementation of HTTP. Useful when "NTLM" authentication required.
 
Xmeta Syntax: Xmeta = expression
Category: CustomScan Default: Xmeta = 0
Type: Single-valued Used by: linkscan.cfg
 
Extract an extra meta tag (matching expression) from each HTML document. Only effective in conjunction with Collectmeta.
 
Xmlmatch Syntax: Xmlmatch expression
Category: CustomScan Default: Xmlmatch 0
Type: Multi-valued Used by: linkscan.cfg
 
Define patterns for link extraction from XML documents.
 
Xmlnomatch Syntax: Xmlnomatch expression
Category: CustomScan Default: Xmlnomatch 0
Type: Multi-valued Used by: linkscan.cfg
 
Define exclusion patterns for link extraction from XML documents.
 

LinkScan for Unix. Reference Manual. Section 27

LinkScan and Various Web Servers

This section discusses the use of LinkScan in conjunction with various web servers and the associated security implications:

  1. Web Server Requirements
  2. LinkScan and Apache
  3. LinkScan Access Controls
  4. LinkScan Security Considerations

27.1 Web Server Requirements

When LinkScan is used to scan a website, the results are stored in the LinkScan database. Reports are created by executing queries against that database with several CGI programs that are supplied with LinkScan.

Hence, LinkScan will normally require that web server software be installed, configured and running on the installation computer. Note that LinkScan doesn't require access to a local web server in order to scan a web site. But a local web server is usually required to view the results of that scan.

  • On Unix Systems the LinkScan installation procedure assumes the availability of an existing web server, often the Apache system. See LinkScan Installation and Startup Guide for Unix Systems.

    The remainder of this section describes the use of LinkScan with various web servers and discusses the associated security considerations.

    27.2 LinkScan and Apache

    When using LinkScan with Apache (and most other web servers) two sets of considerations must be addressed:

    Apache Requirements

    Apache normally requires that several conditions be satisfied before it will execute the LinkScan CGI programs -- or any other CGI program, for that matter:

    1. The CGI programs must be installed in a directory that is configured to permit CGI executions. This is typically a cgi-bin directory configured with an Apache ScriptAlias However, any directory may be configured to permit CGI executions with the Apache Option ExecCGI
    2. The CGI programs must have an appropriate file extension Typically you will need an Apache AddHandler cgi-script .cgi
    3. The CGI program and the directory in which it resides will require appropriate permissions. Typically, one would use 711 for the directory and 755 for the CGI file
    4. The CGI program must not be owned by nobody
    5. The CGI program must include a valid shebang header pointing at the Perl 5 executable on your computer. For example:

      #!/usr/local/bin/perl

    Unless all of the above are satisfied, Apache will refuse to execute the CGI program and you will likely receive a 500 Server Error or 403 Forbidden response.

    LinkScan Requirements

    LinkScan imposes certain additional (minimal) requirements:

    1. In the linkscan.sys configuration file, the Cgibinurl setting must be configured to point at the directory into which the LinkScan CGI programs have been installed. This is required in order that the LinkScan CGI programs can link to each other. For example: Cgibinurl = http://www.example.com/cgi-bin/
    2. In the linkscan.sys configuration file, the Docsurl setting must be configured to point at a directory containing the LinkScan documentation and associated images. For example: Docsurl = http://www.example.com/linkscan/docs/
    3. An additional requirement is imposed if (and only if) the LinkScan CGI programs are installed in a directory other than the main LinkScan directory (for example, if you moved them to a cgi-bin directory). In this case, the LinkScan CGI's will need to know where to find the rest of the LinkScan configuration files and databases. In the directory containing the LinkScan CGI programs, create a hidden file called .linkscan. This file needs to contain a single line entry with the full pathname to the main LinkScan directory. For example:

      /usr/linkscan/

      Be sure to include the leading and trailing forward-slash characters and make the file world readable (chmod 644 .linkscan).

    Although the above guidelines are presented in the specific context of the Apache web server, the basic principals are quite generic and may easily be adapted to almost web server. Note also that LinkScan provides considerable flexibility; you may install the LinkScan CGI programs in one directory, the documentation in another and the main LinkScan system including the databases in a third. Indeed, LinkScan may easily be configured to run in chroot and other similar environments.

    27.3 LinkScan Access Controls

    LinkScan includes some basic Access Controls that may be configured using the Access command in the configuration file linkscan.sys in the LinkScan directory. These access controls apply to CGI access only. It is assumed that standard operating system features will be used to control access by shell (command line) users.

    
    Access username : password : project-list : owner-list : menu-options
    

    An asterisk character may be used as a wildcard for any or all of the above parameters.

    Indeed, a default LinkScan installation will create the following entry in linkscan.sys file providing unrestricted access:

    
    Access = * : * : * : * : *
    

    Facilities are also provided to integrate with HTTP Authentication Schemes. LinkScan will check for the Environment Variable specified by the Httpauth parameter in linkscan.sys (normally REMOTE_USER). If this variable is present, it will be used to set the current Username. LinkScan will assume that the user has already authenticated with the HTTP server and it will not check the password field in linkscan.sys.

    Example: In the following example, we have configured two users with different passwords. User 'admin' has unrestricted access, but user 'webmaster' may only access the two Projects specified. Also the "Site History" and "System Configuration" Reports are not available to 'webmaster'.

    
    Access = admin : root : * : * : *
    Access = webmaster : html : www.example.com,devel.example.com : * : sxdcmoaqt
    

    27.4 LinkScan Security Considerations

    LinkScan incorporates some simple access controls on the various Reporting options and selections when run as CGI scripts. No LinkScan-specific access controls are applied when accessing LinkScan via a shell (command line) interface; it is assumed that normal operating system access controls apply. The LinkScan access controls are subject to the many and varied limitations inherent within the CGI protocol (see the WWW CGI Security FAQ and other sources for further discussion). In summary, if your HTTP server can access any specific file, then, any user with HTTP access to your server may be able to access that file. The LinkScan security features are provided as a convenience but they are no substitute for other more robust system-level security controls such as:

    We highly recommend that you configure HTTP Authentication of the LinkScan directory. Other measures you may wish to consider include:

    LinkScan for Unix. Reference Manual. Section 28

    LinkScan File Formats

    
    The following notes describe the format of many of
    the LinkScan database files stored in:
    
    ...LinkScan/ProjectName/data/
    ...LinkScan/ProjectName/hist/
    
    Each file is created in (mainly) ASCII format,
    with one Record per Line. Each Record contains
    a number of Fields, delimited with <Control-G>
    characters (Octal: 007). The Fields associated
    with each Record type are outlined below.
    
    idx.dat
    =======
    Establishes the mapping between an "idx" number and each
    unique Document/Link/URL examined by LinkScan.
    
     0 = idx
     1 = URL
     2 = Document Title
    
    
    doc.dat
    =======
    Contains the attributes and characteristics for each unique
    Document/Link/URL examined by LinkScan.
    
     0 = idx (see idx.dat)
     1 = URL
     2 = Owner Code (see linkscan.own)
     3 = Clicks
     4 = Link Type (see below)
     5 = Content-Type (MIME)
     6 = Link Status Code (see codes.txt)
     7 = Extended Status (normally blank)
     8 = Location for Redirect (see idx.dat)
     9 = Original Status Code (pre-redirect)
    10 = Content-Length (size in bytes)
    11 = Last-Modified (date/time)
    12 = Reserved
    13 = File System Pathname
    14 = Document Title
    15 = In-line bytes (page weight)
    16 = Number of Errors in this document
    17 = Number of Warnings in this document
    
    
    orp.dat
    =======
    Contains information concerning all Orphaned Files.
    
     0 = URL
     1 = File System Pathname
     2 = Symlink (0=No; 1=Followed symlink; 2=Is symlink)
     3 = File Size
     4 = Date/Time last modified
     5 = Owner Code (see linkscan.own)
     6 = Link Type (see below)
     7 = Link Status Code (see codes.txt)
    
    
    mad.dat and map.dat
    ===================
    Contain the LinkScan SiteMap Data
    mad.dat -- directory order
    map.dat -- link order
    
     0 = Level in Map
     1 = Dot-Decimal Notation
     2 = Document URL
     3 = Document Title
     4 = Owner Code (see linkscan.own)
     5 = Content-Length (size in bytes)
     6 = Last-Modified (date/time)
     7 = Total # of child documents for this node
    
    
    lnk.dat
    =======
    Contains the attributes of every link considered by LinkScan.
    
     0 = Owner Code (see linkscan.own)
     1 = From URL (see idx.dat)
     2 = Line Number (times 10)
     3 = To URL (see idx.dat)
     4 = Link Type Code (see below)
     5 = Link Status Code (see codes.txt)
     6 = Extended Status (normally blank)
     7 = cnt
     8 = Link Caption/Description
     9 = File Size (in-line images only)
    10 = Redirect location (see idx.dat)
    
    
    err.dat
    =======
    Subset of lnk.dat file, excluding records relating to all
    good links.
    
    
    linkscan.own
    ============
    Establishes the mapping between the Owner Code and Owner Name.
    
    0 = Owner Name
    1 = Owner Code
    
    
    linkscan.sum
    ============
    Summary Statistics Data (Note this file is TAB delimited)
    
     0 = Version
     1 = Date and time of scan
     2 = Total Documents
     3 = Missing Documents
     4 = Documents Containing Errors
     5 = Total Other Files
     6 = Missing Other Files
     7 = Total Anchors
     8 = Missing Anchors
     9 = Total External Links
    10 = External Links Tested This Scan
    11 = External Links with Errors
    12 = External Links with Possible Errors
    13 = External Links with Warnings
    14 = Total Orphans
    
    
    linkscan.tim
    ============
    HTTP Transaction Times (Note this file is TAB delimited)
    
    0   URL fetched
    1   HTTP status code (200, 404 etc)
    2   Document size (bytes)
    3   Document Body flag (0=not available; 1=available but not fetched;
                            2=available and fetched)
    4   Transaction time (milliseconds)
    5   Redirect location
    
    Notes:
    * Transaction Time includes time to follow any redirects.
    * Time includes time to fetch document body on HTML
      and similar MIME types only.
    * On other file types (images for example) the transaction
      time does NOT include the body download. But it does
      measure the time and network/server latency for the
      exchange of full request and response headers. The
      additional time could be computed from the file size
      and a knowledge of the available connection bandwidth.
      It's likely to be quite accurate given that the HTTP
      server has only to push the data from an already found
      file down an already open socket, to the client. Since
      most image file formats incorporate compression, you're
      unlikely to see any further savings even if the
      connection type supported such a scheme.
    * Timing will be impacted by # of processes used for
      the scan and also, to some extent, the relative
      performance of the target server and the LinkScan
      machine.
    
    
    
    hist/xxxxxx/dat
    ===============
    
    History Data -- New File Created for Each Scan
    
     0 = Document URL
     1 = Owner Name
     2 = Document Type Code (see below)
     3 = Clicks
     4 = Content-Type (MIME)
     5 = Document Status Code (see codes.txt)
     6 = Content-Length (size in bytes)
     7 = Last-Modified (date/time)
     8 = Document Title
    
    
    Document Type Codes
    ===================
    
     H = HTML Document
     D = PDF Document
     J = JavaScript Document
     M = Image Map
     S = Flash Document
     T = Text Document
     Y = Reserved
     Z = Import Document
    
     F = Other File Type
     I = In-line image
     N = Document with Nofollow rule
     O = Orphaned Document
     P = Orphaned File
    
     A = Anchor
     R = Redirection (internal)
    
     U = External link
     V = Redirection (external)
     X = Reserved (typically mailto: or invalid characters)
    
    

    LinkScan for Unix. Reference Manual. Section 29

    LinkScan Application Notes

    1. LinkScan to Email Interface
    2. Testing Wireless Servers with LinkScan
    3. Testing Secure Servers with LinkScan
    4. Testing Japanese Language Sites with LinkScan
    5. Google Sitemaps
    6. XML Documents

    29.1 LinkScan to Email Interface

    LinkScan incorporates several functions that relate to electronic mail. These include:

    Some or all of the following parameters must be configured in order to use these functions:

    Windows Systems -- linkscan.sys

    Sendmailpath = perl utils/sendmail.pl
    Smtphost = smtp.example.com
    Hostname = www.example.com
    Mailfrom = LinkScan@example.com
    Nameservers =
    [...]
    Mailto = 1
    

    Unix Systems -- linkscan.sys

    Sendmailpath = /usr/lib/sendmail -t
    Smtphost = 
    Hostname = www.example.com
    Mailfrom = LinkScan@example.com
    Nameservers =
    [...]
    Mailto = 1
    

    linkscan.cfg

    For completeness, we address two related settings in the linkscan.cfg file:

    Mailhost = example.com
    Checkmailto = 0
    

    29.2 Testing Wireless Servers with LinkScan

    LinkScan includes support for the Wireless Application Protocol (WAP) and Wireless Markup Language (WML). This allows LinkScan to validate wireless sites via an HTTP gateway. Typically, you will need to add the following configuration commands to linkscan.cfg:

    
    Extraheader User-Agent: Nokia7110/1.0 (04.80)
    Mimetypes text/vnd.wap.wml H
    

    This will cause LinkScan to send an appropriate User-Agent header with each request and to parse/follow documents with a MIME/Content-Type of text/vnd.wap.wml.

    29.3 Testing Secure Servers with LinkScan

    LinkScan may be configured to test websites hosted on secure servers running the Secure Sockets Layer (SSL). i.e. sites with URL's of the form https://www.example.com/.

    On the Microsoft Windows platforms, you need only specify the URL of the site to be scanned. LinkScan includes native support for the Secure Sockets Layer.

    On Unix systems, you will need to install additional software to handle the SSL encryption. The required packages are:

    At the time of writing LinkScan has been tested with OpenSSL version 0.9.6 and Net::SSLeay version 1.05.

    Installation of both packages is very straightforward if you have root access:

    
    
    cd $HOME/openssl-0.9.6
    ./config
    make
    make test
    make install   # See Note 1
    
    cd $HOME/Net_SSLeay.pm-1.05
    perl Makefile.PL
    make
    make test      # See Note 2
    make install   # See Note 1
    

    Note 1: The make install steps may fail if you do not have root access. You may install and run these packages from a user directory if you do not have root access by using something like this:

    
    cd $HOME/openssl-0.9.6
    ./config --openssldir=$HOME/myopenssl
    make
    make test
    make install
    
    cd $HOME/Net_SSLeay.pm-1.05
    perl Makefile.PL $HOME/myopenssl
    make
    make test
    mv ./blib/lib/Net/ /usr/www/linkscan/
    mv ./blib/lib/auto/ /usr/www/linkscan/
    

    Note 2: The make test on Net::SSLeay will produce a number of errors. In general, you can safely ignore them.

    Once the module Net::SSLeay has been successfully installed, LinkScan will be able to scan https://... sites without any additional configuration changes.

    Disclaimer

    Each of the above referenced programs (with the exception of LinkScan) is maintained by parties other than Electronic Software Publishing Corporation. You are solely responsible for your use of those products and your compliance with any applicable software license agreements. Several of the referenced products contain encryption algorithms, the distribution and use of which may be subject to various laws and regulations. You are solely responsible for compliance.

    29.4 Testing Japanese Language Sites with LinkScan

    When scanning sites that contain (in whole or in part) Japanese pages, include the following directives in the Project configuration file (on Windows systems, via the Advanced Tab of the Project Planning Property Sheet):

    
    Jisencode = 1
    Displaylang = EUC-JP
    

    Pages containing JIS, Shift-JIS and/or EUC-JP encoded Japanese characters will be normalized to EUC-JP. This means, for example, that the TITLE tags extracted from different documents may be combined in a single summary document (e.g. the LinkScan SiteMap) even though the original pages were constructed with different encodings.

    The encoding type of each document is stored in the LinkScan database together with the MIME type (Content-Type). The Search Documents Report may be used to search/display this data and help enforce consistent encoding standards across mixed language sites.

    29.5 Google Sitemaps

    LinkScan automatically creates a XML Sitemap file in a format suitable for submission to Google Sitemaps. For more background, see Google Webmaster Help Center.

    The XML Sitemap file is created automatically. The file name is sitemap.xml and it resides in the Project subdirectory of the LinkScan installation directory. e.g.

    The file is formatted in compliance with the Google Sitemaps Protocol. However, Google recommend that the file be compressed using gzip. The gzip utility is standard on most UNIX systems. Windows users may download a free command line implementation of gzip from http://www.gzip.org/.

    LinkScan produces the sitemap.xml file with the following Google-defined fields for each web page listed:

    In addition, LinkScan will optionally limit the scope of the Google Sitemap to the first "N" levels (as defined by the LinkScan Link Order SiteMap). This may be defined by adding a Gsmlevels command to the Project linkscan.cfg file [Windows users: add this command via the Advanced Tab of the Project Planning Property Sheet].

    29.6 XML Documents

    At version 11.6, LinkScan is able to parse and extract links from the following document types:

    The following paragraphs describe how to use LinkScan to scan XML (or other similarly formatted) documents. Activating and configuring the XML parser involves two basic steps.

    1. First, LinkScan must be told to route documents of the appropriate type to the XML parser for analysis. On UNIX systems this may be done with the Mimetypes and Filetypes directives in the linkscan.cfg file.

      Mimetypes text/xml X
      
      Filetypes xml X
      

      On Windows systems, these options may be set via the Mimes and Files Tabs of the Project Planning Property Sheet.

      The former is used with HTTP Scanning and it will route all documents with Content-Type: text/xml header to the XML parser. The latter is used with File System Scanning and it will route all files with a .xml file extension to the new XML parser.

    2. Second, LinkScan must be told how to extract links from the XML document. This is done via Regular Expressions and is best illustrated by example. Suppose we have an XML document organized like this:

      <?xml version="1.0" encoding="ISO-8859-15"?>
      <link>
        <linkUrl>http://www.elsop.com/</linkUrl>
        <linkText>LinkScan</linkText>
        <linkTarget>_blank</linkTarget>
        <linkRef>000012345678</linkRef>
      </link>
      

      We construct an Xmlmatch directive and add it to the linkscan.cfg file:

      Xmlmatch  = <linkUrl>([^<]+)</linkUrl>.*?<linkText>([^<]+)</linkText> $1 $2
      

      LinkScan will now extract the link (http://www.elsop.com/) and the associated caption (LinkScan) from that XML file.

    The new parser means that LinkScan can now be used to quickly and accurately extract links from XML and similarly formatted data files.

    LinkScan for Unix. Reference Manual. Section 30

    LinkScan Revision History

    New in LinkScan 12.1

    New in LinkScan 12.0

    New in LinkScan 11.7

    New in LinkScan 11.6

    New in LinkScan 11.5

    New in LinkScan 11.4

    New in LinkScan 11.3

    New in LinkScan 11.2a

    On September 15 and 16, 2003, changes were made to the Internet Domain Name Service (DNS) by VeriSign, Inc. VeriSign is the company responsible for managing all .com and .net addressing.

    In short, VeriSign created wildcard records such that DNS lookups on a host within an invalid .com or .net domain will resolve to the IP address of a VeriSign operated server. Hence an invalid URL can direct web browsers to a valid web page published by VeriSign.

    In the past, LinkScan would typically report a Possible Error on such links: 900 No DNS Entry. As a result of these changes LinkScan will see a valid web page and report no error at all. Users should be aware that other link checkers (and products that perform similar tasks) may also be impacted by VeriSign's actions.

    Elsop urges all users to install LinkScan Version 11.2a immediately. This version incorporates enhancements which will detect URL's that would otherwise trigger the wildcard records so that LinkScan will once again correctly report an error.

    No configuration changes are required; the new wildcard detection logic is enabled automatically for all URL's within the .com and .net Top Level Domains (TLD's).

    However, users may optionally enable wildcard detection on other TLD's such as cc. Simply add a directive to linkscan.sys such as:

    Wildtlds = com, net, cc
    

    Users that wish to disable this logic (e.g. in the event that VeriSign withdraw the wildcard records) may add this directive to linkscan.sys.

    Wildtlds = 0
    

    New in LinkScan 11.2

    New in LinkScan 11.1

    New in LinkScan 11.0

    New in LinkScan 10.0

    New in LinkScan 9.0

    New in LinkScan 8.2

    At LinkScan 8.2 we have consolidated several minor bug fixes and a large number of customer generated suggestions for improvements and enhancements. We thank all of those users who contributed suggestions. Some of the highlights include:

    New in LinkScan 8.1

    At LinkScan 8.1 we have consolidated several minor bug fixes and a large number of customer generated suggestions for improvements and enhancements. Although each individual change is relatively minor in scope, the aggregate of them all represents a significant improvement to the product. We thank all of those users who contributed suggestions and urge customers to install this greatly improved release at the earliest opportunity. In total, we have have made approximately 60 changes and enhancements. Some of the highlights include:

    New in LinkScan 8.0

    New in LinkScan 7.4

    New in LinkScan 7.3

    New in LinkScan 7.2

    New in LinkScan 7.1

    New in LinkScan 7.0

    New in LinkScan 6.1

    New in LinkScan 6.0

    LinkScan 6.0 includes some significant changes to the scanning modules. For Windows users:

    These changes eliminate prior restrictions due to limitations of the Perl implementation for Windows and can greatly improve performance.

    For Unix users:

    New in LinkScan 5.5

    The configuration file format changes are summarized below:

    We have found that these changes greatly simplify system configuration and administration in complex multi-Project scenarios. The automatic conversion script will attempt to normalize the global and project-specific linkscan.cfg files. However, users may find they can achieve further simplification with a few minutes of manual inspection and editing.

    New in LinkScan 5.4

    LinkScan 5.4 is primarily a maintenance release that consolidates several minor bug fixes and enhancements. It includes changes for the new LinkScan Server and LinkScan Workstation products as well as infrastructure to support new upcoming enhancements.

    New in LinkScan 5.3

    New in LinkScan 5.2

    At LinkScan 5.2 we have improved HTTP navigation (the Execute command) for validating dynamic content (CGI scripts, Server Side Includes etc.), enhanced several of the LinkScan Reports and added some completely new reporting options. Some of the specific enhancements include:

    New in LinkScan 5.1

    LinkScan 5.0 was a major new release. At LinkScan 5.1 we have consolidated several minor bug fixes and a number of improvements designed to further simplify LinkScan administration. The following items are worthy of note:

    New in LinkScan 5.0

    New in LinkScan 4.2

    At LinkScan 4.2, we have focused on enhancements to the various reporting modules with both new and more consistent options.

    New in LinkScan 4.1

    The following changes and enhancements were incorporated in LinkScan version 4.1:

    New in LinkScan 4.0

    The following changes and enhancements were incorporated in LinkScan version 4.0:

    New in LinkScan 3.2

    The following changes and enhancements were incorporated in LinkScan version 3.2:

    New in LinkScan 3.1

    The following changes and enhancements were incorporated in LinkScan version 3.1:

    New in LinkScan 3.0

    The following changes and enhancements were incorporated in LinkScan version 3.0:

    New in LinkScan 2.1

    The following changes and enhancements were incorporated at LinkScan version 2.1:

    New in LinkScan 2.0

    The following changes and enhancements were incorporated at LinkScan version 2.0:

    New in LinkScan 1.2

    The following changes and enhancements were incorporated at LinkScan version 1.2:

    New in LinkScan 1.1

    The following changes and enhancements were incorporated at LinkScan version 1.1:

    LinkScan for Unix. Reference Manual. Section 31

    LinkScan End-User License Agreement
    Including LinkScan Workstation, LinkScan Server,
    LinkScan ServerPro and LinkScan Enterprise

    This license agreement is proof of license. Please treat it as valuable property.

    IMPORTANT - READ CAREFULLY: This End-User License Agreement ("Agreement") is a legal agreement between you (hereinafter "Licensee" or "you") and Electronic Software Publishing Corporation (hereinafter "Licensor") for the Licensor's software products identified above, and any upgrades which may be acquired by you for the identified products from time to time, which may include associated software components, media, printed materials, and "online" or electronic documentation (hereinafter "Product"). By downloading, installing, copying, or otherwise using the Product, you agree to be bound by the terms of this Agreement. If you do not agree to the terms of this Agreement, do not download, install or use the Product.

    1. GRANT OF LICENSE.

    Subject to payment of applicable license fee(s), Electronic Software Publishing Corporation hereby grants to you a non-exclusive non-sublicensable, non-transferable license to use its Product or grants you a license to use the Product free of charge for purposes of evaluating the Product for an evaluation period that is limited to a single one-time trial period of fifteen (15) days. You may use the Product only in the manner described herein. If you initially acquired a copy of the Product without purchasing a license and you wish to purchase a license you may do so by contacting the Licensor via the Internet at http://www.elsop.com/linkscan/ or linkscan@elsop.com.

    If Licensor discovers and/or determines that a Licensee has used the Product on more than a single computer or has scanned more than the number of computers licensed for scanning or in an unauthorized manner, Licensor has the right to demand immediate payment of any amounts that the Licensee should have paid and did not previously pay or to terminate the License. Termination of the License may include, but not be limited to, disabling the licensed Product. Upon termination of license, Licensee shall destroy all copies of the Product in its possession. Licensee is liable for all legal and other expenses associated with the collection of these payments.

    2. SCOPE OF GRANT.

    Licensee may install and use a single copy of the Product on a single computer at a secure Location owned or leased by the Licensee. Licensee may maintain another copy of the Product for archival purposes, provided any copy must contain all of the original Product's proprietary notices.

    LinkScan is offered as four different products: LinkScan Workstation, LinkScan Server, LinkScan ServerPro, and LinkScan Enterprise. The terms: "LinkScan Workstation", "LinkScan Server", "LinkScan ServerPro", and "LinkScan Enterprise" when used in reference to our Product as in "LinkScan Server" do not mean a physical or virtual server, but simply reference different products. The permitted uses of each product are described below.

    The term Location is used in the following text and it is defined as the Licensee's premises (one company or institution) in the same building or campus with a contiguous boundary at the same physical postal address. A Location does not include branch locations or affiliated organizations. This is also the definition of a Location Block (LocBlock).

    The terms "web pages" or documents are pages that are located on your server that you are scanning. The limits on documents described in this agreement refers to the total number of documents that can be scanned with your use of our product. A document may contain numerous links to images and other HTML pages. You may scan an unlimited number of links with all our products.

    A. LinkScan Workstation - You are licensed to scan up to 500 unique web pages on a single physical computer that is owned or leased by you at one Location. The web pages may be on the computer on which the Product is installed or it may be a remote physical computer, but not both. You must buy additional licenses for each additional computer you scan even though you are using only one copy of the Product to scan the multiple computers. If you wish to scan more than 500 unique web pages or other computers, you must obtain additional license(s) or upgrade to another product.

    B. LinkScan Server - You are licensed to scan up to 5,000 unique web pages on a single physical computer that is owned or leased by you at one Location. The web pages may be on the computer on which the Product is installed or it may be a remote physical computer, but not both. You must buy additional licenses for each additional computer you scan even though you are using only one copy of the Product to scan the multiple computers. If you wish to scan more than 5,000 unique web pages or other computers, you must obtain additional license(s) or upgrade to another product.

    C. LinkScan ServerPro - You are licensed to scan up to 15,000 unique web pages on a single physical computer that is owned or leased by you at one Location. The web pages may be on the computer on which the Product is installed or it may be a remote physical computer, but not both. You must buy additional licenses for each additional computer you scan even though you are using only one copy of the Product to scan the multiple computers. If you wish to scan more than 15,000 unique web pages or other computers, you must obtain additional license(s) or upgrade to another product.

    D. LinkScan Enterprise - You are licensed to scan up to 50,000 unique web pages (documents) on up to ten (10) physical computers that are owned or leased by you at one Location. If you wish to scan more than 10 computers, you will have to purchase one or more additional LinkScan Enterprise Licenses.

    D.1. If you wish to scan more than 50,000 unique documents with a copy of LinkScan Enterprise, you must purchase additional Document Blocks (DocBlocks) each of which allows you to scan and additional 50,000 unique documents.

    D.2. If you wish to scan computers at more than one location, you must purchase new LinkScan Enterprise licenses for those locations or if you want to scan more locations using one copy of LinkScan Enterprise, you may purchase additional Location Blocks (LocBlocks).

    E. LinkScan Unlimited - You are licensed to scan an unlimited number of unique web pages (documents) on any number of physical computers that are owned or leased by you.

    3. USE RESTRICTIONS.

    Licensor shall issue to Licensee a Registration Key and Password which may only be installed on the single computer designated in the registration process. The Licensee may transfer the Product to another designated computer owned or leased by the Licensee and re-register the Product for that computer provided the original copy of the Product on the original designated computer is destroyed after the move of the Product has been accomplished. You also agree to not transfer to any other party the Registration Key and Password issued for the original computer. Licensor has the explicit right to monitor the use of the Product by the Licensee in order to enforce the provisions of this agreement.

    Licensee agrees that it will not use or permit the Product to be used in any manner, whether directly or indirectly, that would enable Licensee's customers or any other person or entity to use the Product. However, Licensee may publish copies of SiteMaps and/or TapMaps produced by the Product for public consumption.

    Licensee agrees that the Product is based on and includes trade secrets and proprietary know-how belonging to Licensor and is being made available to Licensee in confidence and solely on the basis of a confidential relationship with Licensor.

    Licensee may not: permit other individuals to use the Product except under the terms listed above; modify, translate, reverse engineer, decompile, disassemble (except to the extent applicable laws specifically prohibit such restriction), or create derivative works based on the Product (including the Product's screen displays); copy the Product (except as specified above); or remove any proprietary notices or labels on the Product. If the licensee does any of the aforementioned activities in this paragraph and has not purchased a license then licensee agrees to immediately pay Licensor the License fee and to comply with all of its terms.

    Licensee may not use the Product to provide timesharing, service bureau, or similar services to any other party. Licensees who are Internet Service Providers are explicitly prohibited from providing the Product or use of the Product to their customers or any other parties.

    Licensee may not allow other parties to use the Product or the Registration Key or Password associated with the Product. Licensee may not allow any other person to do anything that is prohibited by this Agreement.

    Licensee shall not make any portion of the Product available to a third party, rent, lease, sell, sublicense, assign, or otherwise transfer the Product, any portion thereof, or any output generated by the Product to a third party, and shall not convey for commercial purposes any information arising from the use of the product to any third person, or use the Product for a purpose other than that for which it is intended (as evidenced by the documentation). Recipient further agrees to treat the Product with at least the same degree of care as that with which it treats its own confidential or proprietary information.

    4. COPYRIGHT.

    The Product (including any images, applets, animations, and text incorporated into the Product) is owned by Licensor and is protected by copyright laws and international copyright treaties, as well as other intellectual property laws and treaties. The Product is licensed, not sold. All title, including but not limited to copyrights, in and to the Product and any copies thereof are owned by Licensor. You must treat the Product and any printed materials that may accompany the Product like any other copyrighted material. You may not copy the Product or any printed material that may accompany the Product. Licensor reserves all rights not expressly granted.

    5. SOURCE AND BINARY CODE.

    This is PROPRIETARY SOURCE AND BINARY CODE of Licensor; the contents of this file may not be disclosed to third parties, copied or duplicated in any form, in whole or in part, without the prior written permission of Licensor.

    Permission is hereby granted solely to the licensee for use of this source code in its unaltered state. This source code may not be modified by Licensee except under direction of Licensor. Neither may this source code be given under any circumstances to other parties in any form, including source or binary. Licensee shall not reverse engineer, decompile or disassemble any portion of the Product's code. Modification of this source code by Licensee shall automatically terminate this License as per Section 11. Divulging the exact or paraphrased contents of this source code to unlicensed parties either directly or indirectly constitutes violation of federal and international copyright and trade secret laws, and will be duly prosecuted to the fullest extent permitted under law.

    6. DELIVERABLES.

    Licensee may acquire the Product in machine readable form by downloading it electronically from the Licensor's computer (website server) to his computer. The Product will not be delivered in any other form or manner. The Licensor shall deliver to the Licensee by Electronic Mail within a reasonable time after the Licensee has paid for the Product a Registration Key and Password which enables the Product to operate. Reasonable within this context means within three business days of receipt of payment.

    7. DISCLAIMER OF WARRANTY AND LIMITED WARRANTY.

    THE PRODUCT IS DEEMED ACCEPTED BY LICENSEE, AND IS PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND. TO THE MAXIMUM EXTENT PERMITTED BY APPLICABLE LAW, LICENSOR FURTHER DISCLAIMS ALL WARRANTIES, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND NONINFRINGEMENT. LICENSOR DOES NOT WARRANT, GUARANTEE, OR MAKE ANY REPRESENTATIONS REGARDING THE PERFORMANCE, USE OR RESULTS OF THE USE OF THE PRODUCT IN TERMS OF CORRECTNESS, ACCURACY, RELIABILITY, CURRENTNESS, OR OTHERWISE. IN NO EVENT SHALL LICENSOR OR ITS SUPPLIERS BE LIABLE FOR ANY CONSEQUENTIAL, INCIDENTAL, DIRECT, SPECIAL, PUNITIVE, OR OTHER DAMAGES WHATSOEVER (INCLUDING WITHOUT LIMITATION, DAMAGES FOR LOSS OF BUSINESS PROFITS, BUSINESS INTERRUPTION, LOSS OF BUSINESS INFORMATION, OR OTHER PECUNIARY LOSS) ARISING OUT OF THIS AGREEMENT OR THE USE OF OR INABILITY TO USE THE PRODUCT, EVEN IF LICENSOR HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. YOU ASSUME THE ENTIRE RISK AS TO RESULTS AND PERFORMANCE OF THE PRODUCT. IF THE PRODUCT IS DEFECTIVE, YOU, AND NOT LICENSOR OR ITS DEALERS, DISTRIBUTORS, AGENTS, SUPPLIERS, OR EMPLOYEES, ASSUME THE ENTIRE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION.

    THE ABOVE IS THE ONLY WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, THAT IS MADE BY LICENSOR REGARDING THE PRODUCT, NO ORAL OR WRITTEN INFORMATION OR ADVICE GIVEN BY LICENSOR, ITS DEALERS, DISTRIBUTORS, AGENTS, SUPPLIERS, OR EMPLOYEES SHALL CREATE A WARRANTY, OR BIND LICENSOR, AND YOU MAY NOT RELY ON ANY SUCH INFORMATION OR ADVICE. THIS WARRANTY GIVES YOU SPECIFIC LEGAL RIGHTS. YOU MAY HAVE OTHER RIGHTS WHICH VARY FROM STATE TO STATE. NO LICENSOR DEALER, AGENT, SUPPLIER, OR EMPLOYEE IS AUTHORIZED TO MAKE ANY MODIFICATIONS, EXTENSIONS, OR ADDITIONS TO THIS WARRANTY. IF ANY MODIFICATIONS ARE MADE TO THE PRODUCT BY YOU OR IF YOU VIOLATE THE TERMS OF THIS AGREEMENT, THEN THIS WARRANTY SHALL IMMEDIATELY BE TERMINATED. THIS WARRANTY SHALL NOT APPLY IF THE PRODUCT IS USED ON OR IN CONJUNCTION WITH HARDWARE OR PRODUCT OTHER THAN THE UNMODIFIED VERSION OF HARDWARE AND PRODUCT WITH WHICH THE PRODUCT WAS DESIGNED TO BE USED AS DESCRIBED IN THE DOCUMENTATION.

    8. TITLE.

    Title, ownership rights, and intellectual property rights in the Product shall remain in Licensor and/or its suppliers. You understand that the Product is licensed and not sold to you. The Product is protected by the copyright laws and treaties. Title and related rights in the content accessed through the Product is the property of the applicable content owner and may be protected by applicable law. This License gives you no rights to such content.

    9. SUPPORT AND MAINTENANCE.

    Licensor offers no support (including technical support) or maintenance of this Product. Licensee, at its option, may negotiate for Support and Maintenance from Licensor and/or its suppliers through a separate agreement. Licensor may, at its option, publish on its website a list of Frequently Asked Questions (FAQ) concerning the Product without obligation to continue doing so or to maintain said list. Licensor may, at its option, offer and/or provide technical support or assistance for the Product without obligation to continue doing so.

    10. LIMITATIONS ON LICENSOR'S OBLIGATIONS.

    Licensee understands and agrees that Licensor may develop and market new or different computer programs which use part or all of the Product and which performs all of the functions performed by the Product. Nothing contained in this Agreement gives Licensee any rights with respect to such new or different computer programs.

    11. TERMINATION.

    The license will terminate automatically if you fail to comply with the limitations and restrictions described herein or if you are delinquent in making any payments for the Product of any sum due under this Agreement. On termination, you must destroy all copies of the Product. Licensor may also terminate this Agreement if you violate it. You must destroy all copies of the Product in your possession or control promptly upon termination. Upon Licensor's request, you must certify in writing that you have complied with your obligations under this Section and otherwise under this Agreement. Termination by Licensor will not limit any of its other rights or remedies under this Agreement or at law or in equity. Any provision of this Agreement that by its sense and context is intended to survive termination of this Agreement will survive termination.

    12. LIMITATIONS ON LICENSOR'S LIABILITY AND UPON TIME TO SUE.

    UNDER NO CIRCUMSTANCES AND UNDER NO LEGAL THEORY, TORT, CONTRACT, OR OTHERWISE, SHALL LICENSOR OR ITS SUPPLIERS OR RESELLERS BE LIABLE TO YOU OR ANY OTHER PERSON FOR ANY INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES OF ANY CHARACTER INCLUDING, WITHOUT LIMITATION, DAMAGES FOR LOSS OF GOODWILL, WORK STOPPAGE, COMPUTER FAILURE OR MALFUNCTION, OR ANY AND ALL OTHER COMMERCIAL DAMAGES OR LOSSES. IN NO EVENT WILL LICENSOR BE LIABLE FOR ANY DAMAGES IN EXCESS OF THE PRICE PAID FOR SUCH LICENSE, EVEN IF LICENSOR SHALL HAVE BEEN INFORMED OF THE POSSIBILITY OF SUCH DAMAGES, OR FOR ANY CLAIM BY ANY OTHER PARTY. THIS LIMITATION OF LIABILITY SHALL NOT APPLY TO LIABILITY FOR DEATH OR PERSONAL INJURY TO THE EXTENT APPLICABLE LAW PROHIBITS SUCH LIMITATION. FURTHERMORE, SOME STATES DO NOT ALLOW THE EXCLUSION OR LIMITATION OF INCIDENTAL OR CONSEQUENTIAL DAMAGES, SO THIS LIMITATION AND EXCLUSION MAY NOT APPLY TO YOU. NO ACTION, REGARDLESS OF FORM, ARISING OUT OF ANY OF THE TRANSACTIONS UNDER THIS AGREEMENT MAY BE BROUGHT BY LICENSEE MORE THAN ONE YEAR AFTER SUCH ACTION ACCRUED.

    13. TRADEMARKS.

    "Electronic Software Publishing Corporation", the Electronic Software Publishing Corporation logo, "Elsop", "LinkScan", the LinkScan logo, "LinkScan QuickCheck", "LinkScan Dispatch", "MailVet", and all other trademarks which identify the Licensed Program or the company are the trademarks, and in some jurisdictions may be registered trademarks, of the Electronic Software Publishing Corporation.

    14. EXPORT CONTROLS.

    You agree that none of the Product or underlying information or technology will be downloaded or otherwise exported or re-exported (i) into (or to a national or resident of) Cuba, Iraq, Libya, Federal Republic of Yugoslavia (Serbia and Montenegro, U.N. Protected Areas and areas of Republic of Bosnia and Herzegovina under the control of Bosnian Serb forces), North Korea, Iran, Syria or any other country to which the U.S. has embargoed goods; or (ii) to anyone on the U.S. Treasury Department's list of Specially Designated Nationals or the U.S. Commerce Department's Table of Deny Orders. You warrant and represent that neither the U.S.A. Bureau of Export Administration nor any other federal agency has suspended, revoked or denied your export privileges. By downloading or using the Product, you are agreeing to the foregoing and you are representing and warranting that you are not located in, under the control of, or a national or resident of any such country or on any such list.

    In addition, if the licensed Product is identified as a not-for-export product (for example, in the registration process or in the installation process), then the following applies: Except for export to Canada for use In Canada by Canadian citizens, the Product and any underlying technology may not be exported outside the United States or to any foreign entity or "foreign person" as defined by U.S. government regulations, Including without limitation, anyone who is not a citizen, national or lawful permanent resident of the United States. By downloading or using the Product, You are agreeing to the foregoing and you are warranting that you are not a "foreign person" or under the control of a foreign person.

    15. ENTIRE AGREEMENT.

    This Agreement constitutes the entire agreement between the parties in connection with the subject matter hereof and supersedes all prior and contemporaneous agreements, understandings, negotiations and discussions, whether oral or written, of the parties, and there are no warranties, representations and/or agreements between the parties in connection with the subject matter hereof except as specifically set forth or referred to herein.

    16. GOVERNING LAW; SEVERABILITY.

    This Agreement represents the complete agreement concerning this license and may be amended only by a writing executed by both parties. If any provision of this Agreement is held to be unenforceable, such provision shall be reformed only to the extent necessary to make it enforceable. This Agreement shall be governed by California law, without reference to conflicts of law principles. The application of the United Nations Convention on Contracts for the International Sale of Goods is expressly excluded. THE ACCEPTANCE OF ANY PURCHASE ORDER PLACED BY YOU IS EXPRESSLY MADE CONDITIONAL ON YOUR ASSENT TO THE TERMS SET FORTH HEREIN, AND NOT THOSE IN YOUR PURCHASE ORDER. Any suit to enforce the terms of this Agreement may be brought in either the United States District Court of the Northern District of California or the California Superior Court in and for the County of Santa Clara, as appropriate, and you consent to the jurisdiction and venue of such court. If either party brings any action to enforce any rights arising out of or relating to this Agreement (whether or not suit is filed), the prevailing party shall be entitled to recover its costs and expenses related to such action, including reasonable attorneys' fees except as provided under section 1: Grant of License. All terms of this Agreement which, by their nature, are intended to survive termination of this Agreement shall survive any such termination.

    17. COMPLIANCE WITH THE LAW.

    Licensee agrees that it will comply with all federal, state and local laws and regulations governing the use of the Product.

    18. RETURN AND REFUND POLICY.

    The licensor allows no returns and will make no refunds.

    19. TAXES.

    In addition to all license fees paid by Licensee in acquiring this license, Licensee shall pay or reimburse Licensor for all federal, state, local or other taxes not based on Licensor's net income or net worth, including, but not limited to, sales, use, value-added, privilege and property taxes, or amounts levied in lieu thereof, based on charges payable under this Agreement or based on the Product, its use or any services performed hereunder, whether such taxes are now or hereafter imposed under the authority of any federal, state, local or other taxing jurisdiction.

    20. U.S. GOVERNMENT RESTRICTED RIGHTS.

    Use, duplication or disclosure by an agency, agent, unit, or instrumentality of the United States Government is subject to restrictions set forth in subparagraphs (a) through (d) of the Commercial Computer-Restricted Rights clause at FAR 52.227-19 when applicable, or in subparagraph (c)(1)(ii) of the Rights in Technical Data and Computer Software clause at DFARS 252.227-7013, and in similar clauses in the NASA FAR Supplement. Contractor/manufacturer is Electronic Software Publishing Corporation, 1361 Shelby Creek Court, San Jose, CA 95120 USA

    License Version 2007-03 Revision Date: March 15, 2007 (c) Copyright 1997-2010 Electronic Software Publishing Corporation (Elsop) LinkScan (TM) and Elsop (TM) are Trademarks of Electronic Software Publishing Corporation

    LinkScan for Unix. Single Document Reference Manual
    LinkScan Version 12.1
    © Copyright 1997-2010 Electronic Software Publishing Corporation (Elsop)
    LinkScan™ and Elsop™ are Trademarks of Electronic Software Publishing Corporation

     Help   Reference   HowTo   Card