LinkScan

LinkScan for Unix -- Common Tasks

 

 Help   Reference   HowTo   Card 

Introduction

  1. Install LinkScan

  2. Upgrade LinkScan

  3. Deinstall LinkScan

  4. Address License Key Problems

  5. Include/Exclude Different Documents or Links

  6. Suppress Specific Errors or Warnings

  7. Scan HTML Documents on my Local Hard Drive

  8. Scan a List of Links (e.g. from a database)

  9. Scan Sites that Require a Login/Authentication

  10. Send LinkScan Reports via Email

  11. Schedule LinkScan to Run Automatically

  12. Transfer the LinkScan Results to another Database System

  13. Find Links to Inappropriate Sites

1. Install LinkScan

See:

2. Upgrade LinkScan

See:

3. Deinstall LinkScan

Delete the LinkScan installation directory and everything within it.

4. Address License Key Problems

If you receive an Invalid/Corrupt/Expired License Key Error, please mail the exact error message to key@elsop.com.

5. Include/Exclude Different Documents or Links

The Include and Exclude rules take advantage of powerful Perl Regular Expressions. Comprehensive documentation for Perl Regular Expressions is widely available on-line. For example, see: http://www.perldoc.com/perl5.6.1/pod/perlre.html. Some simple examples are described below.

You may add various rules to the Project configuration file, linkscan.cfg. Windows users can click Edit and then Advanced to open the appropriate linkscan.cfg file in a Notepad window.

Note that each rule or command must be entered on a new line starting in column one. Note that lines starting with a pound sign ("#") are treated as comments and ignored. The position of the line within the linkscan.cfg file does not matter; we suggest making additions near the top of the file so you can find them again quickly.

Additional Rules

In addition to the basic Exclude rule, LinkScan supports some powerful variations:

6. Suppress Specific Errors or Warnings

Each link validated by LinkScan is assigned a specific LinkScan Error or Status Code. And, every Status Code is associated with a Severity. You may customize the Severity associated with any Status Code by using the Statuscode command. The command syntax is:


Statuscode statuscode, severitycode

The following Severity codes are valid:

Symbol Code Severity Explanation
* 0 Unknown: LinkScan has not tested or was unable to test this link
* 1 Error: LinkScan found a hard error on this link
* 2 Possible Error: There may be a problem with this link. It should be retested at a later time
* 3 Warning: LinkScan found something unusual about this link. Manual inspection highly recommended
* 4 Advisory: This link is probably ok, but manual inspection recommended
* 5 No Error: This is a good link

Examples:

Statuscode = 301,3    # 301 (Moved Permanently) from Error to Warning
Statuscode = 7,4      #   7 (Orphaned HTML File) to Advisory
Statuscode = 8,4      #   8 (Orphaned non-HTML File) to Advisory

The above commands will downgrade all 301 status codes from Errors to Warnings, and all Orphaned Files from Warnings to Advisories.

7. Scan HTML Documents on my Local Hard Drive

Your Project configuration file, linkscan.cfg should look something like this:

Homedir = /usr/www/htdocs/
Homeurl = http://www.example.com/
Mirrorurl = 
Homefile = index.html
[...]
Http = 0
[...]
Htmlfiles = html, shtml, htm
Mapfiles = map
Pdffiles = 
Flashfiles = swf
Defaultpages = index.html, index.shtml, index.htm, home.html, home.shtml, home.htm
Indexoptions = 0

Note the following points:

8. Scan a List of Links (e.g. from a database)

See Import Scanning.

9. Scan Sites that Require a Login/Authentication

Many websites include some form of access control or user authentication features. In general, these arrangements use one of two mechanisms defined by the HTTP protocols. Both are supported by LinkScan. They are:

In the case of HTTP Authentication, when a user attempts to access a protected area, their browser will present a challenge in the form of a pop-up dialog box that requires a username and password to be entered. In the case of cookie-based arrangements, the user is normally required to login by filling out an HTML form and submitting it.

HTTP Authentication

For sites that require HTTP Authentication, you must configure LinkScan with an appropriate Auth command:


Syntax:

Auth server-name "realm-name" username password

Examples:

Auth www.example.com "" guestuser xxxxxx
Auth app.example.com "Controlled Access" guestuser xxxxxx

You must include a realm-name (enclosed in double-quotes) but it may be empty. In that case, LinkScan will use the configured username and password for any realm on the target server. This is the recommended approach unless your server uses multiple realms with different access control rules for different portions of the website.

Cookie-based Authentication

HTTP access to some sites is controlled via authentication schemes requiring Cookies. For more information regarding Cookies see the Netscape Cookie Specification at http://www.netscape.com/newsref/std/cookie_spec.html.

LinkScan will automatically accept and return all valid cookies received during the course of a scan. However, to gain access to the site, you may need to configure LinkScan to ensure that the appropriate cookies are set. This may be achieved by one of two techniques:

The submissions of a login form may be configured using the Extrahome command (described in the next section). However, you may optionally initialize LinkScan's collection of stored cookies (aka Cookie Jar) with one or more permanent Cookies by using the Cookie command:


Syntax:

Cookie server-name cookiename=cookievalue

Example:

Cookie www.elsop.com LinkScan=cookie_value;

Note: Do not enter space characters around the '=' character

The server-name is the name of the server to be tested. For security reasons and in compliance with the applicable standards, LinkScan will only send the cookie when the specified server-name exactly matches the hostname portion of the requested URL. In this context, server names and their corresponding IP addresses are considered to be different (consistent with all major browsers). The cookie names and values must be reverse engineered from your server code or "discovered" via your browser by enabling the "Prompt before accepting cookies" or examination of stored cookies on disk.

Hint 1: Sites with especially complex schemes (multiple levels of access control, subscription expirations etc.) might consider configuring their server and/or scripts to recognize a "super-user-cookie" specifically for testing purposes. This approach may also be used to trigger test points within server-based scripts and greatly improve the meaningful testability of complex dynamic content.

Hint 2: HTTP Authentication and Cookie related transactions are logged by LinkScan during the course of the scan. You may examine the following file to view the log: .../LinkScan/Projectname/data/linkscan.red

Submitting a Login Form

LinkScan may be configured to submit a form using either the GET or POST methods. Pages that require the GET method are specified with a normal URL and query string. Pages that require the POST method are specified in a similar manner except that the query character (?) is replaced with a double-query (??).


Syntax:

Extrahome relative-path-expression

Example:

Extrahome login.jsp??Name=Malcolm%20Hoar&Password=secret

Hint 1: Use the LinkScan Recorder to automatically capture the correctly constructed URL's.

Hint 2: When using the Extrahome command to submit a login form to provide access to a site, you may also need to configure LinkScan so that it doesn't immediately "click" any LOGOUT button which would invalidate the newly created session. For example:


Extrahome login.jsp??Name=Malcolm%20Hoar&Password=secret
Exclude .*logout.jsp

10. Send LinkScan Reports via Email

First, you must configure the LinkScan to Email Interface. Once this has been completed:

11. Schedule LinkScan to Run Automatically

See:

12. Transfer the LinkScan Results to another Database System

LinkScan was designed from the outset to be a highly open system. Hence it is a straightforward matter to export portions of the LinkScan database into other database management systems for further analysis.

For many users, the simplest method of achieving this is via LinkScan Excel. Once a table of data has been imported into a LinkScan Excel spreadsheet, the data can easily be pushed into another relational database management system (RDBMS) such as Microsoft Access, Microsoft SQL Server or Oracle.

Others may wish to access the LinkScan database structures directly via their own program code. It is a relatively simple programming task to extract the required data using most programming languages including Perl, C, C++, Java or Visual Basic. Those users will wish to study a brief description of the LinkScan File Formats. Note that small changes in the file formats may arise if and when you install new versions of LinkScan. Such changes are generally minor and infrequent.

13. Find Links to Inappropriate Sites

Activate the LinkScan Profiler.

LinkScan for Unix -- Common Tasks
LinkScan Version 11.6
© Copyright 1997-2006 Electronic Software Publishing Corporation (Elsop)
LinkScan™ and Elsop™ are Trademarks of Electronic Software Publishing Corporation

 Help   Reference   HowTo   Card