LinkScan

LinkScan for Unix. Reference Manual.

Section 2

  Previous   Contents   Next   Help   Reference   HowTo   Card 

Essential LinkScan Concepts

This section introduces some important concepts and terms that are used throughout the remainder of this Reference Manual. These are:

  1. LinkScan Projects
  2. LinkScan Owners
  3. LinkScan Usernames
  4. Scanning Methods
  5. Documents and Links
  6. LinkScan Directory and File Structure
  7. LinkScan Configuration Files
  8. Perl Regular Expressions
  9. relative-path and relative-path-expression

2.1 LinkScan Projects

LinkScan is able to scan multiple websites. You may also scan the same website multiple times with different configuration options. In each case, LinkScan creates a unique and corresponding LinkScan Database containing the results of the analysis. Together, the configuration files and database constitute a LinkScan Project.

Users/administrators are required to select a Project when scanning, if multiple projects are defined. And, users must select a Project when viewing the results.

Each LinkScan Project is stored within a subdirectory of the main LinkScan installation directory.

For addition information concerning Projects, how to create them and how to scan them, see Basic Scanning.

2.2 LinkScan Owners

Within each Project, you may also configure multiple LinkScan Owners. Collections of HTML documents and other files are assigned between Owners in a variety of ways:

The LinkScan Owner concept enables individual content developers or workgroups to view results that pertain to their documents or areas of responsibility. LinkScan Owners are defined via the LinkScan Configuration Files, discussed below. By default, LinkScan will create and assign Owners as follows:

This enables users to browse the results selectively so that the reports are smaller and more relevant to their needs. They're also produced more rapidly.

2.3 LinkScan Usernames

LinkScan incorporates access controls that may be used to limit user access to LinkScan databases and results. These controls are not enabled by default.

When activated, users may be required to login to the LinkScan system used a pre-defined LinkScan Username and associated password. The Username will define the Projects and Owners that an individual user is permitted to access.

Those wishing to enable these access control features should see LinkScan Access Controls.

2.4 Scanning Methods

LinkScan supports three different scanning methods:

Network HTTP scanning is generally the best mode to use for sites with a large amount of dynamic content: .jsp, .asp files, etc. The File System Scanning method mode enables tracking of "orphaned" files, files which aren't linked to currently, and is more appropriate for sites with limited dynamic content.

2.5 Documents and Links

The LinkScan software, and this document, both maintain a strong distinction between Documents and Links.

Hence an HTML file is a Document containing Links. Dynamically generated web pages, PDF and Flash Files as well as Import Files may also be considered Documents since LinkScan can examine those files for the presence of Links. Images (such as .gif and .jpg files) are not considered documents.

References to sites other than the one being scanned (External Links) are not documents either, since LinkScan does not examine the content of those files for the presence of Links.

2.6 LinkScan Directory and File Structure

The LinkScan system is made up of a number of different file types:

In a basic LinkScan installation these files are organized within the following directory structure:

2.7 LinkScan Configuration Files

LinkScan's operation is controlled by a number of different configuration files. When running LinkScan via the Windows Graphical User Interface, these files are somewhat invisible. However, they still control the execution of the program and you may find it useful to view the raw configuration files from time to time. On Unix systems, these files represent the primary method of configuring LinkScan. All of the files are formatted in plain ASCII text and may be viewed and modified using the editor of your choice (e.g. Windows Notepad, Unix vi, emacs, pico, nedit, et al).

The most important configuration files are:

This approach provides tremendous flexibility. It means you can establish Global Settings in the Global Configuration File that apply to all Projects. And you may override (single-valued) settings or supplement (multi-valued) settings with additional commands in the Project Configuration File(s); these being Project-specific.

Some additional configuration/control files are discussed elsewhere in this manual. They are used by LinkScan (i.e. do not delete them!) but it is rarely necessary for users to examine or modify them.

All of the configuration files include extensive comments. Comments are signified by the pound sign like this:


# This line contains only a comment

Realcommand = 1   # This comment could describe Realcommand

2.8 Perl Regular Expressions

LinkScan incorporates a vast array of customization features many of which exploit the power of Perl Regular Expressions. For a description of Perl Regular Expressions on Unix systems, see man perlre. HTML versions are available at many locations including:

http://www.perl.com/doc/manual/html/pod/perlre.html

We also recommend the book Mastering Regular Expressions (a.k.a. the Owl Book) by Jeffrey E.F. Friedl, and published by O'Reilly [ISBN: 1-56592-257-3].

2.9 relative-path and relative-path-expression

We make extensive reference to these terms in the customization sections of this manual and they are introduced here for your convenience.

Let us assume that we are scanning the website:

http://www.example.com/

An individual document within that website might be:

http://www.example.com/products/widget.html

LinkScan will refer to that page using its relative-path, which in this case, is:

products/widget.html

A relative-path-expression is a Perl Regular Regular Expression that matches relative-path. For example, all of the following will match our widget page:


products/widget.html      # Also matches products/widgetXhtml
products/widget\.html$    # Does not match anything else
(|.*/)widget\.html$       # Matches widget.html in any directory

LinkScan for Unix. Reference Manual. Section 2. Essential LinkScan Concepts
LinkScan Version 12.1
© Copyright 1997-2010 Electronic Software Publishing Corporation (Elsop)
LinkScan™ and Elsop™ are Trademarks of Electronic Software Publishing Corporation

  Previous   Contents   Next   Help   Reference   HowTo   Card