LinkScan

LinkScan for Unix. Reference Manual.

Section 10

  Previous   Contents   Next   Help   Reference   HowTo   Card 

Import Scanning

The LinkScan Import function may be used to:

When processing a list of Links each URL is checked in turn and its status stored in the LinkScan database. When processing a list of Documents, each document and every link within that document is checked and its status stored.

The import function offers enormous flexibility. To use this feature, carry out the following steps:

  1. Prepare the Import File

    LinkScan will import a simple ASCII file of the following format:

    URL ... one or more tab characters ... URL-Description

    URL's may be absolute, or relative to the Home URL for the current server. The URL-Description is imported and carried through to the LinkScan Reports for identification purposes. You may use any ASCII string, for example a database record number.

    Import files may also include URL's using the extended LinkScan conventions for form submissions (GET, POST and Multi-Part POST). See How to Submit Forms.

    An alternative field separator may be specified by including a special command as the first line of the file:

    ## \s+

    The command starts with '##' in column one followed by a Perl expression that specifies the field delimiter. In the example above, '\s+' means one or more whitespace characters (tab or space).

    Lines with a '#' in column one, and blank lines, are ignored as comments.

    To use the Import Function, open the linkscan.cfg file for the appropriate Project, and edit the Importfile setting. Supply the full pathname to the prepared ASCII import file. For example:

    
    Importfile = /usr/home/linkscan/importfiles/test.txt
    

    Then select the import mode by changing the Import setting. Valid values are:

    Import = 0 Import mode disabled
    Import = 1 Import a list of links
    Import = 2 Import a list of documents
    Import = 3 Import a list of documents with caching disabled

    When using Import Documents LinkScan will by default check each document listed in the Import file but it will not follow those links and scan the entire site. Optionally, you may set Maxclicks and force LinkScan to execute a deeper scan. e.g. with Maxclicks = 3, LinkScan will check the Import File, the documents listed in the Import File, and the children (but not the grandchildren) of those documents.

  2. Special Considerations

    LinkScan de-duplicates the list of links within an Import Document list. This means that LinkScan will validate each unique URL within the list only one time.

    However, you may force LinkScan to process an Import Sequence so that the same URL or document is checked more than once. This may be achieved by adjusting the URL's to make them appear unique. Note that this also provides a means by which to differentiate the test results for each step. Simply edit the URL's to make them unique by adding dummy name-value pairs to the query string of the URL's:

    http://www.example.com/cookie_sensitive?dummyseq=1
    [...]
    http://www.example.com/set_cookie
    [...]
    http://www.example.com/cookie_sensitive?dummyseq=2

    If the URL's already include a query string, simply append the additional parameter to the existing query and change:

    http://www.example.com/foo?name=value

    to:

    http://www.example.com/foo?name=value&dummyseq=1

    Normally, LinkScan maintains the status of each link in a cache while it scans a site. This dramatically improves performance since LinkScan does not need to re-check commonly used images and other components over and over. However, it may also be undesirable with some stateful sequences. For example, if the same URL produces a completely different result before and after a cookie is set.

    In those situations, you may use a special option (Import = 3) which will force LinkScan to flush its cache after each imported document has been validated.

LinkScan for Unix. Reference Manual. Section 10. Import Scanning
LinkScan Version 12.1
© Copyright 1997-2010 Electronic Software Publishing Corporation (Elsop)
LinkScan™ and Elsop™ are Trademarks of Electronic Software Publishing Corporation

  Previous   Contents   Next   Help   Reference   HowTo   Card