6.3 Automatic Link Generation - the Link Apprentice

Mark Bernstein [Ber90] describes a link apprentice for the automatic generation of links that delivers surprisingly effective results using a shallow link detection technique. The link apprentice does not do any semantic analysis of the underlying text, but uses a simple pattern matching and string comparison algorithm. Obviously a system working this way can never expect to generate all links automatically, because if two passages of text express the same idea differently (using different phrases and words), the link apprentice has no way of figuring out the link. If, on the other hand, unrelated concepts are expressed in similar words, the link apprentice computes an erroneous link.

Bernstein integrated his link apprentice into the hypertext system Hypergate(TM) [Ber90]. In Hypergate each word and each left-substring of a word that occurs on a hypertext node are hashed into a Bloom filter hash table [Knu73b] associated with that node. Similarities between two hypertext nodes are computed based on the dot product of their hash tables. In each run, the apprentice scans the whole document and presents the twenty pages which are the most similar to the one the user is currently reading. Bernstein identifies three potential uses for his link apprentice:

Bernstein compared links generated by the link apprentice to manually generated links and found that most of the time automatically generated links are at least as good as the manually generated ones. In about 10 % of the cases, the link apprentice produced meaningless links. This means, that all automatically generated links have to be verified manually, but that the link apprentice offers a valuable tool assisting the hypertext author in finding the best links. It is also well suited for automatically generating first approximations of guided tours (see the chapter about sequentialization). But to get a truly versatile automatic linking facility, there is no way around not only comparing strings, but also analyzing the contents of a document. The VISAR system presented in the chapter about similarity is representative for this kind of system.