HAKMEM -- PROGRAMMING ALGORITHMS, HEURISTICS -- DRAFT, NOT YET PROOFED

Beeler, M., Gosper, R.W., and Schroeppel, R. HAKMEM. MIT AI Memo 239, Feb. 29, 1972. Retyped and converted to html ('Web browser format) by Henry Baker, April, 1995.

PROGRAMMING ALGORITHMS, HEURISTICS

Previous Up Next

ITEM 176 (Gosper):

The "banana phenomenon" was encountered when processing a character string by taking the last 3 letters typed out, searching for a random occurrence of that sequence in the text, taking the letter following that occurrence, typing it out, and iterating. This ensures that every 4-letter string output occurs in the original. The program typed BANANANANANANANANA.... We note an ambiguity in the phrase, "the Nth occurrence of". In one sense, there are five 00's in 0000000000; in another, there are nine. The editing program TECO finds five. Thus it finds only the first ANA in BANANA, and is thus obligated to type N next. By Murphy's Law, there is but one NAN, thus forcing A, and thus a loop. An option to find overlapped instances would be useful, although it would require backing up N-1 characters before seeking the next N character string.

ITEM 177 (Gosper): DRAWING CURVES INCREMENTALLY

Certain plotters and displays are constrained to approximate curves by a sequence of king-moves between points on a lattice.

Many curves and contours are definable by F(X,Y) = 0 with F changing sign on opposite sides of the curve. The following algorithm will draw most such curves more accurately than polygonal approximations and more easily than techniques which search for a "next" X and Y just one move away.

We observe that a good choice of lattice points is just those for which F, when evaluated on one of them, has opposite sign and smaller magnitude than on one or more of its four immediate neighbors. This tends to choose the nearer endpoint of each graph paper line segment which the curve crosses, if near the curve F is monotone with distance from the curve.

First, divide the curve into arcs within which the curve's tangent lies within one 45 degree semiquadrant. We can show that for reasonable F, only two different increments (say north and northwest) are needed to visit the desired points.

Thus, we will be changing one coordinate (incrementing Y) every step, and we have only to check whether changing the other (decrementing X) will reduce the magnitude of F. (If F increases with Y, F(X,Y+1) > -F(X-1,Y+1) means decrement X.) F can often be manipulated so that the inequality simplifies and so that F is easily computed incrementally from X and Y.

As an example, the following computes the first semiquadrant of the circle

     2    2    2
F = X  + Y  - R  = 0.

C0:    F <- 0, Y <- 0, X <- R

C1:    F <- F+2Y+1, Y <- Y+1

C2:    if F >= X, F <- F-2X+1, X <- X-1

C3:    if Y < X-1, go to C1

C4:    (Link to next arc) if Y = X-1, Y <- Y+1, X <- X-1

This can be bummed by maintaining Z = 2Y+1 instead of Y. Symmetry may be used to compute all eight semiquadrants at once, or the loop may be closed at C2 and C3 with two PUSHJ's to provide the palindrome of decisions for the first quadrant. There is an expression for the number of steps per quadrant, but it has a three-way conditional dependent upon the midpoint geometry. Knowing this value, however, we can replace C3 and C4 with a simple loop count and an odd-even test for C4.

The loop must be top-tested (C3 before C1) if the "circle" R = 1, with four diagonal segments, is possible.

All this suggests that displays might be designed with an increment mode which accepts bit strings along with declarations of the form: "0 means north, 1 means northwest". 1100 (or 0011) will not occur with a curve of limited curvature; thus, it could be used as an escape code, but this would be an annoying restriction.

See the following illustration of circles drawn this way.

[In case of a tie, i.e., F has equal magnitudes with opposite signs on adjacent points, do not choose both points but rather have some arbitrary yet consistent preference for, say, the outer one. The problem can't arise for C2 in the example because the inequality F >= X is really F > -(F-2X+1) or F > X-.5.]

ITEM 178 (Schroeppel, Salamin):

Suppose Y satisfies a differential equation of the form

P(X) Y(Nth derivative) + ..... + Q(X) = R(X)

where P, ..... Q, and R are polynomials in X

                                  2                2    2
(for example, Bessel's equation, X  Y'' + X Y' + (X  - N ) Y = 0)

and A is an algebraic number. Then Y(A) can be evaluated to N places in time proportional to N(ln N)^3.

Further, e^X and ln X or any elementary function can be evaluated to N places in N(ln N)^2 for X a real number. If F(X) can be evaluated in such time, so can the inverse of F(X) (by Newton's method), and the first derivative of F(X). Also, zeta(3) and gamma can be done in N(ln N)^3.

ITEM 179 (Gosper):

A program which searches a character string for a given substring can always be written by iterating the sequence fetch-compare-transfer (ILDB-CAIE-JRST on the PDP6/10) once for each character in the sought string. The destinations of the transfers (address fields of the JRST's) must, however, be computed as functions of the sought string. Let 0 1 2 3 4 S A S S Y 0 1 0 2 2 stand for the program T0: ILDB C,A ;C gets next char from pointer in A T1: CAIE C,"S ;skip if it's an S JRST T0 ;loop back on failure ILDB C,A ;next T2: CAIE C,"A ;skip if A JRST T1 ;could be an S ILDB C,A T3: CAIE C,"S JRST T0 ;S, A, non S, so start over ILDB C,A T4: CAIE C,"S JRST T2 ;could be SAS.ASSY ILDB C,A CAIE C,"Y JRST T2 ;could be SASS.ASSY ;found SASSY In other words, a number > 0 in the top row is a location in the program where the corresponding letter of the middle row is compared with a character of the input string. If it differs, the number in the bottom row indicates the location where comparison is to resume. If it matches, the next character of the middle row is compared with the next character of the input string.

Let J be a number in the to row and K be the number below J, so that TK is the address field of the Jth JRST. For each J = 1, 2, ... we compute K(J) as follows: K(1) = 0. Let P be a counter, initially 0. For each succeeding J, increment P. If the Pth letter = the Jth, K(J) = K(P). Otherwise, K(J) = P, and P is reset to 0. (P(J) is the largest number such that the first P characters match the last P character in the first J characters of the sought string.)

J=      0 1                             0 1 2 3 4 5
          M I S S I S S I P P I           I S S I S S I P P I
K(J)=     0 1 1 1 1 1 1 1 1 1 1           0 1 1 0 1 1 0 5 1 0

        0 1 2 3                         0 1 2 3
          C O C A C O L A                 S A S S A F R A S
          0 1 0 2 0 1 3 1                 0 1 0 2 1 3 1 1 0

To generalize this method to search for N strings at once, we produce a program of ILDB-CAIE-JRST's for each of the sought strings, omitting the initial ILDB from all but the first. We must compute the destination of the Jth JRST in the Ith program, TKM(I,J), which is the location of the Kth compare in the Mth program.

It might be reasonable to compile such an instruction sequence whenever a search is initiated, since alternative schemes usually require saving or backing up the character pointer.

ITEM 180 (Gosper):

A problem which may arise in machine processing of visual information is the identification of corners on a noisy boundary of a polygon. Assume you have a broken line. If it is a closed loop, find the vertex furthest from the centroid (or any place). Open the loop by making this place both endpoints and calling it a corner. We define the corner of a broken line segment to be the point the sum of whose distances from the endpoints is maximal. This will divide the segment in two, allowing us to proceed recursively, until our corner isn't much cornerier than the other along the line.

The perpendicular distance which the vector C lies from the line connecting the vectors A and B is just

(C - A) x (B - A)
----------------- ,
    2 |A - B|

but maximizing this can lose on very pointy V's. The distance sum hack can lose on very squashed Z's.

Previous Up Next