Apple Assembly Line - V5N2

Volume 5 -- Issue 2November 1984

In This Issue...

18-Digit Arithmetic, Part 7
S-C Macro Assembler Version 2.0
Convert Two Decimal Digits to Binary
A Whole Megabyte for your Apple //e
65816 News
New DP18 Square Root Subroutine
Improvements to 80-Column Monitor Dump
Generating Cross Reference Text Files with DISASM
Macro Information by Example
Turning Bit-Masks into Indices

Apple II Troubleshooting Guide

We have just received a new book from Howard Sams: Apple II+/IIe Troubleshooting & Repair Guide, by Robert C. Brenner. At a glance, it looks like quite a good introduction to the Apple hardware and its potential problems. The first chapter is Basic Troubleshooting, followed by three chapters on Description, Operations, and Specific Troubleshooting for the II Plus, three more similar chapters on the //e, and two chapters on Preventive Maintenance and Advanced Trouble- shooting. Here's a quote from the Introduction:

This book is a detailed troubleshooting and repair document. It is not a treatise on basic computer theory or a discussion of chip operation, registers, busses, and logic gates. It is an all "meat and potatoes" manual to enable the computer user to repair his or her own machine in those 95 percent of circumstances where knowledge and a good reference are enough to find and repair a failure.

List price of the Troubleshooting & Repair Guide is $19.95. Our price will be $18 + shipping.

Apple //e Reference Manual Source

We have located a mail- or phone-order source for the Apple manuals! A reader in New York City phoned to let us know that the McGraw-Hill Bookstore there carries the Apple publications. Apparently the bookstore is also a computer store and an Apple Dealer. The address is McGraw-Hill Bookstore, 1221 Sixth Ave., New York, NY 10020. The phone number is (212) 512-4100.

18-Digit Arithmetic, Part 7Bob Sander-Cederlof

Last month we began the implementation of math functions, so it seems appropriate to continue in the same direction. This month we will reveal the LOG and EXP functions.

As always, I turned to "Computer Approximations" for some good algorithms. I mentioned this book last month, and several of you have tried to find copies.

Thanks to Trey Johnson, of Monolith Inc. in San Antonio, for the following information: John Wiley & Sons stopped publishing the book "Computer Approximations" in 1977. They sold the rights to Krieger Publishing Co., and it is now being published under the same title. Trey was quoted a price of $22.50 + shipping. Krieger's address is P. O. Box 9542, Melbourne, FL 32901; phone is (305) 724-9542.

"Computer Approximations" is the only book I have found which lists all the actual coefficients needed to produce good approximations for the whole variety of standard functions. Pages 189-339 are packed solid with nothing by numbers. For example, there are ten pages of numbers for the EXP function alone, providing over 100 different approximation formulas for the EXP function. The chapter covering EXP describes the math behind the approximations. You pick an algorithm according to the precision you need, the number base you are using (2, 10, or whatever), the tradeoff between speed and size, and the range of arguments you will be using. Each algorithm in the book has a number, and I indicate that number in the comments to the programs which follow.

Almost all of the approximations involve these steps:

     SIFT:  Check the argument for legal range and
            easy arguments.
     FOLD:  Reduce the range of the argument.
     POLY:  Use a polynomial or a ratio of polynomials
            to approximate the function in the reduced
            range.
   UNFOLD:  Expand the result by the reverse of the
            processes used to reduce the range.

When we first learned about logarithms in high school, we used tables in books. One set of tables converted normal numbers to logs, and the other converted logs back to normal numbers. The LOG function takes the place of the first set of tables, and the EXP function replaces the second. By the way, those high school logarithms were base 10 logs. The log of a number is the power to which you would have to raise 10 to equal the number. For example, the log base 10 of 1000 is 3; of the square root of 10 is .5.

Scientists prefer base "e" logs. "e" is an irrational number (as is pi) approximately equal to 2.71828182845904523536. Did the original scientists have 2.718281828... fingers? Maybe, if they had to chop firewood (logs?)! Anyway, EXP and LOG in Applesoft work with base e. LOG tells you to what power you would raise e to equal the argument, and EXP raises e to the power of the argument.

One great application of LOG and EXP is to raise any number to any power. Applesoft (as well as DP18) has an exponentiation operator "^" for this purpose, but the code inside does it by calling on EXP and LOG. Here are some mathematical symbols to indicate how it is done:

       let         z = x^y
      then     log z = log (x^y)
               log z = y log x
         exp (log z) = exp (y log x)
                 x^y = exp (y log x)

Here is the code for the exponentiation operator in DP18:

     *-------------------------------------
     *   EXPONENTIATION:  X ^ Y
     *      (DAC) = Y
     *      (ARG) = X
     *-------------------------------------
     DP.POWER
            JSR MOVE.DAC.TEMP3   SAVE DAC (POWER) IN TEMP3
            JSR SWAP.ARG.DAC
            JSR DP.LOG10         GET LOG X
            JSR MOVE.TEMP3.ARG   GET Y IN ARG
            JSR DMULT            Y LOG X
            JMP DP.EXP10         X ^ Y

Notice I used base 10 log and exp? That is because DP18 is basically decimal. In a binary floating point scheme such as is internal to Applesoft, base 2 log and exp would probably be used. After all, floating point notation is a kind of half-log half-normal notation.

Which leads to the topic of converting from one logarithmic base to another. If my internal subroutines work in base 10, how do I get LOG and EXP to base e? Some more math is due:

       suppose         e^x = 10^y
       then    log10 (e^x) = log10 (10^y)
                x log10(e) = y log10(10)
                x log10(e) = y

Log10(e) is a constant, approximately 0.43429448190325182765. So if I want to know what EXP(3) is, I can first get 3*log10(e) = 1.302..., and 10^1.302... = 20.0855...

EXP Function

Lines 1640-1660 of the program check for a zero argument, which is an easy case: e^0 = 1. Lines 1670-1700 multiply the argument by log10(e), so that EXP10 can be used.

Lines 1730-1740 again sift out the easy case of 10^0, in case DP.EXP10 was called directly.

Lines 1750-1790 begin the folding process. We can cut the range in half by folding all negative arguments on top to the positive range: EXP(-x) = 1/EXP(x).

Lines 1810,1820 further sift, by eliminating arguments larger than 99. If the exponent of the argument is $43 or more, then the argument is 100 or more. Arguments that large are too large. (Indeed, any argument above 63 is too large.) The Applesoft ROM routine for OVERFLOW ERROR will let you know you tried it.

The arguments we have left will be in the range 0 < x < 100. We can further subdivide the range by separating the integer and fractional parts of the argument. Remember that 10^(x+y) = (10^x)*(10^y)? For illustration, suppose the argument is 3.75. Then 10^3.75 = 10^3 * 10^.75 = 5623.4132.... Lines 1830-2100 perform the separation. The variable INTPWR will get the integer part, which may range from 0 to 99. The corresponding digits are zeroed in DAC, and the resulting fraction is re-normalized. If the fractional part is zero, then the log of the fractional part is 1; lines 2080-2100 sift out this special case. This section could be accomplished by using previously covered subroutines, such as DP.INT to get the integer part, and DSUB to get the fractional part. However, that would take considerably longer for only a slight savings in space.

The active part of the argument has now been reduced to the range 0<x<1. The next adjustment will cut that in half. If the argument x<.5, this adjustment will be skipped. Lines 2120-2160 perform the test, and line 2170 saves the result of the test on the stack. We need the result later when we are unfolding. If x >= .5, then lines 2190-2210 subtract .5 from it. If x = .5, then the result after subtraction will be zero. In this case, the correct answer is a known constant, the square root of 10. Lines 2230-2270 load up that value and skip over the POLY part on down to the UNFOLDing. If not exactly .5, we now have a folded argument in the range 0<x<.5, with a flag on the stack indicating whether or not we subtracted .5 to get there. Later, if we DID subtract .5, we will multiply the result of POLY by the square root of 10 to unfold the answer.

We could have arbitrarily subtracted .5, changing the range from 0<x<1 to -.5<x<.5, with the same result. This would have saved the trouble of determining which side of .5 we were on, and of later deciding whether or not to multiply by SQR(10). However, it would also take longer for those cases already under .5, so I decided against it.

The POLY part is lines 2280-2520. This is a ratio of two polynomials, both 8th degree. However, because of derivational and computational reasons, it is actually written and calculated in a different form:

                  Q(x^2) + xP(x^2)
       POLY(x) =  ----------------
                  Q(x^2) - xP(x^2)

Lines 2290-2320 save x and compute x^2. Lines 2330-2380 call on POLY.N (covered last month) to compute the P polynomial, and then multiply the result by x. The constants are given in lines 1440-1490. So that you see the form, I will give it here with the coefficients rounded off:

       xP = 31x^7 + 4562x^5 + 134331x^3 + 760254x

Lines 2400-2430 compute the Q-polynomial, by calling POLY.1 (also covered last month). POLY.1 is used when the coefficient of the highest degreed term is 1. We get, approximately,

       Q = x^8 + 477x^6 + 29732x^4 + 408437x^2 + 660349

Lines 2440-2520 form the numerator and denominator and divide, giving us a very nice approximation to the function for the folded argument.

Lines 2530-2590 begin the unfolding process, by multiplying by SQR(10) if we previously folded .5<x<1 down to 0<x<.5.

Lines 2600-2660 take care of the integral portion of the original argument, by adding it to the EXPONENT of the result so far. This is equivalent to multiplying by the integral power of ten, but much faster. Isn't base ten nice?

The final adjustment is to take the reciprocal if the original argument was negative, done in lines 2670-2730.

LOG Function

The LOG function is the inverse of the EXP function. Now if we could just run the 6502 backwards....

Log base e is related to log base 10 the same way the exp functions were:

       loge x = loge(10) * log10 (x)

Lines 2990-3040 call on the LOG10 subroutine and then multiply the result by the log base e of 10.

The LOG10 routine begins by sifting out the objectionable argument values, at lines 3100-3130. The argument MUST be positive, and MUST NOT be zero. Negative or zero arguments send you to Applesoft's ILLEGAL QUANTITY ERROR.

Lines 3140-3170 separate the exponent from the mantissa of the argument. The exponent represents the power of 10 multiplier, so as an integer it can just be added to the logarithm of the mantissa viewed as a fraction. The exponent is saved in INTPWR, to be processed later. Stuffing $40 in its place in DAC makes the range now .1<=x<1.

Lines 3180-3210 multiply the fraction by SQR(10), which changes the range to

       1
    -------  <= x < SQR(10)
    SQR(10)

This can be compensated for later by subtracting .5 from the logarithm of the folded argument.

Lines 3220 further thrash the argument by forming an intermediate argument z = (x-1)/(x+1). This value z will be in the range -.52 < z < +.52, which is a nice symmetrical value to run through a ratio of polynomials. I get lost in the math that motivates this step.

The POLY part is again a ratio of two polynomials. Lines 3330-3440 calculate the numerator, which is approximately

     -15z^11 + 301z^9 - 1726z^7 + 4060z^5 - 4192z^3 + 1576z

The denominator, formed in lines 3450-3500, is approximately

     z^12-68z^10+764z^8-3200z^6+6122z^4-5432z^2+1815

Dividing at line 3510 gives the logarithm of the value x. To unfold, we need to subtract .5, handled by lines 3860-3920. We also need to add as an integer the power of ten we saved in INTPWR. The latter is trickier, because we must convert a biased binary integer to a signed decimal floating point value.

Lines 3530-3600 un-bias INTPWR. If the exponent happens to be exactly $40, which in un-biased terms is 0, the rest of this step can be skipped (because the log of 10^0 is zero, adding nothing). If not, it is time to build a DP18 value in ARG. Line 3570 saves the sign in ARG.SIGN.

Lines 3610-3620 pre-clear ARG.HI, which is where we will be putting the one or two digits of INTPWR. Line 3630 assumes it will be a one-digit value, and lines 3640-3650 test that assumption. If it is one digit, lines 3730-3780 will shift the digit to the left nybble and store it in ARG.HI. If two digits, lines 3660 will divide by ten to get the high digit as quotient and low digit as remainder. Then lines 3730-3780 will merge the two digits into ARG.HI.

Lines 3790-3840 complete the construction of ARG by storing the exponent and clearing the remaining mantissa bytes. Line 3850 adds the value to the results of the POLY step, lines 3870-3920 subtract .5, and the answer is ready.

  1000 *SAVE S.DP18 FUNC LOG
  1010 *--------------------------------
  1020 AS.OVRFLW  .EQ $E8D5
  1030 AS.ILLERR  .EQ $E199
  1040 *--------------------------------
  1050 POLY.1     .EQ $FFFF
  1060 POLY.N     .EQ $FFFF
  1070 DADD       .EQ $FFFF
  1080 DSUB       .EQ $FFFF
  1090 DMULT      .EQ $FFFF
  1100 DDIV       .EQ $FFFF
  1110 DP.TRUE    .EQ $FFFF
  1120 MOVE.YA.ARG.1  .EQ $FFFF
  1130 MOVE.YA.DAC.1       .EQ $FFFF
  1140 SWAP.DAC.ARG   .EQ $FFFF
  1150 MOVE.TEMP1.ARG      .EQ $FFFF
  1160 MOVE.TEMP2.ARG      .EQ $FFFF
  1170 MOVE.TEMP3.ARG      .EQ $FFFF
  1180 MOVE.DAC.ARG        .EQ $FFFF
  1190 MOVE.TEMP3.DAC      .EQ $FFFF
  1200 MOVE.DAC.TEMP1      .EQ $FFFF
  1210 MOVE.DAC.TEMP2      .EQ $FFFF
  1220 MOVE.DAC.TEMP3      .EQ $FFFF
  1230 NORMALIZE.DAC       .EQ $FFFF
  1240 *--------------------------------
  1250 DAC.EXPONENT .BS 1
  1260 DAC.HI       .BS 10
  1270 DAC.SIGN     .BS 1
  1280 *--------------------------------
  1290 ARG.EXPONENT .BS 1
  1300 ARG.HI       .BS 10
  1310 ARG.SIGN     .BS 1
  1320 *--------------------------------
  1330 SIGN         .BS 1
  1340 INTPWR       .BS 1
  1350 *--------------------------------
  1360 CON.ONE    .HS 41.10000.00000.00000.00000
  1370 CON.1HALF  .HS 40.50000.00000.00000.00000
  1380 CON.SQR10  .HS 41.31622.77660.16837.93320
  1390 *--------------------------------
  1400 *      EXP (DAC)    E^DAC
  1410 *             OR    10^DAC
  1420 *      #1446 IN HART, ET AL
  1430 *--------------------------------
  1440 P.EXP    .EQ *
  1450 P.EXP.N  .EQ 3
  1460          .HS 42.31341.17940.19730.48777
  1470          .HS 44.45618.28316.94656.35848
  1480          .HS 46.13433.11347.35855.59034
  1490          .HS 46.76025.44794.41265.39434
  1500 Q.EXP    .EQ *
  1510 Q.EXP.N  .EQ 4
  1520          .HS 43.47705.44030.08207.98775
  1530          .HS 45.29732.60655.85996.83303
  1540          .HS 46.40843.69796.67736.28236
  1550          .HS 46.66034.86505.27141.54491
  1560 *--------------------------------
  1570 CON.LOGE .HS 40.43429.44819.03251.82765
  1580 *--------------------------------
  1590 DP.EXP.NULL
  1600        JMP DP.TRUE  E^0 = 10^0 = 1.0
  1610 DP.EXP.OVERFLOW
  1620        JMP AS.OVRFLW
  1630 *--------------------------------
  1640 DP.EXPE
  1650        LDA DAC.EXPONENT
  1660        BEQ DP.EXP.NULL
  1670        LDA #CON.LOGE
  1680        LDY /CON.LOGE
  1690        JSR MOVE.YA.ARG.1
  1700        JSR DMULT    CHANGE TO 10^X
  1710 *--------------------------------
  1720 DP.EXP10
  1730        LDX DAC.EXPONENT       10^0 = 1
  1740        BEQ DP.EXP.NULL
  1750 *---HANDLE NEGATIVE POWERS-------
  1760        LDA DAC.SIGN SAVE FOR 1/EXP IF NEGATIVE
  1770        STA SIGN
  1780        LDA #0       GET ABS(X)
  1790        STA DAC.SIGN
  1800 *---SPLIT INTEGER & FRACTION-----
  1810        CPX #$43     THREE OR MORE INTEGER DIGITS?
  1820        BCS DP.EXP.OVERFLOW   YES, OVERFLOW
  1830        LDA #0       ...ALL FRACTIONAL
  1840        STA INTPWR
  1850        CPX #$41
  1860        BCC .3       ...NO INTEGRAL PART
  1870        LDA DAC.HI   ...1 OR 2 DIGITS
  1880        LSR
  1890        LSR
  1900        LSR
  1910        LSR
  1920        STA INTPWR
  1930        LDA DAC.HI
  1940        AND #$0F
  1950        STA DAC.HI
  1960        CPX #$41     ONE OR TWO DIGITS?
  1970        BEQ .2       ...ONE DIGIT INTEGER
  1980        LDA INTPWR    DIGIT*10
  1990        ASL
  2000        ASL
  2010        ADC INTPWR
  2020        ASL
  2030        ADC DAC.HI
  2040        STA INTPWR
  2050        LDX #0
  2060        STX DAC.HI
  2070 .2     JSR NORMALIZE.DAC   ADJUST REMAINING FRACTION
  2080        BNE .3              FRACTION NOT 0
  2090        JSR DP.TRUE         10^0 = 1
  2100        JMP .7
  2110 *---ADJUST FRACTION SO < .5------
  2120 .3     LDA DAC.EXPONENT
  2130        CMP #$40
  2140        BCC .4
  2150        LDA DAC.HI
  2160        CMP #$50
  2170 .4     PHP          REMEMBER...
  2180        BCC .5       ...ALREADY < .5
  2190        SBC #$50
  2200        STA DAC.HI
  2210        JSR NORMALIZE.DAC
  2220        BNE .5       ...REST OF FRACTION NOT 0
  2230        PLA          POP SAVED STATUS
  2240        LDA #CON.SQR10
  2250        LDY /CON.SQR10
  2260        JSR MOVE.YA.DAC.1
  2270        JMP .7
  2280 *---COMPUTE 10^.XXXX-------------
  2290 .5     JSR MOVE.DAC.TEMP1    SAVE X
  2300        JSR MOVE.DAC.ARG
  2310        JSR DMULT             GET X^2
  2320        JSR MOVE.DAC.TEMP2    SAVE X^2
  2330        LDA #P.EXP            COMPUTE P(X^2)
  2340        LDY /P.EXP
  2350        LDX #P.EXP.N
  2360        JSR POLY.N
  2370        JSR MOVE.TEMP1.ARG    COMPUTE XP(X^2)
  2380        JSR DMULT
  2390        JSR MOVE.DAC.TEMP3    SAVE XP(X^2)
  2400        LDA #Q.EXP            COMPUTE Q(X^2)
  2410        LDY /Q.EXP
  2420        LDX #Q.EXP.N
  2430        JSR POLY.1
  2440        JSR MOVE.DAC.TEMP2    SAVE Q(X^2)
  2450        JSR MOVE.TEMP3.ARG    NUMERATOR = Q+XP
  2460        JSR DADD              Q(X^2)+XP(X^2)
  2470        JSR MOVE.DAC.TEMP1    SAVE UMERATOR
  2480        JSR MOVE.TEMP2.ARG    DENOMINATOR = Q-XP
  2490        JSR MOVE.TEMP3.DAC
  2500        JSR DSUB              Q(X^2)-XP(X^2)
  2510        JSR MOVE.TEMP1.ARG    10^.XXX = N/D
  2520        JSR DDIV
  2530 *---ADJUST BY SQR(10)------------
  2540        PLP          SEE IF ADJUSTMENT NEEDED
  2550        BCC .7       ...NO
  2560        LDA #CON.SQR10
  2570        LDY /CON.SQR10
  2580        JSR MOVE.YA.ARG.1
  2590        JSR DMULT
  2600 *---ADD INTEGRAL POWER-----------
  2610 .7     CLC
  2620        LDA DAC.EXPONENT
  2630        ADC INTPWR
  2640        BPL .8       ...NO OVERFLOW
  2650        JMP DP.EXP.OVERFLOW
  2660 .8     STA DAC.EXPONENT
  2670 *---ADJUST FOR SIGN--------------
  2680        LDA SIGN     GET ORIGINAL SIGN
  2690        BPL .9       POSITIVE, WE ARE DONE
  2700        LDA #CON.ONE NEGATIVE, FORM RECIPROCAL
  2710        LDY /CON.ONE
  2720        JSR MOVE.YA.ARG.1
  2730        JSR DDIV
  2740 .9     RTS
  2750 *--------------------------------
  2760 *      LN (DAC)  LOG E (DAC)
  2770 *           OR   LOG 10 (DAC)
  2780 *      #2330 IN HART, ET AL
  2790 *--------------------------------
  2800 P.LOG    .EQ *
  2810 P.LOG.N  .EQ 5
  2820          .HS C2.14933.41871.23101.49868
  2830          .HS 43.30132.34734.14748.46138
  2840          .HS C4.17255.36265.00653.03387
  2850          .HS 44.40598.33123.94476.21513
  2860          .HS C4.41923.45602.07081.07911
  2870          .HS 44.15764.33484.51127.69255
  2880 Q.LOG    .EQ *
  2890 Q.LOG.N  .EQ 6
  2900          .HS C2.67696.41190.46224.52758
  2910          .HS 43.76357.00230.09155.79877
  2920          .HS C4.32000.87986.36664.12225
  2930          .HS 44.61216.00041.77468.78069
  2940          .HS C4.54315.94950.92575.25735
  2950          .HS 44.18149.36120.76616.30282
  2960 *--------------------------------
  2970 CON.LN10 .HS 41.23025.85092.99404.56840
  2980 *--------------------------------
  2990 DP.LOGE
  3000        JSR DP.LOG10
  3010        LDA #CON.LN10     CONVERT LOG10 TO LN
  3020        LDY /CON.LN10
  3030        JSR MOVE.YA.ARG.1
  3040        JMP DMULT
  3050 *--------------------------------
  3060 DP.LOG.ERR
  3070        JMP AS.ILLERR
  3080 *--------------------------------
  3090 DP.LOG10
  3100        LDA DAC.SIGN      CHECK RANGE
  3110        BMI DP.LOG.ERR    ...NEGATIVE
  3120        LDA DAC.EXPONENT
  3130        BEQ DP.LOG.ERR    ...ZERO
  3140        STA INTPWR        SAVE POWER OF 10
  3150 *---ADJUST RANGE-----------------
  3160        LDA #$40          MAKE FRACTION .1 TO .9999
  3170        STA DAC.EXPONENT
  3180        LDA #CON.SQR10    1/SQR(10) ... SQR(10)
  3190        LDY /CON.SQR10
  3200        JSR MOVE.YA.ARG.1
  3210        JSR DMULT
  3220 *---FORM (X-1)/(X+1)-------------
  3230        JSR MOVE.DAC.TEMP1
  3240        JSR MOVE.DAC.ARG
  3250        JSR DP.TRUE       GET 1 IN DAC
  3260        JSR DSUB          X-1
  3270        JSR MOVE.DAC.TEMP2 SAVE IT
  3280        JSR DP.TRUE       GET 1 IN DAC
  3290        JSR MOVE.TEMP1.ARG
  3300        JSR DADD          X+1
  3310        JSR MOVE.TEMP2.ARG
  3320        JSR DDIV          X-1/X+1
  3330 *---NUMERATOR = Z*P(Z^2)---------
  3340        JSR MOVE.DAC.TEMP1 SAVE IT
  3350        JSR MOVE.DAC.ARG
  3360        JSR DMULT         Z^2
  3370        JSR MOVE.DAC.TEMP2 SAVE Z^2
  3380        LDA #P.LOG
  3390        LDY /P.LOG
  3400        LDX #P.LOG.N
  3410        JSR POLY.N
  3420        JSR MOVE.TEMP1.ARG
  3430        JSR DMULT         Z*P(Z^2)
  3440        JSR MOVE.DAC.TEMP1
  3450 *---DENOMINATOR = Q(Z^2)---------
  3460        LDA #Q.LOG
  3470        LDY /Q.LOG
  3480        LDX #Q.LOG.N
  3490        JSR POLY.1
  3500        JSR MOVE.TEMP1.ARG
  3510        JSR DDIV          Z*P(Z^2)/Q(Z^2)
  3520 *---ADD INTEGER POWER------------
  3530        SEC
  3540        LDA INTPWR        GET POWER OF 10
  3550        SBC #$40
  3560        BEQ .5            ...0, NO NEED TO ADD ANYTHING
  3570        STA ARG.SIGN
  3580        BCS .1            ...1 TO 63
  3590        EOR #$FF          MAKE IT POSITIVE
  3600        ADC #1
  3610 .1     LDY #0
  3620        STY ARG.HI
  3630        LDX #$41
  3640        CMP #10
  3650        BCC .3            1...9
  3660        INX               10...63
  3670 .2     STA ARG.HI        STORE REMAINDER
  3680        SBC #10
  3690        INY               INC. QUOTIENT
  3700        BCS .2            ...TRY ANOTHER SUBTRACTION
  3710        DEY               CORRECT QUOTIENT
  3720        TYA               GET QUOTIENT
  3730 .3     ASL               LEFT JUSTIFY
  3740        ASL
  3750        ASL
  3760        ASL
  3770        ORA ARG.HI        MERGE WITH NEXT DIGIT
  3780        STA ARG.HI
  3790        STX ARG.EXPONENT  $41 OR $42
  3800        LDX #9            CLEAR REST OF ARG
  3810        LDA #0
  3820 .4     STA ARG.HI,X
  3830        DEX
  3840        BNE .4
  3850        JSR DADD
  3860 *---SUBTRACT 0.5-----------------
  3870 .5     LDA #CON.1HALF
  3880        LDY /CON.1HALF
  3890        JSR MOVE.YA.ARG.1
  3900        LDA #$FF
  3910        STA ARG.SIGN
  3920        JMP DADD
  3930 *--------------------------------

S-C Macro Assembler Version 2.0Bill Morgan

We are now accepting orders for the upgrade to S-C Macro Assembler Version 2.0. Here is a summary of the new features:

The big news, of course, is the ability to assemble 65C02, 65802, and 65816 opcodes. The new .OP directive switches between the 6502, Sweet-16, 65C02, and 65816 opcode sets.
All screen output now passes through one driver routine, which will be much easier to modify for other displays. Drivers are included for 40-column, //e and //c 80-column, and STB-80.
Typing a Control-C at the command prompt (:) emits CATALOG, leaving the cursor at the end of the line, to add slot and drive specifiers if needed.
There is a sort of Auto-SAVE function. Once you have created a comment line near the beginning of your source file containing the phrase "SAVE filename", typing ESC-S will emit that phrase and position the cursor at the end, so you can add a suffix or just press RETURN.
The COPY command asks "DELETE ORIGINAL?" If you type "Y", the effect will be that of a MOVE command.
The tape LOAD and SAVE commands have been removed, to make room for new features.
All operand expressions are calculated to 32 bits and .DA data values may be larger, to allow for the 65816's extended addressing capabilities.
You can force Zero Page or Absolute addressing modes by prefixing the operand with < or >.
Operand expressions may include bitwise logical operations. &, ! (or |), and ^ are AND, OR, and EOR.
Control-S functions as a case lock key, toggling upper/lower case entry.
The .BS directive allows you to specify the value of the fill byte generated. This directive now creates fill bytes in assemblies into memory, rather than to disk only.
The assembler tests for the "/" command character, to simplify use of the Laumer Research Full Screen Editor.
All object code bytes are vectored through a standard location, so you can intercept the assembler's output for special purposes.

Registered owners of S-C Macro Assembler will be able to purchase the upgrade to Version 2.0 for only $20.00. Just send us a check or charge card number, and you will be among the first to have the new version.

Convert Two Decimal Digits to BinaryBob Sander-Cederlof

I have recently been running into more and more uses for the decimal mode in the 6502. In the decimal mode, each byte contains a value from 0 to 99, with the ten's digit in the left nybble and the units digit in the right nybble.

The 6502 has built-in capability to add and subtract values in this format, with automatic carry when a nybble exceeds 9. If you have been following my series on 18-digit arithmetic, you have seen a lot of examples of its use.

A frequent problem that arises is conversion between the decimal form and the binary form of a number. I suppose I have written ten million different programs to do this kind of conversion, on at least a thousand different kinds of computers! (Ever notice that my exaggerations are always in decimal?)

For a small (byte-size) example, suppose a byte contains two decimal digits ($00-$99) and you want to convert it to binary ($00-$63). The first step is to separate the two digits into two different variables. The the ten's digit should be multiplied by ten, and the unit's digit added.

Lines 1390-1510 in the listing perform these steps, but there are a few tricks. Lines 1410-1420 strip out the unit's digit and save it in LOW, and lines 1440-1450 save the high digit in HIGH. Notice that I did not shift the high digit down, so it is really the ten's digit times 16 (call it "tens*16").

Lines 1460-1500 multiply the tens*16 by 10/16. Then line 1500 adds the unit's digit.

The program in lines 1010-1190 is a test driver, which calls the DEC.HEX.2 routine 100 times with successive values in the A-register between $00 and $99. DEC.HEX.2 returns with the converted value ($00-$63 in the A-register, and the test driver prints out the value. If everything is okay, the hexadecimal numbers from $00 through $63 will be displayed.

DEC.HEX.2 as written takes 18 bytes plus two variables in page zero. If the variables are not in page zero, the program will take an additional four bytes.

A faster program which takes only a few more bytes, and does not use any variables in RAM other than the stack, is shown in lines 1200-1340. Lines 1220-1260 convert the ten's digit into an index 0-9 in the X-register. Line 1270 retrieves the original number from the stack. Lines 1290-1300 add a value from the table, indexed by the ten's digit, giving a total which is the converted number.

The values in the table consist of one byte each, having selected so that they subtract out the hexadecimal value of the ten's digit and add back the value of that digit-times-ten in binary. For example, if the original number was $58 (meaning decimal 58 in BCD storage format), we will add the value $E2 (which is 50-$50). $58+$E2 = $3A, which is the correct hexadecimal conversion.

I recently worked on a consulting project which included a lot of mixed decimal and hexadecimal calculations. The project was implemented on a 6511 chip, which has only 192 bytes of RAM. That is total, including the stack! We also had 4096 bytes of EPROM. The system operates in a real-time mode with relatively high-speed interrupts occurring. With these constraints, every routine had to be written to use the minimum amount of RAM and to be as fast as possible. A few extra bytes of code would be all right, because 4096 bytes of EPROM was more than enough. In situations like this, programs like the one in lines 1200-1300 come in real handy.

  1000 *SAVE S.QUICK DEC-HEX
  1010 *--------------------------------
  1020 T      LDA #0
  1030        STA 0
  1040 .1     LDA 0
  1050        JSR DEC.HEX.2
  1060        JSR $FDDA
  1070        LDA #" "
  1080        JSR $FDED
  1090        JSR $FDED
  1100        SED
  1110        CLC
  1120        LDA 0
  1130        ADC #1
  1140        STA 0
  1150        CLD
  1160        CMP #0
  1170        BNE .1
  1180        RTS
  1190 *--------------------------------
  1200 DEC.HEX
  1210        PHA          SAVE BYTE
  1220        LSR
  1230        LSR
  1240        LSR
  1250        LSR
  1260        TAX          HI NYBBLE TO X
  1270        PLA          GET ORIG BYTE
  1280        CLC
  1290        ADC TBL,X
  1300        RTS
  1310 *--------------------------------
  1320 TBL    .DA #0-0,#10-$10,#20-$20,#30-$30
  1330        .DA #40-$40,#50-$50,#60-$60
  1340        .DA #70-$70,#80-$80,#90-$90
  1350 *--------------------------------
  1360 LOW    .EQ 1
  1370 HIGH   .EQ 2
  1380 *--------------------------------
  1390 DEC.HEX.2
  1400        PHA
  1410        AND #$0F     SAVE LOW NYBBLE
  1420        STA LOW
  1430        PLA
  1440        AND #$F0     GET HIGH NYBBLE
  1450        STA HIGH
  1460        LSR          /2
  1470        LSR          /4
  1480        ADC HIGH     /4*5
  1490        LSR          /8*5 = *10/16
  1500        ADC LOW      + LOW NYBBLE
  1510        RTS
  1520 *--------------------------------

A Whole Megabyte for your Apple //eBob Sander-Cederlof

Both Applied Engineering and Saturn have announced 1 Mbyte cards for the //e. Saturn's, I understand, plugs into any slot 1-7; this of course makes it a little non-standard compared to other //e memory expanders when it comes to software access.

The new board from Applied Engineering, called RAM WORKS, fits in the //e auxiliary slot. You get 80 column text and double hi-res, with anywhere from 64K to 1 Megabyte of expansion RAM in 64K or 256K increments. You can buy RAM WORKS already expanded, or expand it yourself later. Prices: 64K = $179, 128K = $249, 256K = $449, 512K = $799, and 1Meg = $1499. The first 512K fits one a normal size card, about 6 inches long. The second 512K come in a piggy-back card which attaches to the main card. Another option, an RGB video generator ($129), attaches to the front of the memory card.

The megabyte is divided into 16 chapters of 64K each. You select which one is active by storing a value from $00 to $0F in a register at $C073. Then the normal //e maze of soft switches lets you access the active chapter the same way you would access Apple's standard 64K card.

RAM WORKS has some new design ideas, for which patents are pending, including a power saving circuit and a video refresh circuit. The latter eliminates the annoying screen flicker that normally occurs when you switch chapters with older expansion cards.

Low cost software options available with RAM WORKS include disk emulation for DOS and ProDOS, and workspace expansion for Appleworks. Standard ProDOS will turn Apple's RAM card into a half-size RAMdisk, but with RAM WORKS you get a full megabyte!

If you like the idea of souping up your //e, one of these boards plus a new 65802 processor may be just the ticket!

65816 NewsBill Morgan

Did you see the Infoworld article a few weeks ago (November 5 issue) about the 65816? That story mentioned a plug-in board for the Apple II containing a 65816 processor and extra RAM. Well, I spoke today with Larry Hittel of Com Log, producers of that board, and it does sound very interesting.

Com Log intended their board, the Apple16, to be a developers' tool, rather than a consumer item, or an Apple hot-rod device. They were therefore a little surprised and overwhelmed by the response to the Infoworld story: When I talked to Larry they had exactly one board in stock, and it was waiting for purchase order paperwork from Apple Computer. They are a month or two away from full production quantities.

The Apple16 board uses DMA (Direct Memory Access) to take control of the Apple, shutting down the 6502 and taking over the address bus. They have found that the DMA does not function properly in Apples earlier than Revision 4, due to problems with the bus driver chips on the motherboard.

The 65816 chips are designed to operate at 8 MHz and are currently testing out at 2-4 MHz, but, in order to maintain compatibility with the Apple, the Com Log processor is clocked at 1 MHz.

To the '816, the 64K of Apple memory, both RAM and ROM, is bank 0. Bank 1 echoes the Apple from 0-DFFF, but contains space for new EPROM at E000-FFFF. Banks 2 and 3 are reserved for more new EPROM. Banks 4-7 are the on-board RAM, consisting of one set of either 64K or 256K chips. Banks 8-255 are available on an expansion connector, intended for a future separate memory board. There is abort logic to provide an interrupt on access to non-existent memory.

Com Log is selling the boards now with no EPROMs. They are working on an operating system and an Applesoft interpreter, but those are still some time away. No price has been set for the firmware yet.

The current price of the Apple16 board is $395 with no RAM, $450 with 64K, and $795 with 256K. They are not expecting to have them available in production quantities until January or later, by which time the prices might change. Contact Com Log Corporation at 11056 N. 23rd Dr., Suite 104, Phoenix, AZ 85029. Phone (602) 248-0769.

That Infoworld story quoted an Apple spokesman as saying that the 65816 was to be used in an earlier project that had been shelved. That project is being dusted off and revived, now that the 65816 chips are really coming through. We've been hearing of it as the Apple //x. According to an article in the November 19 issue of Infoworld about an interview with Woz, the //x is still not a fixed design and will not be ready for market until 1986. There's always something new to look forward to!

New DP18 Square Root SubroutineBob Sander-Cederlof

Even after bending over backwards to be certain I had the best possible SQR implementation in the October AAL, I still found some ways to improve it. Last night I found some more information in a book called "Software Manual for the Elementary Functions", by William Cody and William Waite, Prentice-Hall, 1980.

They pointed out that in general an extra Newton iteration took less time than a complex method of getting an initial approximation which would be accurate enough to avoid one iteration. In other words, using a cubic polynomial like I did in October is just not worth it. Not worth the time, and not worth the space.

They further pointed out that it is best to compute the last Newton iteration in a slightly different fashion, to avoid shifting out the last significant digit. The normal iteration computes (x/y + y)*.5. Re-arrangement to y+(x/y-y)*.5 is better. Since it takes an extra step, it should only be used the last time.

To see the difference, consider the example below. I have used a precision of just 3 digits (instead of 18 or 20)to simplify the illustration:

     let x=.253, and y=.5
     then x/y=.506

     x/y+y=1.00 (truncating to 3 places)
     (x/y+y)*.5 = .500, which is wrong

     x/y-y=.006
     (x/y-y)*.5=.003
     y+(x/y-y)*.5 = .503, which is correct.

My new SQR version uses a much faster method for getting the first approximation. The first two digits of the argument (in DAC.HI) must be in the range from 10 to 99. I convert them to an index between $02 and $13 by shifting the first digit over three, and adding one if the second digit is 5 or more. In other words, 10-14 become $02, $15-19 become $03, on up to $95-99 becoming $13. Then I use that value as an index into a table which gives a good approximation to the first two digits of the square root. For example, any number between .10 and .19999...9 will get a first approximation of .35. I store those two digits into DAC.HI, letting the remaining digits stay as they were. This method gives a first approximation which in the worst case still has at least the first digit correct.

It turns out the worst case is for numbers with odd exponents and the mantissa=1, such as 1 (which is .1*10^1), 100 (which is .1*10^3), and so on. Even in this worst case, four iterations give 20 digits of precision.

The end result of these changes is a faster and shorter program which is more accurate. Here is the new listing:

  1000 *SAVE S.NEW SQR ROUTINE
  1010 *--------------------------------
  1020 *      SQR (DAC)
  1030 *--------------------------------
  1040 ERR.SQ JMP AS.ILLERR  ILLEGAL QUANTITY
  1050 DP.SQR.0 RTS
  1060 DP.SQR LDA DAC.EXPONENT
  1070        BEQ DP.SQR.0  SQR(0)=0
  1080        LDA DAC.SIGN
  1090        BMI ERR.SQ   MUST BE POSITIVE
  1100        JSR MOVE.DAC.TEMP3 SAVE X
  1110 *---APPROX. ROOT OF .1 - 1-------
  1120        LDA DAC.HI   CONVERT TWO DIGITS TO BINARY
  1130        AND #$0F     SAVE LO DIGIT
  1140        CMP #5       01234 OR 56789
  1150        PHP          SAVE ANSWER
  1160        LDA DAC.HI   GET HI DIGIT
  1170        LSR
  1180        LSR
  1190        LSR
  1200        LSR          $01...$09
  1210        PLP          01234 OR 56789
  1220        ROL          $02...$13
  1230        TAX
  1240        LDA SQR.TBL,X
  1250        STA DAC.HI
  1260 *---TAKE HALF OF EXPONENT--------
  1270        LDA DAC.EXPONENT
  1280        SEC
  1290        SBC #$40     REMOVE OFFSET
  1300        ROR          DIVIDE BY TWO (KEEP SIGN)
  1310        PHP          SAVE ODD/EVEN BIT
  1320        CLC
  1330        ADC #$C0     RE-BIAS EXPONENT
  1340        STA DAC.EXPONENT
  1350        PLP
  1360        BCC .1       EVEN, DON'T MULT BY SQR(10)
  1370 *---ADJUST APPROX FOR ODD EXP----
  1380        LDA #CON.SQR10
  1390        LDY /CON.SQR10
  1400        JSR MOVE.YA.ARG.1
  1410        JSR DMULT
  1420 *---THREE NEWTON ITERATIONS------
  1430 .1     LDA #3
  1440        STA TEMP3
  1450 .2     JSR MOVE.DAC.TEMP2     TEMP2 = Y
  1460        JSR MOVE.TEMP3.ARG     GET X
  1470        JSR DDIV               X/Y
  1480        JSR MOVE.TEMP2.ARG
  1490        JSR DADD               X/Y+Y
  1500        LDA #CON.HALF
  1510        LDY /CON.HALF
  1520        JSR MOVE.YA.ARG.1
  1530        JSR DMULT              (X/Y+Y)/2
  1540        DEC TEMP3              ANY MORE?
  1550        BNE .2                 ...YES
  1560 *---ONE MORE NEWTON ITERATION----
  1570        JSR MOVE.DAC.TEMP2     TEMP2 = Y
  1580        JSR MOVE.TEMP3.ARG     GET X
  1590        JSR DDIV               X/Y
  1600        JSR MOVE.TEMP2.ARG
  1610        LDA #$FF
  1620        STA ARG.SIGN
  1630        JSR DADD               X/Y-Y
  1640        LDA #CON.HALF
  1650        LDY /CON.HALF
  1660        JSR MOVE.YA.ARG.1
  1670        JSR DMULT              (X/Y-Y)/2
  1680        JSR MOVE.TEMP2.ARG
  1690        JMP DADD               Y + (X/Y-Y)/2
  1700 *--------------------------------
  1710 SQR.TBL    .EQ *-2  (NO ENTRIES AT 0...1)
  1720            .HS 35.42.47.52.57.61.65.69.72
  1730            .HS 76.79.82.85.88.91.94.96.99
  1740 CON.SQR10  .HS 4131622776601683793320
  1750 CON.HALF   .HS 4050000000000000000000
  1760 *--------------------------------

Improvements to 80-column Monitor DumpJan Eugenides

I found a little bug in the 80-column ASCII monitor dump, as presented in Sept 1983 AAL (page 27,28). It worked great in the 80-column mode, but if I happened to be in 40-column mode when I used the monitor dump command something strange happens.

Some time ago I incorporated the dump and Steve Knouse's monitor patches into an EPROM and installed it in my system. Everything seemed to be working fine, until one day.... I was working on a short Applesoft program, and I went into the monitor in 40-column mode to check a few program bytes. When I returned to Applesoft and listed the program, the first line had been changed. Huh?

I eventually figured out that the problem had to do with the tab to column 60. In 40-column mode this will be 20 characters beyond the bottom of the screen, which is $80C.

The solution was to just print a few spaces rather than attempting to tab. This approach makes for more compatibility among various 80-column devices, too.

While I was at it, I even squeezed a byte out of the code.

[And I squeezed some more, saving a total of 11 bytes. Bob S-C]

Here is the modified routine:

  1000 *SAVE S.NEW 80 COL MONITOR DUMP
  1010 *--------------------------------
  1020 *   TO INSTALL,
  1030 *      1.  ASSEMBLE THIS PROGRAM
  1040 *      2.  ENTER THESE MONITOR COMMANDS
  1050 *    $C083 C083 FCC9<CC9.CEFM
  1060 *    $FDBE:C9 FC N FDA6:F N FDB0:F
  1070 *--------------------------------
  1080 *   BY JAN EUGENIDES & BOB S-C
  1090 *--------------------------------
  1100 CH     .EQ $24
  1110 A1     .EQ $3C,3D
  1120 A2     .EQ $3E,3F
  1130 A4     .EQ $42,43
  1140 BUFFER .EQ $2F0
  1150 PRBYTE .EQ $FDDA
  1160 COUT   .EQ $FDED
  1170 PRBLNK .EQ $F948
  1180 *--------------------------------
  1190        .OR $FCC9
  1200        .TA $CC9
  1210 *--------------------------------
  1220 PATCH  PHA          SAVE BYTE
  1230        LDA A1       COMPUTE INDEX
  1240        AND #$0F     0...F
  1250        TAX
  1260        PLA          GET BYTE AGAIN
  1270        STA BUFFER,X SAVE IN BUFFER
  1280        JSR PRBYTE   PRINT ON SCREEN
  1290        INX          GET # BYTES THIS LINE
  1300        STX A4       SAVE IN A4L
  1310        CPX #$10     END OF LINE?
  1320        BEQ .1       ...YES, PRINT ASCII CHARS
  1330        LDA A1       ...NO, SEE IF END OF RANGE
  1340        CMP A2
  1350        LDA A1+1
  1360        SBC A2+1
  1370        BCC .4       ...NO, RETURN
  1380 .1     JSR PRBLNK   PRINT 3 SPACES
  1390        LDX #0       PRINT ASCII CHARS FROM BUFFER
  1400 .2     LDA BUFFER,X GET CHAR
  1410        ORA #$80     MAKE NORMAL VIDEO
  1420        CMP #$A0     TRAP CONTROL CHARS
  1430        BCS .3       ...NOT CONTROL CHAR
  1440        LDA #$AE     ...CTRL, SUBSTITUTE "."
  1450 .3     JSR COUT     PRINT CHAR
  1460        INX          NEXT
  1470        CPX A4       END OF LIST?
  1480        BCC .2       ...NOT YET
  1490 .4     RTS          RETURN

Note the directions for installing the routine in a RAM card copy of the monitor, in lines 1020-1060. "$C083 C083 FCC9<CC9.CFFM" write enables the RAM area and copies the dump code over the top of cassette I/O stuff. "$FDBE:C9 FC N FDA6:F N FDB0:F" patches the monitor dump command code to call the new patch, and also patches to print 16 bytes per screen line.

If you want to use this routine in 40-column mode only, change line 1240 from "AND #$0F" to "AND #$07", line 1310 from "CPX #$10" to "CPX #$08", and leave out the patches at FDA6 and FDB0 in the previous paragraph.

Generating Cross Reference Text File with DISASMBob Kovacs

I received a phone call from Don Lancaster the other day. He had been using DISASM to probe the mysteries of AppleWriter, and was now preparing to document his findings. Although he liked the way DISASM generated a triple cross reference table, he preferred to have it in a form that could be used by his word processor (that is, on a text file). The cross reference table generated by DISASM is normally output to either the screen or a printer, so Don's only alternative was to manually type it into his word processor. There were hundreds of labels....

It turned out that a simple patch to DISASM will do the trick. All that is necessary is to change the JSR PASS2 which normally generates the source code listing to JSR XREF.

The following patch outputs the cross reference table to your file after responding "Y" to the prompt "GENERATE TEXT FILE?":

       $09A1:20 F1 0A

Back in the April issue of AAL, I described a method of using EXEC files with DISASM. A patch was required to the "YES/NO" routine to input the response via KEYIN rather than directly from the keyboard. Although the patch I gave in April works, KEYIN uses the Y-register as an index to the screen. My patch did not always wind up in the right place. So I have expanded the patch as follows:

       $0C57:EA A4 24 20 18 FD 09 80

I hope that this has not caused any inconvenience.

Macro Information by ExampleSandy Greenfarb

The following are three examples of macro use which I have found interesting and informative.

The first example, TEST, shows that you can use parameters in places other than the operand field. In this case, one of the parameters becomes part of an opcode name.

SETD shows how a macro can make more efficient code. If both bytes are the same, there is no need to have two LDA instructions.

MOVD copies two bytes from one variable to another. If you use MOVD to move two bytes one byte higher in RAM, MOVD will reverse the order the bytes are moved so that the data are not clobbered.

  1000 *SAVE S.MACRO EXAMPLES
  1010 *--------------------------------
  1020 *    BY SANDY GREENFARB
  1030 *--------------------------------
  1040 *
  1050 *    PARAMETERS CAN SUBSTITUTE ANYWHERE,
  1060 *      EVEN IN OPCODES
  1070 *--------------------------------
  1080        .MA TEST     VALUE,CONDITION,LABEL
  1090        CMP ]1
  1100        B]2 ]3
  1110        .EM
  1120 *
  1130        >TEST #3,CC,SMALLER
  1140        >TEST TYPE,EQ,SAME
  1150 *
  1160 TYPE   .DA #35
  1170 SAME   NOP
  1180 SMALLER NOP
  1190 *--------------------------------
  1200 *
  1210 *   MACROS CAN SIMPLIFY CODE FOR EFFICIENCY
  1220 *--------------------------------
  1230        .MA SETD     VALUE,VARIABLE
  1240        LDA #]1      LO-BYTE
  1250        STA ]2
  1260     .DO ]1/256*257-]1  ARE LOW AND HI EQUAL?
  1270        LDA /]1
  1280     .ELSE
  1290 *                   HI = LO-BYTE
  1300     .FIN
  1310        STA ]2+1
  1320        .EM
  1330 *
  1340        >SETD $1234,VALUE
  1350        >SETD $2323,VALUE
  1360 *
  1370 VALUE  .BS 2
  1380 *--------------------------------
  1400 *
  1410 *   MACROS CAN PREVENT PROGRAMMING MISTAKES
  1420 *      SUCH AS OVER-WRITING WHEN YOU COPY
  1430 *      ONE VARIABLE INTO ANOTHER.
  1440 *--------------------------------
  1450        .MA MOVD     VAR1,VAR2
  1460     .DO ]2-]1-1
  1470        LDA ]1       NO OVERLAP
  1480        STA ]2
  1490        LDA ]1+1
  1500        STA ]2+1
  1510     .ELSE
  1520        LDA ]1+1     THIS CODE BUILT WHEN THE
  1530        STA ]2+1     VARIABLES OVERLAP
  1540        LDA ]1
  1550        STA ]2
  1560     .FIN
  1570        .EM
  1580 *
  1590        >MOVD $11,$22
  1600        >MOVD $28,VALUE
  1610        >MOVD $11,$12
  1620 *--------------------------------

Turning Bit-Masks into IndicesBob Sander-Cederlof

A few months ago I presented several ways to turn an index (0-7) into a bit mask (01, 02, 04,...,80). We got a lot of feedback, including some faster and better programs. Bruce Love suggested the possibility of the reverse transformation.

According to Bruce, who is a high school teacher in New Zealand, the method which uses the fewest bytes is the one I show in lines 1390-1450. In order to be fair in comparing different algorithms, I am going to count the RTS opcodes both for bytes and for cycles. With this in mind, Bruce's routine takes 8 bytes and from 16 to 65 cycles. This is certainly the smallest way, and it really is pretty fast.

Bruce mentioned that he had written several other programs to solve the same problem: one used the X-register, took 26 bytes with an average of 33.5 cycles; another without using X or Y took 28 bytes and an average of 39 cycles. Unfortunately, he did not include a copy of either of these.

I worked out four more methods, shown in the listing after Bruce's. I wrote a test driver which is in lines 1000-1310. The test driver calls each routine, printing the results of each, for all possible values of the bit-mask.

The following table summarizes the data for the five algorithms:

                               # of cycles
                       bytes  min  max  ave
     SMALLEST.WAY         8    16   65  40.5
     WAY.WITH.X          26    25   42  33.5
     WAY.WITHOUT.X       23    14   30  22
     ANOTHER.WAY.W...    32    14   24  18.375
     STRAIGHT.TEST...    33    14   27  18.5

If the SMALLEST.WAY is not fast enough, I would probably go with the one named WAY.WITHOUT.X. It is almost as fast as the fastest, and is the shortest of the longer routines. Of course, some of you may come up with better and faster ones....

  1000 *SAVE S.MASK --> INDEX
  1010 *--------------------------------
  1020 TEST   LDY #$01
  1030 .1     TYA
  1040        JSR $FDDA
  1050        TYA
  1060        JSR SMALLEST.WAY
  1070        JSR HEX
  1080        TYA
  1090        JSR WAY.WITH.X
  1100        JSR HEX
  1110        TYA
  1120        JSR WAY.WITHOUT.X
  1130        JSR HEX
  1140        TYA
  1150        JSR ANOTHER.WAY.WITHOUT.X
  1160        JSR HEX
  1170        TYA
  1180        JSR STRAIGHT.TESTING.WAY
  1190        JSR HEX
  1200        JSR $FD8E
  1210        TYA
  1220        ASL
  1230        TAY
  1240        BCC .1
  1250        RTS
  1260 *--------------------------------
  1270 HEX    PHA 
  1280        LDA #"-"
  1290        JSR $FDED
  1300        PLA
  1310        JMP $FDDA
  1320 *--------------------------------
  1330 *   WAY WITH FEWEST BYTES
  1340 *      8 BYTES
  1350 *      MIN:  16 CYCLES
  1360 *      MAX:  65 CYCLES
  1370 *      AVE:  40.5 CYCLES
  1380 *--------------------------------
  1390 SMALLEST.WAY
  1400        LDX #8
  1410 .1     DEX
  1420        ASL
  1430        BCC .1
  1440        TXA
  1450        RTS
  1460 *--------------------------------
  1470 *   FASTER WAY USING X-REGISTER
  1480 *      26 BYTES
  1490 *      MIN: 25 CYCLES
  1500 *      MAX: 42 CYCLES
  1510 *      AVE: 33.5 CYCLES
  1520 *--------------------------------
  1530 WAY.WITH.X
  1540        LDX #0       KEEP INDEX IN X
  1550        CMP #$10     80-40-20-10 / 08-04-02-01
  1560        BCC .1       ...8,4,2,1
  1570        LSR          ...80,40,20,10
  1580        LSR          SHIFT OVER TO 8,4,2,1
  1590        LSR
  1600        LSR
  1610        LDX #4       AND BUMP INDEX BY 4
  1620 .1     CMP #$04     08-04 / 02-01
  1630        BCC .2       ...2,1
  1640        LSR          ...8,4
  1650        LSR          SHIFT OVER TO 2,1
  1660        INX          AND BUMP INDEX BY 2
  1670        INX
  1680 .2     LSR          02 / 01
  1690        BEQ .3       ...01
  1700        INX          ...02, BUMP INDEX
  1710 .3     TXA          GET RESULT
  1720        RTS
  1730 *--------------------------------
  1740 *   WAY WITHOUT USING X-REGISTER
  1750 *      23 BYTES
  1760 *      MIN: 14 CYCLES
  1770 *      MAX: 30 CYCLES
  1780 *      AVE: 22 CYCLES
  1790 *--------------------------------
  1800 WAY.WITHOUT.X
  1810        LSR          40-20-10-08-04-02-01-00
  1820        CMP #$04
  1830        BCC .2       ...2,1,0
  1840        BEQ .3       ...4, SHOULD BE 3
  1850        LSR          20-10-08-04
  1860        LSR          10-08-04-02
  1870        LSR          08-04-02-01
  1880        LSR          04-02-01-00
  1890        CMP #4
  1900        BCC .1       2,1,0 INTO 6,5,4
  1910        LDA #2       4 INTO 7
  1920 .1     ADC #4
  1930 .2     RTS
  1940 .3     SBC #1       4 INTO 3
  1950        RTS
  1960 *--------------------------------
  1970 *   ANOTHER WAY WITHOUT X-REGISTER
  1980 *      32 BYTES
  1990 *      MIN: 14 CYCLES
  2000 *      MAX: 24 CYCLES
  2010 *      AVE: 18.375 CYCLES
  2020 *--------------------------------
  2030 ANOTHER.WAY.WITHOUT.X
  2040        CMP #$08     80-40-20-10-08-04-02-01
  2050        BCC .5       ...4,2,1
  2060        BEQ .4       ...8, SHOULD BE 3
  2070        CMP #$40
  2080        BCC .2       ...20,10
  2090        BEQ .1       ...40
  2100        LDA #7
  2110        RTS
  2120 .1     LDA #6
  2130        RTS
  2140 .2     CMP #$20
  2150        BEQ .3
  2160        LDA #4
  2170        RTS
  2180 .3     LDA #5
  2190        RTS
  2200 .4     SBC #2
  2210 .5     LSR
  2220        RTS
  2230 *--------------------------------
  2240 *   STRAIGHTFORWARD TESTING APPROACH
  2250 *      33 BYTES
  2260 *      MIN:  14 CYCLES
  2270 *      MAX:  27 CYCLES
  2280 *      AVE:  18.5 CYCLES
  2290 *--------------------------------
  2300 STRAIGHT.TESTING.WAY
  2310        CMP #$08
  2320        BCC .5
  2330        BEQ .4
  2340        CMP #$20
  2350        BCC .3
  2360        BEQ .2
  2370        CMP #$80
  2380        BCC .1
  2390        LDA #7
  2400        RTS
  2410 .1     LDA #6
  2420        RTS
  2430 .2     LDA #5
  2440        RTS
  2450 .3     LDA #4
  2460        RTS
  2470 .4     LDA #3
  2480        RTS
  2490 .5     LSR          CONVERT 4,2,1 TO 2,1,0
  2500        RTS
  2510 *--------------------------------

Apple Assembly Line is published monthly by S-C SOFTWARE CORPORATION, P.O. Box 280300, Dallas, Texas 75228. Phone (214) 324-2050. Subscription rate is $18 per year in the USA, sent Bulk Mail; add $3 for First Class postage in USA, Canada, and Mexico; add $12 postage for other countries. Back issues are available for $1.80 each (other countries add $1 per back issue for postage).

All material herein is copyrighted by S-C SOFTWARE, all rights reserved. Unless otherwise indicated, all material herein is authored by Bob Sander-Cederlof. (Apple is a registered trademark of Apple Computer, Inc.)