/[dtapublic]/pubs/papers/20070526_cpu08_divu16_by_u16/cpu08div16by16a.tex
ViewVC logotype

Contents of /pubs/papers/20070526_cpu08_divu16_by_u16/cpu08div16by16a.tex

Parent Directory Parent Directory | Revision Log Revision Log


Revision 10 - (show annotations) (download) (as text)
Fri Oct 7 03:23:35 2016 UTC (7 years, 11 months ago) by dashley
File MIME type: application/x-tex
File size: 22866 byte(s)
Commit of u16 = u16 / u16 analysis for the CPU08.
1 %$Header: /home/dashley/cvsrep/e3ft_gpl01/e3ft_gpl01/dtaipubs/cron/2007/cpu08div16by16a/cpu08div16by16a.tex,v 1.12 2007/05/26 18:25:46 dashley Exp $
2 %
3 \documentclass[letterpaper,10pt,titlepage]{article}
4 %
5 %\pagestyle{headings}
6 %
7 \usepackage{amsmath}
8 \usepackage{amsfonts}
9 \usepackage{amssymb}
10 \usepackage[ansinew]{inputenc}
11 \usepackage[OT1]{fontenc}
12 \usepackage{graphicx}
13 %\usepackage{makeidx}
14 %
15 %Contributed by Robin Fairbairns of the comp.text.tex newsgroup. I have no
16 %idea how it works ... but it gives a working \bdiv.
17 %
18 \makeatletter
19 \def\bdiv{%
20 \nonscript\mskip-\medmuskip\mkern5mu%
21 \mathbin{\operator@font div}\penalty900\mkern5mu%
22 \nonscript\mskip-\medmuskip}
23 \makeatother
24 %
25 %
26 \begin{document}
27 \title{Notes on Economical 16/16 = 16 Unsigned Integer Division on the Freescale CPU08}
28 \author{\vspace{1cm}\\David T. Ashley\\\texttt{dta@e3ft.com}\\\vspace{1cm}}
29 \date{\vspace*{8mm}\small{Version Control $ $Revision: 1.12 $ $ \\
30 Version Control $ $Date: 2007/05/26 18:25:46 $ $ (UTC) \\
31 $ $RCSfile: cpu08div16by16a.tex,v $ $ \\
32 \LaTeX{} Compilation Date: \today{}}}
33 \maketitle
34
35 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
36 %
37 \begin{abstract}
38 This document contains my notes on attempting to optimize
39 the 16 = 16/16 unsigned integer division subroutine distributed with
40 a Freescale CPU08 compiler. I attempted to replace the standard
41 shift-compare-subtract algorithm with Knuth's algorithm.
42
43 In the end, the attempt was mostly a failure. There seemed to be no way
44 to gain the execution time advantages of Knuth's algorithm without increasing
45 FLASH size. (The goal would have been to obtain more favorable
46 execution time \emph{and} FLASH size, rather than a tradeoff.)
47 \end{abstract}
48
49 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
50
51 \section{Introduction and Overview}
52 \label{siov0}
53
54 The fundamental goal of practical computer integer arithmetic is to obtain
55 an exact result using machine instructions that are as fast and compact as
56 possible.
57
58 For three of the four fundamental operations (addition, subtraction, and
59 multiplication), it is intuitively obvious to most programmers how to
60 use existing machine instructions to operate on operands that are larger
61 than the machine instructions can accommodate.
62
63 The fourth fundamental operation (division), however, is less well understood
64 by a typical programmer than the other three. It is not obvious to most
65 programmers how to use machine division instructions to divide integers larger
66 than the native machine division instructions will accommodate.
67
68 In 2007, I noticed that the integer division library functions associated with
69 a particular \emph{C} compiler were of the shift-compare-subtract variety.
70 As the CPU08 has a 16/16=8 native division instruction, I suspected but was not
71 sure that the 16/16=16 division function could be improved.
72
73 Note that for a library function used by a \emph{C} compiler, it may not
74 be necessary (although it may be desirable) to calculate the quotient and the
75 remainder at the same time. An algorithm that calculates the quotient but
76 does not calculate the remainder, or vice-versa, may be acceptable for
77 use by compiled code.
78
79 Although I haven't examined the \emph{C} standards, I doubt that there is
80 any special requirement for the quotient or remainder calculated when
81 the denominator is zero. The only requirement is probably that the calculation
82 not continue indefinitely (i.e. not an infinite loop).
83
84 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
85
86 \section{Nomenclature and Notation}
87 \label{snom0}
88
89 I use the nomenclature ``16/16=16 division'' to denote a division function that
90 accepts a 16-bit unsigned numerator and a 16-bit unsigned denominator; and
91 produces either/both a 16-bit quotient and a 16-bit remainder.
92
93 I use $n$ to denote the numerator, $d$ to denote the denominator, $q$ to denote
94 the integer quotient, and $r$ to denote the remainder.
95
96 Any of the 16-bit quantities can be subscripted with ``H'' or ``L'' to denote
97 the most-significant or least-significant byte. For example,
98
99 \begin{equation}
100 \label{eq:snom0:01}
101 n = 2^8 n_H + n_L .
102 \end{equation}
103
104 Within any of the 16-bit quantities, bits are subscripted in the traditional fashion.
105 For example,
106
107 \begin{equation}
108 \label{eq:snom0:02}
109 n = \sum_{i=0}^{15} 2^i n_i .
110 \end{equation}
111
112 Within any of the 8-bit quantities, bits are also subscripted in the traditional fashion.
113 For example,
114
115 \begin{equation}
116 \label{eq:snom0:03}
117 n = 2^8 \sum_{i=0}^{7} 2^i n_{H_i} + \sum_{i=0}^{7} 2^i n_{L_i}
118 = \sum_{i=0}^{7} 2^{i+8} n_{H_i} + \sum_{i=0}^{7} 2^i n_{L_i} .
119 \end{equation}
120
121 Because only non-negative integers are involved, \emph{floor()} and
122 \emph{div} are used interchangeably, i.e.
123
124 \begin{equation}
125 \label{eq:snom0:04}
126 \left\lfloor \frac{a}{b} \right\rfloor = a \bdiv b, \;\; a,b \geq 0 .
127 \end{equation}
128
129 An identity that is frequently used in this document is
130
131 \begin{eqnarray}
132 \label{eq:snom0:05}
133 \frac{a}{b} & = & a \bdiv b + \frac{a \bmod b}{b} \\
134 \label{eq:snom0:06}
135 & = & \left\lfloor \frac{a}{b} \right\rfloor + \frac{a \bmod b}{b} .
136 \end{eqnarray}
137
138
139 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
140
141 \section{Analysis of Existing Code}
142 \label{saec0}
143
144 This section presents an analysis of the behavior and characteristics of
145 the existing code.
146
147
148 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
149
150 \subsection{Listing}
151 \label{saec0:slec0}
152
153 The existing code is reproduced below, with line numbers.
154
155 \begin{verbatim}
156 001: c_umod:
157 002: bsr c_udiv ; divide
158 003: pshh ; transfer MSB
159 004: pula ; in place
160 005: sta c_reg
161 006: txa ; LSB in place
162 007: rts ; and return
163 008: ;
164 009: c_udiv:
165 010: psha ; save NL
166 011: pshh ; DH on the stack
167 012: tst 1,sp ; test zero
168 013: bne full ; full division
169 014: pulh ; clean stack
170 015: cpx c_reg ; compare DL with NH
171 016: bls half ; half division
172 017: lda c_reg ; NH
173 018: psha ; in
174 019: pulh ; H
175 020: pula ; NL in A
176 021: div ; DL in X, divide
177 022: clr c_reg ; QH is zero
178 023: fini:
179 024: pshh ; move RL
180 025: pulx ; in X
181 026: clrh ; RH is zero
182 027: rts ; and return
183 028: half:
184 029: lda c_reg ; NH in A
185 030: clrh ; 1st divide 8 X 8
186 031: div ; divide
187 032: sta c_reg ; QH in place
188 033: pula ; complete dividend with NL
189 034: div ; divide
190 035: bra fini ; complete remainder
191 036: full:
192 037: pshx ; save DL
193 038: ldx #8 ; counter
194 039: clra ; Extention
195 040: pshx ; save on
196 041: psha ; the stack
197 042: tsx ; stack addressed by H:X
198 043: bcl:
199 044: lsl 4,x ; shift E:NH:NL
200 045: rol c_reg ; left
201 046: rola
202 047: sta 0,x ; store E
203 048: cmp 3,x ; compare DH
204 049: blo next ; too low, continue
205 050: bne ok ; ok to subtract
206 051: lda c_reg ; compare NH
207 052: sub 2,x ; with DL
208 053: bhs ok2 ; ok, complete result
209 054: lda 0,x ; restore value
210 055: bra next ; and continue
211 056: ok:
212 057: lda c_reg ; substract D
213 058: sub 2,x ; from E:NH
214 059: ok2:
215 060: sta c_reg ; in place
216 061: lda 0,x
217 062: sbc 3,x
218 063: inc 4,x ; set result bit
219 064: next:
220 065: dbnz 1,x,bcl ; count down and loop
221 066: sta 0,x ; store E
222 067: pulh ; RH in place
223 068: ais #3 ; clean up stack
224 069: ldx c_reg ; RL in place
225 070: pula ; QL in place
226 071: clr c_reg ; QH is zero
227 072: rts ; and return
228 \end{verbatim}
229
230
231 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
232
233 \subsection{Algorithmic Analysis}
234 \label{saec0:saan0}
235
236 The algorithm is divided into two cases:
237
238 \begin{itemize}
239 \item Case I: $d_H = 0$ (lines 14-35).
240 \item Case II: $d_H > 0$ (lines 36-72).
241 \end{itemize}
242
243
244 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
245
246 \subsubsection{Case I}
247 \label{saec0:saan0:scas0}
248
249 In the case of $d_H = 0$, the division is 16/8:
250
251 \begin{eqnarray}
252 \label{eq:saec0:saan0:scas0:01}
253 \frac{2^8 n_H + n_L}{d_L} & = & \frac{2^8 n_H}{d_L} + \frac{n_L}{d_L} \\
254 \label{eq:saec0:saan0:scas0:02}
255 & = & 2^8 \left( n_H \bdiv d_L + \frac{n_H \bmod d_L}{d_L} \right) + \frac{n_L}{d_L} \\
256 \label{eq:saec0:saan0:scas0:03}
257 & = & 2^8 (n_H \bdiv d_L) + \frac{2^8 (n_H \bmod d_L) + n_L}{d_L}
258 \end{eqnarray}
259
260 (\ref{eq:saec0:saan0:scas0:03}) is an exact expression involving rational numbers.
261 However, we don't want to calculate the left side of (\ref{eq:saec0:saan0:scas0:03});
262 rather, we wish to calculate its \emph{floor()}\@. Applying the
263 \emph{floor()} function to both sides of (\ref{eq:saec0:saan0:scas0:03}) yields:
264
265 \begin{equation}
266 \label{eq:saec0:saan0:scas0:04}
267 \left\lfloor \frac{2^8 n_H + n_L}{d_L} \right\rfloor
268 =
269 2^8 (n_H \bdiv d_L) + \left\lfloor \frac{2^8 (n_H \bmod d_L) + n_L}{d_L} \right\rfloor .
270 \end{equation}
271
272 Note that (\ref{eq:saec0:saan0:scas0:04}) is in a form
273 that can be readily evaluated using a processor with 16/8 division
274 capability; so long as
275
276 \begin{equation}
277 \label{eq:saec0:saan0:scas0:05}
278 \frac{2^8 (n_H \bmod d_L) + n_L}{d_L} < 2^8 ,
279 \end{equation}
280
281 \noindent{}a fact that can be easily verified by the reader.
282
283 (\ref{eq:saec0:saan0:scas0:04}) can be readily evaluated by a processor with
284 16/8 division capability because such a division instruction always provides
285 both quotient and remainder. It is easy to see that (\ref{eq:saec0:saan0:scas0:04})
286 can be evaluated with a division, a re-staging of bytes, and a second division.
287
288 If (\ref{eq:saec0:saan0:scas0:04}) is evaluated as suggested, it needs to be verified
289 whether the remainder of the second division is the same as the remainder
290 of the larger division, i.e.
291
292 \begin{equation}
293 \label{eq:saec0:saan0:scas0:06}
294 (2^8 n_H + n_L) \bmod d_L =? ((2^8 \bmod d_L) + n_L) \bmod d_L.
295 \end{equation}
296
297 The question of whether (\ref{eq:saec0:saan0:scas0:06}) is an
298 equality is the question of whether
299
300 \begin{equation}
301 \label{eq:saec0:saan0:scas0:07}
302 ka \bmod b = (k (a \bmod b)) \bmod b .
303 \end{equation}
304
305 In order to prove or disprove (\ref{eq:saec0:saan0:scas0:07}),
306 decompose $a$ into $ib + j$. Then, since $kib \bmod b = 0$,
307
308 \begin{eqnarray}
309 \label{eq:saec0:saan0:scas0:08}
310 k (ib + j) \bmod b & = & k j \bmod b \\
311 \label{eq:saec0:saan0:scas0:09}
312 k j \bmod b & = & k j \bmod b .
313 \end{eqnarray}
314
315 Thus, if (\ref{eq:saec0:saan0:scas0:04}) is evaluated as suggested (with
316 two divisions), the final remainder will be the same as the remainder for the
317 original division. (\ref{eq:saec0:saan0:scas0:04}) will, in fact, deliver both
318 the quotient and remainder economically.
319
320 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
321
322 \subsubsection{Case II}
323 \label{saec0:saan0:scas1}
324
325 The case of $d_H > 0$ (\S{}\ref{saec0:slec0}, lines 36-72) is conventional
326 shift-compare-subtract division. Only eight iterations of the loop are required
327 because with $d_H > 0$, $d \geq 2^8$, and $n/d < 2^8$.
328
329
330 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
331
332 \subsection{Timing Analysis}
333 \label{saec0:stan0}
334
335 The code of \S{}\ref{saec0:slec0} is reproduced below, with instruction
336 timing (number of clocks) and FLASH requirements (number of bytes) added
337 as (clocks/bytes). It was determined that \emph{c\_reg} resides
338 in zero-page (i.e. \emph{direct}) memory.
339
340 \begin{verbatim}
341 001: c_umod:
342 002: bsr c_udiv 4/2 ; divide
343 003: pshh ; transfer MSB
344 004: pula 2/1 ; in place
345 005: sta c_reg 3/2
346 006: txa 1/1 ; LSB in place
347 007: rts 4/1 ; and return
348 008: ;
349 009: c_udiv:
350 010: psha 2/1 ; save NL
351 011: pshh 2/1 ; DH on the stack
352 012: tst 1,sp 4/3 ; test zero
353 013: bne full 3/2 ; full division
354 014: pulh 2/1 ; clean stack
355 015: cpx c_reg 3/2 ; compare DL with NH
356 016: bls half 3/2 ; half division
357 017: lda c_reg 3/2 ; NH
358 018: psha 2/1 ; in
359 019: pulh 2/1 ; H
360 020: pula 2/1 ; NL in A
361 021: div 7/1 ; DL in X, divide
362 022: clr c_reg 3/2 ; QH is zero
363 023: fini:
364 024: pshh 2/1 ; move RL
365 025: pulx 2/1 ; in X
366 026: clrh 1/1 ; RH is zero
367 027: rts 4/1 ; and return
368 028: half:
369 029: lda c_reg 3/2 ; NH in A
370 030: clrh 1/1 ; 1st divide 8 X 8
371 031: div 7/1 ; divide
372 032: sta c_reg 3/2 ; QH in place
373 033: pula 2/1 ; complete dividend with NL
374 034: div 7/1 ; divide
375 035: bra fini 3/2 ; complete remainder
376 036: full:
377 037: pshx 2/1 ; save DL
378 038: ldx #8 2/2 ; counter
379 039: clra 1/1 ; Extention
380 040: pshx 2/1 ; save on
381 041: psha 2/1 ; the stack
382 042: tsx 2/1 ; stack addressed by H:X
383 043: bcl:
384 044: lsl 4,x 4/2 ; shift E:NH:NL
385 045: rol c_reg 4/2 ; left
386 046: rola 1/1
387 047: sta 0,x 2/1 ; store E
388 048: cmp 3,x 3/2 ; compare DH
389 049: blo next 3/2 ; too low, continue
390 050: bne ok 3/2 ; ok to subtract
391 051: lda c_reg 3/2 ; compare NH
392 052: sub 2,x 3/2 ; with DL
393 053: bhs ok2 3/2 ; ok, complete result
394 054: lda 0,x 2/1 ; restore value
395 055: bra next 3/2 ; and continue
396 056: ok:
397 057: lda c_reg 3/2 ; substract D
398 058: sub 2,x 3/2 ; from E:NH
399 059: ok2:
400 060: sta c_reg 3/2 ; in place
401 061: lda 0,x 2/1
402 062: sbc 3,x 3/2
403 063: inc 4,x 3/2 ; set result bit
404 064: next:
405 065: dbnz 1,x,bcl 5/3 ; count down and loop
406 066: sta 0,x 2/1 ; store E
407 067: pulh 2/1 ; RH in place
408 068: ais #3 2/2 ; clean up stack
409 069: ldx c_reg 3/2 ; RL in place
410 070: pula 2/1 ; QL in place
411 071: clr c_reg 3/2 ; QH is zero
412 072: rts 4/1 ; and return
413 \end{verbatim}
414
415 There are three distinct timing cases for the \emph{c\_udiv} function:
416
417 \begin{enumerate}
418 \item $d_H = 0$ and $n_H < d_L$:
419 47 clocks are required, representing straight flow of the
420 instructions from line 10 through line 27.
421 \item $d_H = 0$ and $n_H \geq d_L$: 54 clocks are required.
422 \item $d_H > 0$ and every bit of the quotient is 1, in which case
423 400 clocks are required. This represents 22 clocks up through
424 line 42, 45 clocks $\times$ 8 in the lines from 43 through 65, and
425 18 clocks in the lines from 66 through 72.
426 \end{enumerate}
427
428
429 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
430
431 \subsection{FLASH/RAM Consumption Analysis}
432 \label{saec0:sfra0}
433
434 From \S{}\ref{saec0:stan0}, 93 bytes of FLASH are used. Only one byte
435 of RAM is used (\emph{c\_reg}, probably shared with other functions
436 as well).
437
438
439 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
440
441 \section{Potential Optimizations}
442 \label{spop0}
443
444
445 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
446
447 \subsection{Potential Case I Optimizations}
448 \label{spop0:scas0}
449
450 This section corresponds to \emph{Case I} of \S{}\ref{saec0:saan0:scas0}.
451
452 The most obvious observation about the code (\S{}\ref{saec0:slec0}) is that
453 division instructions are very inexpensive on the CPU08---7 clock cycles,
454 or about 2 typical instructions. Branching based on
455 $n_H \geq d_L$
456 (\S{}\ref{saec0:slec0}, lines 15-16)
457 may cost more in the test, the branch, and in other
458 data transfer overhead than is saved. It may make sense to apply the
459 full formula in (\ref{eq:saec0:saan0:scas0:04}) in all cases where
460 $d_H = 0$.
461
462 When $d_H = 0$, and if one assumes normal distribution of data, the expected
463 value of execution time is about $(47+54)/2 = 50.5$ clocks.\footnote{If the
464 data is assumed normally distributed, $n_H$ has about a 0.5 probability of being at least as large
465 as $d_L$.}
466
467 The code below combines two of the three timing cases into one by ignoring the
468 relationship between $n_H$ and $d_L$.
469
470 \begin{verbatim}
471 ;Condition at function entry:
472 ;N_H in 1,SP
473 ;N_L in A
474 ;D_H in H
475 ;D_L in X
476 ;
477 ;Condition at function exit:
478 ;Q_H in c_reg
479 ;Q_L in A
480 ;R_H in H
481 ;R_L in X
482 ;
483 c_udiv:
484 psha 2/1 ; save NL
485 pshh 2/1 ; DH on the stack
486 tst 1,sp 4/3 ; test zero
487 bne full 3/2 ; full division
488 ;
489 ;From here on we're committed to the division with
490 ;arbitary numerator, and denominator <= 255.
491 ; N_H at 3,sp
492 ; N_L at 2,sp
493 ; D_H at 1,sp
494 ;
495 clrh 1/1
496 lda 3,sp 4/3 ; H:A now contains N_H
497 div 7/1 ; divide
498 sta c_reg 3/2 ; QH in place
499 lda 2,sp 4/3 ; complete dividend with NL
500 div 7/1 ; divide. Q_L in A, R_L in H
501 pshh 2/1 ; move RL
502 pulx 2/1 ; in X
503 clrh 1/1 ; RH is zero
504 ais #3 2/2 ; clean stack
505 rts 4/1 ; and return
506 \end{verbatim}
507
508 Although the code does raise the minimum execution time from 47 to
509 48 clocks:
510
511 \begin{itemize}
512 \item It lowers the expected value of the $d_H=0$ execution time from
513 50.5 to 48 clocks.
514 \item It saves approximately 15 bytes of FLASH.
515 \end{itemize}
516
517 This optimization is recommended.
518
519
520 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
521
522 \subsection{Potential Case II Optimizations}
523 \label{spop0:scas1}
524
525 This section corresponds to \emph{Case II} of \S{}\ref{saec0:saan0:scas1}.
526
527 I sent an e-mail to an engineer at the compiler manufacturer indicating
528 that:
529
530 \begin{itemize}
531 \item I believed \emph{Case I} could be optimized as indicated earlier
532 in the document.
533 \item I believed \emph{Case II} could be optimized by applying Knuth's
534 algorithm.
535 \end{itemize}
536
537 The reply I received from the compiler manufacturer was that:
538
539 \begin{itemize}
540 \item There was agreeement about \emph{Case I}.
541 \item \emph{Case II} may be a little faster using Knuth's algorithm, but would
542 definitely be larger (code used to evaluate the application of Knuth's
543 algorithm was also provided).
544 \end{itemize}
545
546 In the test code provided by the compiler manufacturer, the approach used was
547 to obtain a trial quotient and then to subtract the divisor from the remainder
548 up to twice to adjust the quotient up to 2 counts downward (Knuth's algorithm).
549
550 I did try a different approach (to iterate on the quotient and to reconstruct
551 $q \times d$ with decreasing $q$). This approach promised to be slightly more
552 compact because $q \times d$ reconstruction was reutilized. However, it worked out
553 to occupy about 153 bytes rather than 103 for the shift-compare-subtract
554 algorithm (timing was not examined). (This test code is version-controlled
555 in the same directory as this \LaTeX{} document.)
556
557 At this point I agree with the
558 compiler manufacturer
559 that there is a tradeoff between size and speed (it seems
560 nearly impossible to get both).
561
562 In any reimplementation of this algorithm, will probably need to choose between
563 size and speed. I believe there is some possibility to reduce the reimplementation
564 from 153 bytes, but not down to 103 bytes.
565
566 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
567 \end{document}
568 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
569 % $Log: cpu08div16by16a.tex,v $
570 % Revision 1.12 2007/05/26 18:25:46 dashley
571 % Edits to incorporate final thoughts (for now) and test code results.
572 %
573 % Revision 1.11 2007/05/21 16:49:40 dashley
574 % \bdiv functionality as suggested by Robin Fairbairns added.
575 %
576 % Revision 1.10 2007/05/18 15:53:59 dashley
577 % Edits.
578 %
579 % Revision 1.9 2007/05/18 02:02:56 dashley
580 % Edits.
581 %
582 % Revision 1.8 2007/05/18 02:02:10 dashley
583 % Edits.
584 %
585 % Revision 1.7 2007/05/18 01:40:20 dashley
586 % Edits.
587 %
588 % Revision 1.6 2007/05/18 00:15:56 dashley
589 % Edits.
590 %
591 % Revision 1.5 2007/05/17 15:53:57 dashley
592 % Edits.
593 %
594 % Revision 1.4 2007/05/17 14:46:34 dashley
595 % Edits.
596 %
597 % Revision 1.3 2007/05/17 04:15:22 dashley
598 % Edits.
599 %
600 % Revision 1.2 2007/05/17 03:50:48 dashley
601 % Edits.
602 %
603 % Revision 1.1 2007/05/17 03:09:06 dashley
604 % Initial checkin.
605 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

dashley@gmail.com
ViewVC Help
Powered by ViewVC 1.1.25