/[dtapublic]/pubs/papers/20070526_cpu08_divu16_by_u16/cpu08div16by16a.tex
ViewVC logotype

Annotation of /pubs/papers/20070526_cpu08_divu16_by_u16/cpu08div16by16a.tex

Parent Directory Parent Directory | Revision Log Revision Log


Revision 10 - (hide annotations) (download) (as text)
Fri Oct 7 03:23:35 2016 UTC (7 years, 11 months ago) by dashley
File MIME type: application/x-tex
File size: 22866 byte(s)
Commit of u16 = u16 / u16 analysis for the CPU08.
1 dashley 10 %$Header: /home/dashley/cvsrep/e3ft_gpl01/e3ft_gpl01/dtaipubs/cron/2007/cpu08div16by16a/cpu08div16by16a.tex,v 1.12 2007/05/26 18:25:46 dashley Exp $
2     %
3     \documentclass[letterpaper,10pt,titlepage]{article}
4     %
5     %\pagestyle{headings}
6     %
7     \usepackage{amsmath}
8     \usepackage{amsfonts}
9     \usepackage{amssymb}
10     \usepackage[ansinew]{inputenc}
11     \usepackage[OT1]{fontenc}
12     \usepackage{graphicx}
13     %\usepackage{makeidx}
14     %
15     %Contributed by Robin Fairbairns of the comp.text.tex newsgroup. I have no
16     %idea how it works ... but it gives a working \bdiv.
17     %
18     \makeatletter
19     \def\bdiv{%
20     \nonscript\mskip-\medmuskip\mkern5mu%
21     \mathbin{\operator@font div}\penalty900\mkern5mu%
22     \nonscript\mskip-\medmuskip}
23     \makeatother
24     %
25     %
26     \begin{document}
27     \title{Notes on Economical 16/16 = 16 Unsigned Integer Division on the Freescale CPU08}
28     \author{\vspace{1cm}\\David T. Ashley\\\texttt{dta@e3ft.com}\\\vspace{1cm}}
29     \date{\vspace*{8mm}\small{Version Control $ $Revision: 1.12 $ $ \\
30     Version Control $ $Date: 2007/05/26 18:25:46 $ $ (UTC) \\
31     $ $RCSfile: cpu08div16by16a.tex,v $ $ \\
32     \LaTeX{} Compilation Date: \today{}}}
33     \maketitle
34    
35     %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
36     %
37     \begin{abstract}
38     This document contains my notes on attempting to optimize
39     the 16 = 16/16 unsigned integer division subroutine distributed with
40     a Freescale CPU08 compiler. I attempted to replace the standard
41     shift-compare-subtract algorithm with Knuth's algorithm.
42    
43     In the end, the attempt was mostly a failure. There seemed to be no way
44     to gain the execution time advantages of Knuth's algorithm without increasing
45     FLASH size. (The goal would have been to obtain more favorable
46     execution time \emph{and} FLASH size, rather than a tradeoff.)
47     \end{abstract}
48    
49     %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
50    
51     \section{Introduction and Overview}
52     \label{siov0}
53    
54     The fundamental goal of practical computer integer arithmetic is to obtain
55     an exact result using machine instructions that are as fast and compact as
56     possible.
57    
58     For three of the four fundamental operations (addition, subtraction, and
59     multiplication), it is intuitively obvious to most programmers how to
60     use existing machine instructions to operate on operands that are larger
61     than the machine instructions can accommodate.
62    
63     The fourth fundamental operation (division), however, is less well understood
64     by a typical programmer than the other three. It is not obvious to most
65     programmers how to use machine division instructions to divide integers larger
66     than the native machine division instructions will accommodate.
67    
68     In 2007, I noticed that the integer division library functions associated with
69     a particular \emph{C} compiler were of the shift-compare-subtract variety.
70     As the CPU08 has a 16/16=8 native division instruction, I suspected but was not
71     sure that the 16/16=16 division function could be improved.
72    
73     Note that for a library function used by a \emph{C} compiler, it may not
74     be necessary (although it may be desirable) to calculate the quotient and the
75     remainder at the same time. An algorithm that calculates the quotient but
76     does not calculate the remainder, or vice-versa, may be acceptable for
77     use by compiled code.
78    
79     Although I haven't examined the \emph{C} standards, I doubt that there is
80     any special requirement for the quotient or remainder calculated when
81     the denominator is zero. The only requirement is probably that the calculation
82     not continue indefinitely (i.e. not an infinite loop).
83    
84     %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
85    
86     \section{Nomenclature and Notation}
87     \label{snom0}
88    
89     I use the nomenclature ``16/16=16 division'' to denote a division function that
90     accepts a 16-bit unsigned numerator and a 16-bit unsigned denominator; and
91     produces either/both a 16-bit quotient and a 16-bit remainder.
92    
93     I use $n$ to denote the numerator, $d$ to denote the denominator, $q$ to denote
94     the integer quotient, and $r$ to denote the remainder.
95    
96     Any of the 16-bit quantities can be subscripted with ``H'' or ``L'' to denote
97     the most-significant or least-significant byte. For example,
98    
99     \begin{equation}
100     \label{eq:snom0:01}
101     n = 2^8 n_H + n_L .
102     \end{equation}
103    
104     Within any of the 16-bit quantities, bits are subscripted in the traditional fashion.
105     For example,
106    
107     \begin{equation}
108     \label{eq:snom0:02}
109     n = \sum_{i=0}^{15} 2^i n_i .
110     \end{equation}
111    
112     Within any of the 8-bit quantities, bits are also subscripted in the traditional fashion.
113     For example,
114    
115     \begin{equation}
116     \label{eq:snom0:03}
117     n = 2^8 \sum_{i=0}^{7} 2^i n_{H_i} + \sum_{i=0}^{7} 2^i n_{L_i}
118     = \sum_{i=0}^{7} 2^{i+8} n_{H_i} + \sum_{i=0}^{7} 2^i n_{L_i} .
119     \end{equation}
120    
121     Because only non-negative integers are involved, \emph{floor()} and
122     \emph{div} are used interchangeably, i.e.
123    
124     \begin{equation}
125     \label{eq:snom0:04}
126     \left\lfloor \frac{a}{b} \right\rfloor = a \bdiv b, \;\; a,b \geq 0 .
127     \end{equation}
128    
129     An identity that is frequently used in this document is
130    
131     \begin{eqnarray}
132     \label{eq:snom0:05}
133     \frac{a}{b} & = & a \bdiv b + \frac{a \bmod b}{b} \\
134     \label{eq:snom0:06}
135     & = & \left\lfloor \frac{a}{b} \right\rfloor + \frac{a \bmod b}{b} .
136     \end{eqnarray}
137    
138    
139     %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
140    
141     \section{Analysis of Existing Code}
142     \label{saec0}
143    
144     This section presents an analysis of the behavior and characteristics of
145     the existing code.
146    
147    
148     %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
149    
150     \subsection{Listing}
151     \label{saec0:slec0}
152    
153     The existing code is reproduced below, with line numbers.
154    
155     \begin{verbatim}
156     001: c_umod:
157     002: bsr c_udiv ; divide
158     003: pshh ; transfer MSB
159     004: pula ; in place
160     005: sta c_reg
161     006: txa ; LSB in place
162     007: rts ; and return
163     008: ;
164     009: c_udiv:
165     010: psha ; save NL
166     011: pshh ; DH on the stack
167     012: tst 1,sp ; test zero
168     013: bne full ; full division
169     014: pulh ; clean stack
170     015: cpx c_reg ; compare DL with NH
171     016: bls half ; half division
172     017: lda c_reg ; NH
173     018: psha ; in
174     019: pulh ; H
175     020: pula ; NL in A
176     021: div ; DL in X, divide
177     022: clr c_reg ; QH is zero
178     023: fini:
179     024: pshh ; move RL
180     025: pulx ; in X
181     026: clrh ; RH is zero
182     027: rts ; and return
183     028: half:
184     029: lda c_reg ; NH in A
185     030: clrh ; 1st divide 8 X 8
186     031: div ; divide
187     032: sta c_reg ; QH in place
188     033: pula ; complete dividend with NL
189     034: div ; divide
190     035: bra fini ; complete remainder
191     036: full:
192     037: pshx ; save DL
193     038: ldx #8 ; counter
194     039: clra ; Extention
195     040: pshx ; save on
196     041: psha ; the stack
197     042: tsx ; stack addressed by H:X
198     043: bcl:
199     044: lsl 4,x ; shift E:NH:NL
200     045: rol c_reg ; left
201     046: rola
202     047: sta 0,x ; store E
203     048: cmp 3,x ; compare DH
204     049: blo next ; too low, continue
205     050: bne ok ; ok to subtract
206     051: lda c_reg ; compare NH
207     052: sub 2,x ; with DL
208     053: bhs ok2 ; ok, complete result
209     054: lda 0,x ; restore value
210     055: bra next ; and continue
211     056: ok:
212     057: lda c_reg ; substract D
213     058: sub 2,x ; from E:NH
214     059: ok2:
215     060: sta c_reg ; in place
216     061: lda 0,x
217     062: sbc 3,x
218     063: inc 4,x ; set result bit
219     064: next:
220     065: dbnz 1,x,bcl ; count down and loop
221     066: sta 0,x ; store E
222     067: pulh ; RH in place
223     068: ais #3 ; clean up stack
224     069: ldx c_reg ; RL in place
225     070: pula ; QL in place
226     071: clr c_reg ; QH is zero
227     072: rts ; and return
228     \end{verbatim}
229    
230    
231     %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
232    
233     \subsection{Algorithmic Analysis}
234     \label{saec0:saan0}
235    
236     The algorithm is divided into two cases:
237    
238     \begin{itemize}
239     \item Case I: $d_H = 0$ (lines 14-35).
240     \item Case II: $d_H > 0$ (lines 36-72).
241     \end{itemize}
242    
243    
244     %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
245    
246     \subsubsection{Case I}
247     \label{saec0:saan0:scas0}
248    
249     In the case of $d_H = 0$, the division is 16/8:
250    
251     \begin{eqnarray}
252     \label{eq:saec0:saan0:scas0:01}
253     \frac{2^8 n_H + n_L}{d_L} & = & \frac{2^8 n_H}{d_L} + \frac{n_L}{d_L} \\
254     \label{eq:saec0:saan0:scas0:02}
255     & = & 2^8 \left( n_H \bdiv d_L + \frac{n_H \bmod d_L}{d_L} \right) + \frac{n_L}{d_L} \\
256     \label{eq:saec0:saan0:scas0:03}
257     & = & 2^8 (n_H \bdiv d_L) + \frac{2^8 (n_H \bmod d_L) + n_L}{d_L}
258     \end{eqnarray}
259    
260     (\ref{eq:saec0:saan0:scas0:03}) is an exact expression involving rational numbers.
261     However, we don't want to calculate the left side of (\ref{eq:saec0:saan0:scas0:03});
262     rather, we wish to calculate its \emph{floor()}\@. Applying the
263     \emph{floor()} function to both sides of (\ref{eq:saec0:saan0:scas0:03}) yields:
264    
265     \begin{equation}
266     \label{eq:saec0:saan0:scas0:04}
267     \left\lfloor \frac{2^8 n_H + n_L}{d_L} \right\rfloor
268     =
269     2^8 (n_H \bdiv d_L) + \left\lfloor \frac{2^8 (n_H \bmod d_L) + n_L}{d_L} \right\rfloor .
270     \end{equation}
271    
272     Note that (\ref{eq:saec0:saan0:scas0:04}) is in a form
273     that can be readily evaluated using a processor with 16/8 division
274     capability; so long as
275    
276     \begin{equation}
277     \label{eq:saec0:saan0:scas0:05}
278     \frac{2^8 (n_H \bmod d_L) + n_L}{d_L} < 2^8 ,
279     \end{equation}
280    
281     \noindent{}a fact that can be easily verified by the reader.
282    
283     (\ref{eq:saec0:saan0:scas0:04}) can be readily evaluated by a processor with
284     16/8 division capability because such a division instruction always provides
285     both quotient and remainder. It is easy to see that (\ref{eq:saec0:saan0:scas0:04})
286     can be evaluated with a division, a re-staging of bytes, and a second division.
287    
288     If (\ref{eq:saec0:saan0:scas0:04}) is evaluated as suggested, it needs to be verified
289     whether the remainder of the second division is the same as the remainder
290     of the larger division, i.e.
291    
292     \begin{equation}
293     \label{eq:saec0:saan0:scas0:06}
294     (2^8 n_H + n_L) \bmod d_L =? ((2^8 \bmod d_L) + n_L) \bmod d_L.
295     \end{equation}
296    
297     The question of whether (\ref{eq:saec0:saan0:scas0:06}) is an
298     equality is the question of whether
299    
300     \begin{equation}
301     \label{eq:saec0:saan0:scas0:07}
302     ka \bmod b = (k (a \bmod b)) \bmod b .
303     \end{equation}
304    
305     In order to prove or disprove (\ref{eq:saec0:saan0:scas0:07}),
306     decompose $a$ into $ib + j$. Then, since $kib \bmod b = 0$,
307    
308     \begin{eqnarray}
309     \label{eq:saec0:saan0:scas0:08}
310     k (ib + j) \bmod b & = & k j \bmod b \\
311     \label{eq:saec0:saan0:scas0:09}
312     k j \bmod b & = & k j \bmod b .
313     \end{eqnarray}
314    
315     Thus, if (\ref{eq:saec0:saan0:scas0:04}) is evaluated as suggested (with
316     two divisions), the final remainder will be the same as the remainder for the
317     original division. (\ref{eq:saec0:saan0:scas0:04}) will, in fact, deliver both
318     the quotient and remainder economically.
319    
320     %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
321    
322     \subsubsection{Case II}
323     \label{saec0:saan0:scas1}
324    
325     The case of $d_H > 0$ (\S{}\ref{saec0:slec0}, lines 36-72) is conventional
326     shift-compare-subtract division. Only eight iterations of the loop are required
327     because with $d_H > 0$, $d \geq 2^8$, and $n/d < 2^8$.
328    
329    
330     %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
331    
332     \subsection{Timing Analysis}
333     \label{saec0:stan0}
334    
335     The code of \S{}\ref{saec0:slec0} is reproduced below, with instruction
336     timing (number of clocks) and FLASH requirements (number of bytes) added
337     as (clocks/bytes). It was determined that \emph{c\_reg} resides
338     in zero-page (i.e. \emph{direct}) memory.
339    
340     \begin{verbatim}
341     001: c_umod:
342     002: bsr c_udiv 4/2 ; divide
343     003: pshh ; transfer MSB
344     004: pula 2/1 ; in place
345     005: sta c_reg 3/2
346     006: txa 1/1 ; LSB in place
347     007: rts 4/1 ; and return
348     008: ;
349     009: c_udiv:
350     010: psha 2/1 ; save NL
351     011: pshh 2/1 ; DH on the stack
352     012: tst 1,sp 4/3 ; test zero
353     013: bne full 3/2 ; full division
354     014: pulh 2/1 ; clean stack
355     015: cpx c_reg 3/2 ; compare DL with NH
356     016: bls half 3/2 ; half division
357     017: lda c_reg 3/2 ; NH
358     018: psha 2/1 ; in
359     019: pulh 2/1 ; H
360     020: pula 2/1 ; NL in A
361     021: div 7/1 ; DL in X, divide
362     022: clr c_reg 3/2 ; QH is zero
363     023: fini:
364     024: pshh 2/1 ; move RL
365     025: pulx 2/1 ; in X
366     026: clrh 1/1 ; RH is zero
367     027: rts 4/1 ; and return
368     028: half:
369     029: lda c_reg 3/2 ; NH in A
370     030: clrh 1/1 ; 1st divide 8 X 8
371     031: div 7/1 ; divide
372     032: sta c_reg 3/2 ; QH in place
373     033: pula 2/1 ; complete dividend with NL
374     034: div 7/1 ; divide
375     035: bra fini 3/2 ; complete remainder
376     036: full:
377     037: pshx 2/1 ; save DL
378     038: ldx #8 2/2 ; counter
379     039: clra 1/1 ; Extention
380     040: pshx 2/1 ; save on
381     041: psha 2/1 ; the stack
382     042: tsx 2/1 ; stack addressed by H:X
383     043: bcl:
384     044: lsl 4,x 4/2 ; shift E:NH:NL
385     045: rol c_reg 4/2 ; left
386     046: rola 1/1
387     047: sta 0,x 2/1 ; store E
388     048: cmp 3,x 3/2 ; compare DH
389     049: blo next 3/2 ; too low, continue
390     050: bne ok 3/2 ; ok to subtract
391     051: lda c_reg 3/2 ; compare NH
392     052: sub 2,x 3/2 ; with DL
393     053: bhs ok2 3/2 ; ok, complete result
394     054: lda 0,x 2/1 ; restore value
395     055: bra next 3/2 ; and continue
396     056: ok:
397     057: lda c_reg 3/2 ; substract D
398     058: sub 2,x 3/2 ; from E:NH
399     059: ok2:
400     060: sta c_reg 3/2 ; in place
401     061: lda 0,x 2/1
402     062: sbc 3,x 3/2
403     063: inc 4,x 3/2 ; set result bit
404     064: next:
405     065: dbnz 1,x,bcl 5/3 ; count down and loop
406     066: sta 0,x 2/1 ; store E
407     067: pulh 2/1 ; RH in place
408     068: ais #3 2/2 ; clean up stack
409     069: ldx c_reg 3/2 ; RL in place
410     070: pula 2/1 ; QL in place
411     071: clr c_reg 3/2 ; QH is zero
412     072: rts 4/1 ; and return
413     \end{verbatim}
414    
415     There are three distinct timing cases for the \emph{c\_udiv} function:
416    
417     \begin{enumerate}
418     \item $d_H = 0$ and $n_H < d_L$:
419     47 clocks are required, representing straight flow of the
420     instructions from line 10 through line 27.
421     \item $d_H = 0$ and $n_H \geq d_L$: 54 clocks are required.
422     \item $d_H > 0$ and every bit of the quotient is 1, in which case
423     400 clocks are required. This represents 22 clocks up through
424     line 42, 45 clocks $\times$ 8 in the lines from 43 through 65, and
425     18 clocks in the lines from 66 through 72.
426     \end{enumerate}
427    
428    
429     %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
430    
431     \subsection{FLASH/RAM Consumption Analysis}
432     \label{saec0:sfra0}
433    
434     From \S{}\ref{saec0:stan0}, 93 bytes of FLASH are used. Only one byte
435     of RAM is used (\emph{c\_reg}, probably shared with other functions
436     as well).
437    
438    
439     %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
440    
441     \section{Potential Optimizations}
442     \label{spop0}
443    
444    
445     %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
446    
447     \subsection{Potential Case I Optimizations}
448     \label{spop0:scas0}
449    
450     This section corresponds to \emph{Case I} of \S{}\ref{saec0:saan0:scas0}.
451    
452     The most obvious observation about the code (\S{}\ref{saec0:slec0}) is that
453     division instructions are very inexpensive on the CPU08---7 clock cycles,
454     or about 2 typical instructions. Branching based on
455     $n_H \geq d_L$
456     (\S{}\ref{saec0:slec0}, lines 15-16)
457     may cost more in the test, the branch, and in other
458     data transfer overhead than is saved. It may make sense to apply the
459     full formula in (\ref{eq:saec0:saan0:scas0:04}) in all cases where
460     $d_H = 0$.
461    
462     When $d_H = 0$, and if one assumes normal distribution of data, the expected
463     value of execution time is about $(47+54)/2 = 50.5$ clocks.\footnote{If the
464     data is assumed normally distributed, $n_H$ has about a 0.5 probability of being at least as large
465     as $d_L$.}
466    
467     The code below combines two of the three timing cases into one by ignoring the
468     relationship between $n_H$ and $d_L$.
469    
470     \begin{verbatim}
471     ;Condition at function entry:
472     ;N_H in 1,SP
473     ;N_L in A
474     ;D_H in H
475     ;D_L in X
476     ;
477     ;Condition at function exit:
478     ;Q_H in c_reg
479     ;Q_L in A
480     ;R_H in H
481     ;R_L in X
482     ;
483     c_udiv:
484     psha 2/1 ; save NL
485     pshh 2/1 ; DH on the stack
486     tst 1,sp 4/3 ; test zero
487     bne full 3/2 ; full division
488     ;
489     ;From here on we're committed to the division with
490     ;arbitary numerator, and denominator <= 255.
491     ; N_H at 3,sp
492     ; N_L at 2,sp
493     ; D_H at 1,sp
494     ;
495     clrh 1/1
496     lda 3,sp 4/3 ; H:A now contains N_H
497     div 7/1 ; divide
498     sta c_reg 3/2 ; QH in place
499     lda 2,sp 4/3 ; complete dividend with NL
500     div 7/1 ; divide. Q_L in A, R_L in H
501     pshh 2/1 ; move RL
502     pulx 2/1 ; in X
503     clrh 1/1 ; RH is zero
504     ais #3 2/2 ; clean stack
505     rts 4/1 ; and return
506     \end{verbatim}
507    
508     Although the code does raise the minimum execution time from 47 to
509     48 clocks:
510    
511     \begin{itemize}
512     \item It lowers the expected value of the $d_H=0$ execution time from
513     50.5 to 48 clocks.
514     \item It saves approximately 15 bytes of FLASH.
515     \end{itemize}
516    
517     This optimization is recommended.
518    
519    
520     %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
521    
522     \subsection{Potential Case II Optimizations}
523     \label{spop0:scas1}
524    
525     This section corresponds to \emph{Case II} of \S{}\ref{saec0:saan0:scas1}.
526    
527     I sent an e-mail to an engineer at the compiler manufacturer indicating
528     that:
529    
530     \begin{itemize}
531     \item I believed \emph{Case I} could be optimized as indicated earlier
532     in the document.
533     \item I believed \emph{Case II} could be optimized by applying Knuth's
534     algorithm.
535     \end{itemize}
536    
537     The reply I received from the compiler manufacturer was that:
538    
539     \begin{itemize}
540     \item There was agreeement about \emph{Case I}.
541     \item \emph{Case II} may be a little faster using Knuth's algorithm, but would
542     definitely be larger (code used to evaluate the application of Knuth's
543     algorithm was also provided).
544     \end{itemize}
545    
546     In the test code provided by the compiler manufacturer, the approach used was
547     to obtain a trial quotient and then to subtract the divisor from the remainder
548     up to twice to adjust the quotient up to 2 counts downward (Knuth's algorithm).
549    
550     I did try a different approach (to iterate on the quotient and to reconstruct
551     $q \times d$ with decreasing $q$). This approach promised to be slightly more
552     compact because $q \times d$ reconstruction was reutilized. However, it worked out
553     to occupy about 153 bytes rather than 103 for the shift-compare-subtract
554     algorithm (timing was not examined). (This test code is version-controlled
555     in the same directory as this \LaTeX{} document.)
556    
557     At this point I agree with the
558     compiler manufacturer
559     that there is a tradeoff between size and speed (it seems
560     nearly impossible to get both).
561    
562     In any reimplementation of this algorithm, will probably need to choose between
563     size and speed. I believe there is some possibility to reduce the reimplementation
564     from 153 bytes, but not down to 103 bytes.
565    
566     %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
567     \end{document}
568     %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
569     % $Log: cpu08div16by16a.tex,v $
570     % Revision 1.12 2007/05/26 18:25:46 dashley
571     % Edits to incorporate final thoughts (for now) and test code results.
572     %
573     % Revision 1.11 2007/05/21 16:49:40 dashley
574     % \bdiv functionality as suggested by Robin Fairbairns added.
575     %
576     % Revision 1.10 2007/05/18 15:53:59 dashley
577     % Edits.
578     %
579     % Revision 1.9 2007/05/18 02:02:56 dashley
580     % Edits.
581     %
582     % Revision 1.8 2007/05/18 02:02:10 dashley
583     % Edits.
584     %
585     % Revision 1.7 2007/05/18 01:40:20 dashley
586     % Edits.
587     %
588     % Revision 1.6 2007/05/18 00:15:56 dashley
589     % Edits.
590     %
591     % Revision 1.5 2007/05/17 15:53:57 dashley
592     % Edits.
593     %
594     % Revision 1.4 2007/05/17 14:46:34 dashley
595     % Edits.
596     %
597     % Revision 1.3 2007/05/17 04:15:22 dashley
598     % Edits.
599     %
600     % Revision 1.2 2007/05/17 03:50:48 dashley
601     % Edits.
602     %
603     % Revision 1.1 2007/05/17 03:09:06 dashley
604     % Initial checkin.
605     %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

dashley@gmail.com
ViewVC Help
Powered by ViewVC 1.1.25