1 |
%$Header$ |
2 |
|
3 |
\chapter[\ccilzeroshorttitle{}]{\ccilzerolongtitle{}} |
4 |
|
5 |
\label{ccil0} |
6 |
|
7 |
\beginchapterquote{``If our ancestors had invented arithmetic by counting with |
8 |
their two fists or their eight fingers, instead of their |
9 |
ten `digits', we would never have to worry about |
10 |
writing binary-decimal conversion routines. |
11 |
(And we would perhaps never have learned as much about |
12 |
number systems.)''} |
13 |
{Donald E. Knuth, \cite[p. 319]{bibref:b:knuthclassic2ndedvol2}} |
14 |
|
15 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
16 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
17 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
18 |
\section{Introduction} |
19 |
%Section tag: INT0 |
20 |
\label{ccil0:sint0} |
21 |
|
22 |
Low-cost microcontrollers have no support for floating-point arithmetic, |
23 |
and so integer arithmetic and fixed-point arithmetic are used nearly exclusively |
24 |
in embedded systems. The ability to implement integer arithmetic |
25 |
economically is a critical skill in the development of embedded |
26 |
systems. |
27 |
|
28 |
Integer arithmetic algorithms are critically important in embedded |
29 |
systems for the following reasons: |
30 |
|
31 |
\begin{itemize} |
32 |
\item Mistakes in the implementation of arithmetic are frequently |
33 |
responsible for product problems. (Mistakes are not confined |
34 |
to obvious errors---errors such as filters which do not converge |
35 |
on their input are also responsible for product problems.) |
36 |
\item Floating-point arithmetic is not available or ill-advised |
37 |
for nearly all small embedded systems for the following reasons: |
38 |
\begin{itemize} |
39 |
\item Low-cost microcontrollers do not possess hardware support for |
40 |
floating-point arithmetic. |
41 |
\item Implementation of floating-point arithmetic in software is |
42 |
computationally expensive. |
43 |
\item Implementation of floating-point arithmetic in software may |
44 |
require large floating-point libraries, typically consuming |
45 |
1K-4K of ROM. |
46 |
\item Safety-critical software standards typically prohibit the |
47 |
use of floating-point arithmetic. |
48 |
\end{itemize} |
49 |
\item Integer arithmetic algorithms (other than addition and subtraction) |
50 |
are quite tedious and error-prone for a software developer to design, implement, and |
51 |
unit test. The implementation of such algorithms represents |
52 |
cost and risk. Cost and risk benefits are achieved if the algorithms in detail are |
53 |
available in advance (thus precluding design activities), or |
54 |
better yet if ready-to-use integer algorithm libraries are available. |
55 |
\end{itemize} |
56 |
|
57 |
This chapter describes the more fundamental principles and algorithms |
58 |
(representation, fixed-point arithmetic, treatment of overflow, comparison, |
59 |
addition, subtraction, multiplication, and division). A section |
60 |
(Section \ref{ccil0:smim0}) is also included |
61 |
on miscellaneous mappings involving integers which |
62 |
are not numerical in intent. |
63 |
Chapter %\cdtazeroxrefhyphen\cdtazerovolarabic{} |
64 |
TBD |
65 |
describes more complicated |
66 |
integer algorithms and techniques (discrete-time operations |
67 |
such as filtering, integration, and differentiation as well as more |
68 |
complex functions such as square root). The split between these two chapters |
69 |
is arbitrary; and in fact the material could have been divided differently |
70 |
or combined. |
71 |
|
72 |
Treatment of the topics in this chapter is largely in accordance with |
73 |
Knuth \cite{bibref:b:knuthclassic2ndedvol2}. The principal issues in |
74 |
the implementation of integer algorithms are: |
75 |
|
76 |
\begin{itemize} |
77 |
\item \textbf{How to use the arithmetic [or other] instructions provided by the machine to |
78 |
operate on larger operands.} Microcontrollers typically provide arithmetic |
79 |
instructions (comparison, shifting, addition, subtraction, and often but not |
80 |
always multiplication and/or division) that operate on 8-bit or 16-bit integers. |
81 |
A key question |
82 |
is how small-operand instructions ``scale up''---that is, if and how they can |
83 |
be used to assist in the implementation of integer arithmetic for much larger |
84 |
operands. |
85 |
\item \textbf{The order of the algorithm involved.} The order of algorithms |
86 |
is a complicated issue when applied to microcontroller work. Many sophisticated |
87 |
algorithms have a breakpoint below which they are less economical than |
88 |
an inferior algorithm. Some applications (such as generating cryptographic keys |
89 |
when integers thousands of bits long must be tested for primality) will |
90 |
benefit from sophisticated algorithms becuase the operand sizes are large enough |
91 |
to pass any such breakpoints. However, in microcontroller work, the need to manipulate |
92 |
integers longer than 64 bits is very rare; thus, the breakpoints that indicate the |
93 |
use of more sophisticated algorithms may not be reached. In microcontroller work, |
94 |
depending on the operand sizes, there are circumstances in which an |
95 |
$O(n^2)$ algorithm may be preferable to an $O(\log n)$ algorithm. Generally, |
96 |
the order of algorithms must be balanced against operand sizes. |
97 |
\end{itemize} |
98 |
|
99 |
We do diverge from Knuth in some areas. The most prominent divergence is |
100 |
in the proofs offered for some important theorems and lemmas. Knuth |
101 |
employs contrapositive proof formats in many circumstances, whereas we prefer |
102 |
to use linear proofs that are more understandable to engineers and microcontroller |
103 |
software developers. |
104 |
|
105 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
106 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
107 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
108 |
\section[Paradigms And Principles] |
109 |
{Paradigms And Principles Of Microcontroller Arithmetic} |
110 |
%Section tag: PPM0 |
111 |
\label{ccil0:sppm0} |
112 |
|
113 |
How should one think about microcontroller arithmetic? What principles |
114 |
guide us in its design and implementation? In this section, |
115 |
we provide some general principles and paradigms of thought. |
116 |
|
117 |
|
118 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
119 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
120 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
121 |
\subsection{Microcontroller Arithmetic As An Accident Of Silicon Design} |
122 |
%Subsection tag: MAS0 |
123 |
\label{ccil0:sppm0:smas0} |
124 |
|
125 |
In chapters |
126 |
\cfryzeroxrefhyphen{}\ref{cfry0}, |
127 |
\ccfrzeroxrefhyphen{}\ref{ccfr0}, |
128 |
and |
129 |
\cratzeroxrefhyphen{}\ref{crat0} |
130 |
we consider rational approximation, |
131 |
both in the form $h/k$ and $h/2^q$. Both forms of rational approximation |
132 |
tend to be effective because we know that all modern processors possess |
133 |
shift instructions, most possess integer multiply instructions, and many |
134 |
possess integer divide instructions. In other words, the design |
135 |
of the machine instruction set drives the strategies for implementation |
136 |
of arithmetic, and makes some strategies attractive. |
137 |
|
138 |
Similarly, the observation that all microcontrollers provide instructions |
139 |
for integer arithmetic creates the attractiveness of fixed-point arithmetic. |
140 |
|
141 |
Thus, we might view our approaches to microcontroller arithmetic as |
142 |
an ``accident'' of silicon design, or as being driven by silicon |
143 |
design. |
144 |
|
145 |
Generally, we seek to determine the best way to use the primitive |
146 |
operations provided by the machine (the instruction set) to |
147 |
accomplish the mappings of interest. |
148 |
|
149 |
The ``classic'' algorithms |
150 |
presented by Knuth |
151 |
Knuth (\cite[pp. 265-284]{bibref:b:knuthclassic2ndedvol2}) are especially |
152 |
designed to use the ``small'' addition, subtraction, multiplication, and |
153 |
division provided by the machine to add, subtract, multiply, and divide arbitrarily |
154 |
large integers. In |
155 |
\cite[pp. 265-266]{bibref:b:knuthclassic2ndedvol2}) Knuth writes: |
156 |
|
157 |
\begin{quote} |
158 |
\emph{The most important fact to understand about extended-precision numbers |
159 |
is that they may be regarded as numbers written in radix-$w$ notation, |
160 |
where $w$ is the computer's word size. For example, an integer that |
161 |
fills 10 words on a computer whose word size is $w=10^{10}$ has 100 |
162 |
decimal digits; but we will consider it to be a 10-place number to |
163 |
the base $10^{10}$. This viewpoint is justified for the same reason |
164 |
that we may convert, say, from binary to hexadecimal notation, |
165 |
simply by grouping the bits together.} |
166 |
|
167 |
\emph{In these terms, we are given the following primitive operations to work with:} |
168 |
|
169 |
\begin{itemize} |
170 |
\item \emph{a$_0$) addition or subtraction of one-place integers, giving a one-place |
171 |
answer and a carry;} |
172 |
\item \emph{b$_0$) multiplication of a one-place integer by another one-place integer, |
173 |
giving a two place answer;} |
174 |
\item \emph{c$_0$) division of a two-place integer by a one-place integer, |
175 |
provided that the quotient is a one-place integer, and yielding |
176 |
also a one-place remainder.} |
177 |
\end{itemize} |
178 |
|
179 |
\noindent{}\emph{By adjusting the word size, if necessary, nearly all computers |
180 |
will have these three operations available; so we will construct algorithms |
181 |
(a), (b), and (c) mentioned above in terms of the primitive operations |
182 |
(a$_0$), (b$_0$), and (c$_0$).} |
183 |
|
184 |
\emph{Since we are visualizing extended-precision integers as base $b$ numbers, it is |
185 |
sometimes helpful to think of the situation when $b = 10$, and to imagine |
186 |
that we are doing the arithmetic by hand. Then operation (a$_0$) is analogous |
187 |
to memorizing the addition table; (b$_0$) is analogous to memorizing the |
188 |
multiplication table, and (c$_0$) is essentially memorizing the multiplication |
189 |
table in reverse. The more complicated operations (a), (b), (c) on |
190 |
high-precision numbers can now be done using simple addition, subraction, |
191 |
multiplication, and long-division procedures that children are taught |
192 |
in elementary school.} |
193 |
\end{quote} |
194 |
|
195 |
The critical issue for implementation of integer arithmetic with large operands |
196 |
is how to use small-operand instructions to operate on larger operands---in other words, |
197 |
how to ``scale up'' the capability provided by the instruction set. |
198 |
|
199 |
|
200 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
201 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
202 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
203 |
\subsection{Microcontroller Arithmetic As A Mapping From Quantized Domain To |
204 |
Quantized Range} |
205 |
%Subsection tag: MAM0 |
206 |
\label{ccil0:sppm0:smam0} |
207 |
|
208 |
Microcontroller software accepts inputs which are quantized. In nearly all cases, |
209 |
this involves a mapping from $\vworkrealset$ to $\vworkintset$. Often, because |
210 |
microcontroller products are optimized for cost, the quantization hardware |
211 |
delivers quite poor precision, frequently less than 8 bits. |
212 |
|
213 |
When a quantized input is accepted, it defines an inquality. Knowledge of |
214 |
the quantized input (an integer) confines the actual input (a real |
215 |
number, before |
216 |
quantization) to an interval. With a low-cost hardware design, the |
217 |
interval can be fairly large. Usually, by adding cost, the |
218 |
interval can be made smaller. |
219 |
|
220 |
Microcontroller outputs tend to be quantized as well, so it is |
221 |
accurate to also characterize outputs as integers. For example, a PWM signal |
222 |
generated by a microcontroller or the output of a D/A converter is |
223 |
controlled by data that is an integer. Like inputs, often the ``granularity'' |
224 |
with which outputs can be controlled is quite coarse---again, 8 bits or |
225 |
less is not uncommon. |
226 |
|
227 |
Thus, we may view microcontroller software as a mapping from poor-quality |
228 |
inputs to poor-quality outputs. |
229 |
|
230 |
In such a framework, where the nature of inputs and outputs introduces |
231 |
substantial error, it is imperative not to introduce additional error |
232 |
in computer arithmetic. In other words, given inputs which are |
233 |
integers, the responsibility of the software is to choose the best |
234 |
integers as outputs. Usually this means that calculations should be |
235 |
devised so as not to lose any information (i.e. |
236 |
not to lose remainders, for example). Losing information is usually |
237 |
equivalent to not being able to make the most correct mapping from input |
238 |
to output. ``Lossy'' arithmetic can degrade the performance of a system, |
239 |
since poor arithmetic may compound an inexpensive hardware design. |
240 |
|
241 |
|
242 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
243 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
244 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
245 |
\subsection{Microcontroller Arithmetic As A Simulation Of Continuous Controllers} |
246 |
%Subsection tag: MAE0 |
247 |
\label{ccil0:sppm0:smae0} |
248 |
|
249 |
Control systems have not always employed digital controllers. |
250 |
Many books and web sites (see \cite{bibref:w:historycontrol01}, for example) |
251 |
discuss the historical development of feedback control. Controllers |
252 |
have not historically been digital, or even electronic. |
253 |
Early controllers for governing steam devices or windmills were |
254 |
ultimately mechanical, and relied upon inertia or other physical |
255 |
properties. It is possible to realize abstract notions |
256 |
(integrators, differentiators, gains) using hydraulic systems or other mechanical systems; |
257 |
and in fact hydraulic feedback controllers were used in early rockets |
258 |
and aircraft. Very naturally, abstract notions (integrators, |
259 |
differentiators, gains) can be implemented using analog |
260 |
electronic components. The most common implementations involve |
261 |
operational amplifiers, and the behavior of such implementations comes |
262 |
very close to the ideal mathematical models. |
263 |
|
264 |
Mechanical, hydraulic, and non-digital electronic controllers have |
265 |
one very desirable characteristic---\emph{clipping}. If, for example, |
266 |
one provides an analog differentiator with a $dV/dt$ which |
267 |
is too large, the output that the differentiator can |
268 |
provide is limited, usually by the supply voltage available to an operational |
269 |
amplifier. |
270 |
The differentiator \emph{must} clip. |
271 |
|
272 |
Clipping often leads to behavior which is close to what |
273 |
intuition would expect (i.e. we would present |
274 |
clipping as an occasional advantage). For example, if an input to |
275 |
an analog control system suffers a failure, the behavior |
276 |
the of the controller is limited, as is its internal |
277 |
state. Similarly, when the |
278 |
input is restored, the controller will usually recover |
279 |
in a reasonable time because the |
280 |
state of the controller (typically maintained in capacitors) is limited |
281 |
in the magnitude it can attain. |
282 |
|
283 |
We might view a digital controller as an emulation of |
284 |
an analog controller. We may want to cause the |
285 |
controller to have limits (i.e. rails) internally, for |
286 |
example to prevent excessive integrator ``windup''. We discuss |
287 |
this further in Section \ref{ccil0:sode0}. |
288 |
|
289 |
|
290 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
291 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
292 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
293 |
\section{Practical Design Issues} |
294 |
%Section tag: PDI0 |
295 |
\label{ccil0:spdi0} |
296 |
|
297 |
In this section, we consider practical issues surrounding the design |
298 |
and construction of a set of integer arithmetic subroutines. |
299 |
In practice, such a collection of subroutines is likely to be |
300 |
arranged into a library. The purpose of the library would be to |
301 |
free the clients (or callers) of the library from the complexity of |
302 |
large integer calculations. |
303 |
|
304 |
The design decisions surrounding the construction of a library vary in |
305 |
the objectivity with which they can be approached. Some design decisions |
306 |
(such as the best mechanism for passing parameters) can be approached |
307 |
rigorously because the measures of goodness are unequivocal (minimal ROM consumption |
308 |
or execution time, for example). However, other design decisions, particulary the |
309 |
decision of the exact nature of the interface between an arithmetic library and |
310 |
its clients, are more subjective. One size does \emph{not} fit all. |
311 |
|
312 |
|
313 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
314 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
315 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
316 |
\subsection{Parameter Passing And Temporary Storage Mechanisms} |
317 |
%Subsection tag: PPM0 |
318 |
\label{ccil0:spdi0:sppm0} |
319 |
|
320 |
In small microcontroller work, the desire to save ROM and execution time |
321 |
may lead to inelegant software construction. Because an arithmetic library |
322 |
used in microcontroller work may be called from many different places |
323 |
throughout ROM, serious thought should be given to optimizing the |
324 |
parameter passing mechanisms, even perhaps at the expense of elegance. |
325 |
The way in which the arithmetic library allocates temporary storage is |
326 |
also a point of concern, because the most elegant way of allocating temporary |
327 |
storage (on the stack) may either not be feasible (because of the possibility |
328 |
of stack overflow) or may not be efficient (because the addressing modes of |
329 |
the machine make data on the stack inefficient to address). In this section |
330 |
we discuss both parameter passing and temporary storage mechanisms. |
331 |
|
332 |
In the remainder of the discussion, we make the following assumptions |
333 |
about software architecture. |
334 |
|
335 |
\begin{enumerate} |
336 |
\item \textbf{The arithmetic library need not be re-entrant.} |
337 |
Most ``small'' microcontroller software loads use a non-preemptive |
338 |
scheduling paradigm, so this is a reasonable assumption. We also |
339 |
make the reasonable assumption that ISR's may not make calls into |
340 |
the arithmetic library. |
341 |
\item \textbf{Dynamic memory allocation, other than on the stack, |
342 |
is not allowed by the software architecture.} This is also |
343 |
a reasonable assumption in ``small'' microcontroller software. |
344 |
\end{enumerate} |
345 |
|
346 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
347 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
348 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
349 |
\subsubsection{Parameter Passing Mechanisms} |
350 |
%Subsubsection tag: PPM0 |
351 |
\label{ccil0:spdi0:sppm0:sppm0} |
352 |
|
353 |
If an arithmetic library exists in a microcontroller software load, |
354 |
it may be called many times throughout ROM. Thus, the parameter |
355 |
passing mechanisms chosen may have a large effect on ROM consumption |
356 |
(due to the setup required for each subroutine call multiplied by |
357 |
many instances throughout ROM) and execution time |
358 |
(because in microcontroller software longer instructions nearly always |
359 |
require more time). Because of the criticality of ROM consumption, |
360 |
parameter passing mechanisms that lack elegance may be attractive. |
361 |
|
362 |
In the category of parameter passing, we also include the way in which |
363 |
return value(s) are passed back to the caller. |
364 |
|
365 |
The following parameter-passing mechanisms may be employed: |
366 |
|
367 |
\begin{enumerate} |
368 |
\item \textbf{Pass by value as storage class \emph{automatic}.} |
369 |
The most common scenario is that the arithmetic |
370 |
library is written in assembly-language to be called from |
371 |
`C', and so the assembly-language subroutines must adhere to the |
372 |
parameter-passing conventions used by the compiler. |
373 |
This usually means that the entire input or output value |
374 |
will be passed in CPU registers or on the stack. Somewhat rarely, |
375 |
a compiler will pass parameters in static locations.\footnote{The |
376 |
usual reason for a `C' compiler to pass parameters in static locations |
377 |
is because the instruction set of the machine was not designed for |
378 |
higher-level languages, and references to [usually \emph{near}] memory |
379 |
are cheaper than stack references. Such compilers also typically |
380 |
analyze the calling tree of the program where possible and use this |
381 |
information to overlay the parameter-passing memory areas of |
382 |
subroutines that cannot be active simultaneously. Without the ability |
383 |
to analyze the calling tree and make overlay decisions based on it, |
384 |
memory would be exhausted, because each subroutine would need to have its |
385 |
own static storage for parameters and local variables.} |
386 |
\item \textbf{Pass by reference.} |
387 |
Typically, it is convenient to pass pointer(s) to area(s) of memory |
388 |
containing input operands, and also a pointer to an area of memory |
389 |
owned by the caller which is written with the result by the arithmetic subroutine. |
390 |
The efficiency of this approach depends on the compiler and the instruction |
391 |
set of the machine. If the instruction set of the machine cannot |
392 |
make effective use of pointers or stack frames, an arithmetic subroutine |
393 |
might be constructed so that it first copies the operands to a static area of memory |
394 |
reserved for the arithmetic library, then performs the necessary arithmetic |
395 |
operations on the operands in the static area, |
396 |
then copies the result(s) back to the area owned by the caller. |
397 |
\item \textbf{Pass by common data block.} |
398 |
In some cases, it may be preferable to reserve a block of memory in which to |
399 |
pass parameters to arithmetic library functions, and from which to retrieve |
400 |
results after an arithmetic library function returns. The allocation of such |
401 |
a static memory block may be done manually\footnote{Note to self: need to |
402 |
include gentleman's agreements on memory usage in my note on software architecture.} |
403 |
or automatically (by development tools which can analyze the function calling tree and |
404 |
manage the overlaying). |
405 |
\end{enumerate} |
406 |
|
407 |
|
408 |
|
409 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
410 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
411 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
412 |
\subsubsection{Temporary Storage Mechanisms} |
413 |
%Subsubsection tag: TSM0 |
414 |
\label{ccil0:spdi0:sppm0:stsm0} |
415 |
|
416 |
Need to indicate clearly on section on software architectures the |
417 |
primary temporary storage mechanisms: |
418 |
|
419 |
\begin{itemize} |
420 |
\item Stack. |
421 |
\item Memory block with overlay functionality. |
422 |
\end{itemize} |
423 |
|
424 |
Need to expand software architecture section to cover this, so don't |
425 |
discuss here. |
426 |
|
427 |
|
428 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
429 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
430 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
431 |
\subsection{Reporting Of Overflow, Underflow, And Domain Errors} |
432 |
%Subsection tag: OUD0 |
433 |
\label{ccil0:spdi0:soud0} |
434 |
|
435 |
Long integer data types used in microcontroller work are typically |
436 |
of a static size (they cannot grow in size in as operations are |
437 |
performed on them). The reason for the typical static sizes is that |
438 |
dynamic allocation (except for allocation and |
439 |
deallocation on the stack as subroutines |
440 |
are called and return) is rarely used in small microcontroller work. |
441 |
It will come about in the normal usage of an integer arithmetic |
442 |
library that an attempt will be made to operate on integers in |
443 |
a way which generates an overflow, generates an |
444 |
underflow, or |
445 |
represents a domain error (division by zero or |
446 |
square root of a negative integer, for example). |
447 |
An important design decision is how such normal exceptions should be |
448 |
handled. |
449 |
|
450 |
Possible design decisions in this area are: |
451 |
|
452 |
\begin{enumerate} |
453 |
\item \label{enum:ccil0:spdi0:soud0:01:01} |
454 |
\textbf{To design arithmetic subroutines so that exceptions are not |
455 |
possible.} |
456 |
For example, multiplying an $m$-word integer by an $n$-word integer |
457 |
will always generate an integer that will fit within $m+n$ words. |
458 |
If a multiplication subroutine is designed so that the caller must |
459 |
provide an $m$-word operand and an $n$-word operand and a pointer to |
460 |
an $(m+n)$-word area of memory for the result, an overflow cannot occur. |
461 |
Such a design decision essentially pushes overflow detection back up to |
462 |
the callers of arithmetic subroutines. |
463 |
\item \label{enum:ccil0:spdi0:soud0:01:02} |
464 |
\textbf{To design arithmetic subroutines so that exceptions are possible, |
465 |
but not to detect the exceptions, thus providing an implementation that |
466 |
will produce incorrect results with some operand data values.} |
467 |
For example, if an arithmetic subroutine is designed to add an $m$-word |
468 |
operand to another $m$-word operand to produce an $m$-word result, overflow |
469 |
is possible. A design decision to fail to detect such exceptions pushes |
470 |
the responsibility up to the callers of the arithmetic subroutines. |
471 |
Callers must devise a method for not calling arithmetic subroutines |
472 |
with data values that will cause an exception, or else to detect an exception |
473 |
when it has occurred. |
474 |
\item \label{enum:ccil0:spdi0:soud0:01:03} |
475 |
\textbf{To ``rail'' the result in response to an exception.} |
476 |
It was stated earlier that analog control system functional blocks |
477 |
built with operational |
478 |
amplifiers typically have an output which cannot go beyond the |
479 |
supply rails. One may implement similar behavior in arithmetic subroutines. |
480 |
In an addition subroutine which adds two $m$-word operands to produce an |
481 |
$m$-word result (with each word having $w$ bits), it would be natural to |
482 |
return $2^{mw}-1$ in the event of an overflow in a positive direction and |
483 |
$-2^{mw}$ in the event of an overlfow in a negative direction. Note that |
484 |
the caller will not be able to distinguish a ``rail'' value which represents |
485 |
a valid result from a ``rail'' value substituted to indicate an exception. |
486 |
\item \label{enum:ccil0:spdi0:soud0:01:04} |
487 |
\textbf{To reserve special result data values to indicate exceptions.} |
488 |
Depending on the arithmetic subroutine being implemented, it may be possible |
489 |
to reserve certain result data values to indicate exceptions. This approach |
490 |
is often awkward, as most mathematical subroutines are naturally defined so that |
491 |
all bit patterns in the memory reserved for the result are valid numbers. |
492 |
Additionally, with long result data values, it may not be economical to |
493 |
compare the result against the reserved exception values. Thus, this is seldom |
494 |
an optimal way to deal with exceptions. |
495 |
|
496 |
Additionally, if this approach is employed, the semantics of how exception |
497 |
values combine with other exception values and data values must be decided. |
498 |
\item \label{enum:ccil0:spdi0:soud0:01:05} |
499 |
\textbf{To return exception codes to the caller separate from the result data.} |
500 |
In the `C' language, pointers are often used to supply an arithmetic subroutine |
501 |
with the input operands and to provide the arithmetic subroutine with a location |
502 |
(which belongs to the caller) in which to store the result data. Thus, the return |
503 |
value of the arithmetic subroutine (normally assigned through the subroutine name) |
504 |
is often available to return exception codes. For example, |
505 |
a `C' function may be defined as |
506 |
|
507 |
\begin{verbatim} |
508 |
unsigned char int128_add(INT128 *result, |
509 |
INT128 *arg1, |
510 |
INT128 *arg2), |
511 |
\end{verbatim} |
512 |
|
513 |
leaving the returned \texttt{unsigned char} value available to return |
514 |
exception information. Note that this arrangement has the following advantages: |
515 |
|
516 |
\begin{enumerate} |
517 |
\item All bit patterns in the result data memory area area available |
518 |
as data bit patterns. |
519 |
\item The exception data is very economical to test, because it is placed |
520 |
in a machine-native data type. |
521 |
\item The exception data can easily be discarded or by the caller if desired. |
522 |
\item All decisions about how to handle exceptions are left to the caller. |
523 |
\end{enumerate} |
524 |
\item \label{enum:ccil0:spdi0:soud0:01:06} |
525 |
\textbf{To maintain exception data with each result integer.} |
526 |
It is possible to reserve bits for exception information which are part of the |
527 |
long integer data type. This exception state essentially conveys |
528 |
\emph{NaN}\footnote{\emph{N}ot \emph{a} \emph{n}umber.} information---integers with |
529 |
exception information set are not true numbers, but rather they are different from the |
530 |
true result in some way. As with (\ref{enum:ccil0:spdi0:soud0:01:04}), the |
531 |
semantics of how to combine NaN values and NaN values with ordinary non-NaN numbers |
532 |
must be defined. |
533 |
\item \label{enum:ccil0:spdi0:soud0:01:07} |
534 |
\textbf{To maintain a global exception state variable.} |
535 |
A variable or set of variables can be reserved which hold the |
536 |
exception information, if any, from the most recent call to |
537 |
a function in the arithmetic library. |
538 |
|
539 |
To save CPU cycles, the arithmetic library can |
540 |
be designed so that it will assign the global exception state variable only if an |
541 |
exception occurs---the caller then has the responsibility of clearing the exception |
542 |
state variable before making any call into the arithmetic library where the |
543 |
exception result is of interest. The interface between the caller and the library |
544 |
can be further optimized if the library only OR's data into the variable containing |
545 |
the exception state. Using this optimization, the caller can clear the exception state |
546 |
variable, make several calls into the arithmetic library, and then retrieve a meaningful |
547 |
exception state variable value summarizing several arithmetic operations. |
548 |
\item \label{enum:ccil0:spdi0:soud0:01:08} |
549 |
\textbf{Hybrid approaches.} |
550 |
The approaches (\ref{enum:ccil0:spdi0:soud0:01:01}) |
551 |
through (\ref{enum:ccil0:spdi0:soud0:01:07}) can be combined. |
552 |
For example, approach (\ref{enum:ccil0:spdi0:soud0:01:03}) |
553 |
might be combined with approach (\ref{enum:ccil0:spdi0:soud0:01:05}) |
554 |
so that exceptions are ``railed'', but the caller may also be made |
555 |
aware that an exception has occured. Many hybrid approaches are possible. |
556 |
\end{enumerate} |
557 |
|
558 |
|
559 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
560 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
561 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
562 |
\subsection{Semantics Of Combining Overflow, Underflow, And Domain Errors} |
563 |
%Subsection tag: CMB0 |
564 |
\label{ccil0:spdi0:scmb0} |
565 |
|
566 |
For control system arithmetic, some form of clipping as suggested |
567 |
in Section \ref{ccil0:sppm0:smae0} is probably the |
568 |
best approach. Definitely, an overflow should generate a result |
569 |
which is the largest representable integer, and an |
570 |
underflow should generate a result which is the smallest |
571 |
representable integer. |
572 |
|
573 |
In addition to treating an overflow by clipping, it may be |
574 |
advantageous to reserve a flag in the representation of a |
575 |
multiple-precision integer to record that an overflow has occured and been clipped. |
576 |
Some functions which accept the integer as input may be interested |
577 |
in the value of such a flag, where othere---perhaps most---may |
578 |
not. |
579 |
|
580 |
The correct course of action in the event of a domain error (such |
581 |
as division by zero) is less clear. It is noteworthy that in a |
582 |
normal control system, domain errors cannot occur (but overflows |
583 |
can). |
584 |
|
585 |
The best approach when a domain error is involved probably |
586 |
depends on the basis for the underlying calculation. For |
587 |
example, if integer division is used as part of a |
588 |
strategy for software ratiometric conversion, a value |
589 |
of zero in the denominator probably represents extreme electrical |
590 |
noise, and the most sane approach may be to replace the |
591 |
denominator by one. However, in other contexts it may be appropriate |
592 |
to think in terms of |
593 |
|
594 |
\begin{equation} |
595 |
\lim_{k \rightarrow 0^+} \frac{kn}{kd} |
596 |
\end{equation} |
597 |
|
598 |
\noindent{}or |
599 |
|
600 |
\begin{equation} |
601 |
\lim_{k \rightarrow 0^-} \frac{kn}{kd} . |
602 |
\end{equation} |
603 |
|
604 |
Put in other terms, there is not a clear ``one solution |
605 |
meets all needs'' approach to dealing with domain |
606 |
errors. |
607 |
|
608 |
As in the case of overflows, it may be advantageous to reserve a bit |
609 |
flag to signal that a domain error has occured and that the result |
610 |
is not valid or not reliable. Note that floating point chips |
611 |
(such as the 80x87) provide similar indications of domain errors. |
612 |
|
613 |
It may also be advantageous to adopt conventions for how |
614 |
overflow or domain error flags propagate through binary or |
615 |
unary operators. For example, if two numbers are multiplied, and |
616 |
one of the two has the overflow flag set, it may be wise set the |
617 |
overflow flag in the result. A scheme for how warning |
618 |
flags propagate may be beneficial. |
619 |
|
620 |
|
621 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
622 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
623 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
624 |
\subsection{Variable Versus Constant Subroutine Execution Time} |
625 |
%Subsection tag: VVC0 |
626 |
\label{ccil0:spdi0:svvc0} |
627 |
|
628 |
As a design goal of an embedded system, we seek to minimize the |
629 |
timing variability of software components. An arithmetic subroutine |
630 |
that with a high probability takes a short time to execute and with a |
631 |
low probability takes a long time to execute, and where the execution |
632 |
time is data dependent, is a serious risk. An embedded software product |
633 |
may pass all release testing, but then fail in the field because of |
634 |
specific data values used in calculations. |
635 |
|
636 |
A very conservative design goal would be to design every arithmetic subroutine |
637 |
to require exactly the same execution time, regardless of data values. |
638 |
This goal is not practical because machine instructions themselves |
639 |
usually have a variable execution time, particularly for multiplication |
640 |
and division instructions. A fallback goal would be to avoid |
641 |
large differences between minimum and maximum execution time, without |
642 |
increasing the maximum execution time. A very practical step to take |
643 |
(using division as an example) |
644 |
is to insert artifical delays into easily detectable exception cases (such as |
645 |
division by zero) so that the exception case takes as long as the |
646 |
minimum time for a division with valid operands. |
647 |
|
648 |
|
649 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
650 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
651 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
652 |
\section{Fixed-Point Arithmetic} |
653 |
%Section tag: FPA0 |
654 |
\label{ccil0:sfpa0} |
655 |
|
656 |
\emph{Fixed-point arithmetic}\index{fixed-point arithmetic} |
657 |
is a scheme for the representation |
658 |
of engineering quantities (conceptually real numbers with optional |
659 |
units) by integers so that calculations can be performed |
660 |
on these quantitites using [usually multiple-precision] |
661 |
integer arithmetic. |
662 |
|
663 |
In discussing fixed-point arithmetic, |
664 |
we must be careful to distinguish between the |
665 |
\emph{represented value} (the engineering quantity) |
666 |
and the \emph{representation} (the integer which represents |
667 |
the engineering quantity). In most cases, we must also |
668 |
be careful to devise a system to track the units of the |
669 |
represented values, as, especially with control systems, |
670 |
the units of represented values (due to integration and |
671 |
differentiation) can become very complex |
672 |
and mistakes are easy to make. |
673 |
|
674 |
Fixed-point arithmetic is the dominant paradigm of construction |
675 |
for calculations in small microcontroller systems. It may not be |
676 |
clear why this should be so or what advantages it offers [over |
677 |
floating-point arithmetic]. The reasons for |
678 |
this predominance are: |
679 |
|
680 |
\begin{itemize} |
681 |
\item Fixed-point calculations tend to be very efficient, because |
682 |
they make direct use of the integer arithmetic instructions in the |
683 |
microcontroller's instruction set. On the other hand, floating-point |
684 |
arithmetic operations tend to be much slower. |
685 |
|
686 |
\item Floating-point calculations typically require a floating-point |
687 |
library, which may consume at least several hundred bytes |
688 |
of ROM. |
689 |
|
690 |
\item Some safety-critical software standards prohibit the use of |
691 |
floating-point arithmetic because it can result in nebulous |
692 |
behavior. Fixed-point arithmetic avoids these concerns. |
693 |
\end{itemize} |
694 |
|
695 |
In order to carry out fixed-point arithmetic---that is, in order to |
696 |
operate on engineering quantities as integers---we |
697 |
require that the relationship between the |
698 |
represented value and the representation be of the form |
699 |
|
700 |
\begin{equation} |
701 |
\label{eq:ccil0:sfpa0:00} |
702 |
x = r_I u + \Psi, |
703 |
\end{equation} |
704 |
|
705 |
\noindent{}where $x \in \vworkrealset$ (possibly with units) is the represented |
706 |
value, $u \in \vworkintset$ is the representation, |
707 |
$r_I \in \vworkrealset$ is the scaling factor (possibly with |
708 |
units), and $\Psi \in \vworkrealset$ (possibly with units) is the offset. |
709 |
Note that the units of $r_I$, $\Psi$, and $x$ must match. |
710 |
|
711 |
We further require that $r_I \vworkdivides{} \Psi$\footnote{We \emph{are} |
712 |
aware that this is an abuse of nomenclature, as |
713 |
`$\vworkdivides$' (``divides'') is traditionally only applied to integers.} |
714 |
(i.e. that $\Psi$ be an |
715 |
integral multiple of $r_I$) |
716 |
so that the offset in the represented value corresponds to an integer |
717 |
in the representation. Without this restriction, we could not remove the |
718 |
offset from the representation with integer subtraction only. Note that |
719 |
we do \emph{not} require that $r_I$ or $\Psi$ be rational, although |
720 |
they must both be rational or both be irrational in order to satisfy |
721 |
(\ref{eq:ccil0:sfpa0:00}). |
722 |
|
723 |
In (\ref{eq:ccil0:sfpa0:00}), since we've required that $r_I \vworkdivides{} \Psi$, |
724 |
we can replace $\Psi$ by |
725 |
|
726 |
\begin{equation} |
727 |
\label{eq:ccil0:sfpa0:00b} |
728 |
\psi = \frac{\Psi}{r_I} |
729 |
\end{equation} |
730 |
|
731 |
\noindent{}to obtain |
732 |
|
733 |
\begin{equation} |
734 |
\label{eq:ccil0:sfpa0:01} |
735 |
x = r_I (u + \psi), |
736 |
\end{equation} |
737 |
|
738 |
\noindent{}where $\psi \in \vworkintset$ is the offset in representational |
739 |
counts. |
740 |
|
741 |
Note that we can also easily obtain the following relationships from the |
742 |
defining equations (\ref{eq:ccil0:sfpa0:00}), |
743 |
(\ref{eq:ccil0:sfpa0:00b}), and (\ref{eq:ccil0:sfpa0:01}). |
744 |
|
745 |
\begin{equation} |
746 |
\label{eq:ccil0:sfpa0:02} |
747 |
u = \frac{x - \Psi}{r_I} = \frac{x}{r_I} - \psi |
748 |
\end{equation} |
749 |
|
750 |
\begin{equation} |
751 |
\label{eq:ccil0:sfpa0:03} |
752 |
\psi = \frac{\Psi}{r_I} |
753 |
\Longleftrightarrow |
754 |
\Psi = r_I \psi |
755 |
\Longleftrightarrow |
756 |
r_I = \frac{\Psi}{\psi} |
757 |
\end{equation} |
758 |
|
759 |
|
760 |
For example, in a 16-bit signed integer (which inherently may range |
761 |
from -32768 to 32767 inclusive), one might used a fixed-point |
762 |
representation of 100 integer counts per $^\circ$C ($r_I = 0.01 \; ^\circ$C) |
763 |
with an offset of 100$^\circ$C ($\Psi$ = 100$^\circ$C or equivalently |
764 |
$\psi$ = 10000), giving |
765 |
a representational range from -227.68$^\circ$C to 427.67$^\circ$C inclusive. |
766 |
|
767 |
If $r_I = 2^N, \; N \in \vworkintset$, then the radix point of the represented |
768 |
value is positioned |
769 |
between two bits of the representation---this arrangement may have computational |
770 |
advantages |
771 |
if the whole and fractional parts of the represented value need to be separated |
772 |
easily in the representation. Note also that if $r_I = 2^{WN}$ where $W$ is the |
773 |
machine integer size of the computer (in bits), then the radix point of the represented value |
774 |
occurs between two addressable machine integers of the representation, which can be convenient. |
775 |
However, we do not require for our definition of a fixed-point representation that |
776 |
$r_I$ be an integral power of two; and in fact we do not even require by our |
777 |
definition that $r_I$ be rational. Note that in general our definition above is the |
778 |
weakest set of conditions so that real-world engineering values can be manipulated |
779 |
using integer operations performed upon the representation. |
780 |
|
781 |
|
782 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
783 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
784 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
785 |
\section{Representation Of Integers} |
786 |
%Section Tag: ROI0 |
787 |
\label{ccil0:sroi0} |
788 |
|
789 |
In this section, we discuss common representations of integers, both |
790 |
\emph{machine} integers and \emph{synthetic long} integers. |
791 |
By \emph{representation} we mean the mapping between the abstract |
792 |
mathematical notion of an integer and the way it is stored in the computer |
793 |
(voltage levels and the programming model). |
794 |
Although |
795 |
in Knuth's development of integer arithmetic |
796 |
\cite{bibref:b:knuthclassic2ndedvol2} |
797 |
it is assumed that |
798 |
integers may be represented in any base, we don't require such generality |
799 |
in practice and in this work we confine ourselves for the most |
800 |
part to $b=2^n$, and often to $n = 8i$, $i \in \vworkintsetpos$. The assumption |
801 |
of $b=2^n$ characterizes all modern digital computers, and we feel comfortable |
802 |
making this assumption throughout the work. However, the assumption |
803 |
$n = 8i$, $i \in \vworkintsetpos$ does not hold universally, and so we |
804 |
most often do not make this assumption. |
805 |
|
806 |
By \emph{machine} integer, we mean an integer upon which the computer |
807 |
can operate in a single instruction (such as to add, increment, |
808 |
load, store, etc.). For most microcontrollers, machine integers are |
809 |
either 8 or 16 bits in size. The representation of a machine integer |
810 |
is designed and specified by the microcontroller manufacturer. In principle, |
811 |
nothing would prevent a microcontroller manufacturer from devising |
812 |
and implementing a novel way of representing machine integers and supporting |
813 |
this novel representation with an instruction set. However, in |
814 |
practice, all machine integers are either simple unsigned integers or two's |
815 |
complement signed integers. In addition to the efficiency of these |
816 |
representations with respect to the design of digital logic, these representations |
817 |
are so standard and so pervasive that they are universally tacitly assumed. |
818 |
For example, ``\texttt{if (i \& 0x1)}'' is an accepted C-language |
819 |
idiom for ``if $i$ is odd'', and it is expected that such code will |
820 |
work on all platforms. |
821 |
|
822 |
By \emph{synthetic long} integer, we mean an integer of |
823 |
arbitrary\footnote{By \emph{arbitrary}, we do not necessarily mean that |
824 |
the integer can grow to be arbitrarily large in magnitude, or that |
825 |
its maximum size is not known at compile time. We mean \emph{arbitrarily} |
826 |
longer than a machine integer. Multiple-precision arithmetic libraries |
827 |
can be divided into two classes---those that fix the size of the integers |
828 |
at compile time, and those that use dynamic allocation and allow integers to |
829 |
grow as needed at run time. The former category |
830 |
is normally used for small microcontroller work, whereas the latter category |
831 |
(such as the GNU MP Library \cite{bibref:s:gnumultipleprecisionarithmeticlibrary}) |
832 |
is normally used in scientific and number theoretic calculation and on |
833 |
more powerful platforms than microcontrollers. The representational concepts |
834 |
we present here apply to both categories.} |
835 |
length that is formed by concatenating machine integers. There is some |
836 |
subjectivity in deciding the representation of multiple-precision integers, |
837 |
and we discuss in the subsections |
838 |
\ref{ccil0:sroi0:srou0} and |
839 |
\ref{ccil0:sroi0:sros0} which immediately follow. |
840 |
|
841 |
|
842 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
843 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
844 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
845 |
\subsection{Representation Of Unsigned Integers} |
846 |
%Subsection Tag: ROU0 |
847 |
\label{ccil0:sroi0:srou0} |
848 |
|
849 |
Unsigned machine integers are always represented as an ordered |
850 |
array of bits (Figure \ref{fig:ccil0:sroi0:srou0:00}). For an |
851 |
$m$-bit unsigned integer $u$, we denote these bits $u_{[m-1]}$ through |
852 |
$u_{[0]}$, with $u_{[m-1]}$ the most significant bit. The value |
853 |
of $u$ is the sum of the values of each bit multiplied |
854 |
by the power of 2 it represents: |
855 |
|
856 |
\begin{figure} |
857 |
\centering |
858 |
\includegraphics[width=4.6in]{c_cil0/uintrep1.eps} |
859 |
\caption{Representation Of Unsigned Machine Integers} |
860 |
\label{fig:ccil0:sroi0:srou0:00} |
861 |
\end{figure} |
862 |
|
863 |
\begin{equation} |
864 |
\label{eq:ccil0:sroi0:srou0:00} |
865 |
u = \sum_{i=0}^{m-1} u_{[i]} 2^i . |
866 |
\end{equation} |
867 |
|
868 |
In general, an $m$-bit unsigned integer can assume the values of |
869 |
0 through $2^m - 1$, so that |
870 |
|
871 |
\begin{equation} |
872 |
\label{eq:ccil0:sroi0:srou0:01} |
873 |
u \in \{0, \ldots{} , 2^m - 1 \} . |
874 |
\end{equation} |
875 |
|
876 |
Unsigned synthetic long integers are always represented as an array |
877 |
of unsigned machine integers. |
878 |
Consistent with the GMP \cite{bibref:s:gnumultipleprecisionarithmeticlibrary}, |
879 |
we call each element of the array a \emph{limb} and we call the size of |
880 |
each limb the \emph{limbsize}. This usage is very close to what Knuth |
881 |
calls the \emph{word}size $w$ in the excerpt presented in |
882 |
Section \ref{ccil0:sppm0:smas0}. |
883 |
|
884 |
Microcontroller processors are more likely than more powerful processors to have |
885 |
a non-orthogonal instruction set, and so the limbsize may not be consistent between |
886 |
operations in an arithmetic library. |
887 |
With some processors, the appropriate limbsize may vary depending on the operation being |
888 |
performed. |
889 |
For example, a microcontroller processor may be able to add two 16-bit integers to |
890 |
obtain a 16-bit result plus a carry, but only be able to multiply two 8-bit integers to |
891 |
obtain a 16-bit result (thus, the appropriate limbsize for addition may be |
892 |
16 bits while the appropriate limbsize for multiplication may be 8 bits). |
893 |
In such cases, it is usually most efficient to add using 16-bit limbs but |
894 |
multiply using 8-bit limbs. Adding using 8-bit limbs on a machine which will |
895 |
support 16-bit additions is not normally a good design decision---even if the |
896 |
machine supports an 8-bit addition instruction which is faster than the 16-bit addition |
897 |
instruction, $m/2$ 16-bit additions will nearly always be faster than |
898 |
$m$ 8-bit additions. Using different limbsizes within the same arithmetic library |
899 |
may require some consideration of alignment and |
900 |
endian issues, but these are implementation details |
901 |
which are easily solved. |
902 |
|
903 |
We view a synthetic long unsigned integer as an array of limbs (machine integers) |
904 |
of some size, and we agree that we will not address the array in any other way than |
905 |
by loading and storing limbs of this size.\footnote{Well \ldots{} not quite. |
906 |
In software for large computers (personal computers and workstations) with an |
907 |
orthogonal instruction set, we may be able to adhere to this rule. However, |
908 |
with microcontrollers, arithmetic libraries which are optimized |
909 |
may break this rule.} In particular, because |
910 |
computers may be ``big-endian'' or ``little-endian'', loading and storing |
911 |
smaller units than limbs may lead in |
912 |
a worst case to software defects or in a best case to non-portable code. |
913 |
|
914 |
Assume that $w$ is the number of bits in a limb. |
915 |
Notationally, we denote an unsigned |
916 |
synthetic long integer as an array of $m$ limbs |
917 |
$u_{m-1}$ through $u_0$, each containing $w$ bits, |
918 |
with $u_0$ the least significant machine integer. |
919 |
We may also define $b=2^w$ (consistent with Knuth's |
920 |
notation). |
921 |
The value of |
922 |
such a synthetic long machine integer is |
923 |
|
924 |
\begin{equation} |
925 |
\label{eq:ccil0:sroi0:srou0:02} |
926 |
u = \sum_{i=0}^{m-1} u_{i} 2^{wi} |
927 |
= |
928 |
\sum_{i=0}^{m-1} u_{i} b^i. |
929 |
\end{equation} |
930 |
|
931 |
As an alternative, we may write the value as the sum of the bit-values, |
932 |
|
933 |
\begin{equation} |
934 |
\label{eq:ccil0:sroi0:srou0:03} |
935 |
u = \sum_{i=0}^{wm-1} u_{[i]} 2^{i} . |
936 |
\end{equation} |
937 |
|
938 |
Naturally, the range of such a synthetic long integer is |
939 |
|
940 |
\begin{equation} |
941 |
\label{eq:ccil0:sroi0:srou0:04} |
942 |
u \in \{0, \ldots{} , 2^{wm} - 1 \} . |
943 |
\end{equation} |
944 |
|
945 |
In storing an unsigned synthetic long machine integer, the most natural way |
946 |
to order the array of limbs depends on whether dynamic memory allocation is |
947 |
used by the arithmetic library. In microcontroller work, where arithmetic |
948 |
library subroutines typically operate on fixed-size operands and produce |
949 |
fixed-size results, storing |
950 |
limbs most significant limb first (i.e. in `C', so that element \texttt{[0]} |
951 |
of the array of limbs contains the most significant limb) may be natural |
952 |
and convenient. However, this ordering would lead to computational waste in a library such |
953 |
as the GMP \cite{bibref:s:gnumultipleprecisionarithmeticlibrary} where integers |
954 |
may grow arbitrarily large and the library may need to reallocate long synthetic |
955 |
integers to contain more limbs, as each reallocation would need to be followed |
956 |
by a memory copy to align the integer's existing limbs to the end of the array. |
957 |
For libraries such as the GMP, it is more practical to store limbs |
958 |
least-significant limb first, as it eliminates the need to copy memory |
959 |
when reallocations are done. |
960 |
|
961 |
|
962 |
|
963 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
964 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
965 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
966 |
\subsection{Representation Of Signed Integers} |
967 |
%Subsection Tag: ROS0 |
968 |
\label{ccil0:sroi0:sros0} |
969 |
|
970 |
Signed machine integers are always represented in two's complement form on modern |
971 |
processors. This representation is universal because of the digital logic |
972 |
conveniences---the same addition and subtraction mappings which are correct |
973 |
for unsigned machine integers are also correct for signed machine integers, |
974 |
although the criteria for overflow and comparison are |
975 |
different. |
976 |
|
977 |
Most readers are familiar with two's complement representation, so we will not |
978 |
belabor it. However, we will present essential properties. |
979 |
When two's complement representation is used in an $m$-bit machine integer $u$: |
980 |
|
981 |
\begin{enumerate} |
982 |
\item All bit patterns with $u_{[m-1]} = 0$ represent non-negative integers, and |
983 |
represent the same integer as if the representation were unsigned. |
984 |
\item All bit patterns with $u_{[m-1]} = 1$ represent negative numbers; specifically |
985 |
$u_{[m-2:0]} - 2^{m-1}$; i.e. |
986 |
|
987 |
\begin{equation} |
988 |
\label{eq:ccil0:sroi0:sros0:00} |
989 |
u = - u_{[m-1]} 2^{m-1} + \sum_{i=0}^{m-2} u_{[i]} 2^i . |
990 |
\end{equation} |
991 |
|
992 |
\item $u \in \{-2^{m-1}, \ldots{}, 2^{m-1}-1 \}$. |
993 |
\item All bit patterns represent a unique integer. |
994 |
\item For any integer except $-2^{m-1}$, |
995 |
negation can be performed by forming the one's complement (complementing |
996 |
every bit), then adding one. To see why this is true algebraically, note that |
997 |
|
998 |
\end{enumerate} |
999 |
|
1000 |
|
1001 |
However, let us observe that the value of an |
1002 |
$m$-bit two's complement |
1003 |
machine integer is |
1004 |
|
1005 |
|
1006 |
In general, an $m$-bit signed machine integer can assume the values of |
1007 |
$-2^{m-1}$ through $2^{m-1} - 1$, so that |
1008 |
|
1009 |
\begin{equation} |
1010 |
\label{eq:ccil0:sroi0:sros0:01} |
1011 |
u \in \{-2^{m-1}, \ldots{} , 2^{m-1} - 1 \} . |
1012 |
\end{equation} |
1013 |
|
1014 |
There are [at least] two different representations of signed |
1015 |
multiple-precision integers: |
1016 |
|
1017 |
\begin{itemize} |
1018 |
\item Two's complement representation. |
1019 |
\item Sign-magnitude representation. |
1020 |
\end{itemize} |
1021 |
|
1022 |
There are two different representations commonly used |
1023 |
for signed multiple-precision integers because two's complement |
1024 |
representation is not ideal for multiplication and division, although |
1025 |
it is ideal for addition and subtraction. For multiple-precision |
1026 |
integer arithmetic, sign-magnitude representation is more common. |
1027 |
|
1028 |
In two's complement representation of multiple-precision integers, |
1029 |
the representation is the same as suggested by |
1030 |
(\ref{eq:ccil0:sroi0:sros0:00}), except |
1031 |
more bits are involved. For a two's complement representation |
1032 |
of a number consisting of $n$ machine integers with $W$ bits per |
1033 |
machine integer, |
1034 |
|
1035 |
\begin{equation} |
1036 |
\label{eq:ccil0:sroi0:sros0:02} |
1037 |
u = - u_{B(Wn-1)} 2^{Wn-1} + \sum_{i=0}^{Wn-2} u_{B(i)} 2^i . |
1038 |
\end{equation} |
1039 |
|
1040 |
Because we would like to know how to compare signed multiple-precision |
1041 |
integers in two's complement representation, we can gain some |
1042 |
insight into the representation by rewriting |
1043 |
(\ref{eq:ccil0:sroi0:sros0:02}) in terms of machine integers: |
1044 |
|
1045 |
\begin{equation} |
1046 |
\label{eq:ccil0:sroi0:sros0:03} |
1047 |
u = - u_{B(Wn-1)} 2^{Wn-1} |
1048 |
+ |
1049 |
\sum_{i=W(m-1)}^{Wn-2} u_{B(i)} 2^i |
1050 |
+ |
1051 |
\sum_{i=0}^{m-2} u_{i} 2^{Wi} . |
1052 |
\end{equation} |
1053 |
|
1054 |
(\ref{eq:ccil0:sroi0:sros0:03}) gives some insight into the |
1055 |
relative values of multiple-precision signed two's complement |
1056 |
integers with respect to the values of the machine integers |
1057 |
that comprise them. We discuss this further in |
1058 |
Section \ref{ccil0:scsi0}. |
1059 |
|
1060 |
In sign-magnitude representation of multiple-precision signed two's |
1061 |
complement integers, an integer $u$ is represented as a sign |
1062 |
bit (usually a value of one indicates negativity), and a magnitude, |
1063 |
which is $|u|$. Unlike two's complement representation, |
1064 |
sign-magnitude representation, has two representations of zero---a positive |
1065 |
one and a negative one. Either a canonical form for zero should be |
1066 |
adopted, or both values should be treated identically. |
1067 |
|
1068 |
Assuming that the sign bit is stored in the most significant bit position, |
1069 |
it is easy to deduce that the value of a multiple-precision |
1070 |
signed two's complement integer in sign-magnitude representation is |
1071 |
|
1072 |
\begin{equation} |
1073 |
\label{eq:ccil0:sros0:srou0:04} |
1074 |
u = (-1)^{u_{B(m-1)}} \sum_{i=0}^{m-2} u_{B(i)} 2^i . |
1075 |
\end{equation} |
1076 |
|
1077 |
Sign-magnitude representation is especially convenient because |
1078 |
it allows machine instructions which accept unsigned operands to be used |
1079 |
to make calculations involving signed integers. |
1080 |
|
1081 |
|
1082 |
|
1083 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
1084 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
1085 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
1086 |
\section{Characteristics Of Practical Processors} |
1087 |
%Section tag: CPP |
1088 |
\label{ccil0:scpp0} |
1089 |
|
1090 |
Before discussing specific algorithms, it is necessary to |
1091 |
discuss the construction of practical processors---how such a processor |
1092 |
manipulates machine integers. We accept as a typical processor the |
1093 |
TMS-370C8, an 8-bit microcontroller manufactured by |
1094 |
Texas Instruments. |
1095 |
|
1096 |
\begin{figure} |
1097 |
\centering |
1098 |
\includegraphics[width=4.6in]{c_cil0/t370flag.eps} |
1099 |
\caption{Texas Instruments TMS-370C8 Flags} |
1100 |
\label{fig:ccil0:scpp0:00} |
1101 |
\end{figure} |
1102 |
|
1103 |
\begin{figure} |
1104 |
\centering |
1105 |
\includegraphics[width=4.6in]{c_cil0/t370cjmp.eps} |
1106 |
\caption{Texas Instruments TMS-370C8 Conditional Jump Instructions} |
1107 |
\label{fig:ccil0:scpp0:01} |
1108 |
\end{figure} |
1109 |
|
1110 |
A typical microcontroller allows operations on machine integers |
1111 |
in the following steps: |
1112 |
|
1113 |
\begin{itemize} |
1114 |
\item A machine instruction is performed on one or two machine |
1115 |
integer operands (for example: addition, subtraction, |
1116 |
multiplication, division, increment, decrement, complement, |
1117 |
negation, or comparison). This machine instruction may |
1118 |
produce a result, and usually sets a number of condition flags that |
1119 |
reflect the nature and validity of the result (Is it zero? |
1120 |
Is it negative? Did the result overflow?). As an |
1121 |
example, the condition |
1122 |
flags of the TMS-370C8 are shown in Figure \ref{fig:ccil0:scpp0:00}. |
1123 |
. |
1124 |
\item A conditional branch instruction is used to branch conditionally |
1125 |
based on the state of the condition flags. The definition of |
1126 |
the condition flags and the way in which the conditional |
1127 |
branch instruction utilizes them is designed to provide a |
1128 |
way to treat both unsigned and signed machine integers. |
1129 |
As an example, the way in which the conditional |
1130 |
jump instructions of the TMS-370C8 use the flags |
1131 |
is shown in Figure \ref{fig:ccil0:scpp0:01}. |
1132 |
\end{itemize} |
1133 |
|
1134 |
It is not too often necessary to understand in detail |
1135 |
the Boolean relationships that govern how machine integers are |
1136 |
added, subtracted, and compared; and how signed comparisons differ |
1137 |
from unsigned comparisons. In most cases, it is adequate to |
1138 |
rely on the design of the microcontroller. However, we do present |
1139 |
rudimentary observations in this section. |
1140 |
|
1141 |
|
1142 |
|
1143 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
1144 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
1145 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
1146 |
\section{Comparison Of Integers} |
1147 |
|
1148 |
|
1149 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
1150 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
1151 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
1152 |
\subsection{Comparison Of Unsigned Integers} |
1153 |
|
1154 |
|
1155 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
1156 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
1157 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
1158 |
\subsection{Comparison Of Signed Integers} |
1159 |
%Section Tag: CSI0 |
1160 |
\label{ccil0:scsi0} |
1161 |
|
1162 |
|
1163 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
1164 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
1165 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
1166 |
\section{Integer Addition} |
1167 |
%Section tag: IAD0 |
1168 |
\label{ccil0:siad0} |
1169 |
|
1170 |
Addition of two $m$-bit integers is a combinational function---that is, |
1171 |
the inputs uniquely determine the output. Addition of binary |
1172 |
numbers is performed |
1173 |
|
1174 |
|
1175 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
1176 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
1177 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
1178 |
\subsection{Hardware Implementation Of Addition} |
1179 |
|
1180 |
|
1181 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
1182 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
1183 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
1184 |
\subsection{Addition Of Unsigned Operands} |
1185 |
|
1186 |
|
1187 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
1188 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
1189 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
1190 |
\subsection{Addition Of Signed Operands} |
1191 |
|
1192 |
|
1193 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
1194 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
1195 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
1196 |
\section {Integer Subtraction} |
1197 |
|
1198 |
|
1199 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
1200 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
1201 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
1202 |
\subsection{Hardware Implementation Of Subtraction} |
1203 |
|
1204 |
|
1205 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
1206 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
1207 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
1208 |
\subsection{Subtraction Of Unsigned Operands} |
1209 |
|
1210 |
|
1211 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
1212 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
1213 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
1214 |
\subsection{Subtraction Of Signed Operands} |
1215 |
|
1216 |
|
1217 |
|
1218 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
1219 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
1220 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
1221 |
\section{Integer Multiplication} |
1222 |
|
1223 |
|
1224 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
1225 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
1226 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
1227 |
\subsection{Hardware Implementation Of Multiplication} |
1228 |
|
1229 |
|
1230 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
1231 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
1232 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
1233 |
\subsection{Multiplication Of Unsigned Operands} |
1234 |
|
1235 |
|
1236 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
1237 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
1238 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
1239 |
\subsection{Multiplication Of Signed Operands} |
1240 |
|
1241 |
|
1242 |
|
1243 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
1244 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
1245 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
1246 |
\section{Integer Division} |
1247 |
\label{ccil0:sidv0} |
1248 |
|
1249 |
\index{division}\index{integer division}In this section, |
1250 |
we discuss the best known methods of dividing integers using |
1251 |
typical microcontroller instruction sets. In general, given |
1252 |
two arbitrary integers $p$ and $q$, we are interested in determining |
1253 |
their quotient $q=\lfloor{}p/q\rfloor$ and remainder |
1254 |
$r=p\bmod{}q$ as economically as possible. |
1255 |
|
1256 |
|
1257 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
1258 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
1259 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
1260 |
\subsection{Hardware Implementation Of Division} |
1261 |
|
1262 |
|
1263 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
1264 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
1265 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
1266 |
\subsection{General Unsigned Division Without A Machine Division Instruction} |
1267 |
\label{ccil0:sidv0:sgdn0} |
1268 |
|
1269 |
|
1270 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
1271 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
1272 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
1273 |
\subsection{General Unsigned Division With A Machine Division Instruction} |
1274 |
\label{ccil0:sidv0:sgdu0} |
1275 |
|
1276 |
As mentioned many places in this work, efficiency in microcontroller software |
1277 |
involves phrasing computational problems in a way which makes good use of |
1278 |
the machine instruction set. In Section \ref{ccil0:sidv0:sgdn0} we discussed |
1279 |
the classic shift-compare-subtract algorithm for division. This algorithm |
1280 |
is far from efficient. A reasonable question to ask is whether we can |
1281 |
leverage ``small'' division capability (provided by the machine instruction set) |
1282 |
to accomplish ``large'' divisions (those which we require in practice). |
1283 |
It ends up that this is possible: the technique involved is effectively |
1284 |
to use machine division instructions to estimate the highest-order bits of |
1285 |
the quotient based on the highest-order bits of the dividend and divisor. |
1286 |
|
1287 |
Knuth's discussion of division |
1288 |
algorithms \cite[pp. 270-275]{bibref:b:knuthclassic2ndedvol2} is the |
1289 |
basis for most of the material in this subsection. However, Knuth has |
1290 |
a gift for terseness that is sometimes a curse for the reader, and so |
1291 |
we take more time than Knuth to explain certain results. |
1292 |
|
1293 |
First, as a starting point, we present \emph{Algorithm D} from |
1294 |
Knuth \cite[pp. 272-273]{bibref:b:knuthclassic2ndedvol2}. Then, |
1295 |
we justify the algorithm and explain why it is valid. Finally, |
1296 |
we supply implementation advice for microcontroller instruction sets. |
1297 |
|
1298 |
\begin{vworkalgorithmstatementpar}{Arbitrary Unsigned Division Using |
1299 |
Machine Unsigned Division Instructions} |
1300 |
\label{alg:ccil0:sidv0:sgdu0:01} |
1301 |
(From Knuth \cite[pp. 272-273]{bibref:b:knuthclassic2ndedvol2}) |
1302 |
Given nonnegative integers $u=(u_{m+n-1} \ldots{} u_1 u_0)_b$ |
1303 |
and $v=(v_{n-1} \ldots{} v_1 v_0)_b$, where |
1304 |
$v_{n-1} \neq 0$ and $n > 1$, we form the radix-$b$ quotient |
1305 |
$\lfloor{}u/v\rfloor{} = (q_m q_{m-1} \ldots{} q_0)_b$ and |
1306 |
the remainder $u \bmod v = (r_{n-1} \ldots{} r_1 r_0)_b$. When |
1307 |
$n=1$, the simpler algorithm of |
1308 |
Subsection \ref{ccil0:sidv0:sldm0} |
1309 |
should be used. |
1310 |
|
1311 |
\begin{algblvl0} |
1312 |
\item \label{enumstep:alg:ccil0:sidv0:sgdu0:01:01} |
1313 |
[Normalize.] Set $d \gets \lfloor{}b/(v_{n-1} + 1)\rfloor$. |
1314 |
Then set $(u_{m+n} u_{m+n-1} \ldots{} u_1 u_0)_b$ equal to |
1315 |
$(u_{m+n-1} \ldots{} u_1 u_0)_b$ times $d$; similarly, |
1316 |
set $(v_{n-1} \ldots{} v_1 v_0)_b$ equal to |
1317 |
$(v_{n-1} \ldots{} v_1 v_0)_b$ times $d$. (Notice the introduction |
1318 |
of a new digit position $u_{m+n}$ at the left of |
1319 |
$u_{m+n-1}$; if $d=1$, all we need to do in this step is set |
1320 |
$u_{m+n} \gets 0$. On a binary computer it may be preferable |
1321 |
to choose $d$ to be a power of 2 instead of using the value |
1322 |
suggested here; any value of $d$ that results in |
1323 |
$v_{n-1} \geq \lfloor{}b/2\rfloor$ will suffice. See also |
1324 |
exercise 37.) |
1325 |
\item \label{enumstep:alg:ccil0:sidv0:sgdu0:01:02} |
1326 |
[Initialize $j$.] Set $j \gets m$. (The loop on $j$, |
1327 |
steps |
1328 |
\ref{enumstep:alg:ccil0:sidv0:sgdu0:01:03} |
1329 |
through |
1330 |
\ref{enumstep:alg:ccil0:sidv0:sgdu0:01:07}, |
1331 |
will be essentially a division of |
1332 |
$(u_{j+n} \ldots{} u_{j+1} u_j)_b$ by $(v_{n-1} \ldots{} v_1 v_0)_b$ to |
1333 |
get a single quotient digit $q_j$; see Figure \ref{fig:alg:ccil0:sidv0:sgdu0:01:01}.) |
1334 |
|
1335 |
\begin{figure} |
1336 |
\centering |
1337 |
\includegraphics[width=4.6in]{c_cil0/kdfc01.eps} |
1338 |
\caption{Flowchart For Algorithm \ref{alg:ccil0:sidv0:sgdu0:01} (From \cite[p. 273]{bibref:b:knuthclassic2ndedvol2})} |
1339 |
\label{fig:alg:ccil0:sidv0:sgdu0:01:01} |
1340 |
\end{figure} |
1341 |
|
1342 |
\item \label{enumstep:alg:ccil0:sidv0:sgdu0:01:03} |
1343 |
[Calculate $\hat{q}$.] Set |
1344 |
$\hat{q} \gets \lfloor{}u_{j+n}b + u_{j+n-1})/v_{n-1}\rfloor$ and |
1345 |
let $\hat{r}$ be the remainder, $(u_{j+n}b + u_{j+n-1}) \bmod v_{n-1}$. |
1346 |
Now test if $\hat{q} = b$ or $\hat{q} v_{n-2} > b\hat{r} + u_{j+n-2}$; |
1347 |
if so, decrease $\hat{q}$ by 1, increase $\hat{r}$ by $v_{n-1}$, and repeat |
1348 |
this test if $\hat{r} < b$. (The test of $v_{n-2}$ determines at high |
1349 |
speed most of the cases in which the trial value $\hat{q}$ is one too large, |
1350 |
and it eliminates \emph{all} cases where $\hat{q}$ is two too large; |
1351 |
see exercises 19, 20, 21.) |
1352 |
\item \label{enumstep:alg:ccil0:sidv0:sgdu0:01:04} |
1353 |
[Multiply and subtract.] Replace $(u_{j+n} u_{j+n-1} \ldots{} u_j)_b$ by |
1354 |
|
1355 |
\begin{equation} |
1356 |
\nonumber |
1357 |
(u_{j+n} u_{j+n-1} \ldots{} u_j)_b - \hat{q} (0 v_{n-1} \ldots{} v_1 v_0)_b. |
1358 |
\end{equation} |
1359 |
|
1360 |
This computation consists of a simple multiplication by a one-place number, |
1361 |
combined with a subtraction. The digits $(u_{j+n}, u_{j+n-1}, \ldots{}, u_j)$ |
1362 |
should be kept positive; if the result of this step is actually negative, |
1363 |
$(u_{j+n} u_{j+n-1} \ldots{} u_j)_b$ should be left as the true |
1364 |
value plus $b^{n+1}$, namely as the $b$'s complement of the true value, and |
1365 |
a ``borrow'' to the left should be remembered. |
1366 |
\item \label{enumstep:alg:ccil0:sidv0:sgdu0:01:05} |
1367 |
[Test remainder.] Set $q_j \gets \hat{q}$. If the result of step |
1368 |
\ref{enumstep:alg:ccil0:sidv0:sgdu0:01:04} was negative, go to |
1369 |
step \ref{enumstep:alg:ccil0:sidv0:sgdu0:01:06}; otherwise go on to |
1370 |
step \ref{enumstep:alg:ccil0:sidv0:sgdu0:01:07}. |
1371 |
\item \label{enumstep:alg:ccil0:sidv0:sgdu0:01:06} |
1372 |
[Add back.] (The probability that this step is necessary is very small, on the |
1373 |
order of only $2/b$, as shown in exercise 21; test data to activate this step |
1374 |
should therefore be specifically contrived when debugging.) Decrease |
1375 |
$q_j$ by 1, and add $(0 v_{n-1} \ldots v_1 v_0)_b$ to |
1376 |
$(u_{j+n} u_{j+n-1} \ldots{} u_{j+1} u_j)_b$. (A carry will occur to the left of |
1377 |
$u_{j+n}$, and it should be ignored since it cancels with the borrow that |
1378 |
occured in step \ref{enumstep:alg:ccil0:sidv0:sgdu0:01:04}.) |
1379 |
\item \label{enumstep:alg:ccil0:sidv0:sgdu0:01:07} |
1380 |
[Loop on $j$.] Decrease $j$ by one. Now if $j \geq 0$, go back to step |
1381 |
\ref{enumstep:alg:ccil0:sidv0:sgdu0:01:03}. |
1382 |
\item \label{enumstep:alg:ccil0:sidv0:sgdu0:01:08} |
1383 |
[Unnormalize.] Now $(q_m \ldots{} q_1 q_0)_b$ is the desired quotient, and |
1384 |
the desired remainder may be obtained by dividing |
1385 |
$(u_{n-1} \ldots{} u_1 u_0)_b$ by $d$. |
1386 |
\end{algblvl0} |
1387 |
\end{vworkalgorithmstatementpar} |
1388 |
\vworkalgorithmfooter{} |
1389 |
|
1390 |
The general idea of Algorithm \ref{alg:ccil0:sidv0:sgdu0:01} is that |
1391 |
digits (machine words) of the quotient $q$ can be successively estimated |
1392 |
based on the first digits of the dividend and divisor. Knuth |
1393 |
\cite[p. 271]{bibref:b:knuthclassic2ndedvol2} explores the properties of |
1394 |
the digit estimate |
1395 |
|
1396 |
\begin{equation} |
1397 |
\label{eq:ccil0:sidv0:sgdu0:01} |
1398 |
\hat{q} = \min \left( {\left\lfloor{\frac{u_n b + u_{n-1}}{v_{n-1}}}\right\rfloor, b-1} \right). |
1399 |
\end{equation} |
1400 |
|
1401 |
The first point to make about an estimate in the form of |
1402 |
(\ref{eq:ccil0:sidv0:sgdu0:01}) is that it can only be accomplished |
1403 |
efficiently if the machine-native division instruction supports |
1404 |
overflow detection, since it is possible that |
1405 |
$(u_n b + u_{n-1})/v_{n-1} \geq b$, even if |
1406 |
$u/v < b$, as is shown by the following example. |
1407 |
|
1408 |
\begin{vworkexamplestatement} |
1409 |
\label{ex:ccil0:sidv0:sgdu0:01:01} |
1410 |
Assume that we wish to apply the estimate of $\hat{q}$ provided by |
1411 |
(\ref{eq:ccil0:sidv0:sgdu0:01}) |
1412 |
to $u=16,776,704$ and $v=65,535$. Demonstrate that a machine division |
1413 |
overflow will occur when estimating the first digit, assuming a processor |
1414 |
that can divide a 16-bit dividend by an 8-bit divisor to produce an 8-bit |
1415 |
quotient. |
1416 |
\end{vworkexamplestatement} |
1417 |
\begin{vworkexampleparsection}{Solution} |
1418 |
Note that according to Knuth's intention, the word size on such a machine |
1419 |
is 8 bits. Thus, $b=256$. Note that $u/v = 255 + 255/256 < b = 256$, as |
1420 |
required by Knuth's precondition. However, although $u/v < b$, |
1421 |
$u = [255 \; 254 \; 0] [256^2 \; 256 \; 1]^T = [u_2 u_1 u_0] [b^2 b^1 b^0]^T$ and |
1422 |
$v = [255 \; 255] [256 \; 1]^T = [v_1 v_0] [b^1 b^0]^T$, so that calculating |
1423 |
an estimate $\hat{q}$ as required by (\ref{eq:ccil0:sidv0:sgdu0:01}), |
1424 |
$(u_n b + u_{n-1})/v_{n-1} = 65,534/255 = 256 + 254/255 \geq b$, is a division |
1425 |
overflow for a single machine division instruction. Thus, it follows that |
1426 |
a machine with division overflow detection can quickly determine that $b-1$ from |
1427 |
(\ref{eq:ccil0:sidv0:sgdu0:01}) is the minimum, whereas a machine without |
1428 |
division overflow |
1429 |
detection would have to use several additional machine instructions to make |
1430 |
this determination. |
1431 |
\end{vworkexampleparsection} |
1432 |
\vworkexamplefooter{} |
1433 |
|
1434 |
The second thing to establish about $\hat{q}$ as defined by |
1435 |
(\ref{eq:ccil0:sidv0:sgdu0:01}) is how ``good'' of an estimate |
1436 |
$\hat{q}$ is---how much information, exactly, about $q$ can we |
1437 |
obtain by examining the first two words of $u$ and the first |
1438 |
word of $v$? |
1439 |
|
1440 |
We first establish in the following lemma that our estimate of |
1441 |
$q$, $\hat{q}$, can be no less than $q$. |
1442 |
|
1443 |
\begin{vworklemmastatementpar}{\mbox{\boldmath$\hat{q} \geq q$}} |
1444 |
\label{lem:ccil0:sidv0:sgdu0:01} |
1445 |
The estimate of $q$ provided by (\ref{eq:ccil0:sidv0:sgdu0:01}), |
1446 |
$\hat{q}$ is always at least as great as the actual value of |
1447 |
$q$, i.e. $\hat{q} \geq q$. |
1448 |
\end{vworklemmastatementpar} |
1449 |
\begin{vworklemmaproof} |
1450 |
Knowledge of $u_{n}$, $u_{n-1}$, and $v_{n-1}$ necessarily confine |
1451 |
the intervals in which the actual values of $u$ and $v$ may be; |
1452 |
specifically:\footnote{In |
1453 |
(\ref{eq:lem:ccil0:sidv0:sgdu0:01:04}) |
1454 |
and |
1455 |
(\ref{eq:lem:ccil0:sidv0:sgdu0:01:07}), |
1456 |
we use statements of the form ``$x = x$'' as an idiom for |
1457 |
``$x$ is known''.} |
1458 |
|
1459 |
\begin{eqnarray} |
1460 |
\label{eq:lem:ccil0:sidv0:sgdu0:01:01} |
1461 |
u & = & \sum_{i=0}^{n} u_i b^i \\ |
1462 |
\label{eq:lem:ccil0:sidv0:sgdu0:01:02} |
1463 |
& = & u_n b^n + u_{n-1} b^{n-1} + \ldots{} + u_2 b^2 + u_1 b + u_0 \\ |
1464 |
\label{eq:lem:ccil0:sidv0:sgdu0:01:03} |
1465 |
& = & (u_n b + u_{n-1}) b^{n-1} + \ldots{} + u_2 b^2 + u_1 b + u_0 |
1466 |
\end{eqnarray} |
1467 |
|
1468 |
\begin{eqnarray} |
1469 |
\nonumber & (u_n = u_n \wedge u_{n-1} = u_{n-1}) & \\ |
1470 |
\label{eq:lem:ccil0:sidv0:sgdu0:01:04} |
1471 |
& \vworkvimp & \\ |
1472 |
\nonumber & (u_n b + u_{n-1}) b^{n-1} \leq u \leq (u_n b + u_{n-1}) b^{n-1} + b^{n-1} - 1 & |
1473 |
\end{eqnarray} |
1474 |
|
1475 |
\begin{eqnarray} |
1476 |
\label{eq:lem:ccil0:sidv0:sgdu0:01:05} |
1477 |
v & = & \sum_{i=0}^{n-1} v_i b^i \\ |
1478 |
\label{eq:lem:ccil0:sidv0:sgdu0:01:06} |
1479 |
& = & v_{n-1} b^{n-1} + v_{n-2} b^{n-2} + \ldots{} + v_2 b^2 + v_1 b + v_0 |
1480 |
\end{eqnarray} |
1481 |
|
1482 |
\begin{equation} |
1483 |
\label{eq:lem:ccil0:sidv0:sgdu0:01:07} |
1484 |
(v_{n-1} = v_{n-1}) |
1485 |
\vworkhimp |
1486 |
v_{n-1} b^{n-1} \leq v \leq v_{n-1} b^{n-1} + b^{n-1} - 1 |
1487 |
\end{equation} |
1488 |
|
1489 |
(\ref{eq:lem:ccil0:sidv0:sgdu0:01:04}) and |
1490 |
(\ref{eq:lem:ccil0:sidv0:sgdu0:01:07}) reflect the uncertainties in the |
1491 |
values of $u$ and $v$ respectively because only the first digit(s) of |
1492 |
$u$ and $v$ are being considered in forming the estimate $\hat{q}$. |
1493 |
|
1494 |
By definition, the actual value of $q$ is $\lfloor{}u/v\rfloor$. For a |
1495 |
rational function $f(u,v) = u/v$ where $u \in [u_{min}, u_{max}]$ and |
1496 |
$v \in [v_{min}, v_{max}]$, the minimum value of $u/v$ occurs at |
1497 |
$u_{min}/v_{max}$, and the maximum value of $u/v$ occurs at |
1498 |
$u_{max}/v_{min}$. We can therefore write that |
1499 |
|
1500 |
\begin{equation} |
1501 |
\label{eq:lem:ccil0:sidv0:sgdu0:01:08} |
1502 |
\left\lfloor{\frac{(u_n b + u_{n-1}) b^{n-1}}{v_{n-1} b^{n-1} + b^{n-1} - 1}}\right\rfloor |
1503 |
\leq |
1504 |
q |
1505 |
\leq |
1506 |
\left\lfloor{\frac{(u_n b + u_{n-1}) b^{n-1} + b^{n-1} - 1}{v_{n-1} b^{n-1}}}\right\rfloor . |
1507 |
\end{equation} |
1508 |
|
1509 |
In other words, knowledge of $u_{n}$, $u_{n-1}$, and $v_{n-1}$ confines $q$ to the |
1510 |
interval indicated in (\ref{eq:lem:ccil0:sidv0:sgdu0:01:08}). We must prove that, |
1511 |
given a specific $u_{n}$, $u_{n-1}$, and $v_{n-1}$, $\hat{q}$ is at least as large as |
1512 |
the upper bound in (\ref{eq:lem:ccil0:sidv0:sgdu0:01:08}); otherwise we could find a |
1513 |
$q$ such that $q > \hat{q}$. We can algebraically manipulate the upper bound in |
1514 |
in (\ref{eq:lem:ccil0:sidv0:sgdu0:01:08}) to yield |
1515 |
|
1516 |
\begin{equation} |
1517 |
\label{eq:lem:ccil0:sidv0:sgdu0:01:09} |
1518 |
\left\lfloor{\frac{(u_n b + u_{n-1}) b^{n-1}}{v_{n-1} b^{n-1} + b^{n-1} - 1}}\right\rfloor |
1519 |
\leq |
1520 |
q |
1521 |
\leq |
1522 |
\left\lfloor{\frac{u_n b + u_{n-1} + \frac{b^{n-1}-1}{b^{n-1}}}{v_{n-1}}}\right\rfloor . |
1523 |
\end{equation} |
1524 |
|
1525 |
In (\ref{eq:lem:ccil0:sidv0:sgdu0:01:09}), since $(b^{n-1}-1)/b^{n-1} < 1$ and since |
1526 |
$u_n b + u_{n-1}$ is an integer, we can conclude that |
1527 |
$\lfloor u_n b + u_{n-1} + (b^{n-1}-1)/b^{n-1} \rfloor = \lfloor u_n b + u_{n-1} \rfloor$ |
1528 |
and hence that |
1529 |
|
1530 |
\begin{equation} |
1531 |
\label{eq:lem:ccil0:sidv0:sgdu0:01:10} |
1532 |
q |
1533 |
\leq |
1534 |
\left\lfloor{\frac{u_n b + u_{n-1} + \frac{b^{n-1}-1}{b^{n-1}}}{v_{n-1}}}\right\rfloor |
1535 |
= |
1536 |
\left\lfloor{\frac{u_n b + u_{n-1}}{v_{n-1}}}\right\rfloor |
1537 |
= |
1538 |
\hat{q} . |
1539 |
\end{equation} |
1540 |
|
1541 |
Therefore, $\hat{q} \geq q$. |
1542 |
\end{vworklemmaproof} |
1543 |
\vworklemmafooter{} |
1544 |
|
1545 |
Lemma \ref{lem:ccil0:sidv0:sgdu0:01} establishes that |
1546 |
a digit estimate $\hat{q}$ based on the first digit of the |
1547 |
divisor $v$ can be no less than the actual digit $q$, i.e. |
1548 |
$\hat{q}-q \geq 0$. However, we must also establish an upper bound |
1549 |
on $\hat{q}-q$. |
1550 |
|
1551 |
Intuitively, based on |
1552 |
(\ref{eq:lem:ccil0:sidv0:sgdu0:01:06}), we might guess that |
1553 |
if $v_{n-1}$ is small, the estimate $\hat{q}$ may be quite |
1554 |
poor, as the interval to which the actual value of $v$ is confined |
1555 |
may be quite large. This fact is the basis for the normalization |
1556 |
step [\ref{enumstep:alg:ccil0:sidv0:sgdu0:01:01}] in Algorithm |
1557 |
\ref{alg:ccil0:sidv0:sgdu0:01}. We now prove a useful result |
1558 |
for how much $u, v$ must be normalized so that $\hat{q}-q \leq 2$. |
1559 |
|
1560 |
\begin{vworklemmastatementpar}{Normalization Requirement So That |
1561 |
\mbox{\boldmath$\hat{q} - q \leq 2$}} |
1562 |
\label{lem:ccil0:sidv0:sgdu0:02} |
1563 |
If $v_{n-1} \geq \lfloor b/2 \rfloor$ and $\hat{q}$ is chosen as |
1564 |
indicated in (\ref{eq:ccil0:sidv0:sgdu0:01}), then |
1565 |
$0 \leq \hat{q} - q \leq 2$. |
1566 |
\end{vworklemmastatementpar} |
1567 |
\begin{vworklemmaproof} |
1568 |
The lower limit on $\hat{q} - q$ is proved in Lemma \ref{lem:ccil0:sidv0:sgdu0:01}. |
1569 |
We now seek only to prove that $\hat{q} - q \leq 2$. |
1570 |
By definition of $\hat{q}$ and $q$, |
1571 |
|
1572 |
\begin{equation} |
1573 |
\label{eq:lem:ccil0:sidv0:sgdu0:02:01} |
1574 |
\hat{q} - q = \left\lfloor {\frac{u_n b + u_{n-1}}{v_{n-1}}} \right\rfloor |
1575 |
- \left\lfloor {\frac{u}{v}} \right\rfloor |
1576 |
\end{equation} |
1577 |
|
1578 |
When only nonnegative integers are involved, |
1579 |
(\cmtnzeroxrefhyphen\ref{eq:cmtn0:sfcf0:02}) |
1580 |
supplies an exact expression for the floor of a |
1581 |
ratio of integers. Using (\cmtnzeroxrefhyphen\ref{eq:cmtn0:sfcf0:02}), |
1582 |
(\ref{eq:lem:ccil0:sidv0:sgdu0:02:01}) can be decomposed into |
1583 |
|
1584 |
\begin{equation} |
1585 |
\label{eq:lem:ccil0:sidv0:sgdu0:02:02} |
1586 |
\hat{q} - q = \frac{u_n b + u_{n-1}}{v_{n-1}} |
1587 |
- \frac{(u_n b + u_{n-1}) \bmod v_{n-1}}{v_{n-1}} |
1588 |
- \frac{u}{v} |
1589 |
+ \frac{u \bmod v}{v} . |
1590 |
\end{equation} |
1591 |
|
1592 |
\noindent{}Note that (\ref{eq:lem:ccil0:sidv0:sgdu0:02:02}) is an exact |
1593 |
expression (rather than an |
1594 |
inequality). |
1595 |
|
1596 |
Note in (\ref{eq:lem:ccil0:sidv0:sgdu0:02:02}) that |
1597 |
$(u_n b + u_{n-1}) \bmod v_{n-1} \in [0, v_{n-1}-1]$, and that in general |
1598 |
there is no reason to expect it cannot be zero. Thus, we can assume that |
1599 |
it \emph{is} zero, which will maximize $\hat{q}-q$. We can thus convert |
1600 |
(\ref{eq:lem:ccil0:sidv0:sgdu0:02:02}) into the inequality |
1601 |
|
1602 |
\begin{equation} |
1603 |
\label{eq:lem:ccil0:sidv0:sgdu0:02:03} |
1604 |
\hat{q} - q \leq \frac{u_n b + u_{n-1}}{v_{n-1}} |
1605 |
- \frac{u}{v} |
1606 |
+ \frac{u \bmod v}{v} . |
1607 |
\end{equation} |
1608 |
|
1609 |
In (\ref{eq:lem:ccil0:sidv0:sgdu0:02:03}) we can also observe that |
1610 |
$(u \bmod v)/v \in [0, (v-1)/v]$. If we replace this expression with |
1611 |
``1'' (which is unattainable, but barely), this will change the relational |
1612 |
operator from ``$\leq$'' to ``$<$'': |
1613 |
|
1614 |
\begin{equation} |
1615 |
\label{eq:lem:ccil0:sidv0:sgdu0:02:04} |
1616 |
\hat{q} - q < \frac{u_n b + u_{n-1}}{v_{n-1}} |
1617 |
- \frac{u}{v} |
1618 |
+ 1 . |
1619 |
\end{equation} |
1620 |
|
1621 |
The result we wish to show is that with $v_{n-1} \geq \lfloor b/2 \rfloor$, |
1622 |
$\hat{q}-q \leq 2$. To simplify the subsequent algebraic manipulations, note in |
1623 |
(\ref{eq:lem:ccil0:sidv0:sgdu0:02:04}) that |
1624 |
|
1625 |
\begin{eqnarray} |
1626 |
\label{eq:lem:ccil0:sidv0:sgdu0:02:05} |
1627 |
& \hat{q} - q \leq 2 & \\ |
1628 |
\nonumber & \vworkvertequiv & \\ |
1629 |
\label{eq:lem:ccil0:sidv0:sgdu0:02:06} |
1630 |
& \displaystyle \frac{u_n b + u_{n-1}}{v_{n-1}} |
1631 |
- \frac{u}{v} |
1632 |
+ 1 \leq 3 & \\ |
1633 |
\nonumber & \vworkvertequiv & \\ |
1634 |
\label{eq:lem:ccil0:sidv0:sgdu0:02:07} |
1635 |
& \displaystyle \frac{u_n b + u_{n-1}}{v_{n-1}} |
1636 |
- \frac{u}{v} \leq 2 & |
1637 |
\end{eqnarray} |
1638 |
|
1639 |
(\ref{eq:lem:ccil0:sidv0:sgdu0:02:06}) may be counterintuitive, so |
1640 |
further explanation is offered here. Since $\hat{q} \in \vworkintset$ and $q \in \vworkintset$, |
1641 |
$\hat{q}-q \in \vworkintset$. Thus, proving |
1642 |
(\ref{eq:lem:ccil0:sidv0:sgdu0:02:06}) or |
1643 |
(\ref{eq:lem:ccil0:sidv0:sgdu0:02:07}) proves that |
1644 |
$\hat{q}-q \in \{ \ldots, -1, 0, 1, 2 \}$ (however, by |
1645 |
Lemma \ref{lem:ccil0:sidv0:sgdu0:01}, $\hat{q}-q \geq 0$, so in fact |
1646 |
what would be proved is that $\hat{q}-q \in \{ 0, 1, 2 \}$). For |
1647 |
algebraic simplicity, |
1648 |
we choose to prove (\ref{eq:lem:ccil0:sidv0:sgdu0:02:07}). |
1649 |
|
1650 |
First, adjust numerator and denominator of the first term in |
1651 |
(\ref{eq:lem:ccil0:sidv0:sgdu0:02:07}) by $b^{n-1}$ so that the terms more closely |
1652 |
resemble $u$ and $v$ in |
1653 |
(\ref{eq:lem:ccil0:sidv0:sgdu0:01:02}) |
1654 |
and |
1655 |
(\ref{eq:lem:ccil0:sidv0:sgdu0:01:06}): |
1656 |
|
1657 |
\begin{equation} |
1658 |
\label{eq:lem:ccil0:sidv0:sgdu0:02:08} |
1659 |
\frac{u_n b^n + u_{n-1} b^{n-1}}{v_{n-1} b^{n-1}} |
1660 |
- \frac{u}{v} \leq 2 . |
1661 |
\end{equation} |
1662 |
|
1663 |
For logical implication to be maintained, |
1664 |
we must make the most pessimistic choices and assumptions possible in |
1665 |
(\ref{eq:lem:ccil0:sidv0:sgdu0:02:08}) in order to maximize the value |
1666 |
of the left side of the inequality. |
1667 |
The first assumption to be made is the error in estimating |
1668 |
$u$ and $v$ based on their most significant digits. It can be |
1669 |
seen that (\ref{eq:lem:ccil0:sidv0:sgdu0:02:08}) will be maximized if: |
1670 |
|
1671 |
\begin{itemize} |
1672 |
\item We assume that $u = u_{n} b^n + u_{n-1} b^{n-1}$ (i.e. that we estimate |
1673 |
$u$ precisely). |
1674 |
\item We assume that $v = v_{n-1} b^{n-1} + b^{n-1} - 1$ (i.e. that |
1675 |
we underestimate $v$ by the maximum amount possible). |
1676 |
\item We minimize the value of $v_{n-1} b^{n-1}$. |
1677 |
\end{itemize} |
1678 |
|
1679 |
Assuming that $u$ is estimated precisely yields |
1680 |
|
1681 |
\begin{equation} |
1682 |
\label{eq:lem:ccil0:sidv0:sgdu0:02:09} |
1683 |
\frac{u}{v_{n-1} b^{n-1}} |
1684 |
- \frac{u}{v} \leq 2 . |
1685 |
\end{equation} |
1686 |
|
1687 |
Assuming that $v$ is underestimated by the maximum amount possible |
1688 |
yields |
1689 |
|
1690 |
\begin{equation} |
1691 |
\label{eq:lem:ccil0:sidv0:sgdu0:02:10} |
1692 |
\frac{u}{v - b^{n-1} + 1} |
1693 |
- \frac{u}{v} \leq 2 . |
1694 |
\end{equation} |
1695 |
|
1696 |
Finally, with $b$ and $v$ fixed, $u$ can be maximized by noting that |
1697 |
$u \leq bv - 1$ (by the problem assumption that the quotient is a single digit). |
1698 |
However, for algebraic simplicity, we make the substitution $u=bv$ (rather than |
1699 |
$u=bv-1$), since the weaker upper bound is strong enough to prove the |
1700 |
first result we seek. |
1701 |
|
1702 |
\begin{equation} |
1703 |
\label{eq:lem:ccil0:sidv0:sgdu0:02:11} |
1704 |
\frac{bv}{v - b^{n-1} + 1} |
1705 |
- \frac{bv}{v} \leq 2 |
1706 |
\end{equation} |
1707 |
|
1708 |
\begin{equation} |
1709 |
\label{eq:lem:ccil0:sidv0:sgdu0:02:12} |
1710 |
\frac{bv}{v - b^{n-1} + 1} |
1711 |
- b \leq 2 |
1712 |
\end{equation} |
1713 |
|
1714 |
Solving (\ref{eq:lem:ccil0:sidv0:sgdu0:02:12}) |
1715 |
for $v$ yields |
1716 |
|
1717 |
\begin{equation} |
1718 |
\label{eq:lem:ccil0:sidv0:sgdu0:02:13} |
1719 |
v \geq \frac{b^n}{2} + b^{n-1} - \frac{1}{b} - 1 |
1720 |
\end{equation} |
1721 |
|
1722 |
Again using the assumption that $v$ is underestimated by the maximum amount |
1723 |
possible, we may make the substitution that $v = v_{n-1} b^{n-1} + b^{n-1} -1$, |
1724 |
leading to |
1725 |
|
1726 |
\begin{equation} |
1727 |
\label{eq:lem:ccil0:sidv0:sgdu0:02:13b} |
1728 |
v_{n-1} \geq \frac{b}{2} - \frac{1}{2 b^{n-2}} . |
1729 |
\end{equation} |
1730 |
|
1731 |
There are two cases to consider: $b$ even and $b$ odd. If $b$ is even, the |
1732 |
proof is complete, as $\lfloor b/2 \rfloor = b/2$ and the choice of |
1733 |
$v_{n-1} = \lfloor b/2 \rfloor$ will automatically satisfy |
1734 |
(\ref{eq:lem:ccil0:sidv0:sgdu0:02:13b}). However, if $b$ is odd, |
1735 |
$\lfloor b/2 \rfloor = b/2 - 1/2 < b/2 - 1/2b^{n-2}$, violating |
1736 |
(\ref{eq:lem:ccil0:sidv0:sgdu0:02:13b}), |
1737 |
and so we need to |
1738 |
further examine this case. |
1739 |
|
1740 |
If $b$ is odd and $v_{n-1} = \lfloor b/2 \rfloor$, then |
1741 |
$v_{n-1} = b/2 - 1/2$, violating (\ref{eq:lem:ccil0:sidv0:sgdu0:02:13b}). |
1742 |
However, any larger choice of $v_{n-1}$ (such as |
1743 |
$\lfloor b/2 \rfloor + 1$, $\lfloor b/2 \rfloor + 2$, etc.) satisfies |
1744 |
(\ref{eq:lem:ccil0:sidv0:sgdu0:02:13b}); so that it remains only to prove |
1745 |
the $v_{n-1} = \lfloor b/2 \rfloor = b/2 - 1/2$ |
1746 |
case. |
1747 |
|
1748 |
If $v_{n-1} = \lfloor b/2 \rfloor = b/2 - 1/2$, then |
1749 |
|
1750 |
\begin{eqnarray} |
1751 |
\label{eq:lem:ccil0:sidv0:sgdu0:02:14} |
1752 |
v & \in & \left[ |
1753 |
\left( \frac{b}{2} - \frac{1}{2}\right) b^{n-1}, |
1754 |
\left( \frac{b}{2} - \frac{1}{2}\right) b^{n-1} + b^{n-1} - 1 |
1755 |
\right] \\ |
1756 |
\nonumber & = & |
1757 |
\left[ |
1758 |
\frac{b^n}{2} - \frac{b^{n-1}}{2}, |
1759 |
\frac{b^n}{2} + \frac{b^{n-1}}{2} -1 |
1760 |
\right] . |
1761 |
\end{eqnarray} |
1762 |
|
1763 |
Note in this case that the estimation error ($v - v_{n-1}b^{n-1}$) and the |
1764 |
value of $v$ are not independent; and in fact it is this aspect |
1765 |
of the problem that has led to the violation of |
1766 |
(\ref{eq:lem:ccil0:sidv0:sgdu0:02:13b}) with $b$ odd and |
1767 |
$v_{n-1} = \lfloor b/2 \rfloor$. |
1768 |
|
1769 |
In order to prove the case of $b$ odd and $v = \lfloor b/2 \rfloor = b/2 - 1/2$, |
1770 |
we must reexamine some simplifying assumptions made earlier in order to obtain |
1771 |
tighter inequalities. In (\ref{eq:lem:ccil0:sidv0:sgdu0:02:02}), we can no |
1772 |
longer accept the maximum of $(u \bmod v)/v$ as one; instead we construct the |
1773 |
tighter inequality |
1774 |
|
1775 |
\begin{eqnarray} |
1776 |
\label{eq:lem:ccil0:sidv0:sgdu0:02:15} |
1777 |
& \displaystyle \hat{q} - q \leq \frac{u_n b + u_{n-1}}{v_{n-1}} |
1778 |
- \frac{u}{v} |
1779 |
+ \frac{u \bmod v}{v} & \\ |
1780 |
\nonumber & \vworkvimp & \\ |
1781 |
\label{eq:lem:ccil0:sidv0:sgdu0:02:16} |
1782 |
& \displaystyle \hat{q} - q \leq \frac{u_n b + u_{n-1}}{v_{n-1}} |
1783 |
- \frac{u}{v} |
1784 |
+ \frac{v-1}{v} , & |
1785 |
\end{eqnarray} |
1786 |
|
1787 |
which leads to |
1788 |
|
1789 |
\begin{equation} |
1790 |
\label{eq:lem:ccil0:sidv0:sgdu0:02:17} |
1791 |
\frac{u_n b + u_{n-1}}{v_{n-1}} |
1792 |
- \frac{u+1}{v} |
1793 |
< 2 . |
1794 |
\end{equation} |
1795 |
|
1796 |
In order to maximize the left-hand side of |
1797 |
(\ref{eq:lem:ccil0:sidv0:sgdu0:02:17}), we assume that we |
1798 |
estimate $u$ exactly so that $u = u_n b^n + u_{n-1} b^{n-1}$, |
1799 |
yielding |
1800 |
|
1801 |
\begin{equation} |
1802 |
\label{eq:lem:ccil0:sidv0:sgdu0:02:18} |
1803 |
\frac{u}{v_{n-1} b^{n-1}} |
1804 |
- \frac{u+1}{v} |
1805 |
< 2 . |
1806 |
\end{equation} |
1807 |
|
1808 |
We also assume that $u$ is the maximum value |
1809 |
possible, $u=bv-1$, leading to |
1810 |
|
1811 |
\begin{equation} |
1812 |
\label{eq:lem:ccil0:sidv0:sgdu0:02:19} |
1813 |
\frac{bv-1}{v_{n-1} b^{n-1}} |
1814 |
- b |
1815 |
< 2 . |
1816 |
\end{equation} |
1817 |
|
1818 |
Finally, we assume that $v$ is the upper limit in |
1819 |
(\ref{eq:lem:ccil0:sidv0:sgdu0:02:14}), |
1820 |
$v=b^n/2 + b^{n-1}/2 - 1$, and substitute the known value of |
1821 |
$v_{n-1}$ for the case being proved, $v_{n-1} = b/2-1/2$, yielding |
1822 |
|
1823 |
\begin{equation} |
1824 |
\label{eq:lem:ccil0:sidv0:sgdu0:02:20} |
1825 |
\frac{b \left( \frac{b^n}{2} + \frac{b^{n-1}}{2} - 1 \right) - 1} |
1826 |
{\left( \frac{b}{2} - \frac{1}{2} \right) b^{n-1}} |
1827 |
- b |
1828 |
< 2 . |
1829 |
\end{equation} |
1830 |
|
1831 |
Simplification of (\ref{eq:lem:ccil0:sidv0:sgdu0:02:20}) |
1832 |
will establish that it is always true. This completes the proof. |
1833 |
\end{vworklemmaproof} |
1834 |
\vworklemmafooter{} |
1835 |
|
1836 |
Lemmas \ref{lem:ccil0:sidv0:sgdu0:01} and |
1837 |
\ref{lem:ccil0:sidv0:sgdu0:02}, |
1838 |
standing alone, lead to a good implementation of |
1839 |
division without any further results. If it is known that |
1840 |
$0 \leq \hat{q} - q \leq 2$, Algorithm \ref{alg:ccil0:sidv0:sgdu0:01} |
1841 |
can be trivially modified to only calculate |
1842 |
$\hat{q}$ (omitting the additional tests in Step \ref{enumstep:alg:ccil0:sidv0:sgdu0:01:03}), |
1843 |
and then |
1844 |
to include up to two add-back steps |
1845 |
(duplication of Step \ref{enumstep:alg:ccil0:sidv0:sgdu0:01:06}). |
1846 |
Although such an algorithm would be |
1847 |
satisfactory, it has the disadvantage that the add-back steps would |
1848 |
be executed very frequently, slowing the algorithm substantially, especially |
1849 |
for long operands. |
1850 |
We now show that the additional tests in Step \ref{enumstep:alg:ccil0:sidv0:sgdu0:01:03} of |
1851 |
Algorithm \ref{alg:ccil0:sidv0:sgdu0:01} can eliminate |
1852 |
altogether the case of $\hat{q}-q = 2$ (Lemma \ref{lem:ccil0:sidv0:sgdu0:03}), and |
1853 |
can with a probability close to unity eliminate the |
1854 |
case of $\hat{q}-q = 1$ (Lemmas \ref{lem:ccil0:sidv0:sgdu0:04} |
1855 |
and \ref{lem:ccil0:sidv0:sgdu0:05}). Together these tests, which are present in |
1856 |
the statement of Algorithm \ref{alg:ccil0:sidv0:sgdu0:01}, |
1857 |
reduce add-back (Step \ref{enumstep:alg:ccil0:sidv0:sgdu0:01:06}) to rare occurrence, and create a more |
1858 |
efficient algorithm than would be possible with the |
1859 |
results of Lemmas \ref{lem:ccil0:sidv0:sgdu0:01} and \ref{lem:ccil0:sidv0:sgdu0:02} alone. |
1860 |
|
1861 |
\begin{vworklemmastatementpar} |
1862 |
{\mbox{\boldmath$\hat{q} v_{n-2} \leq b \hat{r} + u_{j+n-2} \vworkhimp 0 \leq \hat{q} - q \leq 1$}} |
1863 |
\label{lem:ccil0:sidv0:sgdu0:03} |
1864 |
If the divisor normalization requirement ($v_{n-1} \geq \lfloor b/2 \rfloor$) as specified in |
1865 |
Step \ref{enumstep:alg:ccil0:sidv0:sgdu0:01:01} of |
1866 |
Algorithm \ref{alg:ccil0:sidv0:sgdu0:01} is met, then |
1867 |
|
1868 |
\begin{equation} |
1869 |
\label{eq:lem:ccil0:sidv0:sgdu0:03:01} |
1870 |
\hat{q} v_{n-2} \leq b \hat{r} + u_{j+n-2} \vworkhimp 0 \leq \hat{q} - q \leq 1 . |
1871 |
\end{equation} |
1872 |
\end{vworklemmastatementpar} |
1873 |
\begin{vworklemmaproof} |
1874 |
For reference, note that: |
1875 |
|
1876 |
\begin{eqnarray} |
1877 |
\label{eq:lem:ccil0:sidv0:sgdu0:03:02} |
1878 |
u & = & u_{j+n} b^{j+n} + u_{j+n-1} b^{j+n-1} + \ldots{} + u_1 b + u_0 \\ |
1879 |
\label{eq:lem:ccil0:sidv0:sgdu0:03:03} |
1880 |
v & = & v_{n-1} b^{n-1} + v_{n-2} b^{n-2} + \ldots{} + v_1 b + v_0 |
1881 |
\end{eqnarray} |
1882 |
|
1883 |
By definition, we the remainder |
1884 |
$\hat{r}$ has the value |
1885 |
$\hat{r} = u_{j+n} b + u_{j+n-1} - \hat{q} v_{n-1}$. Substituting this value |
1886 |
into (\ref{eq:lem:ccil0:sidv0:sgdu0:03:01}) produces |
1887 |
|
1888 |
\begin{equation} |
1889 |
\label{eq:lem:ccil0:sidv0:sgdu0:03:04} |
1890 |
\hat{q} v_{n-2} \leq b (u_{j+n} b + u_{j+n-1} - \hat{q} v_{n-1}) + u_{j+n-2} , |
1891 |
\end{equation} |
1892 |
|
1893 |
and solving for $\hat{q}$ yields |
1894 |
|
1895 |
\begin{equation} |
1896 |
\label{eq:lem:ccil0:sidv0:sgdu0:03:05} |
1897 |
\hat{q} \leq \frac{u_{j+n} b^2 + u_{j+n-1}b + u_{j+n-2}}{v_{n-1}b + v_{n-2}}. |
1898 |
\end{equation} |
1899 |
|
1900 |
It is known from Lemma \ref{lem:ccil0:sidv0:sgdu0:01} that the type of estimate |
1901 |
represented by the floor of the right-hand size of (\ref{eq:lem:ccil0:sidv0:sgdu0:03:05}) |
1902 |
can be no less than $q$, leading to |
1903 |
|
1904 |
\begin{eqnarray} |
1905 |
\label{eq:lem:ccil0:sidv0:sgdu0:03:06} |
1906 |
& \displaystyle q \leq \hat{q} |
1907 |
\leq |
1908 |
\left\lfloor { \frac{u_{j+n} b^2 + u_{j+n-1}b + u_{j+n-2}}{v_{n-1}b + v_{n-2}} } \right\rfloor & \\ |
1909 |
\nonumber & \displaystyle \leq |
1910 |
\frac{u_{j+n} b^2 + u_{j+n-1}b + u_{j+n-2}}{v_{n-1}b + v_{n-2}}. & |
1911 |
\end{eqnarray} |
1912 |
|
1913 |
Because $q, \hat{q} \in \vworkintset$, it is only necessary to prove that |
1914 |
|
1915 |
\begin{equation} |
1916 |
\label{eq:lem:ccil0:sidv0:sgdu0:03:07} |
1917 |
\frac{u_{j+n} b^2 + u_{j+n-1}b + u_{j+n-2}}{v_{n-1}b + v_{n-2}} - |
1918 |
\left\lfloor \frac{u}{v} \right\rfloor |
1919 |
< 2 |
1920 |
\end{equation} |
1921 |
|
1922 |
in order to prove that $\hat{q}-q \leq 1$. Using |
1923 |
(\cmtnzeroxrefhyphen\ref{eq:cmtn0:sfcf0:02}), |
1924 |
(\ref{eq:lem:ccil0:sidv0:sgdu0:03:07}) can be rewritten as |
1925 |
|
1926 |
\begin{equation} |
1927 |
\label{eq:lem:ccil0:sidv0:sgdu0:03:08} |
1928 |
\frac{u_{j+n} b^2 + u_{j+n-1}b + u_{j+n-2}}{v_{n-1}b + v_{n-2}} - |
1929 |
\frac{u}{v} - |
1930 |
\frac{u \bmod v}{v} |
1931 |
< 2 . |
1932 |
\end{equation} |
1933 |
|
1934 |
In order for implication to hold, we must make the most pessimistic |
1935 |
assumptions about $u \bmod v$ (those which maximize it). The maximum value |
1936 |
of $u \bmod v$ is $v-1$, leading to |
1937 |
|
1938 |
\begin{equation} |
1939 |
\label{eq:lem:ccil0:sidv0:sgdu0:03:09} |
1940 |
\frac{u_{j+n} b^2 + u_{j+n-1}b + u_{j+n-2}}{v_{n-1}b + v_{n-2}} - |
1941 |
\frac{u + 1}{v} |
1942 |
< 1 . |
1943 |
\end{equation} |
1944 |
|
1945 |
In order to maximize the left side of (\ref{}), |
1946 |
we must assume that $u$ is maximized, $v$ is minimized, |
1947 |
|
1948 |
|
1949 |
\end{vworklemmaproof} |
1950 |
\vworklemmafooter{} |
1951 |
|
1952 |
\begin{vworklemmastatementpar} |
1953 |
{\mbox{\boldmath$\hat{q} v_{n-2} > b \hat{r} + u_{n-2} \vworkhimp q < \hat{q}$}} |
1954 |
\label{lem:ccil0:sidv0:sgdu0:04} |
1955 |
Given $\hat{q} > 0$, an estimate of $q$, and the remainder |
1956 |
based on the estimate, $\hat{r} = u_n b + u_{n-1} - \hat{q} v_{n-1}$, |
1957 |
|
1958 |
\begin{equation} |
1959 |
\label{eq:lem:ccil0:sidv0:sgdu0:04:01} |
1960 |
\hat{q} v_{n-2} > b \hat{r} + u_{n-2} \vworkhimp \hat{q} > q . |
1961 |
\end{equation} |
1962 |
\end{vworklemmastatementpar} |
1963 |
\begin{vworklemmaproof} |
1964 |
We make the assumption that $v_{n-1}$ and $v_{n-2}$ are not both 0. |
1965 |
$v_{n-1} > 0$ is guaranteed by |
1966 |
the normalization of |
1967 |
$v$ in Algorithm \ref{alg:ccil0:sidv0:sgdu0:01}. |
1968 |
|
1969 |
\begin{equation} |
1970 |
\label{eq:lem:ccil0:sidv0:sgdu0:04:02} |
1971 |
\hat{q} v_{n-2} > b \hat{r} + u_{n-2} |
1972 |
\end{equation} |
1973 |
|
1974 |
\begin{equation} |
1975 |
\label{eq:lem:ccil0:sidv0:sgdu0:04:03} |
1976 |
\hat{q} v_{n-2} > b (u_n b + u_{n-1} - \hat{q} v_{n-1}) + u_{n-2} |
1977 |
\end{equation} |
1978 |
|
1979 |
Solving (\ref{eq:lem:ccil0:sidv0:sgdu0:04:03}) for $\hat{q}$ yields |
1980 |
|
1981 |
\begin{equation} |
1982 |
\label{eq:lem:ccil0:sidv0:sgdu0:04:04} |
1983 |
\hat{q} |
1984 |
> |
1985 |
\frac{u_n b^2 + u_{n-1} b + u_{n-2}}{v_{n-1} b + v_{n-2}} |
1986 |
\geq |
1987 |
\left\lfloor {\frac{u_n b^2 + u_{n-1} b + u_{n-2}}{v_{n-1} b + v_{n-2}}} \right\rfloor |
1988 |
. |
1989 |
\end{equation} |
1990 |
|
1991 |
Note that the right-hand term of |
1992 |
(\ref{eq:lem:ccil0:sidv0:sgdu0:04:04}) |
1993 |
is similar in form to the estimate $\hat{q}$ in |
1994 |
Lemma \ref{lem:ccil0:sidv0:sgdu0:01}, where it is proved that |
1995 |
$\hat{q} \geq q$. It is possible to use identical algebraic technique |
1996 |
as is used in Lemma \ref{lem:ccil0:sidv0:sgdu0:01} in order to prove that |
1997 |
|
1998 |
\begin{equation} |
1999 |
\label{eq:lem:ccil0:sidv0:sgdu0:04:05} |
2000 |
\left\lfloor {\frac{u_n b^2 + u_{n-1} b + u_{n-2}}{v_{n-1} b + v_{n-2}}} \right\rfloor |
2001 |
\geq q, |
2002 |
\end{equation} |
2003 |
|
2004 |
and it follows that |
2005 |
|
2006 |
\begin{equation} |
2007 |
\label{eq:lem:ccil0:sidv0:sgdu0:04:06} |
2008 |
\hat{q} |
2009 |
> |
2010 |
\frac{u_n b^2 + u_{n-1} b + u_{n-2}}{v_{n-1} b + v_{n-2}} |
2011 |
\geq |
2012 |
\left\lfloor {\frac{u_n b^2 + u_{n-1} b + u_{n-2}}{v_{n-1} b + v_{n-2}}} \right\rfloor |
2013 |
\geq q, |
2014 |
\end{equation} |
2015 |
|
2016 |
and thus $\hat{q} > q$. |
2017 |
\end{vworklemmaproof} |
2018 |
\vworklemmafooter{} |
2019 |
|
2020 |
\begin{vworklemmastatementpar} |
2021 |
{Add-back occurs with approximate probability \mbox{\boldmath$2/b$}} |
2022 |
\label{lem:ccil0:sidv0:sgdu0:05} |
2023 |
The estimate of $q$ provided by (\ref{eq:ccil0:sidv0:sgdu0:01}), |
2024 |
\end{vworklemmastatementpar} |
2025 |
\begin{vworklemmaproof} |
2026 |
\end{vworklemmaproof} |
2027 |
\vworklemmafooter{} |
2028 |
|
2029 |
|
2030 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
2031 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
2032 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
2033 |
\subsection{Division Of Signed Operands} |
2034 |
|
2035 |
|
2036 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
2037 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
2038 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
2039 |
\subsection{Large Dividends With Machine-Native Divisors} |
2040 |
\label{ccil0:sidv0:sldm0} |
2041 |
|
2042 |
Division of arbitrary-sized operands (Section \ref{ccil0:sidv0:sgdu0}) is |
2043 |
a costly operation. In many practical applications, we are able to exploit |
2044 |
data sizes of operands or special relationships between the values of |
2045 |
operands to use the instruction set of the machine more effectively. |
2046 |
In this subsection, we investigate what optimizations we may achieve when: |
2047 |
|
2048 |
\begin{itemize} |
2049 |
\item We wish to calculate the quotient and remainder of |
2050 |
unsigned integers $p$ and $q$: $p/q$ and |
2051 |
$p \bmod{} q$; \emph{and} |
2052 |
\item The machine possesses unsigned division instructions |
2053 |
which provide both a quotient and a remainder from |
2054 |
a division; \emph{and} |
2055 |
\item The bitsize of the divisor $q$ is not larger than can |
2056 |
be accomodated (as a divisor) by machine division instructions. |
2057 |
\end{itemize} |
2058 |
|
2059 |
Processors which possess integer division instructions usually |
2060 |
possess one of two types of instructions: |
2061 |
|
2062 |
\begin{itemize} |
2063 |
\item Instructions where the the divisor, quotient and remainder are |
2064 |
$Q$ bits, but the dividend is $2Q$ bits (we call these |
2065 |
``large dividend'' instructions). For example, an |
2066 |
instruction which accepts a 16-bit dividend and an |
2067 |
8-bit divisor to produce an 8-bit quotient and an 8-bit remainder is |
2068 |
typical. |
2069 |
With such instructions, overflow is possible, and is always detectable. |
2070 |
However, in this subsection we never describe algorithms which detect |
2071 |
overflow---instead, we arrange for data values which cannot generate an |
2072 |
overflow. |
2073 |
\item Instructions where the dividend, divisor, quotient, and remainder |
2074 |
are all $Q$ bits (we call these ``small dividend'' instructions). |
2075 |
With such instructions, overflow is not possible. |
2076 |
\end{itemize} |
2077 |
|
2078 |
We call the bitsize $Q$ a \emph{chunk}. We use \emph{chunk} rather than |
2079 |
\emph{word} because the chunksize and wordsize in general are not |
2080 |
required to be the same. |
2081 |
|
2082 |
For the remainder of this discussion, we assume large dividend |
2083 |
instructions (the first category above). |
2084 |
The algorithms developed can be implemented on small dividend machines |
2085 |
by halving data sizes so that the divisor fills no more than half |
2086 |
of the available bits. |
2087 |
|
2088 |
We assume that we are interested in calculating the integer quotient |
2089 |
$\lfloor{}p/q\rfloor$ and remainder $p \bmod{} q$ of two unsigned |
2090 |
integers $p$ and $q$, where $p$ is of size $P$ bits and $q$ is |
2091 |
of size $Q$ bits. For simplicity and without detracting from the |
2092 |
generality of the solution, we assume that $Q \vworkdivides{} P$. |
2093 |
|
2094 |
We then seek to calculate |
2095 |
|
2096 |
\begin{eqnarray} |
2097 |
\label{eq:ccil0:sidv0:sldm0:001} |
2098 |
\frac{p}{q} & = & \frac{2^{P-Q} p_{[P/Q-1]} + 2^{P-2Q} p_{[P/Q-2]} + \ldots{} + 2^{Q} p_{[1]} + p_{[0]}}{q_{[0]}} \\ |
2099 |
\nonumber & = & \frac{\sum_{i=0}^{P/Q-1} 2^{iQ} p_{[i]}}{q_{[0]}} . |
2100 |
\end{eqnarray} |
2101 |
|
2102 |
\noindent{}Using the integer identity |
2103 |
|
2104 |
\begin{equation} |
2105 |
\label{eq:ccil0:sidv0:sldm0:002} |
2106 |
\frac{a}{b} = \left\lfloor{\frac{a}{b}}\right\rfloor + \frac{a \bmod b}{b} , |
2107 |
\end{equation} |
2108 |
|
2109 |
\noindent{}we can reform (\ref{eq:ccil0:sidv0:sldm0:001}) into |
2110 |
|
2111 |
\begin{eqnarray} |
2112 |
\nonumber |
2113 |
\frac{p}{q} & = & 2^{P-Q} \left\lfloor{\frac{p_{[P/Q-1]}}{q_{[0]}}}\right\rfloor \\ |
2114 |
\label{eq:ccil0:sidv0:sldm0:003} |
2115 |
& + & 2^{P-Q} \frac{p_{[P/Q-1]}\bmod q_{[0]}}{q_{[0]}} |
2116 |
+ 2^{P-2Q} \frac{p_{[P/Q-2]}}{q_{[0]}} \\ |
2117 |
\nonumber & + & \frac{\sum_{i=0}^{P/Q-3} 2^{iQ} p_{[i]}}{q_{[0]}} , |
2118 |
\end{eqnarray} |
2119 |
|
2120 |
\noindent{}which can be polished slightly to yield |
2121 |
|
2122 |
\begin{eqnarray} |
2123 |
\nonumber |
2124 |
\frac{p}{q} & = & 2^{P-Q} \left\lfloor{\frac{p_{[P/Q-1]}}{q_{[0]}}}\right\rfloor \\ |
2125 |
\label{eq:ccil0:sidv0:sldm0:004} |
2126 |
& + & 2^{P-2Q} \frac{2^Q (p_{[P/Q-1]}\bmod q_{[0]}) + p_{[P/Q-2]}}{q_{[0]}} \\ |
2127 |
\nonumber & + & \frac{\sum_{i=0}^{P/Q-3} 2^{iQ} p_{[i]}}{q_{[0]}} . |
2128 |
\end{eqnarray} |
2129 |
|
2130 |
Note in (\ref{eq:ccil0:sidv0:sldm0:004}) that the first term, |
2131 |
$\lfloor{}p_{[P/Q-1]} / q_{[0]}\rfloor$, as well as |
2132 |
a portion of the second term, $p_{[P/Q-1]}\bmod q_{[0]}$, can be |
2133 |
calculated using a single machine division instruction with |
2134 |
$p_{[P/Q-1]}$ as the dividend and $q_{[0]}$ as the divisor. |
2135 |
Note also that multiplication of an integer by a power of 2 |
2136 |
can be achieved by placing the integer correctly within the |
2137 |
result. In this regard note that $Q=8$ or $Q=16$ are |
2138 |
the most typical cases, and so the placement can be achieved simply |
2139 |
by selecting the correct memory address. |
2140 |
|
2141 |
It is initially unclear whether we can evaluate or reduce the fraction in the |
2142 |
second term of (\ref{eq:ccil0:sidv0:sldm0:004}), |
2143 |
$[2^Q (p_{[P/Q-1]}\bmod q_{[0]}) + p_{[P/Q-2]}] / q_{[0]}$, |
2144 |
using a single large dividend machine instruction, because |
2145 |
the upper chunk of the dividend is populated with non-zero bits |
2146 |
(specifically, $p_{[P/Q-1]}\bmod q_{[0]}$), and it seems that |
2147 |
a division overflow may be possible. However, with some thought, |
2148 |
it is clear that $p_{[P/Q-1]}\bmod q_{[0]} \leq q_{[0]} - 1$ and |
2149 |
$p_{[P/Q-2]} \leq 2^Q - 1$, thus the largest numerator possible |
2150 |
is $2^Q q_{[0]} - 1$, which, when divided by $q_{[0]}$, will result |
2151 |
in a quotient and remainder of $2^Q - 1$. Thus, no division overflow |
2152 |
can occur, and the fraction in the |
2153 |
second term of (\ref{eq:ccil0:sidv0:sldm0:004}) can be evaluated |
2154 |
using a large divisor integer machine instruction. |
2155 |
|
2156 |
The fraction in the |
2157 |
second term of (\ref{eq:ccil0:sidv0:sldm0:004}) can be simplified |
2158 |
using (\ref{eq:ccil0:sidv0:sldm0:002}) to yield: |
2159 |
|
2160 |
\begin{eqnarray} |
2161 |
\nonumber |
2162 |
\frac{p}{q} & = & 2^{P-Q} \left\lfloor{\frac{p_{[P/Q-1]}}{q_{[0]}}}\right\rfloor \\ |
2163 |
\label{eq:ccil0:sidv0:sldm0:005} |
2164 |
& + & 2^{P-2Q} \left\lfloor{\frac{2^Q (p_{[P/Q-1]}\bmod q_{[0]}) + p_{[P/Q-2]}}{q_{[0]}}}\right\rfloor \\ |
2165 |
\nonumber & + & 2^{P-2Q} \frac{(2^Q (p_{[P/Q-1]}\bmod q_{[0]}) + p_{[P/Q-2]}) \bmod q_{[0]}}{q_{[0]}} \\ |
2166 |
\nonumber & + & \frac{\sum_{i=0}^{P/Q-3} 2^{iQ} p_{[i]}}{q_{[0]}} . |
2167 |
\end{eqnarray} |
2168 |
|
2169 |
The process of combining adjacent terms can be continued until all |
2170 |
divisions and modulo operations necessary can be carried out using |
2171 |
long dividend division instructions. If we envision a |
2172 |
long-dividend division instruction as a functional block that |
2173 |
accepts a $2Q$-bit dividend and a $Q$-bit divisor to produce a |
2174 |
$Q$-bit quotient and a $Q$-bit remainder |
2175 |
(Figure \ref{fig:ccil0:sidv0:sldm0:00}), then we can draw the |
2176 |
entire division as outlined by (\ref{eq:ccil0:sidv0:sldm0:005}) |
2177 |
as shown in Figure \ref{fig:ccil0:sidv0:sldm0:01}. |
2178 |
|
2179 |
\begin{figure} |
2180 |
\centering |
2181 |
\includegraphics[width=4.6in]{c_cil0/lddvblk.eps} |
2182 |
\caption{Long Dividend Division Machine Instruction As A Functional Block} |
2183 |
\label{fig:ccil0:sidv0:sldm0:00} |
2184 |
\end{figure} |
2185 |
|
2186 |
\begin{figure} |
2187 |
\centering |
2188 |
\includegraphics[width=4.6in]{c_cil0/ldmnblk.eps} |
2189 |
\caption{Long Dividend/Machine-Native Divisor Division In Functional Block Form} |
2190 |
\label{fig:ccil0:sidv0:sldm0:01} |
2191 |
\end{figure} |
2192 |
|
2193 |
The following example illustrates how to apply the technique. |
2194 |
|
2195 |
\begin{vworkexamplestatement} |
2196 |
\label{ex:ccil0:sidv0:sldm0:01} |
2197 |
Implement 32/8 unsigned division on the TMS370C8 processor, which is |
2198 |
characterized by a 16/8 division instruction. |
2199 |
\end{vworkexamplestatement} |
2200 |
\begin{vworkexampleparsection}{Solution} |
2201 |
It would be possible to prepare an implementation directly from |
2202 |
Figure \ref{fig:ccil0:sidv0:sldm0:01}: however, it may be |
2203 |
more instructive to work through a solution without the |
2204 |
aid of this figure. |
2205 |
|
2206 |
In the case of the TMS370C8, the chunk size $Q$ is 8 bits; therefore |
2207 |
$Q=8$. The problem statement indicates that we must accept 32-bit dividends; |
2208 |
therefore $P=32$. Thus |
2209 |
|
2210 |
\begin{equation} |
2211 |
\label{eq:ex:ccil0:sidv0:sldm0:01:001} |
2212 |
p = 2^{24} p_{[3]} + 2^{16} p_{[2]} + 2^{8} p_{[1]} + p_{[0]} |
2213 |
\end{equation} |
2214 |
|
2215 |
\noindent{}and |
2216 |
|
2217 |
\begin{equation} |
2218 |
\label{eq:ex:ccil0:sidv0:sldm0:01:002} |
2219 |
q = q_{[0]} . |
2220 |
\end{equation} |
2221 |
|
2222 |
\noindent{}Thus the quotient and remainder we would like to determine, |
2223 |
$\lfloor p/q \rfloor$ and $p \bmod q$, can be obtained by repeated |
2224 |
application of (\ref{eq:ccil0:sidv0:sldm0:002}) as shown |
2225 |
in Equations (\ref{eq:ex:ccil0:sidv0:sldm0:01:003}) |
2226 |
through (\ref{eq:ex:ccil0:sidv0:sldm0:01:011}). |
2227 |
|
2228 |
\begin{equation} |
2229 |
\label{eq:ex:ccil0:sidv0:sldm0:01:003} |
2230 |
\frac{p}{q} |
2231 |
= |
2232 |
\frac{2^{24} p_{[3]} + 2^{16} p_{[2]} + 2^{8} p_{[1]} + p_{[0]}}{q_{[0]}} |
2233 |
\end{equation} |
2234 |
|
2235 |
\begin{equation} |
2236 |
\label{eq:ex:ccil0:sidv0:sldm0:01:004} |
2237 |
\frac{p}{q} |
2238 |
= |
2239 |
2^{24} \frac{p_{[3]}}{q_{[0]}} |
2240 |
+ |
2241 |
2^{16} \frac{p_{[2]}}{q_{[0]}} |
2242 |
+ |
2243 |
2^{8} \frac{p_{[1]}}{q_{[0]}} |
2244 |
+ |
2245 |
\frac{p_{[0]}}{q_{[0]}} |
2246 |
\end{equation} |
2247 |
|
2248 |
\begin{equation} |
2249 |
\label{eq:ex:ccil0:sidv0:sldm0:01:005} |
2250 |
\frac{p}{q} |
2251 |
= |
2252 |
2^{24} \left\lfloor{\frac{p_{[3]}}{q_{[0]}}}\right\rfloor |
2253 |
+ |
2254 |
2^{24} \frac{(p_{[3]} \bmod q_{[0]})}{q_{[0]}} |
2255 |
+ |
2256 |
2^{16} \frac{p_{[2]}}{q_{[0]}} |
2257 |
+ |
2258 |
2^{8} \frac{p_{[1]}}{q_{[0]}} |
2259 |
+ |
2260 |
\frac{p_{[0]}}{q_{[0]}} |
2261 |
\end{equation} |
2262 |
|
2263 |
\begin{equation} |
2264 |
\label{eq:ex:ccil0:sidv0:sldm0:01:006} |
2265 |
\frac{p}{q} |
2266 |
= |
2267 |
2^{24} \left\lfloor{\frac{p_{[3]}}{q_{[0]}}}\right\rfloor |
2268 |
+ |
2269 |
2^{16} \frac{2^8 (p_{[3]} \bmod q_{[0]}) + p_{[2]}}{q_{[0]}} |
2270 |
+ |
2271 |
2^{8} \frac{p_{[1]}}{q_{[0]}} |
2272 |
+ |
2273 |
\frac{p_{[0]}}{q_{[0]}} |
2274 |
\end{equation} |
2275 |
|
2276 |
\begin{eqnarray} |
2277 |
\label{eq:ex:ccil0:sidv0:sldm0:01:007} |
2278 |
\frac{p}{q} & = & 2^{24} \left\lfloor{\frac{p_{[3]}}{q_{[0]}}}\right\rfloor |
2279 |
+ |
2280 |
2^{16} \left\lfloor{\frac{2^8 (p_{[3]} \bmod q_{[0]}) + p_{[2]}}{q_{[0]}}}\right\rfloor \\ |
2281 |
\nonumber & + & |
2282 |
2^{16} \frac{(2^8 (p_{[3]} \bmod q_{[0]}) + p_{[2]}) \bmod q_{[0]}}{q_{[0]}} |
2283 |
+ |
2284 |
2^{8} \frac{p_{[1]}}{q_{[0]}} |
2285 |
+ |
2286 |
\frac{p_{[0]}}{q_{[0]}} |
2287 |
\end{eqnarray} |
2288 |
|
2289 |
\begin{eqnarray} |
2290 |
\label{eq:ex:ccil0:sidv0:sldm0:01:008} |
2291 |
\frac{p}{q} & = & 2^{24} \left\lfloor{\frac{p_{[3]}}{q_{[0]}}}\right\rfloor |
2292 |
+ |
2293 |
2^{16} \left\lfloor{\frac{2^8 (p_{[3]} \bmod q_{[0]}) + p_{[2]}}{q_{[0]}}}\right\rfloor \\ |
2294 |
\nonumber & + & |
2295 |
2^{8} \frac{2^8((2^8 (p_{[3]} \bmod q_{[0]}) + p_{[2]}) \bmod q_{[0]}) + p_{[1]}}{q_{[0]}} |
2296 |
+ |
2297 |
\frac{p_{[0]}}{q_{[0]}} |
2298 |
\end{eqnarray} |
2299 |
|
2300 |
\begin{eqnarray} |
2301 |
\nonumber\frac{p}{q} & = & 2^{24} \left\lfloor{\frac{p_{[3]}}{q_{[0]}}}\right\rfloor |
2302 |
+ |
2303 |
2^{16} \left\lfloor{\frac{2^8 (p_{[3]} \bmod q_{[0]}) + p_{[2]}}{q_{[0]}}}\right\rfloor \\ |
2304 |
\label{eq:ex:ccil0:sidv0:sldm0:01:009} |
2305 |
& + & |
2306 |
2^{8} \left\lfloor{\frac{2^8((2^8 (p_{[3]} \bmod q_{[0]}) + p_{[2]}) \bmod q_{[0]}) + p_{[1]}}{q_{[0]}}}\right\rfloor \\ |
2307 |
\nonumber & + & |
2308 |
2^{8} \frac{(2^8((2^8 (p_{[3]} \bmod q_{[0]}) + p_{[2]}) \bmod q_{[0]}) + p_{[1]}) \bmod q_{[0]}}{q_{[0]}} \\ |
2309 |
\nonumber & + & |
2310 |
\frac{p_{[0]}}{q_{[0]}} |
2311 |
\end{eqnarray} |
2312 |
|
2313 |
\begin{eqnarray} |
2314 |
\nonumber\frac{p}{q} & = & 2^{24} \left\lfloor{\frac{p_{[3]}}{q_{[0]}}}\right\rfloor |
2315 |
+ |
2316 |
2^{16} \left\lfloor{\frac{2^8 (p_{[3]} \bmod q_{[0]}) + p_{[2]}}{q_{[0]}}}\right\rfloor \\ |
2317 |
\label{eq:ex:ccil0:sidv0:sldm0:01:010} |
2318 |
& + & |
2319 |
2^{8} \left\lfloor{\frac{2^8((2^8 (p_{[3]} \bmod q_{[0]}) + p_{[2]}) \bmod q_{[0]}) + p_{[1]}}{q_{[0]}}}\right\rfloor \\ |
2320 |
\nonumber & + & |
2321 |
\frac{2^8((2^8((2^8 (p_{[3]} \bmod q_{[0]}) + p_{[2]}) \bmod q_{[0]}) + p_{[1]}) \bmod q_{[0]}) + p_{[0]}}{q_{[0]}} |
2322 |
\end{eqnarray} |
2323 |
|
2324 |
\begin{eqnarray} |
2325 |
\nonumber & \displaystyle \frac{p}{q} = 2^{24} \left\lfloor{\frac{p_{[3]}}{q_{[0]}}}\right\rfloor |
2326 |
+ |
2327 |
2^{16} \left\lfloor{\frac{2^8 (p_{[3]} \bmod q_{[0]}) + p_{[2]}}{q_{[0]}}}\right\rfloor & \\ |
2328 |
\label{eq:ex:ccil0:sidv0:sldm0:01:011} |
2329 |
& \displaystyle + |
2330 |
2^{8} \left\lfloor{\frac{2^8((2^8 (p_{[3]} \bmod q_{[0]}) + p_{[2]}) \bmod q_{[0]}) + p_{[1]}}{q_{[0]}}}\right\rfloor & \\ |
2331 |
\nonumber & \displaystyle + |
2332 |
\left\lfloor{\frac{2^8((2^8((2^8 (p_{[3]} \bmod q_{[0]}) + p_{[2]}) \bmod q_{[0]}) + p_{[1]}) \bmod q_{[0]}) + p_{[0]}}{q_{[0]}}}\right\rfloor + & \\ |
2333 |
\nonumber |
2334 |
& \displaystyle \frac{(2^8((2^8((2^8 (p_{[3]} \bmod q_{[0]}) + p_{[2]}) \bmod q_{[0]}) + p_{[1]}) \bmod q_{[0]}) + p_{[0]}) \bmod q_{[0]}}{q_{[0]}} |
2335 |
& |
2336 |
\end{eqnarray} |
2337 |
|
2338 |
Note several things about the implementation suggested by |
2339 |
(\ref{eq:ex:ccil0:sidv0:sldm0:01:011}): |
2340 |
|
2341 |
\begin{itemize} |
2342 |
\item No addition or multiplication is required to calculate terms such as |
2343 |
$2^8 (p_{[3]} \bmod q_{[0]}) + p_{[2]}$. The high-order byte of the |
2344 |
large dividend can be stuffed with $p_{[3]} \bmod q_{[0]}$ and |
2345 |
the low-order byte with $p_{[2]}$. |
2346 |
\item No addition or multiplication is required to calculate the |
2347 |
result $\lfloor p/q \rfloor$. |
2348 |
Note in (\ref{eq:ex:ccil0:sidv0:sldm0:01:011}) that the results are |
2349 |
conveniently grouped as bytes with multipliers of $2^{24}$, |
2350 |
$2^{16}$, $2^8$, and $2^0=1$. The terms can simply be placed into |
2351 |
the appropriate byte, as a way of multplication by the appropriate |
2352 |
power of 2. Note also that each term is guaranteed to be |
2353 |
$\in \{0, 1, 2, \ldots{} , 255\}$, by the argument presented |
2354 |
earlier for (\ref{eq:ccil0:sidv0:sldm0:004}). Thus, the |
2355 |
addition will result in no carries, and each result byte can simply |
2356 |
be placed directly in the correct memory location---addition is |
2357 |
not necessary. |
2358 |
\item Four machine division instructions are required, and the remainder |
2359 |
is produced automatically by the fourth instruction. |
2360 |
\end{itemize} |
2361 |
|
2362 |
An implemenation for the TMS370C8 is supplied as Figure |
2363 |
\ref{fig:ex:ccil0:sidv0:sldm0:01:01}. A block diagram of the data |
2364 |
flow for this implementation is supplied as |
2365 |
Figure \ref{fig:ex:ccil0:sidv0:sldm0:01:02}. |
2366 |
\end{vworkexampleparsection} |
2367 |
\vworkexamplefooter{} |
2368 |
|
2369 |
\begin{figure} |
2370 |
\begin{verbatim} |
2371 |
;Assume that byte memory locations p3, p2, p1, and p0 contain the |
2372 |
;32-bit unsigned dividend, and byte q0 contains the 8-bit unsigned |
2373 |
;divisor. Assume also that the result quotient will be placed |
2374 |
;in byte memory locations d3, d2, d1, and d0; and that the |
2375 |
;remainder will be placed in the byte memory location r0. Further |
2376 |
;assume that all memory locations are in the register file (near). |
2377 |
CLR A ;High-order chunk of large divisor |
2378 |
;must be 0. |
2379 |
MOV p3, B ;Load the low-order chunk of divisor. |
2380 |
DIV q0, A ;Perform the first division. |
2381 |
MOV A, d3 ;Quotient becomes this part of the |
2382 |
;result. |
2383 |
MOV B, A ;Remainder becomes high-order chunk of |
2384 |
;next division. |
2385 |
MOV p2, B ;Next byte becomes low-order chunk. |
2386 |
DIV q0, A ;Do the second division. |
2387 |
MOV A, d2 ;Quotient becomes this part of the |
2388 |
;result. |
2389 |
MOV B, A ;Remainder becomes high-order chunk of |
2390 |
;next division. |
2391 |
MOV p1, B ;Next byte becomes low-order chunk. |
2392 |
DIV q0, A ;Do the third division. |
2393 |
MOV A, d1 ;Quotient becomes this part of the |
2394 |
;result. |
2395 |
MOV B, A ;Remainder becomes high-order chunk of |
2396 |
;next division. |
2397 |
MOV p0, B ;Next byte becomes low-order chunk. |
2398 |
DIV q0, A ;Do the fourth division. |
2399 |
MOV A, d0 ;Quotient becomes this part of the |
2400 |
;result. |
2401 |
MOV B, r0 ;This is the remainder, which could be used |
2402 |
;for rounding. |
2403 |
\end{verbatim} |
2404 |
\caption{TMS370C8 Code Snippet Illustrating Unsigned 32/8 |
2405 |
Division Using Unsigned 16/8 |
2406 |
Machine Instructions (Example \ref{ex:ccil0:sidv0:sldm0:01})} |
2407 |
\label{fig:ex:ccil0:sidv0:sldm0:01:01} |
2408 |
\end{figure} |
2409 |
|
2410 |
\begin{figure} |
2411 |
\centering |
2412 |
\includegraphics[width=4.6in]{c_cil0/t370dmp.eps} |
2413 |
\caption{Block Diagram Of Data Flow Of Figure \ref{fig:ex:ccil0:sidv0:sldm0:01:01}} |
2414 |
\label{fig:ex:ccil0:sidv0:sldm0:01:02} |
2415 |
\end{figure} |
2416 |
|
2417 |
|
2418 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
2419 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
2420 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
2421 |
\section{Miscellaneous Integer Mappings} |
2422 |
%Section tag: MIM0 |
2423 |
\label{ccil0:smim0} |
2424 |
|
2425 |
Embedded system work and ROM constraints often inspire a great deal |
2426 |
of cleverness in the selection of instructions to perform mappings or |
2427 |
tests. In this section, we discuss integer mappings (i.e. functions) |
2428 |
for which economical implementations are known; and in the next section |
2429 |
(Section \ref{ccil0:smit0}) |
2430 |
we discuss integer tests for which economical implementations are known. |
2431 |
|
2432 |
To the best of our knowledge, there is no way to derive these mappings |
2433 |
and tests---they have been collected from many software developers and |
2434 |
come from human intuition and experience. |
2435 |
|
2436 |
|
2437 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
2438 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
2439 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
2440 |
\subsection{Lowest-Order Bit} |
2441 |
%Subsection tag: LIB0 |
2442 |
\label{ccil0:smim0:slib0} |
2443 |
|
2444 |
\index{lowest-order bit} |
2445 |
\index{least significant bit} |
2446 |
|
2447 |
The mapping |
2448 |
|
2449 |
\texttt{mask = x \& -x} |
2450 |
|
2451 |
\noindent{}is the most economical way known to extract the |
2452 |
lowest-order bit set in an integer \texttt{x}, or |
2453 |
0 if no bits are set.\footnote{This mapping was contributed by |
2454 |
David Baker (\texttt{bakerda@engin.umich.edu}) |
2455 |
and Raul Selgado (\texttt{rselgado@visteon.com}).} Since most processors have an instruction to form the |
2456 |
two's complement of an integer, this mapping usually requires only |
2457 |
two arithmetic instructions. |
2458 |
|
2459 |
When implementing this mapping in assembly-language on processors without a |
2460 |
two's complement instruction, two other possible implementations are: |
2461 |
|
2462 |
\begin{itemize} |
2463 |
\item \texttt{mask = x \& ($\sim$x + 1)} |
2464 |
\item \texttt{mask = x \& ((x \^{ } -1) + 1)} |
2465 |
\end{itemize} |
2466 |
|
2467 |
|
2468 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
2469 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
2470 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
2471 |
\section{Miscellaneous Integer Tests} |
2472 |
%Section tag: MIT0 |
2473 |
\label{ccil0:smit0} |
2474 |
|
2475 |
\subsection{Power Of 2} |
2476 |
%Subsection tag: PTW0 |
2477 |
\label{ccil0:smit0:sptw0} |
2478 |
|
2479 |
\index{power of two} |
2480 |
\index{2N@$2^N$} |
2481 |
|
2482 |
The test |
2483 |
|
2484 |
\texttt{(x \& (x-1) == 0) \&\& (x != 0)} |
2485 |
|
2486 |
\noindent{}is the most economical way known to |
2487 |
test whether an integer is a positive power of two |
2488 |
(1, 2, 4, 8, 16, etc.).\footnote{The test appeared as part of |
2489 |
a discussion on |
2490 |
the GMP mailing list in 2002.} |
2491 |
|
2492 |
|
2493 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
2494 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
2495 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
2496 |
\section{Exercises} |
2497 |
|
2498 |
\begin{vworkexercisestatement} |
2499 |
\label{exe:ccil0:sexe0:01} |
2500 |
Show that any $m$-bit two's complement integer $u_{[m-1:0]}$ except |
2501 |
$-2^{m-1}$ can be negated by forming the one's complement, then adding one. |
2502 |
\end{vworkexercisestatement} |
2503 |
|
2504 |
|
2505 |
|
2506 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
2507 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
2508 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
2509 |
\vfill |
2510 |
\noindent\begin{figure}[!b] |
2511 |
\noindent\rule[-0.25in]{\textwidth}{1pt} |
2512 |
\begin{tiny} |
2513 |
\begin{verbatim} |
2514 |
$HeadURL$ |
2515 |
$Revision$ |
2516 |
$Date$ |
2517 |
$Author$ |
2518 |
\end{verbatim} |
2519 |
\end{tiny} |
2520 |
\noindent\rule[0.25in]{\textwidth}{1pt} |
2521 |
\end{figure} |
2522 |
|
2523 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
2524 |
% |
2525 |
%End of file C_CIL0.TEX |