1 |
dashley |
4 |
%$Header: /home/dashley/cvsrep/e3ft_gpl01/e3ft_gpl01/dtaipubs/esrgubka/c_cil0/c_cil0.tex,v 1.24 2003/11/03 02:14:24 dtashley Exp $
|
2 |
|
|
|
3 |
|
|
\chapter[\ccilzeroshorttitle{}]{\ccilzerolongtitle{}}
|
4 |
|
|
|
5 |
|
|
\label{ccil0}
|
6 |
|
|
|
7 |
|
|
\beginchapterquote{``If our ancestors had invented arithmetic by counting with
|
8 |
|
|
their two fists or their eight fingers, instead of their
|
9 |
|
|
ten `digits', we would never have to worry about
|
10 |
|
|
writing binary-decimal conversion routines.
|
11 |
|
|
(And we would perhaps never have learned as much about
|
12 |
|
|
number systems.)''}
|
13 |
|
|
{Donald E. Knuth, \cite[p. 319]{bibref:b:knuthclassic2ndedvol2}}
|
14 |
|
|
|
15 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
16 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
17 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
18 |
|
|
\section{Introduction}
|
19 |
|
|
%Section tag: INT0
|
20 |
|
|
\label{ccil0:sint0}
|
21 |
|
|
|
22 |
|
|
Low-cost microcontrollers have no support for floating-point arithmetic,
|
23 |
|
|
and so integer arithmetic and fixed-point arithmetic are used nearly exclusively
|
24 |
|
|
in embedded systems. The ability to implement integer arithmetic
|
25 |
|
|
economically is a critical skill in the development of embedded
|
26 |
|
|
systems.
|
27 |
|
|
|
28 |
|
|
Integer arithmetic algorithms are critically important in embedded
|
29 |
|
|
systems for the following reasons:
|
30 |
|
|
|
31 |
|
|
\begin{itemize}
|
32 |
|
|
\item Mistakes in the implementation of arithmetic are frequently
|
33 |
|
|
responsible for product problems. (Mistakes are not confined
|
34 |
|
|
to obvious errors---errors such as filters which do not converge
|
35 |
|
|
on their input are also responsible for product problems.)
|
36 |
|
|
\item Floating-point arithmetic is not available or ill-advised
|
37 |
|
|
for nearly all small embedded systems for the following reasons:
|
38 |
|
|
\begin{itemize}
|
39 |
|
|
\item Low-cost microcontrollers do not possess hardware support for
|
40 |
|
|
floating-point arithmetic.
|
41 |
|
|
\item Implementation of floating-point arithmetic in software is
|
42 |
|
|
computationally expensive.
|
43 |
|
|
\item Implementation of floating-point arithmetic in software may
|
44 |
|
|
require large floating-point libraries, typically consuming
|
45 |
|
|
1K-4K of ROM.
|
46 |
|
|
\item Safety-critical software standards typically prohibit the
|
47 |
|
|
use of floating-point arithmetic.
|
48 |
|
|
\end{itemize}
|
49 |
|
|
\item Integer arithmetic algorithms (other than addition and subtraction)
|
50 |
|
|
are quite tedious and error-prone for a software developer to design, implement, and
|
51 |
|
|
unit test. The implementation of such algorithms represents
|
52 |
|
|
cost and risk. Cost and risk benefits are achieved if the algorithms in detail are
|
53 |
|
|
available in advance (thus precluding design activities), or
|
54 |
|
|
better yet if ready-to-use integer algorithm libraries are available.
|
55 |
|
|
\end{itemize}
|
56 |
|
|
|
57 |
|
|
This chapter describes the more fundamental principles and algorithms
|
58 |
|
|
(representation, fixed-point arithmetic, treatment of overflow, comparison,
|
59 |
|
|
addition, subtraction, multiplication, and division). A section
|
60 |
|
|
(Section \ref{ccil0:smim0}) is also included
|
61 |
|
|
on miscellaneous mappings involving integers which
|
62 |
|
|
are not numerical in intent.
|
63 |
|
|
Chapter %\cdtazeroxrefhyphen\cdtazerovolarabic{}
|
64 |
|
|
TBD
|
65 |
|
|
describes more complicated
|
66 |
|
|
integer algorithms and techniques (discrete-time operations
|
67 |
|
|
such as filtering, integration, and differentiation as well as more
|
68 |
|
|
complex functions such as square root). The split between these two chapters
|
69 |
|
|
is arbitrary; and in fact the material could have been divided differently
|
70 |
|
|
or combined.
|
71 |
|
|
|
72 |
|
|
Treatment of the topics in this chapter is largely in accordance with
|
73 |
|
|
Knuth \cite{bibref:b:knuthclassic2ndedvol2}. The principal issues in
|
74 |
|
|
the implementation of integer algorithms are:
|
75 |
|
|
|
76 |
|
|
\begin{itemize}
|
77 |
|
|
\item \textbf{How to use the arithmetic [or other] instructions provided by the machine to
|
78 |
|
|
operate on larger operands.} Microcontrollers typically provide arithmetic
|
79 |
|
|
instructions (comparison, shifting, addition, subtraction, and often but not
|
80 |
|
|
always multiplication and/or division) that operate on 8-bit or 16-bit integers.
|
81 |
|
|
A key question
|
82 |
|
|
is how small-operand instructions ``scale up''---that is, if and how they can
|
83 |
|
|
be used to assist in the implementation of integer arithmetic for much larger
|
84 |
|
|
operands.
|
85 |
|
|
\item \textbf{The order of the algorithm involved.} The order of algorithms
|
86 |
|
|
is a complicated issue when applied to microcontroller work. Many sophisticated
|
87 |
|
|
algorithms have a breakpoint below which they are less economical than
|
88 |
|
|
an inferior algorithm. Some applications (such as generating cryptographic keys
|
89 |
|
|
when integers thousands of bits long must be tested for primality) will
|
90 |
|
|
benefit from sophisticated algorithms becuase the operand sizes are large enough
|
91 |
|
|
to pass any such breakpoints. However, in microcontroller work, the need to manipulate
|
92 |
|
|
integers longer than 64 bits is very rare; thus, the breakpoints that indicate the
|
93 |
|
|
use of more sophisticated algorithms may not be reached. In microcontroller work,
|
94 |
|
|
depending on the operand sizes, there are circumstances in which an
|
95 |
|
|
$O(n^2)$ algorithm may be preferable to an $O(\log n)$ algorithm. Generally,
|
96 |
|
|
the order of algorithms must be balanced against operand sizes.
|
97 |
|
|
\end{itemize}
|
98 |
|
|
|
99 |
|
|
We do diverge from Knuth in some areas. The most prominent divergence is
|
100 |
|
|
in the proofs offered for some important theorems and lemmas. Knuth
|
101 |
|
|
employs contrapositive proof formats in many circumstances, whereas we prefer
|
102 |
|
|
to use linear proofs that are more understandable to engineers and microcontroller
|
103 |
|
|
software developers.
|
104 |
|
|
|
105 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
106 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
107 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
108 |
|
|
\section[Paradigms And Principles]
|
109 |
|
|
{Paradigms And Principles Of Microcontroller Arithmetic}
|
110 |
|
|
%Section tag: PPM0
|
111 |
|
|
\label{ccil0:sppm0}
|
112 |
|
|
|
113 |
|
|
How should one think about microcontroller arithmetic? What principles
|
114 |
|
|
guide us in its design and implementation? In this section,
|
115 |
|
|
we provide some general principles and paradigms of thought.
|
116 |
|
|
|
117 |
|
|
|
118 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
119 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
120 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
121 |
|
|
\subsection{Microcontroller Arithmetic As An Accident Of Silicon Design}
|
122 |
|
|
%Subsection tag: MAS0
|
123 |
|
|
\label{ccil0:sppm0:smas0}
|
124 |
|
|
|
125 |
|
|
In chapters
|
126 |
|
|
\cfryzeroxrefhyphen{}\ref{cfry0},
|
127 |
|
|
\ccfrzeroxrefhyphen{}\ref{ccfr0},
|
128 |
|
|
and
|
129 |
|
|
\cratzeroxrefhyphen{}\ref{crat0}
|
130 |
|
|
we consider rational approximation,
|
131 |
|
|
both in the form $h/k$ and $h/2^q$. Both forms of rational approximation
|
132 |
|
|
tend to be effective because we know that all modern processors possess
|
133 |
|
|
shift instructions, most possess integer multiply instructions, and many
|
134 |
|
|
possess integer divide instructions. In other words, the design
|
135 |
|
|
of the machine instruction set drives the strategies for implementation
|
136 |
|
|
of arithmetic, and makes some strategies attractive.
|
137 |
|
|
|
138 |
|
|
Similarly, the observation that all microcontrollers provide instructions
|
139 |
|
|
for integer arithmetic creates the attractiveness of fixed-point arithmetic.
|
140 |
|
|
|
141 |
|
|
Thus, we might view our approaches to microcontroller arithmetic as
|
142 |
|
|
an ``accident'' of silicon design, or as being driven by silicon
|
143 |
|
|
design.
|
144 |
|
|
|
145 |
|
|
Generally, we seek to determine the best way to use the primitive
|
146 |
|
|
operations provided by the machine (the instruction set) to
|
147 |
|
|
accomplish the mappings of interest.
|
148 |
|
|
|
149 |
|
|
The ``classic'' algorithms
|
150 |
|
|
presented by Knuth
|
151 |
|
|
Knuth (\cite[pp. 265-284]{bibref:b:knuthclassic2ndedvol2}) are especially
|
152 |
|
|
designed to use the ``small'' addition, subtraction, multiplication, and
|
153 |
|
|
division provided by the machine to add, subtract, multiply, and divide arbitrarily
|
154 |
|
|
large integers. In
|
155 |
|
|
\cite[pp. 265-266]{bibref:b:knuthclassic2ndedvol2}) Knuth writes:
|
156 |
|
|
|
157 |
|
|
\begin{quote}
|
158 |
|
|
\emph{The most important fact to understand about extended-precision numbers
|
159 |
|
|
is that they may be regarded as numbers written in radix-$w$ notation,
|
160 |
|
|
where $w$ is the computer's word size. For example, an integer that
|
161 |
|
|
fills 10 words on a computer whose word size is $w=10^{10}$ has 100
|
162 |
|
|
decimal digits; but we will consider it to be a 10-place number to
|
163 |
|
|
the base $10^{10}$. This viewpoint is justified for the same reason
|
164 |
|
|
that we may convert, say, from binary to hexadecimal notation,
|
165 |
|
|
simply by grouping the bits together.}
|
166 |
|
|
|
167 |
|
|
\emph{In these terms, we are given the following primitive operations to work with:}
|
168 |
|
|
|
169 |
|
|
\begin{itemize}
|
170 |
|
|
\item \emph{a$_0$) addition or subtraction of one-place integers, giving a one-place
|
171 |
|
|
answer and a carry;}
|
172 |
|
|
\item \emph{b$_0$) multiplication of a one-place integer by another one-place integer,
|
173 |
|
|
giving a two place answer;}
|
174 |
|
|
\item \emph{c$_0$) division of a two-place integer by a one-place integer,
|
175 |
|
|
provided that the quotient is a one-place integer, and yielding
|
176 |
|
|
also a one-place remainder.}
|
177 |
|
|
\end{itemize}
|
178 |
|
|
|
179 |
|
|
\noindent{}\emph{By adjusting the word size, if necessary, nearly all computers
|
180 |
|
|
will have these three operations available; so we will construct algorithms
|
181 |
|
|
(a), (b), and (c) mentioned above in terms of the primitive operations
|
182 |
|
|
(a$_0$), (b$_0$), and (c$_0$).}
|
183 |
|
|
|
184 |
|
|
\emph{Since we are visualizing extended-precision integers as base $b$ numbers, it is
|
185 |
|
|
sometimes helpful to think of the situation when $b = 10$, and to imagine
|
186 |
|
|
that we are doing the arithmetic by hand. Then operation (a$_0$) is analogous
|
187 |
|
|
to memorizing the addition table; (b$_0$) is analogous to memorizing the
|
188 |
|
|
multiplication table, and (c$_0$) is essentially memorizing the multiplication
|
189 |
|
|
table in reverse. The more complicated operations (a), (b), (c) on
|
190 |
|
|
high-precision numbers can now be done using simple addition, subraction,
|
191 |
|
|
multiplication, and long-division procedures that children are taught
|
192 |
|
|
in elementary school.}
|
193 |
|
|
\end{quote}
|
194 |
|
|
|
195 |
|
|
The critical issue for implementation of integer arithmetic with large operands
|
196 |
|
|
is how to use small-operand instructions to operate on larger operands---in other words,
|
197 |
|
|
how to ``scale up'' the capability provided by the instruction set.
|
198 |
|
|
|
199 |
|
|
|
200 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
201 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
202 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
203 |
|
|
\subsection{Microcontroller Arithmetic As A Mapping From Quantized Domain To
|
204 |
|
|
Quantized Range}
|
205 |
|
|
%Subsection tag: MAM0
|
206 |
|
|
\label{ccil0:sppm0:smam0}
|
207 |
|
|
|
208 |
|
|
Microcontroller software accepts inputs which are quantized. In nearly all cases,
|
209 |
|
|
this involves a mapping from $\vworkrealset$ to $\vworkintset$. Often, because
|
210 |
|
|
microcontroller products are optimized for cost, the quantization hardware
|
211 |
|
|
delivers quite poor precision, frequently less than 8 bits.
|
212 |
|
|
|
213 |
|
|
When a quantized input is accepted, it defines an inquality. Knowledge of
|
214 |
|
|
the quantized input (an integer) confines the actual input (a real
|
215 |
|
|
number, before
|
216 |
|
|
quantization) to an interval. With a low-cost hardware design, the
|
217 |
|
|
interval can be fairly large. Usually, by adding cost, the
|
218 |
|
|
interval can be made smaller.
|
219 |
|
|
|
220 |
|
|
Microcontroller outputs tend to be quantized as well, so it is
|
221 |
|
|
accurate to also characterize outputs as integers. For example, a PWM signal
|
222 |
|
|
generated by a microcontroller or the output of a D/A converter is
|
223 |
|
|
controlled by data that is an integer. Like inputs, often the ``granularity''
|
224 |
|
|
with which outputs can be controlled is quite coarse---again, 8 bits or
|
225 |
|
|
less is not uncommon.
|
226 |
|
|
|
227 |
|
|
Thus, we may view microcontroller software as a mapping from poor-quality
|
228 |
|
|
inputs to poor-quality outputs.
|
229 |
|
|
|
230 |
|
|
In such a framework, where the nature of inputs and outputs introduces
|
231 |
|
|
substantial error, it is imperative not to introduce additional error
|
232 |
|
|
in computer arithmetic. In other words, given inputs which are
|
233 |
|
|
integers, the responsibility of the software is to choose the best
|
234 |
|
|
integers as outputs. Usually this means that calculations should be
|
235 |
|
|
devised so as not to lose any information (i.e.
|
236 |
|
|
not to lose remainders, for example). Losing information is usually
|
237 |
|
|
equivalent to not being able to make the most correct mapping from input
|
238 |
|
|
to output. ``Lossy'' arithmetic can degrade the performance of a system,
|
239 |
|
|
since poor arithmetic may compound an inexpensive hardware design.
|
240 |
|
|
|
241 |
|
|
|
242 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
243 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
244 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
245 |
|
|
\subsection{Microcontroller Arithmetic As A Simulation Of Continuous Controllers}
|
246 |
|
|
%Subsection tag: MAE0
|
247 |
|
|
\label{ccil0:sppm0:smae0}
|
248 |
|
|
|
249 |
|
|
Control systems have not always employed digital controllers.
|
250 |
|
|
Many books and web sites (see \cite{bibref:w:historycontrol01}, for example)
|
251 |
|
|
discuss the historical development of feedback control. Controllers
|
252 |
|
|
have not historically been digital, or even electronic.
|
253 |
|
|
Early controllers for governing steam devices or windmills were
|
254 |
|
|
ultimately mechanical, and relied upon inertia or other physical
|
255 |
|
|
properties. It is possible to realize abstract notions
|
256 |
|
|
(integrators, differentiators, gains) using hydraulic systems or other mechanical systems;
|
257 |
|
|
and in fact hydraulic feedback controllers were used in early rockets
|
258 |
|
|
and aircraft. Very naturally, abstract notions (integrators,
|
259 |
|
|
differentiators, gains) can be implemented using analog
|
260 |
|
|
electronic components. The most common implementations involve
|
261 |
|
|
operational amplifiers, and the behavior of such implementations comes
|
262 |
|
|
very close to the ideal mathematical models.
|
263 |
|
|
|
264 |
|
|
Mechanical, hydraulic, and non-digital electronic controllers have
|
265 |
|
|
one very desirable characteristic---\emph{clipping}. If, for example,
|
266 |
|
|
one provides an analog differentiator with a $dV/dt$ which
|
267 |
|
|
is too large, the output that the differentiator can
|
268 |
|
|
provide is limited, usually by the supply voltage available to an operational
|
269 |
|
|
amplifier.
|
270 |
|
|
The differentiator \emph{must} clip.
|
271 |
|
|
|
272 |
|
|
Clipping often leads to behavior which is close to what
|
273 |
|
|
intuition would expect (i.e. we would present
|
274 |
|
|
clipping as an occasional advantage). For example, if an input to
|
275 |
|
|
an analog control system suffers a failure, the behavior
|
276 |
|
|
the of the controller is limited, as is its internal
|
277 |
|
|
state. Similarly, when the
|
278 |
|
|
input is restored, the controller will usually recover
|
279 |
|
|
in a reasonable time because the
|
280 |
|
|
state of the controller (typically maintained in capacitors) is limited
|
281 |
|
|
in the magnitude it can attain.
|
282 |
|
|
|
283 |
|
|
We might view a digital controller as an emulation of
|
284 |
|
|
an analog controller. We may want to cause the
|
285 |
|
|
controller to have limits (i.e. rails) internally, for
|
286 |
|
|
example to prevent excessive integrator ``windup''. We discuss
|
287 |
|
|
this further in Section \ref{ccil0:sode0}.
|
288 |
|
|
|
289 |
|
|
|
290 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
291 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
292 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
293 |
|
|
\section{Practical Design Issues}
|
294 |
|
|
%Section tag: PDI0
|
295 |
|
|
\label{ccil0:spdi0}
|
296 |
|
|
|
297 |
|
|
In this section, we consider practical issues surrounding the design
|
298 |
|
|
and construction of a set of integer arithmetic subroutines.
|
299 |
|
|
In practice, such a collection of subroutines is likely to be
|
300 |
|
|
arranged into a library. The purpose of the library would be to
|
301 |
|
|
free the clients (or callers) of the library from the complexity of
|
302 |
|
|
large integer calculations.
|
303 |
|
|
|
304 |
|
|
The design decisions surrounding the construction of a library vary in
|
305 |
|
|
the objectivity with which they can be approached. Some design decisions
|
306 |
|
|
(such as the best mechanism for passing parameters) can be approached
|
307 |
|
|
rigorously because the measures of goodness are unequivocal (minimal ROM consumption
|
308 |
|
|
or execution time, for example). However, other design decisions, particulary the
|
309 |
|
|
decision of the exact nature of the interface between an arithmetic library and
|
310 |
|
|
its clients, are more subjective. One size does \emph{not} fit all.
|
311 |
|
|
|
312 |
|
|
|
313 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
314 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
315 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
316 |
|
|
\subsection{Parameter Passing And Temporary Storage Mechanisms}
|
317 |
|
|
%Subsection tag: PPM0
|
318 |
|
|
\label{ccil0:spdi0:sppm0}
|
319 |
|
|
|
320 |
|
|
In small microcontroller work, the desire to save ROM and execution time
|
321 |
|
|
may lead to inelegant software construction. Because an arithmetic library
|
322 |
|
|
used in microcontroller work may be called from many different places
|
323 |
|
|
throughout ROM, serious thought should be given to optimizing the
|
324 |
|
|
parameter passing mechanisms, even perhaps at the expense of elegance.
|
325 |
|
|
The way in which the arithmetic library allocates temporary storage is
|
326 |
|
|
also a point of concern, because the most elegant way of allocating temporary
|
327 |
|
|
storage (on the stack) may either not be feasible (because of the possibility
|
328 |
|
|
of stack overflow) or may not be efficient (because the addressing modes of
|
329 |
|
|
the machine make data on the stack inefficient to address). In this section
|
330 |
|
|
we discuss both parameter passing and temporary storage mechanisms.
|
331 |
|
|
|
332 |
|
|
In the remainder of the discussion, we make the following assumptions
|
333 |
|
|
about software architecture.
|
334 |
|
|
|
335 |
|
|
\begin{enumerate}
|
336 |
|
|
\item \textbf{The arithmetic library need not be re-entrant.}
|
337 |
|
|
Most ``small'' microcontroller software loads use a non-preemptive
|
338 |
|
|
scheduling paradigm, so this is a reasonable assumption. We also
|
339 |
|
|
make the reasonable assumption that ISR's may not make calls into
|
340 |
|
|
the arithmetic library.
|
341 |
|
|
\item \textbf{Dynamic memory allocation, other than on the stack,
|
342 |
|
|
is not allowed by the software architecture.} This is also
|
343 |
|
|
a reasonable assumption in ``small'' microcontroller software.
|
344 |
|
|
\end{enumerate}
|
345 |
|
|
|
346 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
347 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
348 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
349 |
|
|
\subsubsection{Parameter Passing Mechanisms}
|
350 |
|
|
%Subsubsection tag: PPM0
|
351 |
|
|
\label{ccil0:spdi0:sppm0:sppm0}
|
352 |
|
|
|
353 |
|
|
If an arithmetic library exists in a microcontroller software load,
|
354 |
|
|
it may be called many times throughout ROM. Thus, the parameter
|
355 |
|
|
passing mechanisms chosen may have a large effect on ROM consumption
|
356 |
|
|
(due to the setup required for each subroutine call multiplied by
|
357 |
|
|
many instances throughout ROM) and execution time
|
358 |
|
|
(because in microcontroller software longer instructions nearly always
|
359 |
|
|
require more time). Because of the criticality of ROM consumption,
|
360 |
|
|
parameter passing mechanisms that lack elegance may be attractive.
|
361 |
|
|
|
362 |
|
|
In the category of parameter passing, we also include the way in which
|
363 |
|
|
return value(s) are passed back to the caller.
|
364 |
|
|
|
365 |
|
|
The following parameter-passing mechanisms may be employed:
|
366 |
|
|
|
367 |
|
|
\begin{enumerate}
|
368 |
|
|
\item \textbf{Pass by value as storage class \emph{automatic}.}
|
369 |
|
|
The most common scenario is that the arithmetic
|
370 |
|
|
library is written in assembly-language to be called from
|
371 |
|
|
`C', and so the assembly-language subroutines must adhere to the
|
372 |
|
|
parameter-passing conventions used by the compiler.
|
373 |
|
|
This usually means that the entire input or output value
|
374 |
|
|
will be passed in CPU registers or on the stack. Somewhat rarely,
|
375 |
|
|
a compiler will pass parameters in static locations.\footnote{The
|
376 |
|
|
usual reason for a `C' compiler to pass parameters in static locations
|
377 |
|
|
is because the instruction set of the machine was not designed for
|
378 |
|
|
higher-level languages, and references to [usually \emph{near}] memory
|
379 |
|
|
are cheaper than stack references. Such compilers also typically
|
380 |
|
|
analyze the calling tree of the program where possible and use this
|
381 |
|
|
information to overlay the parameter-passing memory areas of
|
382 |
|
|
subroutines that cannot be active simultaneously. Without the ability
|
383 |
|
|
to analyze the calling tree and make overlay decisions based on it,
|
384 |
|
|
memory would be exhausted, because each subroutine would need to have its
|
385 |
|
|
own static storage for parameters and local variables.}
|
386 |
|
|
\item \textbf{Pass by reference.}
|
387 |
|
|
Typically, it is convenient to pass pointer(s) to area(s) of memory
|
388 |
|
|
containing input operands, and also a pointer to an area of memory
|
389 |
|
|
owned by the caller which is written with the result by the arithmetic subroutine.
|
390 |
|
|
The efficiency of this approach depends on the compiler and the instruction
|
391 |
|
|
set of the machine. If the instruction set of the machine cannot
|
392 |
|
|
make effective use of pointers or stack frames, an arithmetic subroutine
|
393 |
|
|
might be constructed so that it first copies the operands to a static area of memory
|
394 |
|
|
reserved for the arithmetic library, then performs the necessary arithmetic
|
395 |
|
|
operations on the operands in the static area,
|
396 |
|
|
then copies the result(s) back to the area owned by the caller.
|
397 |
|
|
\item \textbf{Pass by common data block.}
|
398 |
|
|
In some cases, it may be preferable to reserve a block of memory in which to
|
399 |
|
|
pass parameters to arithmetic library functions, and from which to retrieve
|
400 |
|
|
results after an arithmetic library function returns. The allocation of such
|
401 |
|
|
a static memory block may be done manually\footnote{Note to self: need to
|
402 |
|
|
include gentleman's agreements on memory usage in my note on software architecture.}
|
403 |
|
|
or automatically (by development tools which can analyze the function calling tree and
|
404 |
|
|
manage the overlaying).
|
405 |
|
|
\end{enumerate}
|
406 |
|
|
|
407 |
|
|
|
408 |
|
|
|
409 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
410 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
411 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
412 |
|
|
\subsubsection{Temporary Storage Mechanisms}
|
413 |
|
|
%Subsubsection tag: TSM0
|
414 |
|
|
\label{ccil0:spdi0:sppm0:stsm0}
|
415 |
|
|
|
416 |
|
|
Need to indicate clearly on section on software architectures the
|
417 |
|
|
primary temporary storage mechanisms:
|
418 |
|
|
|
419 |
|
|
\begin{itemize}
|
420 |
|
|
\item Stack.
|
421 |
|
|
\item Memory block with overlay functionality.
|
422 |
|
|
\end{itemize}
|
423 |
|
|
|
424 |
|
|
Need to expand software architecture section to cover this, so don't
|
425 |
|
|
discuss here.
|
426 |
|
|
|
427 |
|
|
|
428 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
429 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
430 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
431 |
|
|
\subsection{Reporting Of Overflow, Underflow, And Domain Errors}
|
432 |
|
|
%Subsection tag: OUD0
|
433 |
|
|
\label{ccil0:spdi0:soud0}
|
434 |
|
|
|
435 |
|
|
Long integer data types used in microcontroller work are typically
|
436 |
|
|
of a static size (they cannot grow in size in as operations are
|
437 |
|
|
performed on them). The reason for the typical static sizes is that
|
438 |
|
|
dynamic allocation (except for allocation and
|
439 |
|
|
deallocation on the stack as subroutines
|
440 |
|
|
are called and return) is rarely used in small microcontroller work.
|
441 |
|
|
It will come about in the normal usage of an integer arithmetic
|
442 |
|
|
library that an attempt will be made to operate on integers in
|
443 |
|
|
a way which generates an overflow, generates an
|
444 |
|
|
underflow, or
|
445 |
|
|
represents a domain error (division by zero or
|
446 |
|
|
square root of a negative integer, for example).
|
447 |
|
|
An important design decision is how such normal exceptions should be
|
448 |
|
|
handled.
|
449 |
|
|
|
450 |
|
|
Possible design decisions in this area are:
|
451 |
|
|
|
452 |
|
|
\begin{enumerate}
|
453 |
|
|
\item \label{enum:ccil0:spdi0:soud0:01:01}
|
454 |
|
|
\textbf{To design arithmetic subroutines so that exceptions are not
|
455 |
|
|
possible.}
|
456 |
|
|
For example, multiplying an $m$-word integer by an $n$-word integer
|
457 |
|
|
will always generate an integer that will fit within $m+n$ words.
|
458 |
|
|
If a multiplication subroutine is designed so that the caller must
|
459 |
|
|
provide an $m$-word operand and an $n$-word operand and a pointer to
|
460 |
|
|
an $(m+n)$-word area of memory for the result, an overflow cannot occur.
|
461 |
|
|
Such a design decision essentially pushes overflow detection back up to
|
462 |
|
|
the callers of arithmetic subroutines.
|
463 |
|
|
\item \label{enum:ccil0:spdi0:soud0:01:02}
|
464 |
|
|
\textbf{To design arithmetic subroutines so that exceptions are possible,
|
465 |
|
|
but not to detect the exceptions, thus providing an implementation that
|
466 |
|
|
will produce incorrect results with some operand data values.}
|
467 |
|
|
For example, if an arithmetic subroutine is designed to add an $m$-word
|
468 |
|
|
operand to another $m$-word operand to produce an $m$-word result, overflow
|
469 |
|
|
is possible. A design decision to fail to detect such exceptions pushes
|
470 |
|
|
the responsibility up to the callers of the arithmetic subroutines.
|
471 |
|
|
Callers must devise a method for not calling arithmetic subroutines
|
472 |
|
|
with data values that will cause an exception, or else to detect an exception
|
473 |
|
|
when it has occurred.
|
474 |
|
|
\item \label{enum:ccil0:spdi0:soud0:01:03}
|
475 |
|
|
\textbf{To ``rail'' the result in response to an exception.}
|
476 |
|
|
It was stated earlier that analog control system functional blocks
|
477 |
|
|
built with operational
|
478 |
|
|
amplifiers typically have an output which cannot go beyond the
|
479 |
|
|
supply rails. One may implement similar behavior in arithmetic subroutines.
|
480 |
|
|
In an addition subroutine which adds two $m$-word operands to produce an
|
481 |
|
|
$m$-word result (with each word having $w$ bits), it would be natural to
|
482 |
|
|
return $2^{mw}-1$ in the event of an overflow in a positive direction and
|
483 |
|
|
$-2^{mw}$ in the event of an overlfow in a negative direction. Note that
|
484 |
|
|
the caller will not be able to distinguish a ``rail'' value which represents
|
485 |
|
|
a valid result from a ``rail'' value substituted to indicate an exception.
|
486 |
|
|
\item \label{enum:ccil0:spdi0:soud0:01:04}
|
487 |
|
|
\textbf{To reserve special result data values to indicate exceptions.}
|
488 |
|
|
Depending on the arithmetic subroutine being implemented, it may be possible
|
489 |
|
|
to reserve certain result data values to indicate exceptions. This approach
|
490 |
|
|
is often awkward, as most mathematical subroutines are naturally defined so that
|
491 |
|
|
all bit patterns in the memory reserved for the result are valid numbers.
|
492 |
|
|
Additionally, with long result data values, it may not be economical to
|
493 |
|
|
compare the result against the reserved exception values. Thus, this is seldom
|
494 |
|
|
an optimal way to deal with exceptions.
|
495 |
|
|
|
496 |
|
|
Additionally, if this approach is employed, the semantics of how exception
|
497 |
|
|
values combine with other exception values and data values must be decided.
|
498 |
|
|
\item \label{enum:ccil0:spdi0:soud0:01:05}
|
499 |
|
|
\textbf{To return exception codes to the caller separate from the result data.}
|
500 |
|
|
In the `C' language, pointers are often used to supply an arithmetic subroutine
|
501 |
|
|
with the input operands and to provide the arithmetic subroutine with a location
|
502 |
|
|
(which belongs to the caller) in which to store the result data. Thus, the return
|
503 |
|
|
value of the arithmetic subroutine (normally assigned through the subroutine name)
|
504 |
|
|
is often available to return exception codes. For example,
|
505 |
|
|
a `C' function may be defined as
|
506 |
|
|
|
507 |
|
|
\begin{verbatim}
|
508 |
|
|
unsigned char int128_add(INT128 *result,
|
509 |
|
|
INT128 *arg1,
|
510 |
|
|
INT128 *arg2),
|
511 |
|
|
\end{verbatim}
|
512 |
|
|
|
513 |
|
|
leaving the returned \texttt{unsigned char} value available to return
|
514 |
|
|
exception information. Note that this arrangement has the following advantages:
|
515 |
|
|
|
516 |
|
|
\begin{enumerate}
|
517 |
|
|
\item All bit patterns in the result data memory area area available
|
518 |
|
|
as data bit patterns.
|
519 |
|
|
\item The exception data is very economical to test, because it is placed
|
520 |
|
|
in a machine-native data type.
|
521 |
|
|
\item The exception data can easily be discarded or by the caller if desired.
|
522 |
|
|
\item All decisions about how to handle exceptions are left to the caller.
|
523 |
|
|
\end{enumerate}
|
524 |
|
|
\item \label{enum:ccil0:spdi0:soud0:01:06}
|
525 |
|
|
\textbf{To maintain exception data with each result integer.}
|
526 |
|
|
It is possible to reserve bits for exception information which are part of the
|
527 |
|
|
long integer data type. This exception state essentially conveys
|
528 |
|
|
\emph{NaN}\footnote{\emph{N}ot \emph{a} \emph{n}umber.} information---integers with
|
529 |
|
|
exception information set are not true numbers, but rather they are different from the
|
530 |
|
|
true result in some way. As with (\ref{enum:ccil0:spdi0:soud0:01:04}), the
|
531 |
|
|
semantics of how to combine NaN values and NaN values with ordinary non-NaN numbers
|
532 |
|
|
must be defined.
|
533 |
|
|
\item \label{enum:ccil0:spdi0:soud0:01:07}
|
534 |
|
|
\textbf{To maintain a global exception state variable.}
|
535 |
|
|
A variable or set of variables can be reserved which hold the
|
536 |
|
|
exception information, if any, from the most recent call to
|
537 |
|
|
a function in the arithmetic library.
|
538 |
|
|
|
539 |
|
|
To save CPU cycles, the arithmetic library can
|
540 |
|
|
be designed so that it will assign the global exception state variable only if an
|
541 |
|
|
exception occurs---the caller then has the responsibility of clearing the exception
|
542 |
|
|
state variable before making any call into the arithmetic library where the
|
543 |
|
|
exception result is of interest. The interface between the caller and the library
|
544 |
|
|
can be further optimized if the library only OR's data into the variable containing
|
545 |
|
|
the exception state. Using this optimization, the caller can clear the exception state
|
546 |
|
|
variable, make several calls into the arithmetic library, and then retrieve a meaningful
|
547 |
|
|
exception state variable value summarizing several arithmetic operations.
|
548 |
|
|
\item \label{enum:ccil0:spdi0:soud0:01:08}
|
549 |
|
|
\textbf{Hybrid approaches.}
|
550 |
|
|
The approaches (\ref{enum:ccil0:spdi0:soud0:01:01})
|
551 |
|
|
through (\ref{enum:ccil0:spdi0:soud0:01:07}) can be combined.
|
552 |
|
|
For example, approach (\ref{enum:ccil0:spdi0:soud0:01:03})
|
553 |
|
|
might be combined with approach (\ref{enum:ccil0:spdi0:soud0:01:05})
|
554 |
|
|
so that exceptions are ``railed'', but the caller may also be made
|
555 |
|
|
aware that an exception has occured. Many hybrid approaches are possible.
|
556 |
|
|
\end{enumerate}
|
557 |
|
|
|
558 |
|
|
|
559 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
560 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
561 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
562 |
|
|
\subsection{Semantics Of Combining Overflow, Underflow, And Domain Errors}
|
563 |
|
|
%Subsection tag: CMB0
|
564 |
|
|
\label{ccil0:spdi0:scmb0}
|
565 |
|
|
|
566 |
|
|
For control system arithmetic, some form of clipping as suggested
|
567 |
|
|
in Section \ref{ccil0:sppm0:smae0} is probably the
|
568 |
|
|
best approach. Definitely, an overflow should generate a result
|
569 |
|
|
which is the largest representable integer, and an
|
570 |
|
|
underflow should generate a result which is the smallest
|
571 |
|
|
representable integer.
|
572 |
|
|
|
573 |
|
|
In addition to treating an overflow by clipping, it may be
|
574 |
|
|
advantageous to reserve a flag in the representation of a
|
575 |
|
|
multiple-precision integer to record that an overflow has occured and been clipped.
|
576 |
|
|
Some functions which accept the integer as input may be interested
|
577 |
|
|
in the value of such a flag, where othere---perhaps most---may
|
578 |
|
|
not.
|
579 |
|
|
|
580 |
|
|
The correct course of action in the event of a domain error (such
|
581 |
|
|
as division by zero) is less clear. It is noteworthy that in a
|
582 |
|
|
normal control system, domain errors cannot occur (but overflows
|
583 |
|
|
can).
|
584 |
|
|
|
585 |
|
|
The best approach when a domain error is involved probably
|
586 |
|
|
depends on the basis for the underlying calculation. For
|
587 |
|
|
example, if integer division is used as part of a
|
588 |
|
|
strategy for software ratiometric conversion, a value
|
589 |
|
|
of zero in the denominator probably represents extreme electrical
|
590 |
|
|
noise, and the most sane approach may be to replace the
|
591 |
|
|
denominator by one. However, in other contexts it may be appropriate
|
592 |
|
|
to think in terms of
|
593 |
|
|
|
594 |
|
|
\begin{equation}
|
595 |
|
|
\lim_{k \rightarrow 0^+} \frac{kn}{kd}
|
596 |
|
|
\end{equation}
|
597 |
|
|
|
598 |
|
|
\noindent{}or
|
599 |
|
|
|
600 |
|
|
\begin{equation}
|
601 |
|
|
\lim_{k \rightarrow 0^-} \frac{kn}{kd} .
|
602 |
|
|
\end{equation}
|
603 |
|
|
|
604 |
|
|
Put in other terms, there is not a clear ``one solution
|
605 |
|
|
meets all needs'' approach to dealing with domain
|
606 |
|
|
errors.
|
607 |
|
|
|
608 |
|
|
As in the case of overflows, it may be advantageous to reserve a bit
|
609 |
|
|
flag to signal that a domain error has occured and that the result
|
610 |
|
|
is not valid or not reliable. Note that floating point chips
|
611 |
|
|
(such as the 80x87) provide similar indications of domain errors.
|
612 |
|
|
|
613 |
|
|
It may also be advantageous to adopt conventions for how
|
614 |
|
|
overflow or domain error flags propagate through binary or
|
615 |
|
|
unary operators. For example, if two numbers are multiplied, and
|
616 |
|
|
one of the two has the overflow flag set, it may be wise set the
|
617 |
|
|
overflow flag in the result. A scheme for how warning
|
618 |
|
|
flags propagate may be beneficial.
|
619 |
|
|
|
620 |
|
|
|
621 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
622 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
623 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
624 |
|
|
\subsection{Variable Versus Constant Subroutine Execution Time}
|
625 |
|
|
%Subsection tag: VVC0
|
626 |
|
|
\label{ccil0:spdi0:svvc0}
|
627 |
|
|
|
628 |
|
|
As a design goal of an embedded system, we seek to minimize the
|
629 |
|
|
timing variability of software components. An arithmetic subroutine
|
630 |
|
|
that with a high probability takes a short time to execute and with a
|
631 |
|
|
low probability takes a long time to execute, and where the execution
|
632 |
|
|
time is data dependent, is a serious risk. An embedded software product
|
633 |
|
|
may pass all release testing, but then fail in the field because of
|
634 |
|
|
specific data values used in calculations.
|
635 |
|
|
|
636 |
|
|
A very conservative design goal would be to design every arithmetic subroutine
|
637 |
|
|
to require exactly the same execution time, regardless of data values.
|
638 |
|
|
This goal is not practical because machine instructions themselves
|
639 |
|
|
usually have a variable execution time, particularly for multiplication
|
640 |
|
|
and division instructions. A fallback goal would be to avoid
|
641 |
|
|
large differences between minimum and maximum execution time, without
|
642 |
|
|
increasing the maximum execution time. A very practical step to take
|
643 |
|
|
(using division as an example)
|
644 |
|
|
is to insert artifical delays into easily detectable exception cases (such as
|
645 |
|
|
division by zero) so that the exception case takes as long as the
|
646 |
|
|
minimum time for a division with valid operands.
|
647 |
|
|
|
648 |
|
|
|
649 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
650 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
651 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
652 |
|
|
\section{Fixed-Point Arithmetic}
|
653 |
|
|
%Section tag: FPA0
|
654 |
|
|
\label{ccil0:sfpa0}
|
655 |
|
|
|
656 |
|
|
\emph{Fixed-point arithmetic}\index{fixed-point arithmetic}
|
657 |
|
|
is a scheme for the representation
|
658 |
|
|
of engineering quantities (conceptually real numbers with optional
|
659 |
|
|
units) by integers so that calculations can be performed
|
660 |
|
|
on these quantitites using [usually multiple-precision]
|
661 |
|
|
integer arithmetic.
|
662 |
|
|
|
663 |
|
|
In discussing fixed-point arithmetic,
|
664 |
|
|
we must be careful to distinguish between the
|
665 |
|
|
\emph{represented value} (the engineering quantity)
|
666 |
|
|
and the \emph{representation} (the integer which represents
|
667 |
|
|
the engineering quantity). In most cases, we must also
|
668 |
|
|
be careful to devise a system to track the units of the
|
669 |
|
|
represented values, as, especially with control systems,
|
670 |
|
|
the units of represented values (due to integration and
|
671 |
|
|
differentiation) can become very complex
|
672 |
|
|
and mistakes are easy to make.
|
673 |
|
|
|
674 |
|
|
Fixed-point arithmetic is the dominant paradigm of construction
|
675 |
|
|
for calculations in small microcontroller systems. It may not be
|
676 |
|
|
clear why this should be so or what advantages it offers [over
|
677 |
|
|
floating-point arithmetic]. The reasons for
|
678 |
|
|
this predominance are:
|
679 |
|
|
|
680 |
|
|
\begin{itemize}
|
681 |
|
|
\item Fixed-point calculations tend to be very efficient, because
|
682 |
|
|
they make direct use of the integer arithmetic instructions in the
|
683 |
|
|
microcontroller's instruction set. On the other hand, floating-point
|
684 |
|
|
arithmetic operations tend to be much slower.
|
685 |
|
|
|
686 |
|
|
\item Floating-point calculations typically require a floating-point
|
687 |
|
|
library, which may consume at least several hundred bytes
|
688 |
|
|
of ROM.
|
689 |
|
|
|
690 |
|
|
\item Some safety-critical software standards prohibit the use of
|
691 |
|
|
floating-point arithmetic because it can result in nebulous
|
692 |
|
|
behavior. Fixed-point arithmetic avoids these concerns.
|
693 |
|
|
\end{itemize}
|
694 |
|
|
|
695 |
|
|
In order to carry out fixed-point arithmetic---that is, in order to
|
696 |
|
|
operate on engineering quantities as integers---we
|
697 |
|
|
require that the relationship between the
|
698 |
|
|
represented value and the representation be of the form
|
699 |
|
|
|
700 |
|
|
\begin{equation}
|
701 |
|
|
\label{eq:ccil0:sfpa0:00}
|
702 |
|
|
x = r_I u + \Psi,
|
703 |
|
|
\end{equation}
|
704 |
|
|
|
705 |
|
|
\noindent{}where $x \in \vworkrealset$ (possibly with units) is the represented
|
706 |
|
|
value, $u \in \vworkintset$ is the representation,
|
707 |
|
|
$r_I \in \vworkrealset$ is the scaling factor (possibly with
|
708 |
|
|
units), and $\Psi \in \vworkrealset$ (possibly with units) is the offset.
|
709 |
|
|
Note that the units of $r_I$, $\Psi$, and $x$ must match.
|
710 |
|
|
|
711 |
|
|
We further require that $r_I \vworkdivides{} \Psi$\footnote{We \emph{are}
|
712 |
|
|
aware that this is an abuse of nomenclature, as
|
713 |
|
|
`$\vworkdivides$' (``divides'') is traditionally only applied to integers.}
|
714 |
|
|
(i.e. that $\Psi$ be an
|
715 |
|
|
integral multiple of $r_I$)
|
716 |
|
|
so that the offset in the represented value corresponds to an integer
|
717 |
|
|
in the representation. Without this restriction, we could not remove the
|
718 |
|
|
offset from the representation with integer subtraction only. Note that
|
719 |
|
|
we do \emph{not} require that $r_I$ or $\Psi$ be rational, although
|
720 |
|
|
they must both be rational or both be irrational in order to satisfy
|
721 |
|
|
(\ref{eq:ccil0:sfpa0:00}).
|
722 |
|
|
|
723 |
|
|
In (\ref{eq:ccil0:sfpa0:00}), since we've required that $r_I \vworkdivides{} \Psi$,
|
724 |
|
|
we can replace $\Psi$ by
|
725 |
|
|
|
726 |
|
|
\begin{equation}
|
727 |
|
|
\label{eq:ccil0:sfpa0:00b}
|
728 |
|
|
\psi = \frac{\Psi}{r_I}
|
729 |
|
|
\end{equation}
|
730 |
|
|
|
731 |
|
|
\noindent{}to obtain
|
732 |
|
|
|
733 |
|
|
\begin{equation}
|
734 |
|
|
\label{eq:ccil0:sfpa0:01}
|
735 |
|
|
x = r_I (u + \psi),
|
736 |
|
|
\end{equation}
|
737 |
|
|
|
738 |
|
|
\noindent{}where $\psi \in \vworkintset$ is the offset in representational
|
739 |
|
|
counts.
|
740 |
|
|
|
741 |
|
|
Note that we can also easily obtain the following relationships from the
|
742 |
|
|
defining equations (\ref{eq:ccil0:sfpa0:00}),
|
743 |
|
|
(\ref{eq:ccil0:sfpa0:00b}), and (\ref{eq:ccil0:sfpa0:01}).
|
744 |
|
|
|
745 |
|
|
\begin{equation}
|
746 |
|
|
\label{eq:ccil0:sfpa0:02}
|
747 |
|
|
u = \frac{x - \Psi}{r_I} = \frac{x}{r_I} - \psi
|
748 |
|
|
\end{equation}
|
749 |
|
|
|
750 |
|
|
\begin{equation}
|
751 |
|
|
\label{eq:ccil0:sfpa0:03}
|
752 |
|
|
\psi = \frac{\Psi}{r_I}
|
753 |
|
|
\Longleftrightarrow
|
754 |
|
|
\Psi = r_I \psi
|
755 |
|
|
\Longleftrightarrow
|
756 |
|
|
r_I = \frac{\Psi}{\psi}
|
757 |
|
|
\end{equation}
|
758 |
|
|
|
759 |
|
|
|
760 |
|
|
For example, in a 16-bit signed integer (which inherently may range
|
761 |
|
|
from -32768 to 32767 inclusive), one might used a fixed-point
|
762 |
|
|
representation of 100 integer counts per $^\circ$C ($r_I = 0.01 \; ^\circ$C)
|
763 |
|
|
with an offset of 100$^\circ$C ($\Psi$ = 100$^\circ$C or equivalently
|
764 |
|
|
$\psi$ = 10000), giving
|
765 |
|
|
a representational range from -227.68$^\circ$C to 427.67$^\circ$C inclusive.
|
766 |
|
|
|
767 |
|
|
If $r_I = 2^N, \; N \in \vworkintset$, then the radix point of the represented
|
768 |
|
|
value is positioned
|
769 |
|
|
between two bits of the representation---this arrangement may have computational
|
770 |
|
|
advantages
|
771 |
|
|
if the whole and fractional parts of the represented value need to be separated
|
772 |
|
|
easily in the representation. Note also that if $r_I = 2^{WN}$ where $W$ is the
|
773 |
|
|
machine integer size of the computer (in bits), then the radix point of the represented value
|
774 |
|
|
occurs between two addressable machine integers of the representation, which can be convenient.
|
775 |
|
|
However, we do not require for our definition of a fixed-point representation that
|
776 |
|
|
$r_I$ be an integral power of two; and in fact we do not even require by our
|
777 |
|
|
definition that $r_I$ be rational. Note that in general our definition above is the
|
778 |
|
|
weakest set of conditions so that real-world engineering values can be manipulated
|
779 |
|
|
using integer operations performed upon the representation.
|
780 |
|
|
|
781 |
|
|
|
782 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
783 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
784 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
785 |
|
|
\section{Representation Of Integers}
|
786 |
|
|
%Section Tag: ROI0
|
787 |
|
|
\label{ccil0:sroi0}
|
788 |
|
|
|
789 |
|
|
In this section, we discuss common representations of integers, both
|
790 |
|
|
\emph{machine} integers and \emph{synthetic long} integers.
|
791 |
|
|
By \emph{representation} we mean the mapping between the abstract
|
792 |
|
|
mathematical notion of an integer and the way it is stored in the computer
|
793 |
|
|
(voltage levels and the programming model).
|
794 |
|
|
Although
|
795 |
|
|
in Knuth's development of integer arithmetic
|
796 |
|
|
\cite{bibref:b:knuthclassic2ndedvol2}
|
797 |
|
|
it is assumed that
|
798 |
|
|
integers may be represented in any base, we don't require such generality
|
799 |
|
|
in practice and in this work we confine ourselves for the most
|
800 |
|
|
part to $b=2^n$, and often to $n = 8i$, $i \in \vworkintsetpos$. The assumption
|
801 |
|
|
of $b=2^n$ characterizes all modern digital computers, and we feel comfortable
|
802 |
|
|
making this assumption throughout the work. However, the assumption
|
803 |
|
|
$n = 8i$, $i \in \vworkintsetpos$ does not hold universally, and so we
|
804 |
|
|
most often do not make this assumption.
|
805 |
|
|
|
806 |
|
|
By \emph{machine} integer, we mean an integer upon which the computer
|
807 |
|
|
can operate in a single instruction (such as to add, increment,
|
808 |
|
|
load, store, etc.). For most microcontrollers, machine integers are
|
809 |
|
|
either 8 or 16 bits in size. The representation of a machine integer
|
810 |
|
|
is designed and specified by the microcontroller manufacturer. In principle,
|
811 |
|
|
nothing would prevent a microcontroller manufacturer from devising
|
812 |
|
|
and implementing a novel way of representing machine integers and supporting
|
813 |
|
|
this novel representation with an instruction set. However, in
|
814 |
|
|
practice, all machine integers are either simple unsigned integers or two's
|
815 |
|
|
complement signed integers. In addition to the efficiency of these
|
816 |
|
|
representations with respect to the design of digital logic, these representations
|
817 |
|
|
are so standard and so pervasive that they are universally tacitly assumed.
|
818 |
|
|
For example, ``\texttt{if (i \& 0x1)}'' is an accepted C-language
|
819 |
|
|
idiom for ``if $i$ is odd'', and it is expected that such code will
|
820 |
|
|
work on all platforms.
|
821 |
|
|
|
822 |
|
|
By \emph{synthetic long} integer, we mean an integer of
|
823 |
|
|
arbitrary\footnote{By \emph{arbitrary}, we do not necessarily mean that
|
824 |
|
|
the integer can grow to be arbitrarily large in magnitude, or that
|
825 |
|
|
its maximum size is not known at compile time. We mean \emph{arbitrarily}
|
826 |
|
|
longer than a machine integer. Multiple-precision arithmetic libraries
|
827 |
|
|
can be divided into two classes---those that fix the size of the integers
|
828 |
|
|
at compile time, and those that use dynamic allocation and allow integers to
|
829 |
|
|
grow as needed at run time. The former category
|
830 |
|
|
is normally used for small microcontroller work, whereas the latter category
|
831 |
|
|
(such as the GNU MP Library \cite{bibref:s:gnumultipleprecisionarithmeticlibrary})
|
832 |
|
|
is normally used in scientific and number theoretic calculation and on
|
833 |
|
|
more powerful platforms than microcontrollers. The representational concepts
|
834 |
|
|
we present here apply to both categories.}
|
835 |
|
|
length that is formed by concatenating machine integers. There is some
|
836 |
|
|
subjectivity in deciding the representation of multiple-precision integers,
|
837 |
|
|
and we discuss in the subsections
|
838 |
|
|
\ref{ccil0:sroi0:srou0} and
|
839 |
|
|
\ref{ccil0:sroi0:sros0} which immediately follow.
|
840 |
|
|
|
841 |
|
|
|
842 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
843 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
844 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
845 |
|
|
\subsection{Representation Of Unsigned Integers}
|
846 |
|
|
%Subsection Tag: ROU0
|
847 |
|
|
\label{ccil0:sroi0:srou0}
|
848 |
|
|
|
849 |
|
|
Unsigned machine integers are always represented as an ordered
|
850 |
|
|
array of bits (Figure \ref{fig:ccil0:sroi0:srou0:00}). For an
|
851 |
|
|
$m$-bit unsigned integer $u$, we denote these bits $u_{[m-1]}$ through
|
852 |
|
|
$u_{[0]}$, with $u_{[m-1]}$ the most significant bit. The value
|
853 |
|
|
of $u$ is the sum of the values of each bit multiplied
|
854 |
|
|
by the power of 2 it represents:
|
855 |
|
|
|
856 |
|
|
\begin{figure}
|
857 |
|
|
\centering
|
858 |
|
|
\includegraphics[width=4.6in]{c_cil0/uintrep1.eps}
|
859 |
|
|
\caption{Representation Of Unsigned Machine Integers}
|
860 |
|
|
\label{fig:ccil0:sroi0:srou0:00}
|
861 |
|
|
\end{figure}
|
862 |
|
|
|
863 |
|
|
\begin{equation}
|
864 |
|
|
\label{eq:ccil0:sroi0:srou0:00}
|
865 |
|
|
u = \sum_{i=0}^{m-1} u_{[i]} 2^i .
|
866 |
|
|
\end{equation}
|
867 |
|
|
|
868 |
|
|
In general, an $m$-bit unsigned integer can assume the values of
|
869 |
|
|
0 through $2^m - 1$, so that
|
870 |
|
|
|
871 |
|
|
\begin{equation}
|
872 |
|
|
\label{eq:ccil0:sroi0:srou0:01}
|
873 |
|
|
u \in \{0, \ldots{} , 2^m - 1 \} .
|
874 |
|
|
\end{equation}
|
875 |
|
|
|
876 |
|
|
Unsigned synthetic long integers are always represented as an array
|
877 |
|
|
of unsigned machine integers.
|
878 |
|
|
Consistent with the GMP \cite{bibref:s:gnumultipleprecisionarithmeticlibrary},
|
879 |
|
|
we call each element of the array a \emph{limb} and we call the size of
|
880 |
|
|
each limb the \emph{limbsize}. This usage is very close to what Knuth
|
881 |
|
|
calls the \emph{word}size $w$ in the excerpt presented in
|
882 |
|
|
Section \ref{ccil0:sppm0:smas0}.
|
883 |
|
|
|
884 |
|
|
Microcontroller processors are more likely than more powerful processors to have
|
885 |
|
|
a non-orthogonal instruction set, and so the limbsize may not be consistent between
|
886 |
|
|
operations in an arithmetic library.
|
887 |
|
|
With some processors, the appropriate limbsize may vary depending on the operation being
|
888 |
|
|
performed.
|
889 |
|
|
For example, a microcontroller processor may be able to add two 16-bit integers to
|
890 |
|
|
obtain a 16-bit result plus a carry, but only be able to multiply two 8-bit integers to
|
891 |
|
|
obtain a 16-bit result (thus, the appropriate limbsize for addition may be
|
892 |
|
|
16 bits while the appropriate limbsize for multiplication may be 8 bits).
|
893 |
|
|
In such cases, it is usually most efficient to add using 16-bit limbs but
|
894 |
|
|
multiply using 8-bit limbs. Adding using 8-bit limbs on a machine which will
|
895 |
|
|
support 16-bit additions is not normally a good design decision---even if the
|
896 |
|
|
machine supports an 8-bit addition instruction which is faster than the 16-bit addition
|
897 |
|
|
instruction, $m/2$ 16-bit additions will nearly always be faster than
|
898 |
|
|
$m$ 8-bit additions. Using different limbsizes within the same arithmetic library
|
899 |
|
|
may require some consideration of alignment and
|
900 |
|
|
endian issues, but these are implementation details
|
901 |
|
|
which are easily solved.
|
902 |
|
|
|
903 |
|
|
We view a synthetic long unsigned integer as an array of limbs (machine integers)
|
904 |
|
|
of some size, and we agree that we will not address the array in any other way than
|
905 |
|
|
by loading and storing limbs of this size.\footnote{Well \ldots{} not quite.
|
906 |
|
|
In software for large computers (personal computers and workstations) with an
|
907 |
|
|
orthogonal instruction set, we may be able to adhere to this rule. However,
|
908 |
|
|
with microcontrollers, arithmetic libraries which are optimized
|
909 |
|
|
may break this rule.} In particular, because
|
910 |
|
|
computers may be ``big-endian'' or ``little-endian'', loading and storing
|
911 |
|
|
smaller units than limbs may lead in
|
912 |
|
|
a worst case to software defects or in a best case to non-portable code.
|
913 |
|
|
|
914 |
|
|
Assume that $w$ is the number of bits in a limb.
|
915 |
|
|
Notationally, we denote an unsigned
|
916 |
|
|
synthetic long integer as an array of $m$ limbs
|
917 |
|
|
$u_{m-1}$ through $u_0$, each containing $w$ bits,
|
918 |
|
|
with $u_0$ the least significant machine integer.
|
919 |
|
|
We may also define $b=2^w$ (consistent with Knuth's
|
920 |
|
|
notation).
|
921 |
|
|
The value of
|
922 |
|
|
such a synthetic long machine integer is
|
923 |
|
|
|
924 |
|
|
\begin{equation}
|
925 |
|
|
\label{eq:ccil0:sroi0:srou0:02}
|
926 |
|
|
u = \sum_{i=0}^{m-1} u_{i} 2^{wi}
|
927 |
|
|
=
|
928 |
|
|
\sum_{i=0}^{m-1} u_{i} b^i.
|
929 |
|
|
\end{equation}
|
930 |
|
|
|
931 |
|
|
As an alternative, we may write the value as the sum of the bit-values,
|
932 |
|
|
|
933 |
|
|
\begin{equation}
|
934 |
|
|
\label{eq:ccil0:sroi0:srou0:03}
|
935 |
|
|
u = \sum_{i=0}^{wm-1} u_{[i]} 2^{i} .
|
936 |
|
|
\end{equation}
|
937 |
|
|
|
938 |
|
|
Naturally, the range of such a synthetic long integer is
|
939 |
|
|
|
940 |
|
|
\begin{equation}
|
941 |
|
|
\label{eq:ccil0:sroi0:srou0:04}
|
942 |
|
|
u \in \{0, \ldots{} , 2^{wm} - 1 \} .
|
943 |
|
|
\end{equation}
|
944 |
|
|
|
945 |
|
|
In storing an unsigned synthetic long machine integer, the most natural way
|
946 |
|
|
to order the array of limbs depends on whether dynamic memory allocation is
|
947 |
|
|
used by the arithmetic library. In microcontroller work, where arithmetic
|
948 |
|
|
library subroutines typically operate on fixed-size operands and produce
|
949 |
|
|
fixed-size results, storing
|
950 |
|
|
limbs most significant limb first (i.e. in `C', so that element \texttt{[0]}
|
951 |
|
|
of the array of limbs contains the most significant limb) may be natural
|
952 |
|
|
and convenient. However, this ordering would lead to computational waste in a library such
|
953 |
|
|
as the GMP \cite{bibref:s:gnumultipleprecisionarithmeticlibrary} where integers
|
954 |
|
|
may grow arbitrarily large and the library may need to reallocate long synthetic
|
955 |
|
|
integers to contain more limbs, as each reallocation would need to be followed
|
956 |
|
|
by a memory copy to align the integer's existing limbs to the end of the array.
|
957 |
|
|
For libraries such as the GMP, it is more practical to store limbs
|
958 |
|
|
least-significant limb first, as it eliminates the need to copy memory
|
959 |
|
|
when reallocations are done.
|
960 |
|
|
|
961 |
|
|
|
962 |
|
|
|
963 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
964 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
965 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
966 |
|
|
\subsection{Representation Of Signed Integers}
|
967 |
|
|
%Subsection Tag: ROS0
|
968 |
|
|
\label{ccil0:sroi0:sros0}
|
969 |
|
|
|
970 |
|
|
Signed machine integers are always represented in two's complement form on modern
|
971 |
|
|
processors. This representation is universal because of the digital logic
|
972 |
|
|
conveniences---the same addition and subtraction mappings which are correct
|
973 |
|
|
for unsigned machine integers are also correct for signed machine integers,
|
974 |
|
|
although the criteria for overflow and comparison are
|
975 |
|
|
different.
|
976 |
|
|
|
977 |
|
|
Most readers are familiar with two's complement representation, so we will not
|
978 |
|
|
belabor it. However, we will present essential properties.
|
979 |
|
|
When two's complement representation is used in an $m$-bit machine integer $u$:
|
980 |
|
|
|
981 |
|
|
\begin{enumerate}
|
982 |
|
|
\item All bit patterns with $u_{[m-1]} = 0$ represent non-negative integers, and
|
983 |
|
|
represent the same integer as if the representation were unsigned.
|
984 |
|
|
\item All bit patterns with $u_{[m-1]} = 1$ represent negative numbers; specifically
|
985 |
|
|
$u_{[m-2:0]} - 2^{m-1}$; i.e.
|
986 |
|
|
|
987 |
|
|
\begin{equation}
|
988 |
|
|
\label{eq:ccil0:sroi0:sros0:00}
|
989 |
|
|
u = - u_{[m-1]} 2^{m-1} + \sum_{i=0}^{m-2} u_{[i]} 2^i .
|
990 |
|
|
\end{equation}
|
991 |
|
|
|
992 |
|
|
\item $u \in \{-2^{m-1}, \ldots{}, 2^{m-1}-1 \}$.
|
993 |
|
|
\item All bit patterns represent a unique integer.
|
994 |
|
|
\item For any integer except $-2^{m-1}$,
|
995 |
|
|
negation can be performed by forming the one's complement (complementing
|
996 |
|
|
every bit), then adding one. To see why this is true algebraically, note that
|
997 |
|
|
|
998 |
|
|
\end{enumerate}
|
999 |
|
|
|
1000 |
|
|
|
1001 |
|
|
However, let us observe that the value of an
|
1002 |
|
|
$m$-bit two's complement
|
1003 |
|
|
machine integer is
|
1004 |
|
|
|
1005 |
|
|
|
1006 |
|
|
In general, an $m$-bit signed machine integer can assume the values of
|
1007 |
|
|
$-2^{m-1}$ through $2^{m-1} - 1$, so that
|
1008 |
|
|
|
1009 |
|
|
\begin{equation}
|
1010 |
|
|
\label{eq:ccil0:sroi0:sros0:01}
|
1011 |
|
|
u \in \{-2^{m-1}, \ldots{} , 2^{m-1} - 1 \} .
|
1012 |
|
|
\end{equation}
|
1013 |
|
|
|
1014 |
|
|
There are [at least] two different representations of signed
|
1015 |
|
|
multiple-precision integers:
|
1016 |
|
|
|
1017 |
|
|
\begin{itemize}
|
1018 |
|
|
\item Two's complement representation.
|
1019 |
|
|
\item Sign-magnitude representation.
|
1020 |
|
|
\end{itemize}
|
1021 |
|
|
|
1022 |
|
|
There are two different representations commonly used
|
1023 |
|
|
for signed multiple-precision integers because two's complement
|
1024 |
|
|
representation is not ideal for multiplication and division, although
|
1025 |
|
|
it is ideal for addition and subtraction. For multiple-precision
|
1026 |
|
|
integer arithmetic, sign-magnitude representation is more common.
|
1027 |
|
|
|
1028 |
|
|
In two's complement representation of multiple-precision integers,
|
1029 |
|
|
the representation is the same as suggested by
|
1030 |
|
|
(\ref{eq:ccil0:sroi0:sros0:00}), except
|
1031 |
|
|
more bits are involved. For a two's complement representation
|
1032 |
|
|
of a number consisting of $n$ machine integers with $W$ bits per
|
1033 |
|
|
machine integer,
|
1034 |
|
|
|
1035 |
|
|
\begin{equation}
|
1036 |
|
|
\label{eq:ccil0:sroi0:sros0:02}
|
1037 |
|
|
u = - u_{B(Wn-1)} 2^{Wn-1} + \sum_{i=0}^{Wn-2} u_{B(i)} 2^i .
|
1038 |
|
|
\end{equation}
|
1039 |
|
|
|
1040 |
|
|
Because we would like to know how to compare signed multiple-precision
|
1041 |
|
|
integers in two's complement representation, we can gain some
|
1042 |
|
|
insight into the representation by rewriting
|
1043 |
|
|
(\ref{eq:ccil0:sroi0:sros0:02}) in terms of machine integers:
|
1044 |
|
|
|
1045 |
|
|
\begin{equation}
|
1046 |
|
|
\label{eq:ccil0:sroi0:sros0:03}
|
1047 |
|
|
u = - u_{B(Wn-1)} 2^{Wn-1}
|
1048 |
|
|
+
|
1049 |
|
|
\sum_{i=W(m-1)}^{Wn-2} u_{B(i)} 2^i
|
1050 |
|
|
+
|
1051 |
|
|
\sum_{i=0}^{m-2} u_{i} 2^{Wi} .
|
1052 |
|
|
\end{equation}
|
1053 |
|
|
|
1054 |
|
|
(\ref{eq:ccil0:sroi0:sros0:03}) gives some insight into the
|
1055 |
|
|
relative values of multiple-precision signed two's complement
|
1056 |
|
|
integers with respect to the values of the machine integers
|
1057 |
|
|
that comprise them. We discuss this further in
|
1058 |
|
|
Section \ref{ccil0:scsi0}.
|
1059 |
|
|
|
1060 |
|
|
In sign-magnitude representation of multiple-precision signed two's
|
1061 |
|
|
complement integers, an integer $u$ is represented as a sign
|
1062 |
|
|
bit (usually a value of one indicates negativity), and a magnitude,
|
1063 |
|
|
which is $|u|$. Unlike two's complement representation,
|
1064 |
|
|
sign-magnitude representation, has two representations of zero---a positive
|
1065 |
|
|
one and a negative one. Either a canonical form for zero should be
|
1066 |
|
|
adopted, or both values should be treated identically.
|
1067 |
|
|
|
1068 |
|
|
Assuming that the sign bit is stored in the most significant bit position,
|
1069 |
|
|
it is easy to deduce that the value of a multiple-precision
|
1070 |
|
|
signed two's complement integer in sign-magnitude representation is
|
1071 |
|
|
|
1072 |
|
|
\begin{equation}
|
1073 |
|
|
\label{eq:ccil0:sros0:srou0:04}
|
1074 |
|
|
u = (-1)^{u_{B(m-1)}} \sum_{i=0}^{m-2} u_{B(i)} 2^i .
|
1075 |
|
|
\end{equation}
|
1076 |
|
|
|
1077 |
|
|
Sign-magnitude representation is especially convenient because
|
1078 |
|
|
it allows machine instructions which accept unsigned operands to be used
|
1079 |
|
|
to make calculations involving signed integers.
|
1080 |
|
|
|
1081 |
|
|
|
1082 |
|
|
|
1083 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
1084 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
1085 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
1086 |
|
|
\section{Characteristics Of Practical Processors}
|
1087 |
|
|
%Section tag: CPP
|
1088 |
|
|
\label{ccil0:scpp0}
|
1089 |
|
|
|
1090 |
|
|
Before discussing specific algorithms, it is necessary to
|
1091 |
|
|
discuss the construction of practical processors---how such a processor
|
1092 |
|
|
manipulates machine integers. We accept as a typical processor the
|
1093 |
|
|
TMS-370C8, an 8-bit microcontroller manufactured by
|
1094 |
|
|
Texas Instruments.
|
1095 |
|
|
|
1096 |
|
|
\begin{figure}
|
1097 |
|
|
\centering
|
1098 |
|
|
\includegraphics[width=4.6in]{c_cil0/t370flag.eps}
|
1099 |
|
|
\caption{Texas Instruments TMS-370C8 Flags}
|
1100 |
|
|
\label{fig:ccil0:scpp0:00}
|
1101 |
|
|
\end{figure}
|
1102 |
|
|
|
1103 |
|
|
\begin{figure}
|
1104 |
|
|
\centering
|
1105 |
|
|
\includegraphics[width=4.6in]{c_cil0/t370cjmp.eps}
|
1106 |
|
|
\caption{Texas Instruments TMS-370C8 Conditional Jump Instructions}
|
1107 |
|
|
\label{fig:ccil0:scpp0:01}
|
1108 |
|
|
\end{figure}
|
1109 |
|
|
|
1110 |
|
|
A typical microcontroller allows operations on machine integers
|
1111 |
|
|
in the following steps:
|
1112 |
|
|
|
1113 |
|
|
\begin{itemize}
|
1114 |
|
|
\item A machine instruction is performed on one or two machine
|
1115 |
|
|
integer operands (for example: addition, subtraction,
|
1116 |
|
|
multiplication, division, increment, decrement, complement,
|
1117 |
|
|
negation, or comparison). This machine instruction may
|
1118 |
|
|
produce a result, and usually sets a number of condition flags that
|
1119 |
|
|
reflect the nature and validity of the result (Is it zero?
|
1120 |
|
|
Is it negative? Did the result overflow?). As an
|
1121 |
|
|
example, the condition
|
1122 |
|
|
flags of the TMS-370C8 are shown in Figure \ref{fig:ccil0:scpp0:00}.
|
1123 |
|
|
.
|
1124 |
|
|
\item A conditional branch instruction is used to branch conditionally
|
1125 |
|
|
based on the state of the condition flags. The definition of
|
1126 |
|
|
the condition flags and the way in which the conditional
|
1127 |
|
|
branch instruction utilizes them is designed to provide a
|
1128 |
|
|
way to treat both unsigned and signed machine integers.
|
1129 |
|
|
As an example, the way in which the conditional
|
1130 |
|
|
jump instructions of the TMS-370C8 use the flags
|
1131 |
|
|
is shown in Figure \ref{fig:ccil0:scpp0:01}.
|
1132 |
|
|
\end{itemize}
|
1133 |
|
|
|
1134 |
|
|
It is not too often necessary to understand in detail
|
1135 |
|
|
the Boolean relationships that govern how machine integers are
|
1136 |
|
|
added, subtracted, and compared; and how signed comparisons differ
|
1137 |
|
|
from unsigned comparisons. In most cases, it is adequate to
|
1138 |
|
|
rely on the design of the microcontroller. However, we do present
|
1139 |
|
|
rudimentary observations in this section.
|
1140 |
|
|
|
1141 |
|
|
|
1142 |
|
|
|
1143 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
1144 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
1145 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
1146 |
|
|
\section{Comparison Of Integers}
|
1147 |
|
|
|
1148 |
|
|
|
1149 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
1150 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
1151 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
1152 |
|
|
\subsection{Comparison Of Unsigned Integers}
|
1153 |
|
|
|
1154 |
|
|
|
1155 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
1156 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
1157 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
1158 |
|
|
\subsection{Comparison Of Signed Integers}
|
1159 |
|
|
%Section Tag: CSI0
|
1160 |
|
|
\label{ccil0:scsi0}
|
1161 |
|
|
|
1162 |
|
|
|
1163 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
1164 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
1165 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
1166 |
|
|
\section{Integer Addition}
|
1167 |
|
|
%Section tag: IAD0
|
1168 |
|
|
\label{ccil0:siad0}
|
1169 |
|
|
|
1170 |
|
|
Addition of two $m$-bit integers is a combinational function---that is,
|
1171 |
|
|
the inputs uniquely determine the output. Addition of binary
|
1172 |
|
|
numbers is performed
|
1173 |
|
|
|
1174 |
|
|
|
1175 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
1176 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
1177 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
1178 |
|
|
\subsection{Hardware Implementation Of Addition}
|
1179 |
|
|
|
1180 |
|
|
|
1181 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
1182 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
1183 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
1184 |
|
|
\subsection{Addition Of Unsigned Operands}
|
1185 |
|
|
|
1186 |
|
|
|
1187 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
1188 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
1189 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
1190 |
|
|
\subsection{Addition Of Signed Operands}
|
1191 |
|
|
|
1192 |
|
|
|
1193 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
1194 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
1195 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
1196 |
|
|
\section {Integer Subtraction}
|
1197 |
|
|
|
1198 |
|
|
|
1199 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
1200 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
1201 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
1202 |
|
|
\subsection{Hardware Implementation Of Subtraction}
|
1203 |
|
|
|
1204 |
|
|
|
1205 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
1206 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
1207 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
1208 |
|
|
\subsection{Subtraction Of Unsigned Operands}
|
1209 |
|
|
|
1210 |
|
|
|
1211 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
1212 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
1213 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
1214 |
|
|
\subsection{Subtraction Of Signed Operands}
|
1215 |
|
|
|
1216 |
|
|
|
1217 |
|
|
|
1218 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
1219 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
1220 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
1221 |
|
|
\section{Integer Multiplication}
|
1222 |
|
|
|
1223 |
|
|
|
1224 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
1225 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
1226 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
1227 |
|
|
\subsection{Hardware Implementation Of Multiplication}
|
1228 |
|
|
|
1229 |
|
|
|
1230 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
1231 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
1232 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
1233 |
|
|
\subsection{Multiplication Of Unsigned Operands}
|
1234 |
|
|
|
1235 |
|
|
|
1236 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
1237 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
1238 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
1239 |
|
|
\subsection{Multiplication Of Signed Operands}
|
1240 |
|
|
|
1241 |
|
|
|
1242 |
|
|
|
1243 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
1244 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
1245 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
1246 |
|
|
\section{Integer Division}
|
1247 |
|
|
\label{ccil0:sidv0}
|
1248 |
|
|
|
1249 |
|
|
\index{division}\index{integer division}In this section,
|
1250 |
|
|
we discuss the best known methods of dividing integers using
|
1251 |
|
|
typical microcontroller instruction sets. In general, given
|
1252 |
|
|
two arbitrary integers $p$ and $q$, we are interested in determining
|
1253 |
|
|
their quotient $q=\lfloor{}p/q\rfloor$ and remainder
|
1254 |
|
|
$r=p\bmod{}q$ as economically as possible.
|
1255 |
|
|
|
1256 |
|
|
|
1257 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
1258 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
1259 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
1260 |
|
|
\subsection{Hardware Implementation Of Division}
|
1261 |
|
|
|
1262 |
|
|
|
1263 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
1264 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
1265 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
1266 |
|
|
\subsection{General Unsigned Division Without A Machine Division Instruction}
|
1267 |
|
|
\label{ccil0:sidv0:sgdn0}
|
1268 |
|
|
|
1269 |
|
|
|
1270 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
1271 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
1272 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
1273 |
|
|
\subsection{General Unsigned Division With A Machine Division Instruction}
|
1274 |
|
|
\label{ccil0:sidv0:sgdu0}
|
1275 |
|
|
|
1276 |
|
|
As mentioned many places in this work, efficiency in microcontroller software
|
1277 |
|
|
involves phrasing computational problems in a way which makes good use of
|
1278 |
|
|
the machine instruction set. In Section \ref{ccil0:sidv0:sgdn0} we discussed
|
1279 |
|
|
the classic shift-compare-subtract algorithm for division. This algorithm
|
1280 |
|
|
is far from efficient. A reasonable question to ask is whether we can
|
1281 |
|
|
leverage ``small'' division capability (provided by the machine instruction set)
|
1282 |
|
|
to accomplish ``large'' divisions (those which we require in practice).
|
1283 |
|
|
It ends up that this is possible: the technique involved is effectively
|
1284 |
|
|
to use machine division instructions to estimate the highest-order bits of
|
1285 |
|
|
the quotient based on the highest-order bits of the dividend and divisor.
|
1286 |
|
|
|
1287 |
|
|
Knuth's discussion of division
|
1288 |
|
|
algorithms \cite[pp. 270-275]{bibref:b:knuthclassic2ndedvol2} is the
|
1289 |
|
|
basis for most of the material in this subsection. However, Knuth has
|
1290 |
|
|
a gift for terseness that is sometimes a curse for the reader, and so
|
1291 |
|
|
we take more time than Knuth to explain certain results.
|
1292 |
|
|
|
1293 |
|
|
First, as a starting point, we present \emph{Algorithm D} from
|
1294 |
|
|
Knuth \cite[pp. 272-273]{bibref:b:knuthclassic2ndedvol2}. Then,
|
1295 |
|
|
we justify the algorithm and explain why it is valid. Finally,
|
1296 |
|
|
we supply implementation advice for microcontroller instruction sets.
|
1297 |
|
|
|
1298 |
|
|
\begin{vworkalgorithmstatementpar}{Arbitrary Unsigned Division Using
|
1299 |
|
|
Machine Unsigned Division Instructions}
|
1300 |
|
|
\label{alg:ccil0:sidv0:sgdu0:01}
|
1301 |
|
|
(From Knuth \cite[pp. 272-273]{bibref:b:knuthclassic2ndedvol2})
|
1302 |
|
|
Given nonnegative integers $u=(u_{m+n-1} \ldots{} u_1 u_0)_b$
|
1303 |
|
|
and $v=(v_{n-1} \ldots{} v_1 v_0)_b$, where
|
1304 |
|
|
$v_{n-1} \neq 0$ and $n > 1$, we form the radix-$b$ quotient
|
1305 |
|
|
$\lfloor{}u/v\rfloor{} = (q_m q_{m-1} \ldots{} q_0)_b$ and
|
1306 |
|
|
the remainder $u \bmod v = (r_{n-1} \ldots{} r_1 r_0)_b$. When
|
1307 |
|
|
$n=1$, the simpler algorithm of
|
1308 |
|
|
Subsection \ref{ccil0:sidv0:sldm0}
|
1309 |
|
|
should be used.
|
1310 |
|
|
|
1311 |
|
|
\begin{algblvl0}
|
1312 |
|
|
\item \label{enumstep:alg:ccil0:sidv0:sgdu0:01:01}
|
1313 |
|
|
[Normalize.] Set $d \gets \lfloor{}b/(v_{n-1} + 1)\rfloor$.
|
1314 |
|
|
Then set $(u_{m+n} u_{m+n-1} \ldots{} u_1 u_0)_b$ equal to
|
1315 |
|
|
$(u_{m+n-1} \ldots{} u_1 u_0)_b$ times $d$; similarly,
|
1316 |
|
|
set $(v_{n-1} \ldots{} v_1 v_0)_b$ equal to
|
1317 |
|
|
$(v_{n-1} \ldots{} v_1 v_0)_b$ times $d$. (Notice the introduction
|
1318 |
|
|
of a new digit position $u_{m+n}$ at the left of
|
1319 |
|
|
$u_{m+n-1}$; if $d=1$, all we need to do in this step is set
|
1320 |
|
|
$u_{m+n} \gets 0$. On a binary computer it may be preferable
|
1321 |
|
|
to choose $d$ to be a power of 2 instead of using the value
|
1322 |
|
|
suggested here; any value of $d$ that results in
|
1323 |
|
|
$v_{n-1} \geq \lfloor{}b/2\rfloor$ will suffice. See also
|
1324 |
|
|
exercise 37.)
|
1325 |
|
|
\item \label{enumstep:alg:ccil0:sidv0:sgdu0:01:02}
|
1326 |
|
|
[Initialize $j$.] Set $j \gets m$. (The loop on $j$,
|
1327 |
|
|
steps
|
1328 |
|
|
\ref{enumstep:alg:ccil0:sidv0:sgdu0:01:03}
|
1329 |
|
|
through
|
1330 |
|
|
\ref{enumstep:alg:ccil0:sidv0:sgdu0:01:07},
|
1331 |
|
|
will be essentially a division of
|
1332 |
|
|
$(u_{j+n} \ldots{} u_{j+1} u_j)_b$ by $(v_{n-1} \ldots{} v_1 v_0)_b$ to
|
1333 |
|
|
get a single quotient digit $q_j$; see Figure \ref{fig:alg:ccil0:sidv0:sgdu0:01:01}.)
|
1334 |
|
|
|
1335 |
|
|
\begin{figure}
|
1336 |
|
|
\centering
|
1337 |
|
|
\includegraphics[width=4.6in]{c_cil0/kdfc01.eps}
|
1338 |
|
|
\caption{Flowchart For Algorithm \ref{alg:ccil0:sidv0:sgdu0:01} (From \cite[p. 273]{bibref:b:knuthclassic2ndedvol2})}
|
1339 |
|
|
\label{fig:alg:ccil0:sidv0:sgdu0:01:01}
|
1340 |
|
|
\end{figure}
|
1341 |
|
|
|
1342 |
|
|
\item \label{enumstep:alg:ccil0:sidv0:sgdu0:01:03}
|
1343 |
|
|
[Calculate $\hat{q}$.] Set
|
1344 |
|
|
$\hat{q} \gets \lfloor{}u_{j+n}b + u_{j+n-1})/v_{n-1}\rfloor$ and
|
1345 |
|
|
let $\hat{r}$ be the remainder, $(u_{j+n}b + u_{j+n-1}) \bmod v_{n-1}$.
|
1346 |
|
|
Now test if $\hat{q} = b$ or $\hat{q} v_{n-2} > b\hat{r} + u_{j+n-2}$;
|
1347 |
|
|
if so, decrease $\hat{q}$ by 1, increase $\hat{r}$ by $v_{n-1}$, and repeat
|
1348 |
|
|
this test if $\hat{r} < b$. (The test of $v_{n-2}$ determines at high
|
1349 |
|
|
speed most of the cases in which the trial value $\hat{q}$ is one too large,
|
1350 |
|
|
and it eliminates \emph{all} cases where $\hat{q}$ is two too large;
|
1351 |
|
|
see exercises 19, 20, 21.)
|
1352 |
|
|
\item \label{enumstep:alg:ccil0:sidv0:sgdu0:01:04}
|
1353 |
|
|
[Multiply and subtract.] Replace $(u_{j+n} u_{j+n-1} \ldots{} u_j)_b$ by
|
1354 |
|
|
|
1355 |
|
|
\begin{equation}
|
1356 |
|
|
\nonumber
|
1357 |
|
|
(u_{j+n} u_{j+n-1} \ldots{} u_j)_b - \hat{q} (0 v_{n-1} \ldots{} v_1 v_0)_b.
|
1358 |
|
|
\end{equation}
|
1359 |
|
|
|
1360 |
|
|
This computation consists of a simple multiplication by a one-place number,
|
1361 |
|
|
combined with a subtraction. The digits $(u_{j+n}, u_{j+n-1}, \ldots{}, u_j)$
|
1362 |
|
|
should be kept positive; if the result of this step is actually negative,
|
1363 |
|
|
$(u_{j+n} u_{j+n-1} \ldots{} u_j)_b$ should be left as the true
|
1364 |
|
|
value plus $b^{n+1}$, namely as the $b$'s complement of the true value, and
|
1365 |
|
|
a ``borrow'' to the left should be remembered.
|
1366 |
|
|
\item \label{enumstep:alg:ccil0:sidv0:sgdu0:01:05}
|
1367 |
|
|
[Test remainder.] Set $q_j \gets \hat{q}$. If the result of step
|
1368 |
|
|
\ref{enumstep:alg:ccil0:sidv0:sgdu0:01:04} was negative, go to
|
1369 |
|
|
step \ref{enumstep:alg:ccil0:sidv0:sgdu0:01:06}; otherwise go on to
|
1370 |
|
|
step \ref{enumstep:alg:ccil0:sidv0:sgdu0:01:07}.
|
1371 |
|
|
\item \label{enumstep:alg:ccil0:sidv0:sgdu0:01:06}
|
1372 |
|
|
[Add back.] (The probability that this step is necessary is very small, on the
|
1373 |
|
|
order of only $2/b$, as shown in exercise 21; test data to activate this step
|
1374 |
|
|
should therefore be specifically contrived when debugging.) Decrease
|
1375 |
|
|
$q_j$ by 1, and add $(0 v_{n-1} \ldots v_1 v_0)_b$ to
|
1376 |
|
|
$(u_{j+n} u_{j+n-1} \ldots{} u_{j+1} u_j)_b$. (A carry will occur to the left of
|
1377 |
|
|
$u_{j+n}$, and it should be ignored since it cancels with the borrow that
|
1378 |
|
|
occured in step \ref{enumstep:alg:ccil0:sidv0:sgdu0:01:04}.)
|
1379 |
|
|
\item \label{enumstep:alg:ccil0:sidv0:sgdu0:01:07}
|
1380 |
|
|
[Loop on $j$.] Decrease $j$ by one. Now if $j \geq 0$, go back to step
|
1381 |
|
|
\ref{enumstep:alg:ccil0:sidv0:sgdu0:01:03}.
|
1382 |
|
|
\item \label{enumstep:alg:ccil0:sidv0:sgdu0:01:08}
|
1383 |
|
|
[Unnormalize.] Now $(q_m \ldots{} q_1 q_0)_b$ is the desired quotient, and
|
1384 |
|
|
the desired remainder may be obtained by dividing
|
1385 |
|
|
$(u_{n-1} \ldots{} u_1 u_0)_b$ by $d$.
|
1386 |
|
|
\end{algblvl0}
|
1387 |
|
|
\end{vworkalgorithmstatementpar}
|
1388 |
|
|
\vworkalgorithmfooter{}
|
1389 |
|
|
|
1390 |
|
|
The general idea of Algorithm \ref{alg:ccil0:sidv0:sgdu0:01} is that
|
1391 |
|
|
digits (machine words) of the quotient $q$ can be successively estimated
|
1392 |
|
|
based on the first digits of the dividend and divisor. Knuth
|
1393 |
|
|
\cite[p. 271]{bibref:b:knuthclassic2ndedvol2} explores the properties of
|
1394 |
|
|
the digit estimate
|
1395 |
|
|
|
1396 |
|
|
\begin{equation}
|
1397 |
|
|
\label{eq:ccil0:sidv0:sgdu0:01}
|
1398 |
|
|
\hat{q} = \min \left( {\left\lfloor{\frac{u_n b + u_{n-1}}{v_{n-1}}}\right\rfloor, b-1} \right).
|
1399 |
|
|
\end{equation}
|
1400 |
|
|
|
1401 |
|
|
The first point to make about an estimate in the form of
|
1402 |
|
|
(\ref{eq:ccil0:sidv0:sgdu0:01}) is that it can only be accomplished
|
1403 |
|
|
efficiently if the machine-native division instruction supports
|
1404 |
|
|
overflow detection, since it is possible that
|
1405 |
|
|
$(u_n b + u_{n-1})/v_{n-1} \geq b$, even if
|
1406 |
|
|
$u/v < b$, as is shown by the following example.
|
1407 |
|
|
|
1408 |
|
|
\begin{vworkexamplestatement}
|
1409 |
|
|
\label{ex:ccil0:sidv0:sgdu0:01:01}
|
1410 |
|
|
Assume that we wish to apply the estimate of $\hat{q}$ provided by
|
1411 |
|
|
(\ref{eq:ccil0:sidv0:sgdu0:01})
|
1412 |
|
|
to $u=16,776,704$ and $v=65,535$. Demonstrate that a machine division
|
1413 |
|
|
overflow will occur when estimating the first digit, assuming a processor
|
1414 |
|
|
that can divide a 16-bit dividend by an 8-bit divisor to produce an 8-bit
|
1415 |
|
|
quotient.
|
1416 |
|
|
\end{vworkexamplestatement}
|
1417 |
|
|
\begin{vworkexampleparsection}{Solution}
|
1418 |
|
|
Note that according to Knuth's intention, the word size on such a machine
|
1419 |
|
|
is 8 bits. Thus, $b=256$. Note that $u/v = 255 + 255/256 < b = 256$, as
|
1420 |
|
|
required by Knuth's precondition. However, although $u/v < b$,
|
1421 |
|
|
$u = [255 \; 254 \; 0] [256^2 \; 256 \; 1]^T = [u_2 u_1 u_0] [b^2 b^1 b^0]^T$ and
|
1422 |
|
|
$v = [255 \; 255] [256 \; 1]^T = [v_1 v_0] [b^1 b^0]^T$, so that calculating
|
1423 |
|
|
an estimate $\hat{q}$ as required by (\ref{eq:ccil0:sidv0:sgdu0:01}),
|
1424 |
|
|
$(u_n b + u_{n-1})/v_{n-1} = 65,534/255 = 256 + 254/255 \geq b$, is a division
|
1425 |
|
|
overflow for a single machine division instruction. Thus, it follows that
|
1426 |
|
|
a machine with division overflow detection can quickly determine that $b-1$ from
|
1427 |
|
|
(\ref{eq:ccil0:sidv0:sgdu0:01}) is the minimum, whereas a machine without
|
1428 |
|
|
division overflow
|
1429 |
|
|
detection would have to use several additional machine instructions to make
|
1430 |
|
|
this determination.
|
1431 |
|
|
\end{vworkexampleparsection}
|
1432 |
|
|
\vworkexamplefooter{}
|
1433 |
|
|
|
1434 |
|
|
The second thing to establish about $\hat{q}$ as defined by
|
1435 |
|
|
(\ref{eq:ccil0:sidv0:sgdu0:01}) is how ``good'' of an estimate
|
1436 |
|
|
$\hat{q}$ is---how much information, exactly, about $q$ can we
|
1437 |
|
|
obtain by examining the first two words of $u$ and the first
|
1438 |
|
|
word of $v$?
|
1439 |
|
|
|
1440 |
|
|
We first establish in the following lemma that our estimate of
|
1441 |
|
|
$q$, $\hat{q}$, can be no less than $q$.
|
1442 |
|
|
|
1443 |
|
|
\begin{vworklemmastatementpar}{\mbox{\boldmath$\hat{q} \geq q$}}
|
1444 |
|
|
\label{lem:ccil0:sidv0:sgdu0:01}
|
1445 |
|
|
The estimate of $q$ provided by (\ref{eq:ccil0:sidv0:sgdu0:01}),
|
1446 |
|
|
$\hat{q}$ is always at least as great as the actual value of
|
1447 |
|
|
$q$, i.e. $\hat{q} \geq q$.
|
1448 |
|
|
\end{vworklemmastatementpar}
|
1449 |
|
|
\begin{vworklemmaproof}
|
1450 |
|
|
Knowledge of $u_{n}$, $u_{n-1}$, and $v_{n-1}$ necessarily confine
|
1451 |
|
|
the intervals in which the actual values of $u$ and $v$ may be;
|
1452 |
|
|
specifically:\footnote{In
|
1453 |
|
|
(\ref{eq:lem:ccil0:sidv0:sgdu0:01:04})
|
1454 |
|
|
and
|
1455 |
|
|
(\ref{eq:lem:ccil0:sidv0:sgdu0:01:07}),
|
1456 |
|
|
we use statements of the form ``$x = x$'' as an idiom for
|
1457 |
|
|
``$x$ is known''.}
|
1458 |
|
|
|
1459 |
|
|
\begin{eqnarray}
|
1460 |
|
|
\label{eq:lem:ccil0:sidv0:sgdu0:01:01}
|
1461 |
|
|
u & = & \sum_{i=0}^{n} u_i b^i \\
|
1462 |
|
|
\label{eq:lem:ccil0:sidv0:sgdu0:01:02}
|
1463 |
|
|
& = & u_n b^n + u_{n-1} b^{n-1} + \ldots{} + u_2 b^2 + u_1 b + u_0 \\
|
1464 |
|
|
\label{eq:lem:ccil0:sidv0:sgdu0:01:03}
|
1465 |
|
|
& = & (u_n b + u_{n-1}) b^{n-1} + \ldots{} + u_2 b^2 + u_1 b + u_0
|
1466 |
|
|
\end{eqnarray}
|
1467 |
|
|
|
1468 |
|
|
\begin{eqnarray}
|
1469 |
|
|
\nonumber & (u_n = u_n \wedge u_{n-1} = u_{n-1}) & \\
|
1470 |
|
|
\label{eq:lem:ccil0:sidv0:sgdu0:01:04}
|
1471 |
|
|
& \vworkvimp & \\
|
1472 |
|
|
\nonumber & (u_n b + u_{n-1}) b^{n-1} \leq u \leq (u_n b + u_{n-1}) b^{n-1} + b^{n-1} - 1 &
|
1473 |
|
|
\end{eqnarray}
|
1474 |
|
|
|
1475 |
|
|
\begin{eqnarray}
|
1476 |
|
|
\label{eq:lem:ccil0:sidv0:sgdu0:01:05}
|
1477 |
|
|
v & = & \sum_{i=0}^{n-1} v_i b^i \\
|
1478 |
|
|
\label{eq:lem:ccil0:sidv0:sgdu0:01:06}
|
1479 |
|
|
& = & v_{n-1} b^{n-1} + v_{n-2} b^{n-2} + \ldots{} + v_2 b^2 + v_1 b + v_0
|
1480 |
|
|
\end{eqnarray}
|
1481 |
|
|
|
1482 |
|
|
\begin{equation}
|
1483 |
|
|
\label{eq:lem:ccil0:sidv0:sgdu0:01:07}
|
1484 |
|
|
(v_{n-1} = v_{n-1})
|
1485 |
|
|
\vworkhimp
|
1486 |
|
|
v_{n-1} b^{n-1} \leq v \leq v_{n-1} b^{n-1} + b^{n-1} - 1
|
1487 |
|
|
\end{equation}
|
1488 |
|
|
|
1489 |
|
|
(\ref{eq:lem:ccil0:sidv0:sgdu0:01:04}) and
|
1490 |
|
|
(\ref{eq:lem:ccil0:sidv0:sgdu0:01:07}) reflect the uncertainties in the
|
1491 |
|
|
values of $u$ and $v$ respectively because only the first digit(s) of
|
1492 |
|
|
$u$ and $v$ are being considered in forming the estimate $\hat{q}$.
|
1493 |
|
|
|
1494 |
|
|
By definition, the actual value of $q$ is $\lfloor{}u/v\rfloor$. For a
|
1495 |
|
|
rational function $f(u,v) = u/v$ where $u \in [u_{min}, u_{max}]$ and
|
1496 |
|
|
$v \in [v_{min}, v_{max}]$, the minimum value of $u/v$ occurs at
|
1497 |
|
|
$u_{min}/v_{max}$, and the maximum value of $u/v$ occurs at
|
1498 |
|
|
$u_{max}/v_{min}$. We can therefore write that
|
1499 |
|
|
|
1500 |
|
|
\begin{equation}
|
1501 |
|
|
\label{eq:lem:ccil0:sidv0:sgdu0:01:08}
|
1502 |
|
|
\left\lfloor{\frac{(u_n b + u_{n-1}) b^{n-1}}{v_{n-1} b^{n-1} + b^{n-1} - 1}}\right\rfloor
|
1503 |
|
|
\leq
|
1504 |
|
|
q
|
1505 |
|
|
\leq
|
1506 |
|
|
\left\lfloor{\frac{(u_n b + u_{n-1}) b^{n-1} + b^{n-1} - 1}{v_{n-1} b^{n-1}}}\right\rfloor .
|
1507 |
|
|
\end{equation}
|
1508 |
|
|
|
1509 |
|
|
In other words, knowledge of $u_{n}$, $u_{n-1}$, and $v_{n-1}$ confines $q$ to the
|
1510 |
|
|
interval indicated in (\ref{eq:lem:ccil0:sidv0:sgdu0:01:08}). We must prove that,
|
1511 |
|
|
given a specific $u_{n}$, $u_{n-1}$, and $v_{n-1}$, $\hat{q}$ is at least as large as
|
1512 |
|
|
the upper bound in (\ref{eq:lem:ccil0:sidv0:sgdu0:01:08}); otherwise we could find a
|
1513 |
|
|
$q$ such that $q > \hat{q}$. We can algebraically manipulate the upper bound in
|
1514 |
|
|
in (\ref{eq:lem:ccil0:sidv0:sgdu0:01:08}) to yield
|
1515 |
|
|
|
1516 |
|
|
\begin{equation}
|
1517 |
|
|
\label{eq:lem:ccil0:sidv0:sgdu0:01:09}
|
1518 |
|
|
\left\lfloor{\frac{(u_n b + u_{n-1}) b^{n-1}}{v_{n-1} b^{n-1} + b^{n-1} - 1}}\right\rfloor
|
1519 |
|
|
\leq
|
1520 |
|
|
q
|
1521 |
|
|
\leq
|
1522 |
|
|
\left\lfloor{\frac{u_n b + u_{n-1} + \frac{b^{n-1}-1}{b^{n-1}}}{v_{n-1}}}\right\rfloor .
|
1523 |
|
|
\end{equation}
|
1524 |
|
|
|
1525 |
|
|
In (\ref{eq:lem:ccil0:sidv0:sgdu0:01:09}), since $(b^{n-1}-1)/b^{n-1} < 1$ and since
|
1526 |
|
|
$u_n b + u_{n-1}$ is an integer, we can conclude that
|
1527 |
|
|
$\lfloor u_n b + u_{n-1} + (b^{n-1}-1)/b^{n-1} \rfloor = \lfloor u_n b + u_{n-1} \rfloor$
|
1528 |
|
|
and hence that
|
1529 |
|
|
|
1530 |
|
|
\begin{equation}
|
1531 |
|
|
\label{eq:lem:ccil0:sidv0:sgdu0:01:10}
|
1532 |
|
|
q
|
1533 |
|
|
\leq
|
1534 |
|
|
\left\lfloor{\frac{u_n b + u_{n-1} + \frac{b^{n-1}-1}{b^{n-1}}}{v_{n-1}}}\right\rfloor
|
1535 |
|
|
=
|
1536 |
|
|
\left\lfloor{\frac{u_n b + u_{n-1}}{v_{n-1}}}\right\rfloor
|
1537 |
|
|
=
|
1538 |
|
|
\hat{q} .
|
1539 |
|
|
\end{equation}
|
1540 |
|
|
|
1541 |
|
|
Therefore, $\hat{q} \geq q$.
|
1542 |
|
|
\end{vworklemmaproof}
|
1543 |
|
|
\vworklemmafooter{}
|
1544 |
|
|
|
1545 |
|
|
Lemma \ref{lem:ccil0:sidv0:sgdu0:01} establishes that
|
1546 |
|
|
a digit estimate $\hat{q}$ based on the first digit of the
|
1547 |
|
|
divisor $v$ can be no less than the actual digit $q$, i.e.
|
1548 |
|
|
$\hat{q}-q \geq 0$. However, we must also establish an upper bound
|
1549 |
|
|
on $\hat{q}-q$.
|
1550 |
|
|
|
1551 |
|
|
Intuitively, based on
|
1552 |
|
|
(\ref{eq:lem:ccil0:sidv0:sgdu0:01:06}), we might guess that
|
1553 |
|
|
if $v_{n-1}$ is small, the estimate $\hat{q}$ may be quite
|
1554 |
|
|
poor, as the interval to which the actual value of $v$ is confined
|
1555 |
|
|
may be quite large. This fact is the basis for the normalization
|
1556 |
|
|
step [\ref{enumstep:alg:ccil0:sidv0:sgdu0:01:01}] in Algorithm
|
1557 |
|
|
\ref{alg:ccil0:sidv0:sgdu0:01}. We now prove a useful result
|
1558 |
|
|
for how much $u, v$ must be normalized so that $\hat{q}-q \leq 2$.
|
1559 |
|
|
|
1560 |
|
|
\begin{vworklemmastatementpar}{Normalization Requirement So That
|
1561 |
|
|
\mbox{\boldmath$\hat{q} - q \leq 2$}}
|
1562 |
|
|
\label{lem:ccil0:sidv0:sgdu0:02}
|
1563 |
|
|
If $v_{n-1} \geq \lfloor b/2 \rfloor$ and $\hat{q}$ is chosen as
|
1564 |
|
|
indicated in (\ref{eq:ccil0:sidv0:sgdu0:01}), then
|
1565 |
|
|
$0 \leq \hat{q} - q \leq 2$.
|
1566 |
|
|
\end{vworklemmastatementpar}
|
1567 |
|
|
\begin{vworklemmaproof}
|
1568 |
|
|
The lower limit on $\hat{q} - q$ is proved in Lemma \ref{lem:ccil0:sidv0:sgdu0:01}.
|
1569 |
|
|
We now seek only to prove that $\hat{q} - q \leq 2$.
|
1570 |
|
|
By definition of $\hat{q}$ and $q$,
|
1571 |
|
|
|
1572 |
|
|
\begin{equation}
|
1573 |
|
|
\label{eq:lem:ccil0:sidv0:sgdu0:02:01}
|
1574 |
|
|
\hat{q} - q = \left\lfloor {\frac{u_n b + u_{n-1}}{v_{n-1}}} \right\rfloor
|
1575 |
|
|
- \left\lfloor {\frac{u}{v}} \right\rfloor
|
1576 |
|
|
\end{equation}
|
1577 |
|
|
|
1578 |
|
|
When only nonnegative integers are involved,
|
1579 |
|
|
(\cmtnzeroxrefhyphen\ref{eq:cmtn0:sfcf0:02})
|
1580 |
|
|
supplies an exact expression for the floor of a
|
1581 |
|
|
ratio of integers. Using (\cmtnzeroxrefhyphen\ref{eq:cmtn0:sfcf0:02}),
|
1582 |
|
|
(\ref{eq:lem:ccil0:sidv0:sgdu0:02:01}) can be decomposed into
|
1583 |
|
|
|
1584 |
|
|
\begin{equation}
|
1585 |
|
|
\label{eq:lem:ccil0:sidv0:sgdu0:02:02}
|
1586 |
|
|
\hat{q} - q = \frac{u_n b + u_{n-1}}{v_{n-1}}
|
1587 |
|
|
- \frac{(u_n b + u_{n-1}) \bmod v_{n-1}}{v_{n-1}}
|
1588 |
|
|
- \frac{u}{v}
|
1589 |
|
|
+ \frac{u \bmod v}{v} .
|
1590 |
|
|
\end{equation}
|
1591 |
|
|
|
1592 |
|
|
\noindent{}Note that (\ref{eq:lem:ccil0:sidv0:sgdu0:02:02}) is an exact
|
1593 |
|
|
expression (rather than an
|
1594 |
|
|
inequality).
|
1595 |
|
|
|
1596 |
|
|
Note in (\ref{eq:lem:ccil0:sidv0:sgdu0:02:02}) that
|
1597 |
|
|
$(u_n b + u_{n-1}) \bmod v_{n-1} \in [0, v_{n-1}-1]$, and that in general
|
1598 |
|
|
there is no reason to expect it cannot be zero. Thus, we can assume that
|
1599 |
|
|
it \emph{is} zero, which will maximize $\hat{q}-q$. We can thus convert
|
1600 |
|
|
(\ref{eq:lem:ccil0:sidv0:sgdu0:02:02}) into the inequality
|
1601 |
|
|
|
1602 |
|
|
\begin{equation}
|
1603 |
|
|
\label{eq:lem:ccil0:sidv0:sgdu0:02:03}
|
1604 |
|
|
\hat{q} - q \leq \frac{u_n b + u_{n-1}}{v_{n-1}}
|
1605 |
|
|
- \frac{u}{v}
|
1606 |
|
|
+ \frac{u \bmod v}{v} .
|
1607 |
|
|
\end{equation}
|
1608 |
|
|
|
1609 |
|
|
In (\ref{eq:lem:ccil0:sidv0:sgdu0:02:03}) we can also observe that
|
1610 |
|
|
$(u \bmod v)/v \in [0, (v-1)/v]$. If we replace this expression with
|
1611 |
|
|
``1'' (which is unattainable, but barely), this will change the relational
|
1612 |
|
|
operator from ``$\leq$'' to ``$<$'':
|
1613 |
|
|
|
1614 |
|
|
\begin{equation}
|
1615 |
|
|
\label{eq:lem:ccil0:sidv0:sgdu0:02:04}
|
1616 |
|
|
\hat{q} - q < \frac{u_n b + u_{n-1}}{v_{n-1}}
|
1617 |
|
|
- \frac{u}{v}
|
1618 |
|
|
+ 1 .
|
1619 |
|
|
\end{equation}
|
1620 |
|
|
|
1621 |
|
|
The result we wish to show is that with $v_{n-1} \geq \lfloor b/2 \rfloor$,
|
1622 |
|
|
$\hat{q}-q \leq 2$. To simplify the subsequent algebraic manipulations, note in
|
1623 |
|
|
(\ref{eq:lem:ccil0:sidv0:sgdu0:02:04}) that
|
1624 |
|
|
|
1625 |
|
|
\begin{eqnarray}
|
1626 |
|
|
\label{eq:lem:ccil0:sidv0:sgdu0:02:05}
|
1627 |
|
|
& \hat{q} - q \leq 2 & \\
|
1628 |
|
|
\nonumber & \vworkvertequiv & \\
|
1629 |
|
|
\label{eq:lem:ccil0:sidv0:sgdu0:02:06}
|
1630 |
|
|
& \displaystyle \frac{u_n b + u_{n-1}}{v_{n-1}}
|
1631 |
|
|
- \frac{u}{v}
|
1632 |
|
|
+ 1 \leq 3 & \\
|
1633 |
|
|
\nonumber & \vworkvertequiv & \\
|
1634 |
|
|
\label{eq:lem:ccil0:sidv0:sgdu0:02:07}
|
1635 |
|
|
& \displaystyle \frac{u_n b + u_{n-1}}{v_{n-1}}
|
1636 |
|
|
- \frac{u}{v} \leq 2 &
|
1637 |
|
|
\end{eqnarray}
|
1638 |
|
|
|
1639 |
|
|
(\ref{eq:lem:ccil0:sidv0:sgdu0:02:06}) may be counterintuitive, so
|
1640 |
|
|
further explanation is offered here. Since $\hat{q} \in \vworkintset$ and $q \in \vworkintset$,
|
1641 |
|
|
$\hat{q}-q \in \vworkintset$. Thus, proving
|
1642 |
|
|
(\ref{eq:lem:ccil0:sidv0:sgdu0:02:06}) or
|
1643 |
|
|
(\ref{eq:lem:ccil0:sidv0:sgdu0:02:07}) proves that
|
1644 |
|
|
$\hat{q}-q \in \{ \ldots, -1, 0, 1, 2 \}$ (however, by
|
1645 |
|
|
Lemma \ref{lem:ccil0:sidv0:sgdu0:01}, $\hat{q}-q \geq 0$, so in fact
|
1646 |
|
|
what would be proved is that $\hat{q}-q \in \{ 0, 1, 2 \}$). For
|
1647 |
|
|
algebraic simplicity,
|
1648 |
|
|
we choose to prove (\ref{eq:lem:ccil0:sidv0:sgdu0:02:07}).
|
1649 |
|
|
|
1650 |
|
|
First, adjust numerator and denominator of the first term in
|
1651 |
|
|
(\ref{eq:lem:ccil0:sidv0:sgdu0:02:07}) by $b^{n-1}$ so that the terms more closely
|
1652 |
|
|
resemble $u$ and $v$ in
|
1653 |
|
|
(\ref{eq:lem:ccil0:sidv0:sgdu0:01:02})
|
1654 |
|
|
and
|
1655 |
|
|
(\ref{eq:lem:ccil0:sidv0:sgdu0:01:06}):
|
1656 |
|
|
|
1657 |
|
|
\begin{equation}
|
1658 |
|
|
\label{eq:lem:ccil0:sidv0:sgdu0:02:08}
|
1659 |
|
|
\frac{u_n b^n + u_{n-1} b^{n-1}}{v_{n-1} b^{n-1}}
|
1660 |
|
|
- \frac{u}{v} \leq 2 .
|
1661 |
|
|
\end{equation}
|
1662 |
|
|
|
1663 |
|
|
For logical implication to be maintained,
|
1664 |
|
|
we must make the most pessimistic choices and assumptions possible in
|
1665 |
|
|
(\ref{eq:lem:ccil0:sidv0:sgdu0:02:08}) in order to maximize the value
|
1666 |
|
|
of the left side of the inequality.
|
1667 |
|
|
The first assumption to be made is the error in estimating
|
1668 |
|
|
$u$ and $v$ based on their most significant digits. It can be
|
1669 |
|
|
seen that (\ref{eq:lem:ccil0:sidv0:sgdu0:02:08}) will be maximized if:
|
1670 |
|
|
|
1671 |
|
|
\begin{itemize}
|
1672 |
|
|
\item We assume that $u = u_{n} b^n + u_{n-1} b^{n-1}$ (i.e. that we estimate
|
1673 |
|
|
$u$ precisely).
|
1674 |
|
|
\item We assume that $v = v_{n-1} b^{n-1} + b^{n-1} - 1$ (i.e. that
|
1675 |
|
|
we underestimate $v$ by the maximum amount possible).
|
1676 |
|
|
\item We minimize the value of $v_{n-1} b^{n-1}$.
|
1677 |
|
|
\end{itemize}
|
1678 |
|
|
|
1679 |
|
|
Assuming that $u$ is estimated precisely yields
|
1680 |
|
|
|
1681 |
|
|
\begin{equation}
|
1682 |
|
|
\label{eq:lem:ccil0:sidv0:sgdu0:02:09}
|
1683 |
|
|
\frac{u}{v_{n-1} b^{n-1}}
|
1684 |
|
|
- \frac{u}{v} \leq 2 .
|
1685 |
|
|
\end{equation}
|
1686 |
|
|
|
1687 |
|
|
Assuming that $v$ is underestimated by the maximum amount possible
|
1688 |
|
|
yields
|
1689 |
|
|
|
1690 |
|
|
\begin{equation}
|
1691 |
|
|
\label{eq:lem:ccil0:sidv0:sgdu0:02:10}
|
1692 |
|
|
\frac{u}{v - b^{n-1} + 1}
|
1693 |
|
|
- \frac{u}{v} \leq 2 .
|
1694 |
|
|
\end{equation}
|
1695 |
|
|
|
1696 |
|
|
Finally, with $b$ and $v$ fixed, $u$ can be maximized by noting that
|
1697 |
|
|
$u \leq bv - 1$ (by the problem assumption that the quotient is a single digit).
|
1698 |
|
|
However, for algebraic simplicity, we make the substitution $u=bv$ (rather than
|
1699 |
|
|
$u=bv-1$), since the weaker upper bound is strong enough to prove the
|
1700 |
|
|
first result we seek.
|
1701 |
|
|
|
1702 |
|
|
\begin{equation}
|
1703 |
|
|
\label{eq:lem:ccil0:sidv0:sgdu0:02:11}
|
1704 |
|
|
\frac{bv}{v - b^{n-1} + 1}
|
1705 |
|
|
- \frac{bv}{v} \leq 2
|
1706 |
|
|
\end{equation}
|
1707 |
|
|
|
1708 |
|
|
\begin{equation}
|
1709 |
|
|
\label{eq:lem:ccil0:sidv0:sgdu0:02:12}
|
1710 |
|
|
\frac{bv}{v - b^{n-1} + 1}
|
1711 |
|
|
- b \leq 2
|
1712 |
|
|
\end{equation}
|
1713 |
|
|
|
1714 |
|
|
Solving (\ref{eq:lem:ccil0:sidv0:sgdu0:02:12})
|
1715 |
|
|
for $v$ yields
|
1716 |
|
|
|
1717 |
|
|
\begin{equation}
|
1718 |
|
|
\label{eq:lem:ccil0:sidv0:sgdu0:02:13}
|
1719 |
|
|
v \geq \frac{b^n}{2} + b^{n-1} - \frac{1}{b} - 1
|
1720 |
|
|
\end{equation}
|
1721 |
|
|
|
1722 |
|
|
Again using the assumption that $v$ is underestimated by the maximum amount
|
1723 |
|
|
possible, we may make the substitution that $v = v_{n-1} b^{n-1} + b^{n-1} -1$,
|
1724 |
|
|
leading to
|
1725 |
|
|
|
1726 |
|
|
\begin{equation}
|
1727 |
|
|
\label{eq:lem:ccil0:sidv0:sgdu0:02:13b}
|
1728 |
|
|
v_{n-1} \geq \frac{b}{2} - \frac{1}{2 b^{n-2}} .
|
1729 |
|
|
\end{equation}
|
1730 |
|
|
|
1731 |
|
|
There are two cases to consider: $b$ even and $b$ odd. If $b$ is even, the
|
1732 |
|
|
proof is complete, as $\lfloor b/2 \rfloor = b/2$ and the choice of
|
1733 |
|
|
$v_{n-1} = \lfloor b/2 \rfloor$ will automatically satisfy
|
1734 |
|
|
(\ref{eq:lem:ccil0:sidv0:sgdu0:02:13b}). However, if $b$ is odd,
|
1735 |
|
|
$\lfloor b/2 \rfloor = b/2 - 1/2 < b/2 - 1/2b^{n-2}$, violating
|
1736 |
|
|
(\ref{eq:lem:ccil0:sidv0:sgdu0:02:13b}),
|
1737 |
|
|
and so we need to
|
1738 |
|
|
further examine this case.
|
1739 |
|
|
|
1740 |
|
|
If $b$ is odd and $v_{n-1} = \lfloor b/2 \rfloor$, then
|
1741 |
|
|
$v_{n-1} = b/2 - 1/2$, violating (\ref{eq:lem:ccil0:sidv0:sgdu0:02:13b}).
|
1742 |
|
|
However, any larger choice of $v_{n-1}$ (such as
|
1743 |
|
|
$\lfloor b/2 \rfloor + 1$, $\lfloor b/2 \rfloor + 2$, etc.) satisfies
|
1744 |
|
|
(\ref{eq:lem:ccil0:sidv0:sgdu0:02:13b}); so that it remains only to prove
|
1745 |
|
|
the $v_{n-1} = \lfloor b/2 \rfloor = b/2 - 1/2$
|
1746 |
|
|
case.
|
1747 |
|
|
|
1748 |
|
|
If $v_{n-1} = \lfloor b/2 \rfloor = b/2 - 1/2$, then
|
1749 |
|
|
|
1750 |
|
|
\begin{eqnarray}
|
1751 |
|
|
\label{eq:lem:ccil0:sidv0:sgdu0:02:14}
|
1752 |
|
|
v & \in & \left[
|
1753 |
|
|
\left( \frac{b}{2} - \frac{1}{2}\right) b^{n-1},
|
1754 |
|
|
\left( \frac{b}{2} - \frac{1}{2}\right) b^{n-1} + b^{n-1} - 1
|
1755 |
|
|
\right] \\
|
1756 |
|
|
\nonumber & = &
|
1757 |
|
|
\left[
|
1758 |
|
|
\frac{b^n}{2} - \frac{b^{n-1}}{2},
|
1759 |
|
|
\frac{b^n}{2} + \frac{b^{n-1}}{2} -1
|
1760 |
|
|
\right] .
|
1761 |
|
|
\end{eqnarray}
|
1762 |
|
|
|
1763 |
|
|
Note in this case that the estimation error ($v - v_{n-1}b^{n-1}$) and the
|
1764 |
|
|
value of $v$ are not independent; and in fact it is this aspect
|
1765 |
|
|
of the problem that has led to the violation of
|
1766 |
|
|
(\ref{eq:lem:ccil0:sidv0:sgdu0:02:13b}) with $b$ odd and
|
1767 |
|
|
$v_{n-1} = \lfloor b/2 \rfloor$.
|
1768 |
|
|
|
1769 |
|
|
In order to prove the case of $b$ odd and $v = \lfloor b/2 \rfloor = b/2 - 1/2$,
|
1770 |
|
|
we must reexamine some simplifying assumptions made earlier in order to obtain
|
1771 |
|
|
tighter inequalities. In (\ref{eq:lem:ccil0:sidv0:sgdu0:02:02}), we can no
|
1772 |
|
|
longer accept the maximum of $(u \bmod v)/v$ as one; instead we construct the
|
1773 |
|
|
tighter inequality
|
1774 |
|
|
|
1775 |
|
|
\begin{eqnarray}
|
1776 |
|
|
\label{eq:lem:ccil0:sidv0:sgdu0:02:15}
|
1777 |
|
|
& \displaystyle \hat{q} - q \leq \frac{u_n b + u_{n-1}}{v_{n-1}}
|
1778 |
|
|
- \frac{u}{v}
|
1779 |
|
|
+ \frac{u \bmod v}{v} & \\
|
1780 |
|
|
\nonumber & \vworkvimp & \\
|
1781 |
|
|
\label{eq:lem:ccil0:sidv0:sgdu0:02:16}
|
1782 |
|
|
& \displaystyle \hat{q} - q \leq \frac{u_n b + u_{n-1}}{v_{n-1}}
|
1783 |
|
|
- \frac{u}{v}
|
1784 |
|
|
+ \frac{v-1}{v} , &
|
1785 |
|
|
\end{eqnarray}
|
1786 |
|
|
|
1787 |
|
|
which leads to
|
1788 |
|
|
|
1789 |
|
|
\begin{equation}
|
1790 |
|
|
\label{eq:lem:ccil0:sidv0:sgdu0:02:17}
|
1791 |
|
|
\frac{u_n b + u_{n-1}}{v_{n-1}}
|
1792 |
|
|
- \frac{u+1}{v}
|
1793 |
|
|
< 2 .
|
1794 |
|
|
\end{equation}
|
1795 |
|
|
|
1796 |
|
|
In order to maximize the left-hand side of
|
1797 |
|
|
(\ref{eq:lem:ccil0:sidv0:sgdu0:02:17}), we assume that we
|
1798 |
|
|
estimate $u$ exactly so that $u = u_n b^n + u_{n-1} b^{n-1}$,
|
1799 |
|
|
yielding
|
1800 |
|
|
|
1801 |
|
|
\begin{equation}
|
1802 |
|
|
\label{eq:lem:ccil0:sidv0:sgdu0:02:18}
|
1803 |
|
|
\frac{u}{v_{n-1} b^{n-1}}
|
1804 |
|
|
- \frac{u+1}{v}
|
1805 |
|
|
< 2 .
|
1806 |
|
|
\end{equation}
|
1807 |
|
|
|
1808 |
|
|
We also assume that $u$ is the maximum value
|
1809 |
|
|
possible, $u=bv-1$, leading to
|
1810 |
|
|
|
1811 |
|
|
\begin{equation}
|
1812 |
|
|
\label{eq:lem:ccil0:sidv0:sgdu0:02:19}
|
1813 |
|
|
\frac{bv-1}{v_{n-1} b^{n-1}}
|
1814 |
|
|
- b
|
1815 |
|
|
< 2 .
|
1816 |
|
|
\end{equation}
|
1817 |
|
|
|
1818 |
|
|
Finally, we assume that $v$ is the upper limit in
|
1819 |
|
|
(\ref{eq:lem:ccil0:sidv0:sgdu0:02:14}),
|
1820 |
|
|
$v=b^n/2 + b^{n-1}/2 - 1$, and substitute the known value of
|
1821 |
|
|
$v_{n-1}$ for the case being proved, $v_{n-1} = b/2-1/2$, yielding
|
1822 |
|
|
|
1823 |
|
|
\begin{equation}
|
1824 |
|
|
\label{eq:lem:ccil0:sidv0:sgdu0:02:20}
|
1825 |
|
|
\frac{b \left( \frac{b^n}{2} + \frac{b^{n-1}}{2} - 1 \right) - 1}
|
1826 |
|
|
{\left( \frac{b}{2} - \frac{1}{2} \right) b^{n-1}}
|
1827 |
|
|
- b
|
1828 |
|
|
< 2 .
|
1829 |
|
|
\end{equation}
|
1830 |
|
|
|
1831 |
|
|
Simplification of (\ref{eq:lem:ccil0:sidv0:sgdu0:02:20})
|
1832 |
|
|
will establish that it is always true. This completes the proof.
|
1833 |
|
|
\end{vworklemmaproof}
|
1834 |
|
|
\vworklemmafooter{}
|
1835 |
|
|
|
1836 |
|
|
Lemmas \ref{lem:ccil0:sidv0:sgdu0:01} and
|
1837 |
|
|
\ref{lem:ccil0:sidv0:sgdu0:02},
|
1838 |
|
|
standing alone, lead to a good implementation of
|
1839 |
|
|
division without any further results. If it is known that
|
1840 |
|
|
$0 \leq \hat{q} - q \leq 2$, Algorithm \ref{alg:ccil0:sidv0:sgdu0:01}
|
1841 |
|
|
can be trivially modified to only calculate
|
1842 |
|
|
$\hat{q}$ (omitting the additional tests in Step \ref{enumstep:alg:ccil0:sidv0:sgdu0:01:03}),
|
1843 |
|
|
and then
|
1844 |
|
|
to include up to two add-back steps
|
1845 |
|
|
(duplication of Step \ref{enumstep:alg:ccil0:sidv0:sgdu0:01:06}).
|
1846 |
|
|
Although such an algorithm would be
|
1847 |
|
|
satisfactory, it has the disadvantage that the add-back steps would
|
1848 |
|
|
be executed very frequently, slowing the algorithm substantially, especially
|
1849 |
|
|
for long operands.
|
1850 |
|
|
We now show that the additional tests in Step \ref{enumstep:alg:ccil0:sidv0:sgdu0:01:03} of
|
1851 |
|
|
Algorithm \ref{alg:ccil0:sidv0:sgdu0:01} can eliminate
|
1852 |
|
|
altogether the case of $\hat{q}-q = 2$ (Lemma \ref{lem:ccil0:sidv0:sgdu0:03}), and
|
1853 |
|
|
can with a probability close to unity eliminate the
|
1854 |
|
|
case of $\hat{q}-q = 1$ (Lemmas \ref{lem:ccil0:sidv0:sgdu0:04}
|
1855 |
|
|
and \ref{lem:ccil0:sidv0:sgdu0:05}). Together these tests, which are present in
|
1856 |
|
|
the statement of Algorithm \ref{alg:ccil0:sidv0:sgdu0:01},
|
1857 |
|
|
reduce add-back (Step \ref{enumstep:alg:ccil0:sidv0:sgdu0:01:06}) to rare occurrence, and create a more
|
1858 |
|
|
efficient algorithm than would be possible with the
|
1859 |
|
|
results of Lemmas \ref{lem:ccil0:sidv0:sgdu0:01} and \ref{lem:ccil0:sidv0:sgdu0:02} alone.
|
1860 |
|
|
|
1861 |
|
|
\begin{vworklemmastatementpar}
|
1862 |
|
|
{\mbox{\boldmath$\hat{q} v_{n-2} \leq b \hat{r} + u_{j+n-2} \vworkhimp 0 \leq \hat{q} - q \leq 1$}}
|
1863 |
|
|
\label{lem:ccil0:sidv0:sgdu0:03}
|
1864 |
|
|
If the divisor normalization requirement ($v_{n-1} \geq \lfloor b/2 \rfloor$) as specified in
|
1865 |
|
|
Step \ref{enumstep:alg:ccil0:sidv0:sgdu0:01:01} of
|
1866 |
|
|
Algorithm \ref{alg:ccil0:sidv0:sgdu0:01} is met, then
|
1867 |
|
|
|
1868 |
|
|
\begin{equation}
|
1869 |
|
|
\label{eq:lem:ccil0:sidv0:sgdu0:03:01}
|
1870 |
|
|
\hat{q} v_{n-2} \leq b \hat{r} + u_{j+n-2} \vworkhimp 0 \leq \hat{q} - q \leq 1 .
|
1871 |
|
|
\end{equation}
|
1872 |
|
|
\end{vworklemmastatementpar}
|
1873 |
|
|
\begin{vworklemmaproof}
|
1874 |
|
|
For reference, note that:
|
1875 |
|
|
|
1876 |
|
|
\begin{eqnarray}
|
1877 |
|
|
\label{eq:lem:ccil0:sidv0:sgdu0:03:02}
|
1878 |
|
|
u & = & u_{j+n} b^{j+n} + u_{j+n-1} b^{j+n-1} + \ldots{} + u_1 b + u_0 \\
|
1879 |
|
|
\label{eq:lem:ccil0:sidv0:sgdu0:03:03}
|
1880 |
|
|
v & = & v_{n-1} b^{n-1} + v_{n-2} b^{n-2} + \ldots{} + v_1 b + v_0
|
1881 |
|
|
\end{eqnarray}
|
1882 |
|
|
|
1883 |
|
|
By definition, we the remainder
|
1884 |
|
|
$\hat{r}$ has the value
|
1885 |
|
|
$\hat{r} = u_{j+n} b + u_{j+n-1} - \hat{q} v_{n-1}$. Substituting this value
|
1886 |
|
|
into (\ref{eq:lem:ccil0:sidv0:sgdu0:03:01}) produces
|
1887 |
|
|
|
1888 |
|
|
\begin{equation}
|
1889 |
|
|
\label{eq:lem:ccil0:sidv0:sgdu0:03:04}
|
1890 |
|
|
\hat{q} v_{n-2} \leq b (u_{j+n} b + u_{j+n-1} - \hat{q} v_{n-1}) + u_{j+n-2} ,
|
1891 |
|
|
\end{equation}
|
1892 |
|
|
|
1893 |
|
|
and solving for $\hat{q}$ yields
|
1894 |
|
|
|
1895 |
|
|
\begin{equation}
|
1896 |
|
|
\label{eq:lem:ccil0:sidv0:sgdu0:03:05}
|
1897 |
|
|
\hat{q} \leq \frac{u_{j+n} b^2 + u_{j+n-1}b + u_{j+n-2}}{v_{n-1}b + v_{n-2}}.
|
1898 |
|
|
\end{equation}
|
1899 |
|
|
|
1900 |
|
|
It is known from Lemma \ref{lem:ccil0:sidv0:sgdu0:01} that the type of estimate
|
1901 |
|
|
represented by the floor of the right-hand size of (\ref{eq:lem:ccil0:sidv0:sgdu0:03:05})
|
1902 |
|
|
can be no less than $q$, leading to
|
1903 |
|
|
|
1904 |
|
|
\begin{eqnarray}
|
1905 |
|
|
\label{eq:lem:ccil0:sidv0:sgdu0:03:06}
|
1906 |
|
|
& \displaystyle q \leq \hat{q}
|
1907 |
|
|
\leq
|
1908 |
|
|
\left\lfloor { \frac{u_{j+n} b^2 + u_{j+n-1}b + u_{j+n-2}}{v_{n-1}b + v_{n-2}} } \right\rfloor & \\
|
1909 |
|
|
\nonumber & \displaystyle \leq
|
1910 |
|
|
\frac{u_{j+n} b^2 + u_{j+n-1}b + u_{j+n-2}}{v_{n-1}b + v_{n-2}}. &
|
1911 |
|
|
\end{eqnarray}
|
1912 |
|
|
|
1913 |
|
|
Because $q, \hat{q} \in \vworkintset$, it is only necessary to prove that
|
1914 |
|
|
|
1915 |
|
|
\begin{equation}
|
1916 |
|
|
\label{eq:lem:ccil0:sidv0:sgdu0:03:07}
|
1917 |
|
|
\frac{u_{j+n} b^2 + u_{j+n-1}b + u_{j+n-2}}{v_{n-1}b + v_{n-2}} -
|
1918 |
|
|
\left\lfloor \frac{u}{v} \right\rfloor
|
1919 |
|
|
< 2
|
1920 |
|
|
\end{equation}
|
1921 |
|
|
|
1922 |
|
|
in order to prove that $\hat{q}-q \leq 1$. Using
|
1923 |
|
|
(\cmtnzeroxrefhyphen\ref{eq:cmtn0:sfcf0:02}),
|
1924 |
|
|
(\ref{eq:lem:ccil0:sidv0:sgdu0:03:07}) can be rewritten as
|
1925 |
|
|
|
1926 |
|
|
\begin{equation}
|
1927 |
|
|
\label{eq:lem:ccil0:sidv0:sgdu0:03:08}
|
1928 |
|
|
\frac{u_{j+n} b^2 + u_{j+n-1}b + u_{j+n-2}}{v_{n-1}b + v_{n-2}} -
|
1929 |
|
|
\frac{u}{v} -
|
1930 |
|
|
\frac{u \bmod v}{v}
|
1931 |
|
|
< 2 .
|
1932 |
|
|
\end{equation}
|
1933 |
|
|
|
1934 |
|
|
In order for implication to hold, we must make the most pessimistic
|
1935 |
|
|
assumptions about $u \bmod v$ (those which maximize it). The maximum value
|
1936 |
|
|
of $u \bmod v$ is $v-1$, leading to
|
1937 |
|
|
|
1938 |
|
|
\begin{equation}
|
1939 |
|
|
\label{eq:lem:ccil0:sidv0:sgdu0:03:09}
|
1940 |
|
|
\frac{u_{j+n} b^2 + u_{j+n-1}b + u_{j+n-2}}{v_{n-1}b + v_{n-2}} -
|
1941 |
|
|
\frac{u + 1}{v}
|
1942 |
|
|
< 1 .
|
1943 |
|
|
\end{equation}
|
1944 |
|
|
|
1945 |
|
|
In order to maximize the left side of (\ref{}),
|
1946 |
|
|
we must assume that $u$ is maximized, $v$ is minimized,
|
1947 |
|
|
|
1948 |
|
|
|
1949 |
|
|
\end{vworklemmaproof}
|
1950 |
|
|
\vworklemmafooter{}
|
1951 |
|
|
|
1952 |
|
|
\begin{vworklemmastatementpar}
|
1953 |
|
|
{\mbox{\boldmath$\hat{q} v_{n-2} > b \hat{r} + u_{n-2} \vworkhimp q < \hat{q}$}}
|
1954 |
|
|
\label{lem:ccil0:sidv0:sgdu0:04}
|
1955 |
|
|
Given $\hat{q} > 0$, an estimate of $q$, and the remainder
|
1956 |
|
|
based on the estimate, $\hat{r} = u_n b + u_{n-1} - \hat{q} v_{n-1}$,
|
1957 |
|
|
|
1958 |
|
|
\begin{equation}
|
1959 |
|
|
\label{eq:lem:ccil0:sidv0:sgdu0:04:01}
|
1960 |
|
|
\hat{q} v_{n-2} > b \hat{r} + u_{n-2} \vworkhimp \hat{q} > q .
|
1961 |
|
|
\end{equation}
|
1962 |
|
|
\end{vworklemmastatementpar}
|
1963 |
|
|
\begin{vworklemmaproof}
|
1964 |
|
|
We make the assumption that $v_{n-1}$ and $v_{n-2}$ are not both 0.
|
1965 |
|
|
$v_{n-1} > 0$ is guaranteed by
|
1966 |
|
|
the normalization of
|
1967 |
|
|
$v$ in Algorithm \ref{alg:ccil0:sidv0:sgdu0:01}.
|
1968 |
|
|
|
1969 |
|
|
\begin{equation}
|
1970 |
|
|
\label{eq:lem:ccil0:sidv0:sgdu0:04:02}
|
1971 |
|
|
\hat{q} v_{n-2} > b \hat{r} + u_{n-2}
|
1972 |
|
|
\end{equation}
|
1973 |
|
|
|
1974 |
|
|
\begin{equation}
|
1975 |
|
|
\label{eq:lem:ccil0:sidv0:sgdu0:04:03}
|
1976 |
|
|
\hat{q} v_{n-2} > b (u_n b + u_{n-1} - \hat{q} v_{n-1}) + u_{n-2}
|
1977 |
|
|
\end{equation}
|
1978 |
|
|
|
1979 |
|
|
Solving (\ref{eq:lem:ccil0:sidv0:sgdu0:04:03}) for $\hat{q}$ yields
|
1980 |
|
|
|
1981 |
|
|
\begin{equation}
|
1982 |
|
|
\label{eq:lem:ccil0:sidv0:sgdu0:04:04}
|
1983 |
|
|
\hat{q}
|
1984 |
|
|
>
|
1985 |
|
|
\frac{u_n b^2 + u_{n-1} b + u_{n-2}}{v_{n-1} b + v_{n-2}}
|
1986 |
|
|
\geq
|
1987 |
|
|
\left\lfloor {\frac{u_n b^2 + u_{n-1} b + u_{n-2}}{v_{n-1} b + v_{n-2}}} \right\rfloor
|
1988 |
|
|
.
|
1989 |
|
|
\end{equation}
|
1990 |
|
|
|
1991 |
|
|
Note that the right-hand term of
|
1992 |
|
|
(\ref{eq:lem:ccil0:sidv0:sgdu0:04:04})
|
1993 |
|
|
is similar in form to the estimate $\hat{q}$ in
|
1994 |
|
|
Lemma \ref{lem:ccil0:sidv0:sgdu0:01}, where it is proved that
|
1995 |
|
|
$\hat{q} \geq q$. It is possible to use identical algebraic technique
|
1996 |
|
|
as is used in Lemma \ref{lem:ccil0:sidv0:sgdu0:01} in order to prove that
|
1997 |
|
|
|
1998 |
|
|
\begin{equation}
|
1999 |
|
|
\label{eq:lem:ccil0:sidv0:sgdu0:04:05}
|
2000 |
|
|
\left\lfloor {\frac{u_n b^2 + u_{n-1} b + u_{n-2}}{v_{n-1} b + v_{n-2}}} \right\rfloor
|
2001 |
|
|
\geq q,
|
2002 |
|
|
\end{equation}
|
2003 |
|
|
|
2004 |
|
|
and it follows that
|
2005 |
|
|
|
2006 |
|
|
\begin{equation}
|
2007 |
|
|
\label{eq:lem:ccil0:sidv0:sgdu0:04:06}
|
2008 |
|
|
\hat{q}
|
2009 |
|
|
>
|
2010 |
|
|
\frac{u_n b^2 + u_{n-1} b + u_{n-2}}{v_{n-1} b + v_{n-2}}
|
2011 |
|
|
\geq
|
2012 |
|
|
\left\lfloor {\frac{u_n b^2 + u_{n-1} b + u_{n-2}}{v_{n-1} b + v_{n-2}}} \right\rfloor
|
2013 |
|
|
\geq q,
|
2014 |
|
|
\end{equation}
|
2015 |
|
|
|
2016 |
|
|
and thus $\hat{q} > q$.
|
2017 |
|
|
\end{vworklemmaproof}
|
2018 |
|
|
\vworklemmafooter{}
|
2019 |
|
|
|
2020 |
|
|
\begin{vworklemmastatementpar}
|
2021 |
|
|
{Add-back occurs with approximate probability \mbox{\boldmath$2/b$}}
|
2022 |
|
|
\label{lem:ccil0:sidv0:sgdu0:05}
|
2023 |
|
|
The estimate of $q$ provided by (\ref{eq:ccil0:sidv0:sgdu0:01}),
|
2024 |
|
|
\end{vworklemmastatementpar}
|
2025 |
|
|
\begin{vworklemmaproof}
|
2026 |
|
|
\end{vworklemmaproof}
|
2027 |
|
|
\vworklemmafooter{}
|
2028 |
|
|
|
2029 |
|
|
|
2030 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
2031 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
2032 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
2033 |
|
|
\subsection{Division Of Signed Operands}
|
2034 |
|
|
|
2035 |
|
|
|
2036 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
2037 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
2038 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
2039 |
|
|
\subsection{Large Dividends With Machine-Native Divisors}
|
2040 |
|
|
\label{ccil0:sidv0:sldm0}
|
2041 |
|
|
|
2042 |
|
|
Division of arbitrary-sized operands (Section \ref{ccil0:sidv0:sgdu0}) is
|
2043 |
|
|
a costly operation. In many practical applications, we are able to exploit
|
2044 |
|
|
data sizes of operands or special relationships between the values of
|
2045 |
|
|
operands to use the instruction set of the machine more effectively.
|
2046 |
|
|
In this subsection, we investigate what optimizations we may achieve when:
|
2047 |
|
|
|
2048 |
|
|
\begin{itemize}
|
2049 |
|
|
\item We wish to calculate the quotient and remainder of
|
2050 |
|
|
unsigned integers $p$ and $q$: $p/q$ and
|
2051 |
|
|
$p \bmod{} q$; \emph{and}
|
2052 |
|
|
\item The machine possesses unsigned division instructions
|
2053 |
|
|
which provide both a quotient and a remainder from
|
2054 |
|
|
a division; \emph{and}
|
2055 |
|
|
\item The bitsize of the divisor $q$ is not larger than can
|
2056 |
|
|
be accomodated (as a divisor) by machine division instructions.
|
2057 |
|
|
\end{itemize}
|
2058 |
|
|
|
2059 |
|
|
Processors which possess integer division instructions usually
|
2060 |
|
|
possess one of two types of instructions:
|
2061 |
|
|
|
2062 |
|
|
\begin{itemize}
|
2063 |
|
|
\item Instructions where the the divisor, quotient and remainder are
|
2064 |
|
|
$Q$ bits, but the dividend is $2Q$ bits (we call these
|
2065 |
|
|
``large dividend'' instructions). For example, an
|
2066 |
|
|
instruction which accepts a 16-bit dividend and an
|
2067 |
|
|
8-bit divisor to produce an 8-bit quotient and an 8-bit remainder is
|
2068 |
|
|
typical.
|
2069 |
|
|
With such instructions, overflow is possible, and is always detectable.
|
2070 |
|
|
However, in this subsection we never describe algorithms which detect
|
2071 |
|
|
overflow---instead, we arrange for data values which cannot generate an
|
2072 |
|
|
overflow.
|
2073 |
|
|
\item Instructions where the dividend, divisor, quotient, and remainder
|
2074 |
|
|
are all $Q$ bits (we call these ``small dividend'' instructions).
|
2075 |
|
|
With such instructions, overflow is not possible.
|
2076 |
|
|
\end{itemize}
|
2077 |
|
|
|
2078 |
|
|
We call the bitsize $Q$ a \emph{chunk}. We use \emph{chunk} rather than
|
2079 |
|
|
\emph{word} because the chunksize and wordsize in general are not
|
2080 |
|
|
required to be the same.
|
2081 |
|
|
|
2082 |
|
|
For the remainder of this discussion, we assume large dividend
|
2083 |
|
|
instructions (the first category above).
|
2084 |
|
|
The algorithms developed can be implemented on small dividend machines
|
2085 |
|
|
by halving data sizes so that the divisor fills no more than half
|
2086 |
|
|
of the available bits.
|
2087 |
|
|
|
2088 |
|
|
We assume that we are interested in calculating the integer quotient
|
2089 |
|
|
$\lfloor{}p/q\rfloor$ and remainder $p \bmod{} q$ of two unsigned
|
2090 |
|
|
integers $p$ and $q$, where $p$ is of size $P$ bits and $q$ is
|
2091 |
|
|
of size $Q$ bits. For simplicity and without detracting from the
|
2092 |
|
|
generality of the solution, we assume that $Q \vworkdivides{} P$.
|
2093 |
|
|
|
2094 |
|
|
We then seek to calculate
|
2095 |
|
|
|
2096 |
|
|
\begin{eqnarray}
|
2097 |
|
|
\label{eq:ccil0:sidv0:sldm0:001}
|
2098 |
|
|
\frac{p}{q} & = & \frac{2^{P-Q} p_{[P/Q-1]} + 2^{P-2Q} p_{[P/Q-2]} + \ldots{} + 2^{Q} p_{[1]} + p_{[0]}}{q_{[0]}} \\
|
2099 |
|
|
\nonumber & = & \frac{\sum_{i=0}^{P/Q-1} 2^{iQ} p_{[i]}}{q_{[0]}} .
|
2100 |
|
|
\end{eqnarray}
|
2101 |
|
|
|
2102 |
|
|
\noindent{}Using the integer identity
|
2103 |
|
|
|
2104 |
|
|
\begin{equation}
|
2105 |
|
|
\label{eq:ccil0:sidv0:sldm0:002}
|
2106 |
|
|
\frac{a}{b} = \left\lfloor{\frac{a}{b}}\right\rfloor + \frac{a \bmod b}{b} ,
|
2107 |
|
|
\end{equation}
|
2108 |
|
|
|
2109 |
|
|
\noindent{}we can reform (\ref{eq:ccil0:sidv0:sldm0:001}) into
|
2110 |
|
|
|
2111 |
|
|
\begin{eqnarray}
|
2112 |
|
|
\nonumber
|
2113 |
|
|
\frac{p}{q} & = & 2^{P-Q} \left\lfloor{\frac{p_{[P/Q-1]}}{q_{[0]}}}\right\rfloor \\
|
2114 |
|
|
\label{eq:ccil0:sidv0:sldm0:003}
|
2115 |
|
|
& + & 2^{P-Q} \frac{p_{[P/Q-1]}\bmod q_{[0]}}{q_{[0]}}
|
2116 |
|
|
+ 2^{P-2Q} \frac{p_{[P/Q-2]}}{q_{[0]}} \\
|
2117 |
|
|
\nonumber & + & \frac{\sum_{i=0}^{P/Q-3} 2^{iQ} p_{[i]}}{q_{[0]}} ,
|
2118 |
|
|
\end{eqnarray}
|
2119 |
|
|
|
2120 |
|
|
\noindent{}which can be polished slightly to yield
|
2121 |
|
|
|
2122 |
|
|
\begin{eqnarray}
|
2123 |
|
|
\nonumber
|
2124 |
|
|
\frac{p}{q} & = & 2^{P-Q} \left\lfloor{\frac{p_{[P/Q-1]}}{q_{[0]}}}\right\rfloor \\
|
2125 |
|
|
\label{eq:ccil0:sidv0:sldm0:004}
|
2126 |
|
|
& + & 2^{P-2Q} \frac{2^Q (p_{[P/Q-1]}\bmod q_{[0]}) + p_{[P/Q-2]}}{q_{[0]}} \\
|
2127 |
|
|
\nonumber & + & \frac{\sum_{i=0}^{P/Q-3} 2^{iQ} p_{[i]}}{q_{[0]}} .
|
2128 |
|
|
\end{eqnarray}
|
2129 |
|
|
|
2130 |
|
|
Note in (\ref{eq:ccil0:sidv0:sldm0:004}) that the first term,
|
2131 |
|
|
$\lfloor{}p_{[P/Q-1]} / q_{[0]}\rfloor$, as well as
|
2132 |
|
|
a portion of the second term, $p_{[P/Q-1]}\bmod q_{[0]}$, can be
|
2133 |
|
|
calculated using a single machine division instruction with
|
2134 |
|
|
$p_{[P/Q-1]}$ as the dividend and $q_{[0]}$ as the divisor.
|
2135 |
|
|
Note also that multiplication of an integer by a power of 2
|
2136 |
|
|
can be achieved by placing the integer correctly within the
|
2137 |
|
|
result. In this regard note that $Q=8$ or $Q=16$ are
|
2138 |
|
|
the most typical cases, and so the placement can be achieved simply
|
2139 |
|
|
by selecting the correct memory address.
|
2140 |
|
|
|
2141 |
|
|
It is initially unclear whether we can evaluate or reduce the fraction in the
|
2142 |
|
|
second term of (\ref{eq:ccil0:sidv0:sldm0:004}),
|
2143 |
|
|
$[2^Q (p_{[P/Q-1]}\bmod q_{[0]}) + p_{[P/Q-2]}] / q_{[0]}$,
|
2144 |
|
|
using a single large dividend machine instruction, because
|
2145 |
|
|
the upper chunk of the dividend is populated with non-zero bits
|
2146 |
|
|
(specifically, $p_{[P/Q-1]}\bmod q_{[0]}$), and it seems that
|
2147 |
|
|
a division overflow may be possible. However, with some thought,
|
2148 |
|
|
it is clear that $p_{[P/Q-1]}\bmod q_{[0]} \leq q_{[0]} - 1$ and
|
2149 |
|
|
$p_{[P/Q-2]} \leq 2^Q - 1$, thus the largest numerator possible
|
2150 |
|
|
is $2^Q q_{[0]} - 1$, which, when divided by $q_{[0]}$, will result
|
2151 |
|
|
in a quotient and remainder of $2^Q - 1$. Thus, no division overflow
|
2152 |
|
|
can occur, and the fraction in the
|
2153 |
|
|
second term of (\ref{eq:ccil0:sidv0:sldm0:004}) can be evaluated
|
2154 |
|
|
using a large divisor integer machine instruction.
|
2155 |
|
|
|
2156 |
|
|
The fraction in the
|
2157 |
|
|
second term of (\ref{eq:ccil0:sidv0:sldm0:004}) can be simplified
|
2158 |
|
|
using (\ref{eq:ccil0:sidv0:sldm0:002}) to yield:
|
2159 |
|
|
|
2160 |
|
|
\begin{eqnarray}
|
2161 |
|
|
\nonumber
|
2162 |
|
|
\frac{p}{q} & = & 2^{P-Q} \left\lfloor{\frac{p_{[P/Q-1]}}{q_{[0]}}}\right\rfloor \\
|
2163 |
|
|
\label{eq:ccil0:sidv0:sldm0:005}
|
2164 |
|
|
& + & 2^{P-2Q} \left\lfloor{\frac{2^Q (p_{[P/Q-1]}\bmod q_{[0]}) + p_{[P/Q-2]}}{q_{[0]}}}\right\rfloor \\
|
2165 |
|
|
\nonumber & + & 2^{P-2Q} \frac{(2^Q (p_{[P/Q-1]}\bmod q_{[0]}) + p_{[P/Q-2]}) \bmod q_{[0]}}{q_{[0]}} \\
|
2166 |
|
|
\nonumber & + & \frac{\sum_{i=0}^{P/Q-3} 2^{iQ} p_{[i]}}{q_{[0]}} .
|
2167 |
|
|
\end{eqnarray}
|
2168 |
|
|
|
2169 |
|
|
The process of combining adjacent terms can be continued until all
|
2170 |
|
|
divisions and modulo operations necessary can be carried out using
|
2171 |
|
|
long dividend division instructions. If we envision a
|
2172 |
|
|
long-dividend division instruction as a functional block that
|
2173 |
|
|
accepts a $2Q$-bit dividend and a $Q$-bit divisor to produce a
|
2174 |
|
|
$Q$-bit quotient and a $Q$-bit remainder
|
2175 |
|
|
(Figure \ref{fig:ccil0:sidv0:sldm0:00}), then we can draw the
|
2176 |
|
|
entire division as outlined by (\ref{eq:ccil0:sidv0:sldm0:005})
|
2177 |
|
|
as shown in Figure \ref{fig:ccil0:sidv0:sldm0:01}.
|
2178 |
|
|
|
2179 |
|
|
\begin{figure}
|
2180 |
|
|
\centering
|
2181 |
|
|
\includegraphics[width=4.6in]{c_cil0/lddvblk.eps}
|
2182 |
|
|
\caption{Long Dividend Division Machine Instruction As A Functional Block}
|
2183 |
|
|
\label{fig:ccil0:sidv0:sldm0:00}
|
2184 |
|
|
\end{figure}
|
2185 |
|
|
|
2186 |
|
|
\begin{figure}
|
2187 |
|
|
\centering
|
2188 |
|
|
\includegraphics[width=4.6in]{c_cil0/ldmnblk.eps}
|
2189 |
|
|
\caption{Long Dividend/Machine-Native Divisor Division In Functional Block Form}
|
2190 |
|
|
\label{fig:ccil0:sidv0:sldm0:01}
|
2191 |
|
|
\end{figure}
|
2192 |
|
|
|
2193 |
|
|
The following example illustrates how to apply the technique.
|
2194 |
|
|
|
2195 |
|
|
\begin{vworkexamplestatement}
|
2196 |
|
|
\label{ex:ccil0:sidv0:sldm0:01}
|
2197 |
|
|
Implement 32/8 unsigned division on the TMS370C8 processor, which is
|
2198 |
|
|
characterized by a 16/8 division instruction.
|
2199 |
|
|
\end{vworkexamplestatement}
|
2200 |
|
|
\begin{vworkexampleparsection}{Solution}
|
2201 |
|
|
It would be possible to prepare an implementation directly from
|
2202 |
|
|
Figure \ref{fig:ccil0:sidv0:sldm0:01}: however, it may be
|
2203 |
|
|
more instructive to work through a solution without the
|
2204 |
|
|
aid of this figure.
|
2205 |
|
|
|
2206 |
|
|
In the case of the TMS370C8, the chunk size $Q$ is 8 bits; therefore
|
2207 |
|
|
$Q=8$. The problem statement indicates that we must accept 32-bit dividends;
|
2208 |
|
|
therefore $P=32$. Thus
|
2209 |
|
|
|
2210 |
|
|
\begin{equation}
|
2211 |
|
|
\label{eq:ex:ccil0:sidv0:sldm0:01:001}
|
2212 |
|
|
p = 2^{24} p_{[3]} + 2^{16} p_{[2]} + 2^{8} p_{[1]} + p_{[0]}
|
2213 |
|
|
\end{equation}
|
2214 |
|
|
|
2215 |
|
|
\noindent{}and
|
2216 |
|
|
|
2217 |
|
|
\begin{equation}
|
2218 |
|
|
\label{eq:ex:ccil0:sidv0:sldm0:01:002}
|
2219 |
|
|
q = q_{[0]} .
|
2220 |
|
|
\end{equation}
|
2221 |
|
|
|
2222 |
|
|
\noindent{}Thus the quotient and remainder we would like to determine,
|
2223 |
|
|
$\lfloor p/q \rfloor$ and $p \bmod q$, can be obtained by repeated
|
2224 |
|
|
application of (\ref{eq:ccil0:sidv0:sldm0:002}) as shown
|
2225 |
|
|
in Equations (\ref{eq:ex:ccil0:sidv0:sldm0:01:003})
|
2226 |
|
|
through (\ref{eq:ex:ccil0:sidv0:sldm0:01:011}).
|
2227 |
|
|
|
2228 |
|
|
\begin{equation}
|
2229 |
|
|
\label{eq:ex:ccil0:sidv0:sldm0:01:003}
|
2230 |
|
|
\frac{p}{q}
|
2231 |
|
|
=
|
2232 |
|
|
\frac{2^{24} p_{[3]} + 2^{16} p_{[2]} + 2^{8} p_{[1]} + p_{[0]}}{q_{[0]}}
|
2233 |
|
|
\end{equation}
|
2234 |
|
|
|
2235 |
|
|
\begin{equation}
|
2236 |
|
|
\label{eq:ex:ccil0:sidv0:sldm0:01:004}
|
2237 |
|
|
\frac{p}{q}
|
2238 |
|
|
=
|
2239 |
|
|
2^{24} \frac{p_{[3]}}{q_{[0]}}
|
2240 |
|
|
+
|
2241 |
|
|
2^{16} \frac{p_{[2]}}{q_{[0]}}
|
2242 |
|
|
+
|
2243 |
|
|
2^{8} \frac{p_{[1]}}{q_{[0]}}
|
2244 |
|
|
+
|
2245 |
|
|
\frac{p_{[0]}}{q_{[0]}}
|
2246 |
|
|
\end{equation}
|
2247 |
|
|
|
2248 |
|
|
\begin{equation}
|
2249 |
|
|
\label{eq:ex:ccil0:sidv0:sldm0:01:005}
|
2250 |
|
|
\frac{p}{q}
|
2251 |
|
|
=
|
2252 |
|
|
2^{24} \left\lfloor{\frac{p_{[3]}}{q_{[0]}}}\right\rfloor
|
2253 |
|
|
+
|
2254 |
|
|
2^{24} \frac{(p_{[3]} \bmod q_{[0]})}{q_{[0]}}
|
2255 |
|
|
+
|
2256 |
|
|
2^{16} \frac{p_{[2]}}{q_{[0]}}
|
2257 |
|
|
+
|
2258 |
|
|
2^{8} \frac{p_{[1]}}{q_{[0]}}
|
2259 |
|
|
+
|
2260 |
|
|
\frac{p_{[0]}}{q_{[0]}}
|
2261 |
|
|
\end{equation}
|
2262 |
|
|
|
2263 |
|
|
\begin{equation}
|
2264 |
|
|
\label{eq:ex:ccil0:sidv0:sldm0:01:006}
|
2265 |
|
|
\frac{p}{q}
|
2266 |
|
|
=
|
2267 |
|
|
2^{24} \left\lfloor{\frac{p_{[3]}}{q_{[0]}}}\right\rfloor
|
2268 |
|
|
+
|
2269 |
|
|
2^{16} \frac{2^8 (p_{[3]} \bmod q_{[0]}) + p_{[2]}}{q_{[0]}}
|
2270 |
|
|
+
|
2271 |
|
|
2^{8} \frac{p_{[1]}}{q_{[0]}}
|
2272 |
|
|
+
|
2273 |
|
|
\frac{p_{[0]}}{q_{[0]}}
|
2274 |
|
|
\end{equation}
|
2275 |
|
|
|
2276 |
|
|
\begin{eqnarray}
|
2277 |
|
|
\label{eq:ex:ccil0:sidv0:sldm0:01:007}
|
2278 |
|
|
\frac{p}{q} & = & 2^{24} \left\lfloor{\frac{p_{[3]}}{q_{[0]}}}\right\rfloor
|
2279 |
|
|
+
|
2280 |
|
|
2^{16} \left\lfloor{\frac{2^8 (p_{[3]} \bmod q_{[0]}) + p_{[2]}}{q_{[0]}}}\right\rfloor \\
|
2281 |
|
|
\nonumber & + &
|
2282 |
|
|
2^{16} \frac{(2^8 (p_{[3]} \bmod q_{[0]}) + p_{[2]}) \bmod q_{[0]}}{q_{[0]}}
|
2283 |
|
|
+
|
2284 |
|
|
2^{8} \frac{p_{[1]}}{q_{[0]}}
|
2285 |
|
|
+
|
2286 |
|
|
\frac{p_{[0]}}{q_{[0]}}
|
2287 |
|
|
\end{eqnarray}
|
2288 |
|
|
|
2289 |
|
|
\begin{eqnarray}
|
2290 |
|
|
\label{eq:ex:ccil0:sidv0:sldm0:01:008}
|
2291 |
|
|
\frac{p}{q} & = & 2^{24} \left\lfloor{\frac{p_{[3]}}{q_{[0]}}}\right\rfloor
|
2292 |
|
|
+
|
2293 |
|
|
2^{16} \left\lfloor{\frac{2^8 (p_{[3]} \bmod q_{[0]}) + p_{[2]}}{q_{[0]}}}\right\rfloor \\
|
2294 |
|
|
\nonumber & + &
|
2295 |
|
|
2^{8} \frac{2^8((2^8 (p_{[3]} \bmod q_{[0]}) + p_{[2]}) \bmod q_{[0]}) + p_{[1]}}{q_{[0]}}
|
2296 |
|
|
+
|
2297 |
|
|
\frac{p_{[0]}}{q_{[0]}}
|
2298 |
|
|
\end{eqnarray}
|
2299 |
|
|
|
2300 |
|
|
\begin{eqnarray}
|
2301 |
|
|
\nonumber\frac{p}{q} & = & 2^{24} \left\lfloor{\frac{p_{[3]}}{q_{[0]}}}\right\rfloor
|
2302 |
|
|
+
|
2303 |
|
|
2^{16} \left\lfloor{\frac{2^8 (p_{[3]} \bmod q_{[0]}) + p_{[2]}}{q_{[0]}}}\right\rfloor \\
|
2304 |
|
|
\label{eq:ex:ccil0:sidv0:sldm0:01:009}
|
2305 |
|
|
& + &
|
2306 |
|
|
2^{8} \left\lfloor{\frac{2^8((2^8 (p_{[3]} \bmod q_{[0]}) + p_{[2]}) \bmod q_{[0]}) + p_{[1]}}{q_{[0]}}}\right\rfloor \\
|
2307 |
|
|
\nonumber & + &
|
2308 |
|
|
2^{8} \frac{(2^8((2^8 (p_{[3]} \bmod q_{[0]}) + p_{[2]}) \bmod q_{[0]}) + p_{[1]}) \bmod q_{[0]}}{q_{[0]}} \\
|
2309 |
|
|
\nonumber & + &
|
2310 |
|
|
\frac{p_{[0]}}{q_{[0]}}
|
2311 |
|
|
\end{eqnarray}
|
2312 |
|
|
|
2313 |
|
|
\begin{eqnarray}
|
2314 |
|
|
\nonumber\frac{p}{q} & = & 2^{24} \left\lfloor{\frac{p_{[3]}}{q_{[0]}}}\right\rfloor
|
2315 |
|
|
+
|
2316 |
|
|
2^{16} \left\lfloor{\frac{2^8 (p_{[3]} \bmod q_{[0]}) + p_{[2]}}{q_{[0]}}}\right\rfloor \\
|
2317 |
|
|
\label{eq:ex:ccil0:sidv0:sldm0:01:010}
|
2318 |
|
|
& + &
|
2319 |
|
|
2^{8} \left\lfloor{\frac{2^8((2^8 (p_{[3]} \bmod q_{[0]}) + p_{[2]}) \bmod q_{[0]}) + p_{[1]}}{q_{[0]}}}\right\rfloor \\
|
2320 |
|
|
\nonumber & + &
|
2321 |
|
|
\frac{2^8((2^8((2^8 (p_{[3]} \bmod q_{[0]}) + p_{[2]}) \bmod q_{[0]}) + p_{[1]}) \bmod q_{[0]}) + p_{[0]}}{q_{[0]}}
|
2322 |
|
|
\end{eqnarray}
|
2323 |
|
|
|
2324 |
|
|
\begin{eqnarray}
|
2325 |
|
|
\nonumber & \displaystyle \frac{p}{q} = 2^{24} \left\lfloor{\frac{p_{[3]}}{q_{[0]}}}\right\rfloor
|
2326 |
|
|
+
|
2327 |
|
|
2^{16} \left\lfloor{\frac{2^8 (p_{[3]} \bmod q_{[0]}) + p_{[2]}}{q_{[0]}}}\right\rfloor & \\
|
2328 |
|
|
\label{eq:ex:ccil0:sidv0:sldm0:01:011}
|
2329 |
|
|
& \displaystyle +
|
2330 |
|
|
2^{8} \left\lfloor{\frac{2^8((2^8 (p_{[3]} \bmod q_{[0]}) + p_{[2]}) \bmod q_{[0]}) + p_{[1]}}{q_{[0]}}}\right\rfloor & \\
|
2331 |
|
|
\nonumber & \displaystyle +
|
2332 |
|
|
\left\lfloor{\frac{2^8((2^8((2^8 (p_{[3]} \bmod q_{[0]}) + p_{[2]}) \bmod q_{[0]}) + p_{[1]}) \bmod q_{[0]}) + p_{[0]}}{q_{[0]}}}\right\rfloor + & \\
|
2333 |
|
|
\nonumber
|
2334 |
|
|
& \displaystyle \frac{(2^8((2^8((2^8 (p_{[3]} \bmod q_{[0]}) + p_{[2]}) \bmod q_{[0]}) + p_{[1]}) \bmod q_{[0]}) + p_{[0]}) \bmod q_{[0]}}{q_{[0]}}
|
2335 |
|
|
&
|
2336 |
|
|
\end{eqnarray}
|
2337 |
|
|
|
2338 |
|
|
Note several things about the implementation suggested by
|
2339 |
|
|
(\ref{eq:ex:ccil0:sidv0:sldm0:01:011}):
|
2340 |
|
|
|
2341 |
|
|
\begin{itemize}
|
2342 |
|
|
\item No addition or multiplication is required to calculate terms such as
|
2343 |
|
|
$2^8 (p_{[3]} \bmod q_{[0]}) + p_{[2]}$. The high-order byte of the
|
2344 |
|
|
large dividend can be stuffed with $p_{[3]} \bmod q_{[0]}$ and
|
2345 |
|
|
the low-order byte with $p_{[2]}$.
|
2346 |
|
|
\item No addition or multiplication is required to calculate the
|
2347 |
|
|
result $\lfloor p/q \rfloor$.
|
2348 |
|
|
Note in (\ref{eq:ex:ccil0:sidv0:sldm0:01:011}) that the results are
|
2349 |
|
|
conveniently grouped as bytes with multipliers of $2^{24}$,
|
2350 |
|
|
$2^{16}$, $2^8$, and $2^0=1$. The terms can simply be placed into
|
2351 |
|
|
the appropriate byte, as a way of multplication by the appropriate
|
2352 |
|
|
power of 2. Note also that each term is guaranteed to be
|
2353 |
|
|
$\in \{0, 1, 2, \ldots{} , 255\}$, by the argument presented
|
2354 |
|
|
earlier for (\ref{eq:ccil0:sidv0:sldm0:004}). Thus, the
|
2355 |
|
|
addition will result in no carries, and each result byte can simply
|
2356 |
|
|
be placed directly in the correct memory location---addition is
|
2357 |
|
|
not necessary.
|
2358 |
|
|
\item Four machine division instructions are required, and the remainder
|
2359 |
|
|
is produced automatically by the fourth instruction.
|
2360 |
|
|
\end{itemize}
|
2361 |
|
|
|
2362 |
|
|
An implemenation for the TMS370C8 is supplied as Figure
|
2363 |
|
|
\ref{fig:ex:ccil0:sidv0:sldm0:01:01}. A block diagram of the data
|
2364 |
|
|
flow for this implementation is supplied as
|
2365 |
|
|
Figure \ref{fig:ex:ccil0:sidv0:sldm0:01:02}.
|
2366 |
|
|
\end{vworkexampleparsection}
|
2367 |
|
|
\vworkexamplefooter{}
|
2368 |
|
|
|
2369 |
|
|
\begin{figure}
|
2370 |
|
|
\begin{verbatim}
|
2371 |
|
|
;Assume that byte memory locations p3, p2, p1, and p0 contain the
|
2372 |
|
|
;32-bit unsigned dividend, and byte q0 contains the 8-bit unsigned
|
2373 |
|
|
;divisor. Assume also that the result quotient will be placed
|
2374 |
|
|
;in byte memory locations d3, d2, d1, and d0; and that the
|
2375 |
|
|
;remainder will be placed in the byte memory location r0. Further
|
2376 |
|
|
;assume that all memory locations are in the register file (near).
|
2377 |
|
|
CLR A ;High-order chunk of large divisor
|
2378 |
|
|
;must be 0.
|
2379 |
|
|
MOV p3, B ;Load the low-order chunk of divisor.
|
2380 |
|
|
DIV q0, A ;Perform the first division.
|
2381 |
|
|
MOV A, d3 ;Quotient becomes this part of the
|
2382 |
|
|
;result.
|
2383 |
|
|
MOV B, A ;Remainder becomes high-order chunk of
|
2384 |
|
|
;next division.
|
2385 |
|
|
MOV p2, B ;Next byte becomes low-order chunk.
|
2386 |
|
|
DIV q0, A ;Do the second division.
|
2387 |
|
|
MOV A, d2 ;Quotient becomes this part of the
|
2388 |
|
|
;result.
|
2389 |
|
|
MOV B, A ;Remainder becomes high-order chunk of
|
2390 |
|
|
;next division.
|
2391 |
|
|
MOV p1, B ;Next byte becomes low-order chunk.
|
2392 |
|
|
DIV q0, A ;Do the third division.
|
2393 |
|
|
MOV A, d1 ;Quotient becomes this part of the
|
2394 |
|
|
;result.
|
2395 |
|
|
MOV B, A ;Remainder becomes high-order chunk of
|
2396 |
|
|
;next division.
|
2397 |
|
|
MOV p0, B ;Next byte becomes low-order chunk.
|
2398 |
|
|
DIV q0, A ;Do the fourth division.
|
2399 |
|
|
MOV A, d0 ;Quotient becomes this part of the
|
2400 |
|
|
;result.
|
2401 |
|
|
MOV B, r0 ;This is the remainder, which could be used
|
2402 |
|
|
;for rounding.
|
2403 |
|
|
\end{verbatim}
|
2404 |
|
|
\caption{TMS370C8 Code Snippet Illustrating Unsigned 32/8
|
2405 |
|
|
Division Using Unsigned 16/8
|
2406 |
|
|
Machine Instructions (Example \ref{ex:ccil0:sidv0:sldm0:01})}
|
2407 |
|
|
\label{fig:ex:ccil0:sidv0:sldm0:01:01}
|
2408 |
|
|
\end{figure}
|
2409 |
|
|
|
2410 |
|
|
\begin{figure}
|
2411 |
|
|
\centering
|
2412 |
|
|
\includegraphics[width=4.6in]{c_cil0/t370dmp.eps}
|
2413 |
|
|
\caption{Block Diagram Of Data Flow Of Figure \ref{fig:ex:ccil0:sidv0:sldm0:01:01}}
|
2414 |
|
|
\label{fig:ex:ccil0:sidv0:sldm0:01:02}
|
2415 |
|
|
\end{figure}
|
2416 |
|
|
|
2417 |
|
|
|
2418 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
2419 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
2420 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
2421 |
|
|
\section{Miscellaneous Integer Mappings}
|
2422 |
|
|
%Section tag: MIM0
|
2423 |
|
|
\label{ccil0:smim0}
|
2424 |
|
|
|
2425 |
|
|
Embedded system work and ROM constraints often inspire a great deal
|
2426 |
|
|
of cleverness in the selection of instructions to perform mappings or
|
2427 |
|
|
tests. In this section, we discuss integer mappings (i.e. functions)
|
2428 |
|
|
for which economical implementations are known; and in the next section
|
2429 |
|
|
(Section \ref{ccil0:smit0})
|
2430 |
|
|
we discuss integer tests for which economical implementations are known.
|
2431 |
|
|
|
2432 |
|
|
To the best of our knowledge, there is no way to derive these mappings
|
2433 |
|
|
and tests---they have been collected from many software developers and
|
2434 |
|
|
come from human intuition and experience.
|
2435 |
|
|
|
2436 |
|
|
|
2437 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
2438 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
2439 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
2440 |
|
|
\subsection{Lowest-Order Bit}
|
2441 |
|
|
%Subsection tag: LIB0
|
2442 |
|
|
\label{ccil0:smim0:slib0}
|
2443 |
|
|
|
2444 |
|
|
\index{lowest-order bit}
|
2445 |
|
|
\index{least significant bit}
|
2446 |
|
|
|
2447 |
|
|
The mapping
|
2448 |
|
|
|
2449 |
|
|
\texttt{mask = x \& -x}
|
2450 |
|
|
|
2451 |
|
|
\noindent{}is the most economical way known to extract the
|
2452 |
|
|
lowest-order bit set in an integer \texttt{x}, or
|
2453 |
|
|
0 if no bits are set.\footnote{This mapping was contributed by
|
2454 |
|
|
David Baker (\texttt{bakerda@engin.umich.edu})
|
2455 |
|
|
and Raul Selgado (\texttt{rselgado@visteon.com}).} Since most processors have an instruction to form the
|
2456 |
|
|
two's complement of an integer, this mapping usually requires only
|
2457 |
|
|
two arithmetic instructions.
|
2458 |
|
|
|
2459 |
|
|
When implementing this mapping in assembly-language on processors without a
|
2460 |
|
|
two's complement instruction, two other possible implementations are:
|
2461 |
|
|
|
2462 |
|
|
\begin{itemize}
|
2463 |
|
|
\item \texttt{mask = x \& ($\sim$x + 1)}
|
2464 |
|
|
\item \texttt{mask = x \& ((x \^{ } -1) + 1)}
|
2465 |
|
|
\end{itemize}
|
2466 |
|
|
|
2467 |
|
|
|
2468 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
2469 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
2470 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
2471 |
|
|
\section{Miscellaneous Integer Tests}
|
2472 |
|
|
%Section tag: MIT0
|
2473 |
|
|
\label{ccil0:smit0}
|
2474 |
|
|
|
2475 |
|
|
\subsection{Power Of 2}
|
2476 |
|
|
%Subsection tag: PTW0
|
2477 |
|
|
\label{ccil0:smit0:sptw0}
|
2478 |
|
|
|
2479 |
|
|
\index{power of two}
|
2480 |
|
|
\index{2N@$2^N$}
|
2481 |
|
|
|
2482 |
|
|
The test
|
2483 |
|
|
|
2484 |
|
|
\texttt{(x \& (x-1) == 0) \&\& (x != 0)}
|
2485 |
|
|
|
2486 |
|
|
\noindent{}is the most economical way known to
|
2487 |
|
|
test whether an integer is a positive power of two
|
2488 |
|
|
(1, 2, 4, 8, 16, etc.).\footnote{The test appeared as part of
|
2489 |
|
|
a discussion on
|
2490 |
|
|
the GMP mailing list in 2002.}
|
2491 |
|
|
|
2492 |
|
|
|
2493 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
2494 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
2495 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
2496 |
|
|
\section{Exercises}
|
2497 |
|
|
|
2498 |
|
|
\begin{vworkexercisestatement}
|
2499 |
|
|
\label{exe:ccil0:sexe0:01}
|
2500 |
|
|
Show that any $m$-bit two's complement integer $u_{[m-1:0]}$ except
|
2501 |
|
|
$-2^{m-1}$ can be negated by forming the one's complement, then adding one.
|
2502 |
|
|
\end{vworkexercisestatement}
|
2503 |
|
|
|
2504 |
|
|
|
2505 |
|
|
|
2506 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
2507 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
2508 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
2509 |
|
|
\vfill
|
2510 |
|
|
\noindent\begin{figure}[!b]
|
2511 |
|
|
\noindent\rule[-0.25in]{\textwidth}{1pt}
|
2512 |
|
|
\begin{tiny}
|
2513 |
|
|
\begin{verbatim}
|
2514 |
|
|
$RCSfile: c_cil0.tex,v $
|
2515 |
|
|
$Source: /home/dashley/cvsrep/e3ft_gpl01/e3ft_gpl01/dtaipubs/esrgubka/c_cil0/c_cil0.tex,v $
|
2516 |
|
|
$Revision: 1.24 $
|
2517 |
|
|
$Author: dtashley $
|
2518 |
|
|
$Date: 2003/11/03 02:14:24 $
|
2519 |
|
|
\end{verbatim}
|
2520 |
|
|
\end{tiny}
|
2521 |
|
|
\noindent\rule[0.25in]{\textwidth}{1pt}
|
2522 |
|
|
\end{figure}
|
2523 |
|
|
|
2524 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
2525 |
|
|
% $Log: c_cil0.tex,v $
|
2526 |
|
|
% Revision 1.24 2003/11/03 02:14:24 dtashley
|
2527 |
|
|
% All duplicate labels as flagged by LaTeX removed. Additional files added
|
2528 |
|
|
% to Microsoft Visual Studio edit context.
|
2529 |
|
|
%
|
2530 |
|
|
% Revision 1.23 2002/08/26 17:57:03 dtashley
|
2531 |
|
|
% Additional solutions chapter added. Precautionary checkin to be sure
|
2532 |
|
|
% that I've captured all changes.
|
2533 |
|
|
%
|
2534 |
|
|
% Revision 1.22 2002/08/25 13:18:21 dtashley
|
2535 |
|
|
% Edits.
|
2536 |
|
|
%
|
2537 |
|
|
% Revision 1.21 2002/08/22 00:33:33 dtashley
|
2538 |
|
|
% Have made aesthetic changes in CFRY0 and CCFR0. Checking in all before
|
2539 |
|
|
% rebuild of book.
|
2540 |
|
|
%
|
2541 |
|
|
% Revision 1.20 2002/08/21 09:12:11 dtashley
|
2542 |
|
|
% Safety checkin.
|
2543 |
|
|
%
|
2544 |
|
|
% Revision 1.19 2002/08/19 05:06:45 dtashley
|
2545 |
|
|
% Substantial progress in integer numerics chapter. Preparing to
|
2546 |
|
|
% work at home.
|
2547 |
|
|
%
|
2548 |
|
|
% Revision 1.18 2002/08/10 07:34:48 dtashley
|
2549 |
|
|
% Checkin before resuming work on a lemma at home.
|
2550 |
|
|
%
|
2551 |
|
|
% Revision 1.17 2002/08/09 03:13:42 dtashley
|
2552 |
|
|
% Significant lemma proved, and other progress.
|
2553 |
|
|
%
|
2554 |
|
|
% Revision 1.16 2002/08/06 22:15:24 dtashley
|
2555 |
|
|
% Substantial lemma proof progress.
|
2556 |
|
|
%
|
2557 |
|
|
% Revision 1.15 2002/08/05 01:28:33 dtashley
|
2558 |
|
|
% Lemma demonstrating that 0 \leq \hat{q}-q \leq 2 completed.
|
2559 |
|
|
%
|
2560 |
|
|
% Revision 1.14 2002/08/03 23:04:57 dtashley
|
2561 |
|
|
% Safety checkin. Having difficulty: may have solved the wrong
|
2562 |
|
|
% problem.
|
2563 |
|
|
%
|
2564 |
|
|
% Revision 1.13 2002/08/01 01:31:24 dtashley
|
2565 |
|
|
% Safety checkin. Having difficulty completing an alternate version of
|
2566 |
|
|
% Knuth's proof about the accuracy of quotient estimates based on first
|
2567 |
|
|
% digits of dividend and divisor.
|
2568 |
|
|
%
|
2569 |
|
|
% Revision 1.12 2002/07/31 04:38:31 dtashley
|
2570 |
|
|
% Edits to basic integer algorithms chapter, safety checkin.
|
2571 |
|
|
%
|
2572 |
|
|
% Revision 1.11 2002/07/29 16:30:08 dtashley
|
2573 |
|
|
% Safety checkin before moving work back to WSU server Kalman.
|
2574 |
|
|
%
|
2575 |
|
|
% Revision 1.10 2002/07/26 20:39:43 dtashley
|
2576 |
|
|
% Safety checkin.
|
2577 |
|
|
%
|
2578 |
|
|
% Revision 1.9 2002/07/26 14:58:01 dtashley
|
2579 |
|
|
% Safety checkin.
|
2580 |
|
|
%
|
2581 |
|
|
% Revision 1.8 2002/07/21 16:19:28 dtashley
|
2582 |
|
|
% Finalization of material about integer division when dividend is
|
2583 |
|
|
% long but divisor is same as data size that machine can use as
|
2584 |
|
|
% divisor natively.
|
2585 |
|
|
%
|
2586 |
|
|
% Revision 1.7 2002/07/20 23:01:45 dtashley
|
2587 |
|
|
% Adding material about division with large dividend and machine-native
|
2588 |
|
|
% divisor.
|
2589 |
|
|
%
|
2590 |
|
|
% Revision 1.6 2002/06/11 04:34:19 dtashley
|
2591 |
|
|
% Information for David Baker added.
|
2592 |
|
|
%
|
2593 |
|
|
% Revision 1.5 2002/06/08 05:50:25 dtashley
|
2594 |
|
|
% Added missing tilde. Can't get it to be same size as other
|
2595 |
|
|
% characters--need to research that.
|
2596 |
|
|
%
|
2597 |
|
|
% Revision 1.4 2002/06/07 02:08:32 dtashley
|
2598 |
|
|
% Section added for integer mappings and tests. Mapping for lowest-order
|
2599 |
|
|
% bit set from Raul Selgado and Dave (last name presently unknown)
|
2600 |
|
|
% at Visteon added. Test for power of 2 added.
|
2601 |
|
|
%
|
2602 |
|
|
% Revision 1.3 2001/08/31 23:11:17 dtashley
|
2603 |
|
|
% End of August 2001 safety check-in.
|
2604 |
|
|
%
|
2605 |
|
|
% Revision 1.2 2001/08/27 20:06:08 dtashley
|
2606 |
|
|
% Edits and substantial re-organization.
|
2607 |
|
|
%
|
2608 |
|
|
% Revision 1.1 2001/08/25 22:51:25 dtashley
|
2609 |
|
|
% Complex re-organization of book.
|
2610 |
|
|
%
|
2611 |
|
|
%End of file C_CIL0.TEX
|