Parent Directory | Revision Log

Revision **4** -
(**show annotations**)
(**download**)
(**as text**)

*Thu Oct 6 03:15:02 2016 UTC*
(7 years, 8 months ago)
by *dashley*

File MIME type: application/x-tex

File size: 115608 byte(s)

File MIME type: application/x-tex

File size: 115608 byte(s)

Initial commit after migrating from CVS.

1 | %$Header: /home/dashley/cvsrep/e3ft_gpl01/e3ft_gpl01/dtaipubs/esrgubka/c_cil0/c_cil0.tex,v 1.24 2003/11/03 02:14:24 dtashley Exp $ |

2 | |

3 | \chapter[\ccilzeroshorttitle{}]{\ccilzerolongtitle{}} |

4 | |

5 | \label{ccil0} |

6 | |

7 | \beginchapterquote{``If our ancestors had invented arithmetic by counting with |

8 | their two fists or their eight fingers, instead of their |

9 | ten `digits', we would never have to worry about |

10 | writing binary-decimal conversion routines. |

11 | (And we would perhaps never have learned as much about |

12 | number systems.)''} |

13 | {Donald E. Knuth, \cite[p. 319]{bibref:b:knuthclassic2ndedvol2}} |

14 | |

15 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

16 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

17 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

18 | \section{Introduction} |

19 | %Section tag: INT0 |

20 | \label{ccil0:sint0} |

21 | |

22 | Low-cost microcontrollers have no support for floating-point arithmetic, |

23 | and so integer arithmetic and fixed-point arithmetic are used nearly exclusively |

24 | in embedded systems. The ability to implement integer arithmetic |

25 | economically is a critical skill in the development of embedded |

26 | systems. |

27 | |

28 | Integer arithmetic algorithms are critically important in embedded |

29 | systems for the following reasons: |

30 | |

31 | \begin{itemize} |

32 | \item Mistakes in the implementation of arithmetic are frequently |

33 | responsible for product problems. (Mistakes are not confined |

34 | to obvious errors---errors such as filters which do not converge |

35 | on their input are also responsible for product problems.) |

36 | \item Floating-point arithmetic is not available or ill-advised |

37 | for nearly all small embedded systems for the following reasons: |

38 | \begin{itemize} |

39 | \item Low-cost microcontrollers do not possess hardware support for |

40 | floating-point arithmetic. |

41 | \item Implementation of floating-point arithmetic in software is |

42 | computationally expensive. |

43 | \item Implementation of floating-point arithmetic in software may |

44 | require large floating-point libraries, typically consuming |

45 | 1K-4K of ROM. |

46 | \item Safety-critical software standards typically prohibit the |

47 | use of floating-point arithmetic. |

48 | \end{itemize} |

49 | \item Integer arithmetic algorithms (other than addition and subtraction) |

50 | are quite tedious and error-prone for a software developer to design, implement, and |

51 | unit test. The implementation of such algorithms represents |

52 | cost and risk. Cost and risk benefits are achieved if the algorithms in detail are |

53 | available in advance (thus precluding design activities), or |

54 | better yet if ready-to-use integer algorithm libraries are available. |

55 | \end{itemize} |

56 | |

57 | This chapter describes the more fundamental principles and algorithms |

58 | (representation, fixed-point arithmetic, treatment of overflow, comparison, |

59 | addition, subtraction, multiplication, and division). A section |

60 | (Section \ref{ccil0:smim0}) is also included |

61 | on miscellaneous mappings involving integers which |

62 | are not numerical in intent. |

63 | Chapter %\cdtazeroxrefhyphen\cdtazerovolarabic{} |

64 | TBD |

65 | describes more complicated |

66 | integer algorithms and techniques (discrete-time operations |

67 | such as filtering, integration, and differentiation as well as more |

68 | complex functions such as square root). The split between these two chapters |

69 | is arbitrary; and in fact the material could have been divided differently |

70 | or combined. |

71 | |

72 | Treatment of the topics in this chapter is largely in accordance with |

73 | Knuth \cite{bibref:b:knuthclassic2ndedvol2}. The principal issues in |

74 | the implementation of integer algorithms are: |

75 | |

76 | \begin{itemize} |

77 | \item \textbf{How to use the arithmetic [or other] instructions provided by the machine to |

78 | operate on larger operands.} Microcontrollers typically provide arithmetic |

79 | instructions (comparison, shifting, addition, subtraction, and often but not |

80 | always multiplication and/or division) that operate on 8-bit or 16-bit integers. |

81 | A key question |

82 | is how small-operand instructions ``scale up''---that is, if and how they can |

83 | be used to assist in the implementation of integer arithmetic for much larger |

84 | operands. |

85 | \item \textbf{The order of the algorithm involved.} The order of algorithms |

86 | is a complicated issue when applied to microcontroller work. Many sophisticated |

87 | algorithms have a breakpoint below which they are less economical than |

88 | an inferior algorithm. Some applications (such as generating cryptographic keys |

89 | when integers thousands of bits long must be tested for primality) will |

90 | benefit from sophisticated algorithms becuase the operand sizes are large enough |

91 | to pass any such breakpoints. However, in microcontroller work, the need to manipulate |

92 | integers longer than 64 bits is very rare; thus, the breakpoints that indicate the |

93 | use of more sophisticated algorithms may not be reached. In microcontroller work, |

94 | depending on the operand sizes, there are circumstances in which an |

95 | $O(n^2)$ algorithm may be preferable to an $O(\log n)$ algorithm. Generally, |

96 | the order of algorithms must be balanced against operand sizes. |

97 | \end{itemize} |

98 | |

99 | We do diverge from Knuth in some areas. The most prominent divergence is |

100 | in the proofs offered for some important theorems and lemmas. Knuth |

101 | employs contrapositive proof formats in many circumstances, whereas we prefer |

102 | to use linear proofs that are more understandable to engineers and microcontroller |

103 | software developers. |

104 | |

105 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

106 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

107 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

108 | \section[Paradigms And Principles] |

109 | {Paradigms And Principles Of Microcontroller Arithmetic} |

110 | %Section tag: PPM0 |

111 | \label{ccil0:sppm0} |

112 | |

113 | How should one think about microcontroller arithmetic? What principles |

114 | guide us in its design and implementation? In this section, |

115 | we provide some general principles and paradigms of thought. |

116 | |

117 | |

118 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

119 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

120 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

121 | \subsection{Microcontroller Arithmetic As An Accident Of Silicon Design} |

122 | %Subsection tag: MAS0 |

123 | \label{ccil0:sppm0:smas0} |

124 | |

125 | In chapters |

126 | \cfryzeroxrefhyphen{}\ref{cfry0}, |

127 | \ccfrzeroxrefhyphen{}\ref{ccfr0}, |

128 | and |

129 | \cratzeroxrefhyphen{}\ref{crat0} |

130 | we consider rational approximation, |

131 | both in the form $h/k$ and $h/2^q$. Both forms of rational approximation |

132 | tend to be effective because we know that all modern processors possess |

133 | shift instructions, most possess integer multiply instructions, and many |

134 | possess integer divide instructions. In other words, the design |

135 | of the machine instruction set drives the strategies for implementation |

136 | of arithmetic, and makes some strategies attractive. |

137 | |

138 | Similarly, the observation that all microcontrollers provide instructions |

139 | for integer arithmetic creates the attractiveness of fixed-point arithmetic. |

140 | |

141 | Thus, we might view our approaches to microcontroller arithmetic as |

142 | an ``accident'' of silicon design, or as being driven by silicon |

143 | design. |

144 | |

145 | Generally, we seek to determine the best way to use the primitive |

146 | operations provided by the machine (the instruction set) to |

147 | accomplish the mappings of interest. |

148 | |

149 | The ``classic'' algorithms |

150 | presented by Knuth |

151 | Knuth (\cite[pp. 265-284]{bibref:b:knuthclassic2ndedvol2}) are especially |

152 | designed to use the ``small'' addition, subtraction, multiplication, and |

153 | division provided by the machine to add, subtract, multiply, and divide arbitrarily |

154 | large integers. In |

155 | \cite[pp. 265-266]{bibref:b:knuthclassic2ndedvol2}) Knuth writes: |

156 | |

157 | \begin{quote} |

158 | \emph{The most important fact to understand about extended-precision numbers |

159 | is that they may be regarded as numbers written in radix-$w$ notation, |

160 | where $w$ is the computer's word size. For example, an integer that |

161 | fills 10 words on a computer whose word size is $w=10^{10}$ has 100 |

162 | decimal digits; but we will consider it to be a 10-place number to |

163 | the base $10^{10}$. This viewpoint is justified for the same reason |

164 | that we may convert, say, from binary to hexadecimal notation, |

165 | simply by grouping the bits together.} |

166 | |

167 | \emph{In these terms, we are given the following primitive operations to work with:} |

168 | |

169 | \begin{itemize} |

170 | \item \emph{a$_0$) addition or subtraction of one-place integers, giving a one-place |

171 | answer and a carry;} |

172 | \item \emph{b$_0$) multiplication of a one-place integer by another one-place integer, |

173 | giving a two place answer;} |

174 | \item \emph{c$_0$) division of a two-place integer by a one-place integer, |

175 | provided that the quotient is a one-place integer, and yielding |

176 | also a one-place remainder.} |

177 | \end{itemize} |

178 | |

179 | \noindent{}\emph{By adjusting the word size, if necessary, nearly all computers |

180 | will have these three operations available; so we will construct algorithms |

181 | (a), (b), and (c) mentioned above in terms of the primitive operations |

182 | (a$_0$), (b$_0$), and (c$_0$).} |

183 | |

184 | \emph{Since we are visualizing extended-precision integers as base $b$ numbers, it is |

185 | sometimes helpful to think of the situation when $b = 10$, and to imagine |

186 | that we are doing the arithmetic by hand. Then operation (a$_0$) is analogous |

187 | to memorizing the addition table; (b$_0$) is analogous to memorizing the |

188 | multiplication table, and (c$_0$) is essentially memorizing the multiplication |

189 | table in reverse. The more complicated operations (a), (b), (c) on |

190 | high-precision numbers can now be done using simple addition, subraction, |

191 | multiplication, and long-division procedures that children are taught |

192 | in elementary school.} |

193 | \end{quote} |

194 | |

195 | The critical issue for implementation of integer arithmetic with large operands |

196 | is how to use small-operand instructions to operate on larger operands---in other words, |

197 | how to ``scale up'' the capability provided by the instruction set. |

198 | |

199 | |

200 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

201 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

202 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

203 | \subsection{Microcontroller Arithmetic As A Mapping From Quantized Domain To |

204 | Quantized Range} |

205 | %Subsection tag: MAM0 |

206 | \label{ccil0:sppm0:smam0} |

207 | |

208 | Microcontroller software accepts inputs which are quantized. In nearly all cases, |

209 | this involves a mapping from $\vworkrealset$ to $\vworkintset$. Often, because |

210 | microcontroller products are optimized for cost, the quantization hardware |

211 | delivers quite poor precision, frequently less than 8 bits. |

212 | |

213 | When a quantized input is accepted, it defines an inquality. Knowledge of |

214 | the quantized input (an integer) confines the actual input (a real |

215 | number, before |

216 | quantization) to an interval. With a low-cost hardware design, the |

217 | interval can be fairly large. Usually, by adding cost, the |

218 | interval can be made smaller. |

219 | |

220 | Microcontroller outputs tend to be quantized as well, so it is |

221 | accurate to also characterize outputs as integers. For example, a PWM signal |

222 | generated by a microcontroller or the output of a D/A converter is |

223 | controlled by data that is an integer. Like inputs, often the ``granularity'' |

224 | with which outputs can be controlled is quite coarse---again, 8 bits or |

225 | less is not uncommon. |

226 | |

227 | Thus, we may view microcontroller software as a mapping from poor-quality |

228 | inputs to poor-quality outputs. |

229 | |

230 | In such a framework, where the nature of inputs and outputs introduces |

231 | substantial error, it is imperative not to introduce additional error |

232 | in computer arithmetic. In other words, given inputs which are |

233 | integers, the responsibility of the software is to choose the best |

234 | integers as outputs. Usually this means that calculations should be |

235 | devised so as not to lose any information (i.e. |

236 | not to lose remainders, for example). Losing information is usually |

237 | equivalent to not being able to make the most correct mapping from input |

238 | to output. ``Lossy'' arithmetic can degrade the performance of a system, |

239 | since poor arithmetic may compound an inexpensive hardware design. |

240 | |

241 | |

242 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

243 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

244 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

245 | \subsection{Microcontroller Arithmetic As A Simulation Of Continuous Controllers} |

246 | %Subsection tag: MAE0 |

247 | \label{ccil0:sppm0:smae0} |

248 | |

249 | Control systems have not always employed digital controllers. |

250 | Many books and web sites (see \cite{bibref:w:historycontrol01}, for example) |

251 | discuss the historical development of feedback control. Controllers |

252 | have not historically been digital, or even electronic. |

253 | Early controllers for governing steam devices or windmills were |

254 | ultimately mechanical, and relied upon inertia or other physical |

255 | properties. It is possible to realize abstract notions |

256 | (integrators, differentiators, gains) using hydraulic systems or other mechanical systems; |

257 | and in fact hydraulic feedback controllers were used in early rockets |

258 | and aircraft. Very naturally, abstract notions (integrators, |

259 | differentiators, gains) can be implemented using analog |

260 | electronic components. The most common implementations involve |

261 | operational amplifiers, and the behavior of such implementations comes |

262 | very close to the ideal mathematical models. |

263 | |

264 | Mechanical, hydraulic, and non-digital electronic controllers have |

265 | one very desirable characteristic---\emph{clipping}. If, for example, |

266 | one provides an analog differentiator with a $dV/dt$ which |

267 | is too large, the output that the differentiator can |

268 | provide is limited, usually by the supply voltage available to an operational |

269 | amplifier. |

270 | The differentiator \emph{must} clip. |

271 | |

272 | Clipping often leads to behavior which is close to what |

273 | intuition would expect (i.e. we would present |

274 | clipping as an occasional advantage). For example, if an input to |

275 | an analog control system suffers a failure, the behavior |

276 | the of the controller is limited, as is its internal |

277 | state. Similarly, when the |

278 | input is restored, the controller will usually recover |

279 | in a reasonable time because the |

280 | state of the controller (typically maintained in capacitors) is limited |

281 | in the magnitude it can attain. |

282 | |

283 | We might view a digital controller as an emulation of |

284 | an analog controller. We may want to cause the |

285 | controller to have limits (i.e. rails) internally, for |

286 | example to prevent excessive integrator ``windup''. We discuss |

287 | this further in Section \ref{ccil0:sode0}. |

288 | |

289 | |

290 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

291 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

292 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

293 | \section{Practical Design Issues} |

294 | %Section tag: PDI0 |

295 | \label{ccil0:spdi0} |

296 | |

297 | In this section, we consider practical issues surrounding the design |

298 | and construction of a set of integer arithmetic subroutines. |

299 | In practice, such a collection of subroutines is likely to be |

300 | arranged into a library. The purpose of the library would be to |

301 | free the clients (or callers) of the library from the complexity of |

302 | large integer calculations. |

303 | |

304 | The design decisions surrounding the construction of a library vary in |

305 | the objectivity with which they can be approached. Some design decisions |

306 | (such as the best mechanism for passing parameters) can be approached |

307 | rigorously because the measures of goodness are unequivocal (minimal ROM consumption |

308 | or execution time, for example). However, other design decisions, particulary the |

309 | decision of the exact nature of the interface between an arithmetic library and |

310 | its clients, are more subjective. One size does \emph{not} fit all. |

311 | |

312 | |

313 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

314 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

315 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

316 | \subsection{Parameter Passing And Temporary Storage Mechanisms} |

317 | %Subsection tag: PPM0 |

318 | \label{ccil0:spdi0:sppm0} |

319 | |

320 | In small microcontroller work, the desire to save ROM and execution time |

321 | may lead to inelegant software construction. Because an arithmetic library |

322 | used in microcontroller work may be called from many different places |

323 | throughout ROM, serious thought should be given to optimizing the |

324 | parameter passing mechanisms, even perhaps at the expense of elegance. |

325 | The way in which the arithmetic library allocates temporary storage is |

326 | also a point of concern, because the most elegant way of allocating temporary |

327 | storage (on the stack) may either not be feasible (because of the possibility |

328 | of stack overflow) or may not be efficient (because the addressing modes of |

329 | the machine make data on the stack inefficient to address). In this section |

330 | we discuss both parameter passing and temporary storage mechanisms. |

331 | |

332 | In the remainder of the discussion, we make the following assumptions |

333 | about software architecture. |

334 | |

335 | \begin{enumerate} |

336 | \item \textbf{The arithmetic library need not be re-entrant.} |

337 | Most ``small'' microcontroller software loads use a non-preemptive |

338 | scheduling paradigm, so this is a reasonable assumption. We also |

339 | make the reasonable assumption that ISR's may not make calls into |

340 | the arithmetic library. |

341 | \item \textbf{Dynamic memory allocation, other than on the stack, |

342 | is not allowed by the software architecture.} This is also |

343 | a reasonable assumption in ``small'' microcontroller software. |

344 | \end{enumerate} |

345 | |

346 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

347 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

348 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

349 | \subsubsection{Parameter Passing Mechanisms} |

350 | %Subsubsection tag: PPM0 |

351 | \label{ccil0:spdi0:sppm0:sppm0} |

352 | |

353 | If an arithmetic library exists in a microcontroller software load, |

354 | it may be called many times throughout ROM. Thus, the parameter |

355 | passing mechanisms chosen may have a large effect on ROM consumption |

356 | (due to the setup required for each subroutine call multiplied by |

357 | many instances throughout ROM) and execution time |

358 | (because in microcontroller software longer instructions nearly always |

359 | require more time). Because of the criticality of ROM consumption, |

360 | parameter passing mechanisms that lack elegance may be attractive. |

361 | |

362 | In the category of parameter passing, we also include the way in which |

363 | return value(s) are passed back to the caller. |

364 | |

365 | The following parameter-passing mechanisms may be employed: |

366 | |

367 | \begin{enumerate} |

368 | \item \textbf{Pass by value as storage class \emph{automatic}.} |

369 | The most common scenario is that the arithmetic |

370 | library is written in assembly-language to be called from |

371 | `C', and so the assembly-language subroutines must adhere to the |

372 | parameter-passing conventions used by the compiler. |

373 | This usually means that the entire input or output value |

374 | will be passed in CPU registers or on the stack. Somewhat rarely, |

375 | a compiler will pass parameters in static locations.\footnote{The |

376 | usual reason for a `C' compiler to pass parameters in static locations |

377 | is because the instruction set of the machine was not designed for |

378 | higher-level languages, and references to [usually \emph{near}] memory |

379 | are cheaper than stack references. Such compilers also typically |

380 | analyze the calling tree of the program where possible and use this |

381 | information to overlay the parameter-passing memory areas of |

382 | subroutines that cannot be active simultaneously. Without the ability |

383 | to analyze the calling tree and make overlay decisions based on it, |

384 | memory would be exhausted, because each subroutine would need to have its |

385 | own static storage for parameters and local variables.} |

386 | \item \textbf{Pass by reference.} |

387 | Typically, it is convenient to pass pointer(s) to area(s) of memory |

388 | containing input operands, and also a pointer to an area of memory |

389 | owned by the caller which is written with the result by the arithmetic subroutine. |

390 | The efficiency of this approach depends on the compiler and the instruction |

391 | set of the machine. If the instruction set of the machine cannot |

392 | make effective use of pointers or stack frames, an arithmetic subroutine |

393 | might be constructed so that it first copies the operands to a static area of memory |

394 | reserved for the arithmetic library, then performs the necessary arithmetic |

395 | operations on the operands in the static area, |

396 | then copies the result(s) back to the area owned by the caller. |

397 | \item \textbf{Pass by common data block.} |

398 | In some cases, it may be preferable to reserve a block of memory in which to |

399 | pass parameters to arithmetic library functions, and from which to retrieve |

400 | results after an arithmetic library function returns. The allocation of such |

401 | a static memory block may be done manually\footnote{Note to self: need to |

402 | include gentleman's agreements on memory usage in my note on software architecture.} |

403 | or automatically (by development tools which can analyze the function calling tree and |

404 | manage the overlaying). |

405 | \end{enumerate} |

406 | |

407 | |

408 | |

409 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

410 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

411 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

412 | \subsubsection{Temporary Storage Mechanisms} |

413 | %Subsubsection tag: TSM0 |

414 | \label{ccil0:spdi0:sppm0:stsm0} |

415 | |

416 | Need to indicate clearly on section on software architectures the |

417 | primary temporary storage mechanisms: |

418 | |

419 | \begin{itemize} |

420 | \item Stack. |

421 | \item Memory block with overlay functionality. |

422 | \end{itemize} |

423 | |

424 | Need to expand software architecture section to cover this, so don't |

425 | discuss here. |

426 | |

427 | |

428 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

429 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

430 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

431 | \subsection{Reporting Of Overflow, Underflow, And Domain Errors} |

432 | %Subsection tag: OUD0 |

433 | \label{ccil0:spdi0:soud0} |

434 | |

435 | Long integer data types used in microcontroller work are typically |

436 | of a static size (they cannot grow in size in as operations are |

437 | performed on them). The reason for the typical static sizes is that |

438 | dynamic allocation (except for allocation and |

439 | deallocation on the stack as subroutines |

440 | are called and return) is rarely used in small microcontroller work. |

441 | It will come about in the normal usage of an integer arithmetic |

442 | library that an attempt will be made to operate on integers in |

443 | a way which generates an overflow, generates an |

444 | underflow, or |

445 | represents a domain error (division by zero or |

446 | square root of a negative integer, for example). |

447 | An important design decision is how such normal exceptions should be |

448 | handled. |

449 | |

450 | Possible design decisions in this area are: |

451 | |

452 | \begin{enumerate} |

453 | \item \label{enum:ccil0:spdi0:soud0:01:01} |

454 | \textbf{To design arithmetic subroutines so that exceptions are not |

455 | possible.} |

456 | For example, multiplying an $m$-word integer by an $n$-word integer |

457 | will always generate an integer that will fit within $m+n$ words. |

458 | If a multiplication subroutine is designed so that the caller must |

459 | provide an $m$-word operand and an $n$-word operand and a pointer to |

460 | an $(m+n)$-word area of memory for the result, an overflow cannot occur. |

461 | Such a design decision essentially pushes overflow detection back up to |

462 | the callers of arithmetic subroutines. |

463 | \item \label{enum:ccil0:spdi0:soud0:01:02} |

464 | \textbf{To design arithmetic subroutines so that exceptions are possible, |

465 | but not to detect the exceptions, thus providing an implementation that |

466 | will produce incorrect results with some operand data values.} |

467 | For example, if an arithmetic subroutine is designed to add an $m$-word |

468 | operand to another $m$-word operand to produce an $m$-word result, overflow |

469 | is possible. A design decision to fail to detect such exceptions pushes |

470 | the responsibility up to the callers of the arithmetic subroutines. |

471 | Callers must devise a method for not calling arithmetic subroutines |

472 | with data values that will cause an exception, or else to detect an exception |

473 | when it has occurred. |

474 | \item \label{enum:ccil0:spdi0:soud0:01:03} |

475 | \textbf{To ``rail'' the result in response to an exception.} |

476 | It was stated earlier that analog control system functional blocks |

477 | built with operational |

478 | amplifiers typically have an output which cannot go beyond the |

479 | supply rails. One may implement similar behavior in arithmetic subroutines. |

480 | In an addition subroutine which adds two $m$-word operands to produce an |

481 | $m$-word result (with each word having $w$ bits), it would be natural to |

482 | return $2^{mw}-1$ in the event of an overflow in a positive direction and |

483 | $-2^{mw}$ in the event of an overlfow in a negative direction. Note that |

484 | the caller will not be able to distinguish a ``rail'' value which represents |

485 | a valid result from a ``rail'' value substituted to indicate an exception. |

486 | \item \label{enum:ccil0:spdi0:soud0:01:04} |

487 | \textbf{To reserve special result data values to indicate exceptions.} |

488 | Depending on the arithmetic subroutine being implemented, it may be possible |

489 | to reserve certain result data values to indicate exceptions. This approach |

490 | is often awkward, as most mathematical subroutines are naturally defined so that |

491 | all bit patterns in the memory reserved for the result are valid numbers. |

492 | Additionally, with long result data values, it may not be economical to |

493 | compare the result against the reserved exception values. Thus, this is seldom |

494 | an optimal way to deal with exceptions. |

495 | |

496 | Additionally, if this approach is employed, the semantics of how exception |

497 | values combine with other exception values and data values must be decided. |

498 | \item \label{enum:ccil0:spdi0:soud0:01:05} |

499 | \textbf{To return exception codes to the caller separate from the result data.} |

500 | In the `C' language, pointers are often used to supply an arithmetic subroutine |

501 | with the input operands and to provide the arithmetic subroutine with a location |

502 | (which belongs to the caller) in which to store the result data. Thus, the return |

503 | value of the arithmetic subroutine (normally assigned through the subroutine name) |

504 | is often available to return exception codes. For example, |

505 | a `C' function may be defined as |

506 | |

507 | \begin{verbatim} |

508 | unsigned char int128_add(INT128 *result, |

509 | INT128 *arg1, |

510 | INT128 *arg2), |

511 | \end{verbatim} |

512 | |

513 | leaving the returned \texttt{unsigned char} value available to return |

514 | exception information. Note that this arrangement has the following advantages: |

515 | |

516 | \begin{enumerate} |

517 | \item All bit patterns in the result data memory area area available |

518 | as data bit patterns. |

519 | \item The exception data is very economical to test, because it is placed |

520 | in a machine-native data type. |

521 | \item The exception data can easily be discarded or by the caller if desired. |

522 | \item All decisions about how to handle exceptions are left to the caller. |

523 | \end{enumerate} |

524 | \item \label{enum:ccil0:spdi0:soud0:01:06} |

525 | \textbf{To maintain exception data with each result integer.} |

526 | It is possible to reserve bits for exception information which are part of the |

527 | long integer data type. This exception state essentially conveys |

528 | \emph{NaN}\footnote{\emph{N}ot \emph{a} \emph{n}umber.} information---integers with |

529 | exception information set are not true numbers, but rather they are different from the |

530 | true result in some way. As with (\ref{enum:ccil0:spdi0:soud0:01:04}), the |

531 | semantics of how to combine NaN values and NaN values with ordinary non-NaN numbers |

532 | must be defined. |

533 | \item \label{enum:ccil0:spdi0:soud0:01:07} |

534 | \textbf{To maintain a global exception state variable.} |

535 | A variable or set of variables can be reserved which hold the |

536 | exception information, if any, from the most recent call to |

537 | a function in the arithmetic library. |

538 | |

539 | To save CPU cycles, the arithmetic library can |

540 | be designed so that it will assign the global exception state variable only if an |

541 | exception occurs---the caller then has the responsibility of clearing the exception |

542 | state variable before making any call into the arithmetic library where the |

543 | exception result is of interest. The interface between the caller and the library |

544 | can be further optimized if the library only OR's data into the variable containing |

545 | the exception state. Using this optimization, the caller can clear the exception state |

546 | variable, make several calls into the arithmetic library, and then retrieve a meaningful |

547 | exception state variable value summarizing several arithmetic operations. |

548 | \item \label{enum:ccil0:spdi0:soud0:01:08} |

549 | \textbf{Hybrid approaches.} |

550 | The approaches (\ref{enum:ccil0:spdi0:soud0:01:01}) |

551 | through (\ref{enum:ccil0:spdi0:soud0:01:07}) can be combined. |

552 | For example, approach (\ref{enum:ccil0:spdi0:soud0:01:03}) |

553 | might be combined with approach (\ref{enum:ccil0:spdi0:soud0:01:05}) |

554 | so that exceptions are ``railed'', but the caller may also be made |

555 | aware that an exception has occured. Many hybrid approaches are possible. |

556 | \end{enumerate} |

557 | |

558 | |

559 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

560 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

561 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

562 | \subsection{Semantics Of Combining Overflow, Underflow, And Domain Errors} |

563 | %Subsection tag: CMB0 |

564 | \label{ccil0:spdi0:scmb0} |

565 | |

566 | For control system arithmetic, some form of clipping as suggested |

567 | in Section \ref{ccil0:sppm0:smae0} is probably the |

568 | best approach. Definitely, an overflow should generate a result |

569 | which is the largest representable integer, and an |

570 | underflow should generate a result which is the smallest |

571 | representable integer. |

572 | |

573 | In addition to treating an overflow by clipping, it may be |

574 | advantageous to reserve a flag in the representation of a |

575 | multiple-precision integer to record that an overflow has occured and been clipped. |

576 | Some functions which accept the integer as input may be interested |

577 | in the value of such a flag, where othere---perhaps most---may |

578 | not. |

579 | |

580 | The correct course of action in the event of a domain error (such |

581 | as division by zero) is less clear. It is noteworthy that in a |

582 | normal control system, domain errors cannot occur (but overflows |

583 | can). |

584 | |

585 | The best approach when a domain error is involved probably |

586 | depends on the basis for the underlying calculation. For |

587 | example, if integer division is used as part of a |

588 | strategy for software ratiometric conversion, a value |

589 | of zero in the denominator probably represents extreme electrical |

590 | noise, and the most sane approach may be to replace the |

591 | denominator by one. However, in other contexts it may be appropriate |

592 | to think in terms of |

593 | |

594 | \begin{equation} |

595 | \lim_{k \rightarrow 0^+} \frac{kn}{kd} |

596 | \end{equation} |

597 | |

598 | \noindent{}or |

599 | |

600 | \begin{equation} |

601 | \lim_{k \rightarrow 0^-} \frac{kn}{kd} . |

602 | \end{equation} |

603 | |

604 | Put in other terms, there is not a clear ``one solution |

605 | meets all needs'' approach to dealing with domain |

606 | errors. |

607 | |

608 | As in the case of overflows, it may be advantageous to reserve a bit |

609 | flag to signal that a domain error has occured and that the result |

610 | is not valid or not reliable. Note that floating point chips |

611 | (such as the 80x87) provide similar indications of domain errors. |

612 | |

613 | It may also be advantageous to adopt conventions for how |

614 | overflow or domain error flags propagate through binary or |

615 | unary operators. For example, if two numbers are multiplied, and |

616 | one of the two has the overflow flag set, it may be wise set the |

617 | overflow flag in the result. A scheme for how warning |

618 | flags propagate may be beneficial. |

619 | |

620 | |

621 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

622 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

623 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

624 | \subsection{Variable Versus Constant Subroutine Execution Time} |

625 | %Subsection tag: VVC0 |

626 | \label{ccil0:spdi0:svvc0} |

627 | |

628 | As a design goal of an embedded system, we seek to minimize the |

629 | timing variability of software components. An arithmetic subroutine |

630 | that with a high probability takes a short time to execute and with a |

631 | low probability takes a long time to execute, and where the execution |

632 | time is data dependent, is a serious risk. An embedded software product |

633 | may pass all release testing, but then fail in the field because of |

634 | specific data values used in calculations. |

635 | |

636 | A very conservative design goal would be to design every arithmetic subroutine |

637 | to require exactly the same execution time, regardless of data values. |

638 | This goal is not practical because machine instructions themselves |

639 | usually have a variable execution time, particularly for multiplication |

640 | and division instructions. A fallback goal would be to avoid |

641 | large differences between minimum and maximum execution time, without |

642 | increasing the maximum execution time. A very practical step to take |

643 | (using division as an example) |

644 | is to insert artifical delays into easily detectable exception cases (such as |

645 | division by zero) so that the exception case takes as long as the |

646 | minimum time for a division with valid operands. |

647 | |

648 | |

649 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

650 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

651 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

652 | \section{Fixed-Point Arithmetic} |

653 | %Section tag: FPA0 |

654 | \label{ccil0:sfpa0} |

655 | |

656 | \emph{Fixed-point arithmetic}\index{fixed-point arithmetic} |

657 | is a scheme for the representation |

658 | of engineering quantities (conceptually real numbers with optional |

659 | units) by integers so that calculations can be performed |

660 | on these quantitites using [usually multiple-precision] |

661 | integer arithmetic. |

662 | |

663 | In discussing fixed-point arithmetic, |

664 | we must be careful to distinguish between the |

665 | \emph{represented value} (the engineering quantity) |

666 | and the \emph{representation} (the integer which represents |

667 | the engineering quantity). In most cases, we must also |

668 | be careful to devise a system to track the units of the |

669 | represented values, as, especially with control systems, |

670 | the units of represented values (due to integration and |

671 | differentiation) can become very complex |

672 | and mistakes are easy to make. |

673 | |

674 | Fixed-point arithmetic is the dominant paradigm of construction |

675 | for calculations in small microcontroller systems. It may not be |

676 | clear why this should be so or what advantages it offers [over |

677 | floating-point arithmetic]. The reasons for |

678 | this predominance are: |

679 | |

680 | \begin{itemize} |

681 | \item Fixed-point calculations tend to be very efficient, because |

682 | they make direct use of the integer arithmetic instructions in the |

683 | microcontroller's instruction set. On the other hand, floating-point |

684 | arithmetic operations tend to be much slower. |

685 | |

686 | \item Floating-point calculations typically require a floating-point |

687 | library, which may consume at least several hundred bytes |

688 | of ROM. |

689 | |

690 | \item Some safety-critical software standards prohibit the use of |

691 | floating-point arithmetic because it can result in nebulous |

692 | behavior. Fixed-point arithmetic avoids these concerns. |

693 | \end{itemize} |

694 | |

695 | In order to carry out fixed-point arithmetic---that is, in order to |

696 | operate on engineering quantities as integers---we |

697 | require that the relationship between the |

698 | represented value and the representation be of the form |

699 | |

700 | \begin{equation} |

701 | \label{eq:ccil0:sfpa0:00} |

702 | x = r_I u + \Psi, |

703 | \end{equation} |

704 | |

705 | \noindent{}where $x \in \vworkrealset$ (possibly with units) is the represented |

706 | value, $u \in \vworkintset$ is the representation, |

707 | $r_I \in \vworkrealset$ is the scaling factor (possibly with |

708 | units), and $\Psi \in \vworkrealset$ (possibly with units) is the offset. |

709 | Note that the units of $r_I$, $\Psi$, and $x$ must match. |

710 | |

711 | We further require that $r_I \vworkdivides{} \Psi$\footnote{We \emph{are} |

712 | aware that this is an abuse of nomenclature, as |

713 | `$\vworkdivides$' (``divides'') is traditionally only applied to integers.} |

714 | (i.e. that $\Psi$ be an |

715 | integral multiple of $r_I$) |

716 | so that the offset in the represented value corresponds to an integer |

717 | in the representation. Without this restriction, we could not remove the |

718 | offset from the representation with integer subtraction only. Note that |

719 | we do \emph{not} require that $r_I$ or $\Psi$ be rational, although |

720 | they must both be rational or both be irrational in order to satisfy |

721 | (\ref{eq:ccil0:sfpa0:00}). |

722 | |

723 | In (\ref{eq:ccil0:sfpa0:00}), since we've required that $r_I \vworkdivides{} \Psi$, |

724 | we can replace $\Psi$ by |

725 | |

726 | \begin{equation} |

727 | \label{eq:ccil0:sfpa0:00b} |

728 | \psi = \frac{\Psi}{r_I} |

729 | \end{equation} |

730 | |

731 | \noindent{}to obtain |

732 | |

733 | \begin{equation} |

734 | \label{eq:ccil0:sfpa0:01} |

735 | x = r_I (u + \psi), |

736 | \end{equation} |

737 | |

738 | \noindent{}where $\psi \in \vworkintset$ is the offset in representational |

739 | counts. |

740 | |

741 | Note that we can also easily obtain the following relationships from the |

742 | defining equations (\ref{eq:ccil0:sfpa0:00}), |

743 | (\ref{eq:ccil0:sfpa0:00b}), and (\ref{eq:ccil0:sfpa0:01}). |

744 | |

745 | \begin{equation} |

746 | \label{eq:ccil0:sfpa0:02} |

747 | u = \frac{x - \Psi}{r_I} = \frac{x}{r_I} - \psi |

748 | \end{equation} |

749 | |

750 | \begin{equation} |

751 | \label{eq:ccil0:sfpa0:03} |

752 | \psi = \frac{\Psi}{r_I} |

753 | \Longleftrightarrow |

754 | \Psi = r_I \psi |

755 | \Longleftrightarrow |

756 | r_I = \frac{\Psi}{\psi} |

757 | \end{equation} |

758 | |

759 | |

760 | For example, in a 16-bit signed integer (which inherently may range |

761 | from -32768 to 32767 inclusive), one might used a fixed-point |

762 | representation of 100 integer counts per $^\circ$C ($r_I = 0.01 \; ^\circ$C) |

763 | with an offset of 100$^\circ$C ($\Psi$ = 100$^\circ$C or equivalently |

764 | $\psi$ = 10000), giving |

765 | a representational range from -227.68$^\circ$C to 427.67$^\circ$C inclusive. |

766 | |

767 | If $r_I = 2^N, \; N \in \vworkintset$, then the radix point of the represented |

768 | value is positioned |

769 | between two bits of the representation---this arrangement may have computational |

770 | advantages |

771 | if the whole and fractional parts of the represented value need to be separated |

772 | easily in the representation. Note also that if $r_I = 2^{WN}$ where $W$ is the |

773 | machine integer size of the computer (in bits), then the radix point of the represented value |

774 | occurs between two addressable machine integers of the representation, which can be convenient. |

775 | However, we do not require for our definition of a fixed-point representation that |

776 | $r_I$ be an integral power of two; and in fact we do not even require by our |

777 | definition that $r_I$ be rational. Note that in general our definition above is the |

778 | weakest set of conditions so that real-world engineering values can be manipulated |

779 | using integer operations performed upon the representation. |

780 | |

781 | |

782 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

783 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

784 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

785 | \section{Representation Of Integers} |

786 | %Section Tag: ROI0 |

787 | \label{ccil0:sroi0} |

788 | |

789 | In this section, we discuss common representations of integers, both |

790 | \emph{machine} integers and \emph{synthetic long} integers. |

791 | By \emph{representation} we mean the mapping between the abstract |

792 | mathematical notion of an integer and the way it is stored in the computer |

793 | (voltage levels and the programming model). |

794 | Although |

795 | in Knuth's development of integer arithmetic |

796 | \cite{bibref:b:knuthclassic2ndedvol2} |

797 | it is assumed that |

798 | integers may be represented in any base, we don't require such generality |

799 | in practice and in this work we confine ourselves for the most |

800 | part to $b=2^n$, and often to $n = 8i$, $i \in \vworkintsetpos$. The assumption |

801 | of $b=2^n$ characterizes all modern digital computers, and we feel comfortable |

802 | making this assumption throughout the work. However, the assumption |

803 | $n = 8i$, $i \in \vworkintsetpos$ does not hold universally, and so we |

804 | most often do not make this assumption. |

805 | |

806 | By \emph{machine} integer, we mean an integer upon which the computer |

807 | can operate in a single instruction (such as to add, increment, |

808 | load, store, etc.). For most microcontrollers, machine integers are |

809 | either 8 or 16 bits in size. The representation of a machine integer |

810 | is designed and specified by the microcontroller manufacturer. In principle, |

811 | nothing would prevent a microcontroller manufacturer from devising |

812 | and implementing a novel way of representing machine integers and supporting |

813 | this novel representation with an instruction set. However, in |

814 | practice, all machine integers are either simple unsigned integers or two's |

815 | complement signed integers. In addition to the efficiency of these |

816 | representations with respect to the design of digital logic, these representations |

817 | are so standard and so pervasive that they are universally tacitly assumed. |

818 | For example, ``\texttt{if (i \& 0x1)}'' is an accepted C-language |

819 | idiom for ``if $i$ is odd'', and it is expected that such code will |

820 | work on all platforms. |

821 | |

822 | By \emph{synthetic long} integer, we mean an integer of |

823 | arbitrary\footnote{By \emph{arbitrary}, we do not necessarily mean that |

824 | the integer can grow to be arbitrarily large in magnitude, or that |

825 | its maximum size is not known at compile time. We mean \emph{arbitrarily} |

826 | longer than a machine integer. Multiple-precision arithmetic libraries |

827 | can be divided into two classes---those that fix the size of the integers |

828 | at compile time, and those that use dynamic allocation and allow integers to |

829 | grow as needed at run time. The former category |

830 | is normally used for small microcontroller work, whereas the latter category |

831 | (such as the GNU MP Library \cite{bibref:s:gnumultipleprecisionarithmeticlibrary}) |

832 | is normally used in scientific and number theoretic calculation and on |

833 | more powerful platforms than microcontrollers. The representational concepts |

834 | we present here apply to both categories.} |

835 | length that is formed by concatenating machine integers. There is some |

836 | subjectivity in deciding the representation of multiple-precision integers, |

837 | and we discuss in the subsections |

838 | \ref{ccil0:sroi0:srou0} and |

839 | \ref{ccil0:sroi0:sros0} which immediately follow. |

840 | |

841 | |

842 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

843 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

844 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

845 | \subsection{Representation Of Unsigned Integers} |

846 | %Subsection Tag: ROU0 |

847 | \label{ccil0:sroi0:srou0} |

848 | |

849 | Unsigned machine integers are always represented as an ordered |

850 | array of bits (Figure \ref{fig:ccil0:sroi0:srou0:00}). For an |

851 | $m$-bit unsigned integer $u$, we denote these bits $u_{[m-1]}$ through |

852 | $u_{[0]}$, with $u_{[m-1]}$ the most significant bit. The value |

853 | of $u$ is the sum of the values of each bit multiplied |

854 | by the power of 2 it represents: |

855 | |

856 | \begin{figure} |

857 | \centering |

858 | \includegraphics[width=4.6in]{c_cil0/uintrep1.eps} |

859 | \caption{Representation Of Unsigned Machine Integers} |

860 | \label{fig:ccil0:sroi0:srou0:00} |

861 | \end{figure} |

862 | |

863 | \begin{equation} |

864 | \label{eq:ccil0:sroi0:srou0:00} |

865 | u = \sum_{i=0}^{m-1} u_{[i]} 2^i . |

866 | \end{equation} |

867 | |

868 | In general, an $m$-bit unsigned integer can assume the values of |

869 | 0 through $2^m - 1$, so that |

870 | |

871 | \begin{equation} |

872 | \label{eq:ccil0:sroi0:srou0:01} |

873 | u \in \{0, \ldots{} , 2^m - 1 \} . |

874 | \end{equation} |

875 | |

876 | Unsigned synthetic long integers are always represented as an array |

877 | of unsigned machine integers. |

878 | Consistent with the GMP \cite{bibref:s:gnumultipleprecisionarithmeticlibrary}, |

879 | we call each element of the array a \emph{limb} and we call the size of |

880 | each limb the \emph{limbsize}. This usage is very close to what Knuth |

881 | calls the \emph{word}size $w$ in the excerpt presented in |

882 | Section \ref{ccil0:sppm0:smas0}. |

883 | |

884 | Microcontroller processors are more likely than more powerful processors to have |

885 | a non-orthogonal instruction set, and so the limbsize may not be consistent between |

886 | operations in an arithmetic library. |

887 | With some processors, the appropriate limbsize may vary depending on the operation being |

888 | performed. |

889 | For example, a microcontroller processor may be able to add two 16-bit integers to |

890 | obtain a 16-bit result plus a carry, but only be able to multiply two 8-bit integers to |

891 | obtain a 16-bit result (thus, the appropriate limbsize for addition may be |

892 | 16 bits while the appropriate limbsize for multiplication may be 8 bits). |

893 | In such cases, it is usually most efficient to add using 16-bit limbs but |

894 | multiply using 8-bit limbs. Adding using 8-bit limbs on a machine which will |

895 | support 16-bit additions is not normally a good design decision---even if the |

896 | machine supports an 8-bit addition instruction which is faster than the 16-bit addition |

897 | instruction, $m/2$ 16-bit additions will nearly always be faster than |

898 | $m$ 8-bit additions. Using different limbsizes within the same arithmetic library |

899 | may require some consideration of alignment and |

900 | endian issues, but these are implementation details |

901 | which are easily solved. |

902 | |

903 | We view a synthetic long unsigned integer as an array of limbs (machine integers) |

904 | of some size, and we agree that we will not address the array in any other way than |

905 | by loading and storing limbs of this size.\footnote{Well \ldots{} not quite. |

906 | In software for large computers (personal computers and workstations) with an |

907 | orthogonal instruction set, we may be able to adhere to this rule. However, |

908 | with microcontrollers, arithmetic libraries which are optimized |

909 | may break this rule.} In particular, because |

910 | computers may be ``big-endian'' or ``little-endian'', loading and storing |

911 | smaller units than limbs may lead in |

912 | a worst case to software defects or in a best case to non-portable code. |

913 | |

914 | Assume that $w$ is the number of bits in a limb. |

915 | Notationally, we denote an unsigned |

916 | synthetic long integer as an array of $m$ limbs |

917 | $u_{m-1}$ through $u_0$, each containing $w$ bits, |

918 | with $u_0$ the least significant machine integer. |

919 | We may also define $b=2^w$ (consistent with Knuth's |

920 | notation). |

921 | The value of |

922 | such a synthetic long machine integer is |

923 | |

924 | \begin{equation} |

925 | \label{eq:ccil0:sroi0:srou0:02} |

926 | u = \sum_{i=0}^{m-1} u_{i} 2^{wi} |

927 | = |

928 | \sum_{i=0}^{m-1} u_{i} b^i. |

929 | \end{equation} |

930 | |

931 | As an alternative, we may write the value as the sum of the bit-values, |

932 | |

933 | \begin{equation} |

934 | \label{eq:ccil0:sroi0:srou0:03} |

935 | u = \sum_{i=0}^{wm-1} u_{[i]} 2^{i} . |

936 | \end{equation} |

937 | |

938 | Naturally, the range of such a synthetic long integer is |

939 | |

940 | \begin{equation} |

941 | \label{eq:ccil0:sroi0:srou0:04} |

942 | u \in \{0, \ldots{} , 2^{wm} - 1 \} . |

943 | \end{equation} |

944 | |

945 | In storing an unsigned synthetic long machine integer, the most natural way |

946 | to order the array of limbs depends on whether dynamic memory allocation is |

947 | used by the arithmetic library. In microcontroller work, where arithmetic |

948 | library subroutines typically operate on fixed-size operands and produce |

949 | fixed-size results, storing |

950 | limbs most significant limb first (i.e. in `C', so that element \texttt{[0]} |

951 | of the array of limbs contains the most significant limb) may be natural |

952 | and convenient. However, this ordering would lead to computational waste in a library such |

953 | as the GMP \cite{bibref:s:gnumultipleprecisionarithmeticlibrary} where integers |

954 | may grow arbitrarily large and the library may need to reallocate long synthetic |

955 | integers to contain more limbs, as each reallocation would need to be followed |

956 | by a memory copy to align the integer's existing limbs to the end of the array. |

957 | For libraries such as the GMP, it is more practical to store limbs |

958 | least-significant limb first, as it eliminates the need to copy memory |

959 | when reallocations are done. |

960 | |

961 | |

962 | |

963 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

964 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

965 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

966 | \subsection{Representation Of Signed Integers} |

967 | %Subsection Tag: ROS0 |

968 | \label{ccil0:sroi0:sros0} |

969 | |

970 | Signed machine integers are always represented in two's complement form on modern |

971 | processors. This representation is universal because of the digital logic |

972 | conveniences---the same addition and subtraction mappings which are correct |

973 | for unsigned machine integers are also correct for signed machine integers, |

974 | although the criteria for overflow and comparison are |

975 | different. |

976 | |

977 | Most readers are familiar with two's complement representation, so we will not |

978 | belabor it. However, we will present essential properties. |

979 | When two's complement representation is used in an $m$-bit machine integer $u$: |

980 | |

981 | \begin{enumerate} |

982 | \item All bit patterns with $u_{[m-1]} = 0$ represent non-negative integers, and |

983 | represent the same integer as if the representation were unsigned. |

984 | \item All bit patterns with $u_{[m-1]} = 1$ represent negative numbers; specifically |

985 | $u_{[m-2:0]} - 2^{m-1}$; i.e. |

986 | |

987 | \begin{equation} |

988 | \label{eq:ccil0:sroi0:sros0:00} |

989 | u = - u_{[m-1]} 2^{m-1} + \sum_{i=0}^{m-2} u_{[i]} 2^i . |

990 | \end{equation} |

991 | |

992 | \item $u \in \{-2^{m-1}, \ldots{}, 2^{m-1}-1 \}$. |

993 | \item All bit patterns represent a unique integer. |

994 | \item For any integer except $-2^{m-1}$, |

995 | negation can be performed by forming the one's complement (complementing |

996 | every bit), then adding one. To see why this is true algebraically, note that |

997 | |

998 | \end{enumerate} |

999 | |

1000 | |

1001 | However, let us observe that the value of an |

1002 | $m$-bit two's complement |

1003 | machine integer is |

1004 | |

1005 | |

1006 | In general, an $m$-bit signed machine integer can assume the values of |

1007 | $-2^{m-1}$ through $2^{m-1} - 1$, so that |

1008 | |

1009 | \begin{equation} |

1010 | \label{eq:ccil0:sroi0:sros0:01} |

1011 | u \in \{-2^{m-1}, \ldots{} , 2^{m-1} - 1 \} . |

1012 | \end{equation} |

1013 | |

1014 | There are [at least] two different representations of signed |

1015 | multiple-precision integers: |

1016 | |

1017 | \begin{itemize} |

1018 | \item Two's complement representation. |

1019 | \item Sign-magnitude representation. |

1020 | \end{itemize} |

1021 | |

1022 | There are two different representations commonly used |

1023 | for signed multiple-precision integers because two's complement |

1024 | representation is not ideal for multiplication and division, although |

1025 | it is ideal for addition and subtraction. For multiple-precision |

1026 | integer arithmetic, sign-magnitude representation is more common. |

1027 | |

1028 | In two's complement representation of multiple-precision integers, |

1029 | the representation is the same as suggested by |

1030 | (\ref{eq:ccil0:sroi0:sros0:00}), except |

1031 | more bits are involved. For a two's complement representation |

1032 | of a number consisting of $n$ machine integers with $W$ bits per |

1033 | machine integer, |

1034 | |

1035 | \begin{equation} |

1036 | \label{eq:ccil0:sroi0:sros0:02} |

1037 | u = - u_{B(Wn-1)} 2^{Wn-1} + \sum_{i=0}^{Wn-2} u_{B(i)} 2^i . |

1038 | \end{equation} |

1039 | |

1040 | Because we would like to know how to compare signed multiple-precision |

1041 | integers in two's complement representation, we can gain some |

1042 | insight into the representation by rewriting |

1043 | (\ref{eq:ccil0:sroi0:sros0:02}) in terms of machine integers: |

1044 | |

1045 | \begin{equation} |

1046 | \label{eq:ccil0:sroi0:sros0:03} |

1047 | u = - u_{B(Wn-1)} 2^{Wn-1} |

1048 | + |

1049 | \sum_{i=W(m-1)}^{Wn-2} u_{B(i)} 2^i |

1050 | + |

1051 | \sum_{i=0}^{m-2} u_{i} 2^{Wi} . |

1052 | \end{equation} |

1053 | |

1054 | (\ref{eq:ccil0:sroi0:sros0:03}) gives some insight into the |

1055 | relative values of multiple-precision signed two's complement |

1056 | integers with respect to the values of the machine integers |

1057 | that comprise them. We discuss this further in |

1058 | Section \ref{ccil0:scsi0}. |

1059 | |

1060 | In sign-magnitude representation of multiple-precision signed two's |

1061 | complement integers, an integer $u$ is represented as a sign |

1062 | bit (usually a value of one indicates negativity), and a magnitude, |

1063 | which is $|u|$. Unlike two's complement representation, |

1064 | sign-magnitude representation, has two representations of zero---a positive |

1065 | one and a negative one. Either a canonical form for zero should be |

1066 | adopted, or both values should be treated identically. |

1067 | |

1068 | Assuming that the sign bit is stored in the most significant bit position, |

1069 | it is easy to deduce that the value of a multiple-precision |

1070 | signed two's complement integer in sign-magnitude representation is |

1071 | |

1072 | \begin{equation} |

1073 | \label{eq:ccil0:sros0:srou0:04} |

1074 | u = (-1)^{u_{B(m-1)}} \sum_{i=0}^{m-2} u_{B(i)} 2^i . |

1075 | \end{equation} |

1076 | |

1077 | Sign-magnitude representation is especially convenient because |

1078 | it allows machine instructions which accept unsigned operands to be used |

1079 | to make calculations involving signed integers. |

1080 | |

1081 | |

1082 | |

1083 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

1084 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

1085 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

1086 | \section{Characteristics Of Practical Processors} |

1087 | %Section tag: CPP |

1088 | \label{ccil0:scpp0} |

1089 | |

1090 | Before discussing specific algorithms, it is necessary to |

1091 | discuss the construction of practical processors---how such a processor |

1092 | manipulates machine integers. We accept as a typical processor the |

1093 | TMS-370C8, an 8-bit microcontroller manufactured by |

1094 | Texas Instruments. |

1095 | |

1096 | \begin{figure} |

1097 | \centering |

1098 | \includegraphics[width=4.6in]{c_cil0/t370flag.eps} |

1099 | \caption{Texas Instruments TMS-370C8 Flags} |

1100 | \label{fig:ccil0:scpp0:00} |

1101 | \end{figure} |

1102 | |

1103 | \begin{figure} |

1104 | \centering |

1105 | \includegraphics[width=4.6in]{c_cil0/t370cjmp.eps} |

1106 | \caption{Texas Instruments TMS-370C8 Conditional Jump Instructions} |

1107 | \label{fig:ccil0:scpp0:01} |

1108 | \end{figure} |

1109 | |

1110 | A typical microcontroller allows operations on machine integers |

1111 | in the following steps: |

1112 | |

1113 | \begin{itemize} |

1114 | \item A machine instruction is performed on one or two machine |

1115 | integer operands (for example: addition, subtraction, |

1116 | multiplication, division, increment, decrement, complement, |

1117 | negation, or comparison). This machine instruction may |

1118 | produce a result, and usually sets a number of condition flags that |

1119 | reflect the nature and validity of the result (Is it zero? |

1120 | Is it negative? Did the result overflow?). As an |

1121 | example, the condition |

1122 | flags of the TMS-370C8 are shown in Figure \ref{fig:ccil0:scpp0:00}. |

1123 | . |

1124 | \item A conditional branch instruction is used to branch conditionally |

1125 | based on the state of the condition flags. The definition of |

1126 | the condition flags and the way in which the conditional |

1127 | branch instruction utilizes them is designed to provide a |

1128 | way to treat both unsigned and signed machine integers. |

1129 | As an example, the way in which the conditional |

1130 | jump instructions of the TMS-370C8 use the flags |

1131 | is shown in Figure \ref{fig:ccil0:scpp0:01}. |

1132 | \end{itemize} |

1133 | |

1134 | It is not too often necessary to understand in detail |

1135 | the Boolean relationships that govern how machine integers are |

1136 | added, subtracted, and compared; and how signed comparisons differ |

1137 | from unsigned comparisons. In most cases, it is adequate to |

1138 | rely on the design of the microcontroller. However, we do present |

1139 | rudimentary observations in this section. |

1140 | |

1141 | |

1142 | |

1143 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

1144 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

1145 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

1146 | \section{Comparison Of Integers} |

1147 | |

1148 | |

1149 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

1150 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

1151 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

1152 | \subsection{Comparison Of Unsigned Integers} |

1153 | |

1154 | |

1155 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

1156 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

1157 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

1158 | \subsection{Comparison Of Signed Integers} |

1159 | %Section Tag: CSI0 |

1160 | \label{ccil0:scsi0} |

1161 | |

1162 | |

1163 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

1164 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

1165 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

1166 | \section{Integer Addition} |

1167 | %Section tag: IAD0 |

1168 | \label{ccil0:siad0} |

1169 | |

1170 | Addition of two $m$-bit integers is a combinational function---that is, |

1171 | the inputs uniquely determine the output. Addition of binary |

1172 | numbers is performed |

1173 | |

1174 | |

1175 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

1176 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

1177 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

1178 | \subsection{Hardware Implementation Of Addition} |

1179 | |

1180 | |

1181 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

1182 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

1183 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

1184 | \subsection{Addition Of Unsigned Operands} |

1185 | |

1186 | |

1187 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

1188 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

1189 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

1190 | \subsection{Addition Of Signed Operands} |

1191 | |

1192 | |

1193 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

1194 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

1195 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

1196 | \section {Integer Subtraction} |

1197 | |

1198 | |

1199 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

1200 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

1201 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

1202 | \subsection{Hardware Implementation Of Subtraction} |

1203 | |

1204 | |

1205 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

1206 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

1207 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

1208 | \subsection{Subtraction Of Unsigned Operands} |

1209 | |

1210 | |

1211 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

1212 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

1213 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

1214 | \subsection{Subtraction Of Signed Operands} |

1215 | |

1216 | |

1217 | |

1218 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

1219 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

1220 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

1221 | \section{Integer Multiplication} |

1222 | |

1223 | |

1224 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

1225 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

1226 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

1227 | \subsection{Hardware Implementation Of Multiplication} |

1228 | |

1229 | |

1230 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

1231 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

1232 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

1233 | \subsection{Multiplication Of Unsigned Operands} |

1234 | |

1235 | |

1236 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

1237 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

1238 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

1239 | \subsection{Multiplication Of Signed Operands} |

1240 | |

1241 | |

1242 | |

1243 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

1244 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

1245 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

1246 | \section{Integer Division} |

1247 | \label{ccil0:sidv0} |

1248 | |

1249 | \index{division}\index{integer division}In this section, |

1250 | we discuss the best known methods of dividing integers using |

1251 | typical microcontroller instruction sets. In general, given |

1252 | two arbitrary integers $p$ and $q$, we are interested in determining |

1253 | their quotient $q=\lfloor{}p/q\rfloor$ and remainder |

1254 | $r=p\bmod{}q$ as economically as possible. |

1255 | |

1256 | |

1257 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

1258 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

1259 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

1260 | \subsection{Hardware Implementation Of Division} |

1261 | |

1262 | |

1263 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

1264 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

1265 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

1266 | \subsection{General Unsigned Division Without A Machine Division Instruction} |

1267 | \label{ccil0:sidv0:sgdn0} |

1268 | |

1269 | |

1270 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

1271 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

1272 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

1273 | \subsection{General Unsigned Division With A Machine Division Instruction} |

1274 | \label{ccil0:sidv0:sgdu0} |

1275 | |

1276 | As mentioned many places in this work, efficiency in microcontroller software |

1277 | involves phrasing computational problems in a way which makes good use of |

1278 | the machine instruction set. In Section \ref{ccil0:sidv0:sgdn0} we discussed |

1279 | the classic shift-compare-subtract algorithm for division. This algorithm |

1280 | is far from efficient. A reasonable question to ask is whether we can |

1281 | leverage ``small'' division capability (provided by the machine instruction set) |

1282 | to accomplish ``large'' divisions (those which we require in practice). |

1283 | It ends up that this is possible: the technique involved is effectively |

1284 | to use machine division instructions to estimate the highest-order bits of |

1285 | the quotient based on the highest-order bits of the dividend and divisor. |

1286 | |

1287 | Knuth's discussion of division |

1288 | algorithms \cite[pp. 270-275]{bibref:b:knuthclassic2ndedvol2} is the |

1289 | basis for most of the material in this subsection. However, Knuth has |

1290 | a gift for terseness that is sometimes a curse for the reader, and so |

1291 | we take more time than Knuth to explain certain results. |

1292 | |

1293 | First, as a starting point, we present \emph{Algorithm D} from |

1294 | Knuth \cite[pp. 272-273]{bibref:b:knuthclassic2ndedvol2}. Then, |

1295 | we justify the algorithm and explain why it is valid. Finally, |

1296 | we supply implementation advice for microcontroller instruction sets. |

1297 | |

1298 | \begin{vworkalgorithmstatementpar}{Arbitrary Unsigned Division Using |

1299 | Machine Unsigned Division Instructions} |

1300 | \label{alg:ccil0:sidv0:sgdu0:01} |

1301 | (From Knuth \cite[pp. 272-273]{bibref:b:knuthclassic2ndedvol2}) |

1302 | Given nonnegative integers $u=(u_{m+n-1} \ldots{} u_1 u_0)_b$ |

1303 | and $v=(v_{n-1} \ldots{} v_1 v_0)_b$, where |

1304 | $v_{n-1} \neq 0$ and $n > 1$, we form the radix-$b$ quotient |

1305 | $\lfloor{}u/v\rfloor{} = (q_m q_{m-1} \ldots{} q_0)_b$ and |

1306 | the remainder $u \bmod v = (r_{n-1} \ldots{} r_1 r_0)_b$. When |

1307 | $n=1$, the simpler algorithm of |

1308 | Subsection \ref{ccil0:sidv0:sldm0} |

1309 | should be used. |

1310 | |

1311 | \begin{algblvl0} |

1312 | \item \label{enumstep:alg:ccil0:sidv0:sgdu0:01:01} |

1313 | [Normalize.] Set $d \gets \lfloor{}b/(v_{n-1} + 1)\rfloor$. |

1314 | Then set $(u_{m+n} u_{m+n-1} \ldots{} u_1 u_0)_b$ equal to |

1315 | $(u_{m+n-1} \ldots{} u_1 u_0)_b$ times $d$; similarly, |

1316 | set $(v_{n-1} \ldots{} v_1 v_0)_b$ equal to |

1317 | $(v_{n-1} \ldots{} v_1 v_0)_b$ times $d$. (Notice the introduction |

1318 | of a new digit position $u_{m+n}$ at the left of |

1319 | $u_{m+n-1}$; if $d=1$, all we need to do in this step is set |

1320 | $u_{m+n} \gets 0$. On a binary computer it may be preferable |

1321 | to choose $d$ to be a power of 2 instead of using the value |

1322 | suggested here; any value of $d$ that results in |

1323 | $v_{n-1} \geq \lfloor{}b/2\rfloor$ will suffice. See also |

1324 | exercise 37.) |

1325 | \item \label{enumstep:alg:ccil0:sidv0:sgdu0:01:02} |

1326 | [Initialize $j$.] Set $j \gets m$. (The loop on $j$, |

1327 | steps |

1328 | \ref{enumstep:alg:ccil0:sidv0:sgdu0:01:03} |

1329 | through |

1330 | \ref{enumstep:alg:ccil0:sidv0:sgdu0:01:07}, |

1331 | will be essentially a division of |

1332 | $(u_{j+n} \ldots{} u_{j+1} u_j)_b$ by $(v_{n-1} \ldots{} v_1 v_0)_b$ to |

1333 | get a single quotient digit $q_j$; see Figure \ref{fig:alg:ccil0:sidv0:sgdu0:01:01}.) |

1334 | |

1335 | \begin{figure} |

1336 | \centering |

1337 | \includegraphics[width=4.6in]{c_cil0/kdfc01.eps} |

1338 | \caption{Flowchart For Algorithm \ref{alg:ccil0:sidv0:sgdu0:01} (From \cite[p. 273]{bibref:b:knuthclassic2ndedvol2})} |

1339 | \label{fig:alg:ccil0:sidv0:sgdu0:01:01} |

1340 | \end{figure} |

1341 | |

1342 | \item \label{enumstep:alg:ccil0:sidv0:sgdu0:01:03} |

1343 | [Calculate $\hat{q}$.] Set |

1344 | $\hat{q} \gets \lfloor{}u_{j+n}b + u_{j+n-1})/v_{n-1}\rfloor$ and |

1345 | let $\hat{r}$ be the remainder, $(u_{j+n}b + u_{j+n-1}) \bmod v_{n-1}$. |

1346 | Now test if $\hat{q} = b$ or $\hat{q} v_{n-2} > b\hat{r} + u_{j+n-2}$; |

1347 | if so, decrease $\hat{q}$ by 1, increase $\hat{r}$ by $v_{n-1}$, and repeat |

1348 | this test if $\hat{r} < b$. (The test of $v_{n-2}$ determines at high |

1349 | speed most of the cases in which the trial value $\hat{q}$ is one too large, |

1350 | and it eliminates \emph{all} cases where $\hat{q}$ is two too large; |

1351 | see exercises 19, 20, 21.) |

1352 | \item \label{enumstep:alg:ccil0:sidv0:sgdu0:01:04} |

1353 | [Multiply and subtract.] Replace $(u_{j+n} u_{j+n-1} \ldots{} u_j)_b$ by |

1354 | |

1355 | \begin{equation} |

1356 | \nonumber |

1357 | (u_{j+n} u_{j+n-1} \ldots{} u_j)_b - \hat{q} (0 v_{n-1} \ldots{} v_1 v_0)_b. |

1358 | \end{equation} |

1359 | |

1360 | This computation consists of a simple multiplication by a one-place number, |

1361 | combined with a subtraction. The digits $(u_{j+n}, u_{j+n-1}, \ldots{}, u_j)$ |

1362 | should be kept positive; if the result of this step is actually negative, |

1363 | $(u_{j+n} u_{j+n-1} \ldots{} u_j)_b$ should be left as the true |

1364 | value plus $b^{n+1}$, namely as the $b$'s complement of the true value, and |

1365 | a ``borrow'' to the left should be remembered. |

1366 | \item \label{enumstep:alg:ccil0:sidv0:sgdu0:01:05} |

1367 | [Test remainder.] Set $q_j \gets \hat{q}$. If the result of step |

1368 | \ref{enumstep:alg:ccil0:sidv0:sgdu0:01:04} was negative, go to |

1369 | step \ref{enumstep:alg:ccil0:sidv0:sgdu0:01:06}; otherwise go on to |

1370 | step \ref{enumstep:alg:ccil0:sidv0:sgdu0:01:07}. |

1371 | \item \label{enumstep:alg:ccil0:sidv0:sgdu0:01:06} |

1372 | [Add back.] (The probability that this step is necessary is very small, on the |

1373 | order of only $2/b$, as shown in exercise 21; test data to activate this step |

1374 | should therefore be specifically contrived when debugging.) Decrease |

1375 | $q_j$ by 1, and add $(0 v_{n-1} \ldots v_1 v_0)_b$ to |

1376 | $(u_{j+n} u_{j+n-1} \ldots{} u_{j+1} u_j)_b$. (A carry will occur to the left of |

1377 | $u_{j+n}$, and it should be ignored since it cancels with the borrow that |

1378 | occured in step \ref{enumstep:alg:ccil0:sidv0:sgdu0:01:04}.) |

1379 | \item \label{enumstep:alg:ccil0:sidv0:sgdu0:01:07} |

1380 | [Loop on $j$.] Decrease $j$ by one. Now if $j \geq 0$, go back to step |

1381 | \ref{enumstep:alg:ccil0:sidv0:sgdu0:01:03}. |

1382 | \item \label{enumstep:alg:ccil0:sidv0:sgdu0:01:08} |

1383 | [Unnormalize.] Now $(q_m \ldots{} q_1 q_0)_b$ is the desired quotient, and |

1384 | the desired remainder may be obtained by dividing |

1385 | $(u_{n-1} \ldots{} u_1 u_0)_b$ by $d$. |

1386 | \end{algblvl0} |

1387 | \end{vworkalgorithmstatementpar} |

1388 | \vworkalgorithmfooter{} |

1389 | |

1390 | The general idea of Algorithm \ref{alg:ccil0:sidv0:sgdu0:01} is that |

1391 | digits (machine words) of the quotient $q$ can be successively estimated |

1392 | based on the first digits of the dividend and divisor. Knuth |

1393 | \cite[p. 271]{bibref:b:knuthclassic2ndedvol2} explores the properties of |

1394 | the digit estimate |

1395 | |

1396 | \begin{equation} |

1397 | \label{eq:ccil0:sidv0:sgdu0:01} |

1398 | \hat{q} = \min \left( {\left\lfloor{\frac{u_n b + u_{n-1}}{v_{n-1}}}\right\rfloor, b-1} \right). |

1399 | \end{equation} |

1400 | |

1401 | The first point to make about an estimate in the form of |

1402 | (\ref{eq:ccil0:sidv0:sgdu0:01}) is that it can only be accomplished |

1403 | efficiently if the machine-native division instruction supports |

1404 | overflow detection, since it is possible that |

1405 | $(u_n b + u_{n-1})/v_{n-1} \geq b$, even if |

1406 | $u/v < b$, as is shown by the following example. |

1407 | |

1408 | \begin{vworkexamplestatement} |

1409 | \label{ex:ccil0:sidv0:sgdu0:01:01} |

1410 | Assume that we wish to apply the estimate of $\hat{q}$ provided by |

1411 | (\ref{eq:ccil0:sidv0:sgdu0:01}) |

1412 | to $u=16,776,704$ and $v=65,535$. Demonstrate that a machine division |

1413 | overflow will occur when estimating the first digit, assuming a processor |

1414 | that can divide a 16-bit dividend by an 8-bit divisor to produce an 8-bit |

1415 | quotient. |

1416 | \end{vworkexamplestatement} |

1417 | \begin{vworkexampleparsection}{Solution} |

1418 | Note that according to Knuth's intention, the word size on such a machine |

1419 | is 8 bits. Thus, $b=256$. Note that $u/v = 255 + 255/256 < b = 256$, as |

1420 | required by Knuth's precondition. However, although $u/v < b$, |

1421 | $u = [255 \; 254 \; 0] [256^2 \; 256 \; 1]^T = [u_2 u_1 u_0] [b^2 b^1 b^0]^T$ and |

1422 | $v = [255 \; 255] [256 \; 1]^T = [v_1 v_0] [b^1 b^0]^T$, so that calculating |

1423 | an estimate $\hat{q}$ as required by (\ref{eq:ccil0:sidv0:sgdu0:01}), |

1424 | $(u_n b + u_{n-1})/v_{n-1} = 65,534/255 = 256 + 254/255 \geq b$, is a division |

1425 | overflow for a single machine division instruction. Thus, it follows that |

1426 | a machine with division overflow detection can quickly determine that $b-1$ from |

1427 | (\ref{eq:ccil0:sidv0:sgdu0:01}) is the minimum, whereas a machine without |

1428 | division overflow |

1429 | detection would have to use several additional machine instructions to make |

1430 | this determination. |

1431 | \end{vworkexampleparsection} |

1432 | \vworkexamplefooter{} |

1433 | |

1434 | The second thing to establish about $\hat{q}$ as defined by |

1435 | (\ref{eq:ccil0:sidv0:sgdu0:01}) is how ``good'' of an estimate |

1436 | $\hat{q}$ is---how much information, exactly, about $q$ can we |

1437 | obtain by examining the first two words of $u$ and the first |

1438 | word of $v$? |

1439 | |

1440 | We first establish in the following lemma that our estimate of |

1441 | $q$, $\hat{q}$, can be no less than $q$. |

1442 | |

1443 | \begin{vworklemmastatementpar}{\mbox{\boldmath$\hat{q} \geq q$}} |

1444 | \label{lem:ccil0:sidv0:sgdu0:01} |

1445 | The estimate of $q$ provided by (\ref{eq:ccil0:sidv0:sgdu0:01}), |

1446 | $\hat{q}$ is always at least as great as the actual value of |

1447 | $q$, i.e. $\hat{q} \geq q$. |

1448 | \end{vworklemmastatementpar} |

1449 | \begin{vworklemmaproof} |

1450 | Knowledge of $u_{n}$, $u_{n-1}$, and $v_{n-1}$ necessarily confine |

1451 | the intervals in which the actual values of $u$ and $v$ may be; |

1452 | specifically:\footnote{In |

1453 | (\ref{eq:lem:ccil0:sidv0:sgdu0:01:04}) |

1454 | and |

1455 | (\ref{eq:lem:ccil0:sidv0:sgdu0:01:07}), |

1456 | we use statements of the form ``$x = x$'' as an idiom for |

1457 | ``$x$ is known''.} |

1458 | |

1459 | \begin{eqnarray} |

1460 | \label{eq:lem:ccil0:sidv0:sgdu0:01:01} |

1461 | u & = & \sum_{i=0}^{n} u_i b^i \\ |

1462 | \label{eq:lem:ccil0:sidv0:sgdu0:01:02} |

1463 | & = & u_n b^n + u_{n-1} b^{n-1} + \ldots{} + u_2 b^2 + u_1 b + u_0 \\ |

1464 | \label{eq:lem:ccil0:sidv0:sgdu0:01:03} |

1465 | & = & (u_n b + u_{n-1}) b^{n-1} + \ldots{} + u_2 b^2 + u_1 b + u_0 |

1466 | \end{eqnarray} |

1467 | |

1468 | \begin{eqnarray} |

1469 | \nonumber & (u_n = u_n \wedge u_{n-1} = u_{n-1}) & \\ |

1470 | \label{eq:lem:ccil0:sidv0:sgdu0:01:04} |

1471 | & \vworkvimp & \\ |

1472 | \nonumber & (u_n b + u_{n-1}) b^{n-1} \leq u \leq (u_n b + u_{n-1}) b^{n-1} + b^{n-1} - 1 & |

1473 | \end{eqnarray} |

1474 | |

1475 | \begin{eqnarray} |

1476 | \label{eq:lem:ccil0:sidv0:sgdu0:01:05} |

1477 | v & = & \sum_{i=0}^{n-1} v_i b^i \\ |

1478 | \label{eq:lem:ccil0:sidv0:sgdu0:01:06} |

1479 | & = & v_{n-1} b^{n-1} + v_{n-2} b^{n-2} + \ldots{} + v_2 b^2 + v_1 b + v_0 |

1480 | \end{eqnarray} |

1481 | |

1482 | \begin{equation} |

1483 | \label{eq:lem:ccil0:sidv0:sgdu0:01:07} |

1484 | (v_{n-1} = v_{n-1}) |

1485 | \vworkhimp |

1486 | v_{n-1} b^{n-1} \leq v \leq v_{n-1} b^{n-1} + b^{n-1} - 1 |

1487 | \end{equation} |

1488 | |

1489 | (\ref{eq:lem:ccil0:sidv0:sgdu0:01:04}) and |

1490 | (\ref{eq:lem:ccil0:sidv0:sgdu0:01:07}) reflect the uncertainties in the |

1491 | values of $u$ and $v$ respectively because only the first digit(s) of |

1492 | $u$ and $v$ are being considered in forming the estimate $\hat{q}$. |

1493 | |

1494 | By definition, the actual value of $q$ is $\lfloor{}u/v\rfloor$. For a |

1495 | rational function $f(u,v) = u/v$ where $u \in [u_{min}, u_{max}]$ and |

1496 | $v \in [v_{min}, v_{max}]$, the minimum value of $u/v$ occurs at |

1497 | $u_{min}/v_{max}$, and the maximum value of $u/v$ occurs at |

1498 | $u_{max}/v_{min}$. We can therefore write that |

1499 | |

1500 | \begin{equation} |

1501 | \label{eq:lem:ccil0:sidv0:sgdu0:01:08} |

1502 | \left\lfloor{\frac{(u_n b + u_{n-1}) b^{n-1}}{v_{n-1} b^{n-1} + b^{n-1} - 1}}\right\rfloor |

1503 | \leq |

1504 | q |

1505 | \leq |

1506 | \left\lfloor{\frac{(u_n b + u_{n-1}) b^{n-1} + b^{n-1} - 1}{v_{n-1} b^{n-1}}}\right\rfloor . |

1507 | \end{equation} |

1508 | |

1509 | In other words, knowledge of $u_{n}$, $u_{n-1}$, and $v_{n-1}$ confines $q$ to the |

1510 | interval indicated in (\ref{eq:lem:ccil0:sidv0:sgdu0:01:08}). We must prove that, |

1511 | given a specific $u_{n}$, $u_{n-1}$, and $v_{n-1}$, $\hat{q}$ is at least as large as |

1512 | the upper bound in (\ref{eq:lem:ccil0:sidv0:sgdu0:01:08}); otherwise we could find a |

1513 | $q$ such that $q > \hat{q}$. We can algebraically manipulate the upper bound in |

1514 | in (\ref{eq:lem:ccil0:sidv0:sgdu0:01:08}) to yield |

1515 | |

1516 | \begin{equation} |

1517 | \label{eq:lem:ccil0:sidv0:sgdu0:01:09} |

1518 | \left\lfloor{\frac{(u_n b + u_{n-1}) b^{n-1}}{v_{n-1} b^{n-1} + b^{n-1} - 1}}\right\rfloor |

1519 | \leq |

1520 | q |

1521 | \leq |

1522 | \left\lfloor{\frac{u_n b + u_{n-1} + \frac{b^{n-1}-1}{b^{n-1}}}{v_{n-1}}}\right\rfloor . |

1523 | \end{equation} |

1524 | |

1525 | In (\ref{eq:lem:ccil0:sidv0:sgdu0:01:09}), since $(b^{n-1}-1)/b^{n-1} < 1$ and since |

1526 | $u_n b + u_{n-1}$ is an integer, we can conclude that |

1527 | $\lfloor u_n b + u_{n-1} + (b^{n-1}-1)/b^{n-1} \rfloor = \lfloor u_n b + u_{n-1} \rfloor$ |

1528 | and hence that |

1529 | |

1530 | \begin{equation} |

1531 | \label{eq:lem:ccil0:sidv0:sgdu0:01:10} |

1532 | q |

1533 | \leq |

1534 | \left\lfloor{\frac{u_n b + u_{n-1} + \frac{b^{n-1}-1}{b^{n-1}}}{v_{n-1}}}\right\rfloor |

1535 | = |

1536 | \left\lfloor{\frac{u_n b + u_{n-1}}{v_{n-1}}}\right\rfloor |

1537 | = |

1538 | \hat{q} . |

1539 | \end{equation} |

1540 | |

1541 | Therefore, $\hat{q} \geq q$. |

1542 | \end{vworklemmaproof} |

1543 | \vworklemmafooter{} |

1544 | |

1545 | Lemma \ref{lem:ccil0:sidv0:sgdu0:01} establishes that |

1546 | a digit estimate $\hat{q}$ based on the first digit of the |

1547 | divisor $v$ can be no less than the actual digit $q$, i.e. |

1548 | $\hat{q}-q \geq 0$. However, we must also establish an upper bound |

1549 | on $\hat{q}-q$. |

1550 | |

1551 | Intuitively, based on |

1552 | (\ref{eq:lem:ccil0:sidv0:sgdu0:01:06}), we might guess that |

1553 | if $v_{n-1}$ is small, the estimate $\hat{q}$ may be quite |

1554 | poor, as the interval to which the actual value of $v$ is confined |

1555 | may be quite large. This fact is the basis for the normalization |

1556 | step [\ref{enumstep:alg:ccil0:sidv0:sgdu0:01:01}] in Algorithm |

1557 | \ref{alg:ccil0:sidv0:sgdu0:01}. We now prove a useful result |

1558 | for how much $u, v$ must be normalized so that $\hat{q}-q \leq 2$. |

1559 | |

1560 | \begin{vworklemmastatementpar}{Normalization Requirement So That |

1561 | \mbox{\boldmath$\hat{q} - q \leq 2$}} |

1562 | \label{lem:ccil0:sidv0:sgdu0:02} |

1563 | If $v_{n-1} \geq \lfloor b/2 \rfloor$ and $\hat{q}$ is chosen as |

1564 | indicated in (\ref{eq:ccil0:sidv0:sgdu0:01}), then |

1565 | $0 \leq \hat{q} - q \leq 2$. |

1566 | \end{vworklemmastatementpar} |

1567 | \begin{vworklemmaproof} |

1568 | The lower limit on $\hat{q} - q$ is proved in Lemma \ref{lem:ccil0:sidv0:sgdu0:01}. |

1569 | We now seek only to prove that $\hat{q} - q \leq 2$. |

1570 | By definition of $\hat{q}$ and $q$, |

1571 | |

1572 | \begin{equation} |

1573 | \label{eq:lem:ccil0:sidv0:sgdu0:02:01} |

1574 | \hat{q} - q = \left\lfloor {\frac{u_n b + u_{n-1}}{v_{n-1}}} \right\rfloor |

1575 | - \left\lfloor {\frac{u}{v}} \right\rfloor |

1576 | \end{equation} |

1577 | |

1578 | When only nonnegative integers are involved, |

1579 | (\cmtnzeroxrefhyphen\ref{eq:cmtn0:sfcf0:02}) |

1580 | supplies an exact expression for the floor of a |

1581 | ratio of integers. Using (\cmtnzeroxrefhyphen\ref{eq:cmtn0:sfcf0:02}), |

1582 | (\ref{eq:lem:ccil0:sidv0:sgdu0:02:01}) can be decomposed into |

1583 | |

1584 | \begin{equation} |

1585 | \label{eq:lem:ccil0:sidv0:sgdu0:02:02} |

1586 | \hat{q} - q = \frac{u_n b + u_{n-1}}{v_{n-1}} |

1587 | - \frac{(u_n b + u_{n-1}) \bmod v_{n-1}}{v_{n-1}} |

1588 | - \frac{u}{v} |

1589 | + \frac{u \bmod v}{v} . |

1590 | \end{equation} |

1591 | |

1592 | \noindent{}Note that (\ref{eq:lem:ccil0:sidv0:sgdu0:02:02}) is an exact |

1593 | expression (rather than an |

1594 | inequality). |

1595 | |

1596 | Note in (\ref{eq:lem:ccil0:sidv0:sgdu0:02:02}) that |

1597 | $(u_n b + u_{n-1}) \bmod v_{n-1} \in [0, v_{n-1}-1]$, and that in general |

1598 | there is no reason to expect it cannot be zero. Thus, we can assume that |

1599 | it \emph{is} zero, which will maximize $\hat{q}-q$. We can thus convert |

1600 | (\ref{eq:lem:ccil0:sidv0:sgdu0:02:02}) into the inequality |

1601 | |

1602 | \begin{equation} |

1603 | \label{eq:lem:ccil0:sidv0:sgdu0:02:03} |

1604 | \hat{q} - q \leq \frac{u_n b + u_{n-1}}{v_{n-1}} |

1605 | - \frac{u}{v} |

1606 | + \frac{u \bmod v}{v} . |

1607 | \end{equation} |

1608 | |

1609 | In (\ref{eq:lem:ccil0:sidv0:sgdu0:02:03}) we can also observe that |

1610 | $(u \bmod v)/v \in [0, (v-1)/v]$. If we replace this expression with |

1611 | ``1'' (which is unattainable, but barely), this will change the relational |

1612 | operator from ``$\leq$'' to ``$<$'': |

1613 | |

1614 | \begin{equation} |

1615 | \label{eq:lem:ccil0:sidv0:sgdu0:02:04} |

1616 | \hat{q} - q < \frac{u_n b + u_{n-1}}{v_{n-1}} |

1617 | - \frac{u}{v} |

1618 | + 1 . |

1619 | \end{equation} |

1620 | |

1621 | The result we wish to show is that with $v_{n-1} \geq \lfloor b/2 \rfloor$, |

1622 | $\hat{q}-q \leq 2$. To simplify the subsequent algebraic manipulations, note in |

1623 | (\ref{eq:lem:ccil0:sidv0:sgdu0:02:04}) that |

1624 | |

1625 | \begin{eqnarray} |

1626 | \label{eq:lem:ccil0:sidv0:sgdu0:02:05} |

1627 | & \hat{q} - q \leq 2 & \\ |

1628 | \nonumber & \vworkvertequiv & \\ |

1629 | \label{eq:lem:ccil0:sidv0:sgdu0:02:06} |

1630 | & \displaystyle \frac{u_n b + u_{n-1}}{v_{n-1}} |

1631 | - \frac{u}{v} |

1632 | + 1 \leq 3 & \\ |

1633 | \nonumber & \vworkvertequiv & \\ |

1634 | \label{eq:lem:ccil0:sidv0:sgdu0:02:07} |

1635 | & \displaystyle \frac{u_n b + u_{n-1}}{v_{n-1}} |

1636 | - \frac{u}{v} \leq 2 & |

1637 | \end{eqnarray} |

1638 | |

1639 | (\ref{eq:lem:ccil0:sidv0:sgdu0:02:06}) may be counterintuitive, so |

1640 | further explanation is offered here. Since $\hat{q} \in \vworkintset$ and $q \in \vworkintset$, |

1641 | $\hat{q}-q \in \vworkintset$. Thus, proving |

1642 | (\ref{eq:lem:ccil0:sidv0:sgdu0:02:06}) or |

1643 | (\ref{eq:lem:ccil0:sidv0:sgdu0:02:07}) proves that |

1644 | $\hat{q}-q \in \{ \ldots, -1, 0, 1, 2 \}$ (however, by |

1645 | Lemma \ref{lem:ccil0:sidv0:sgdu0:01}, $\hat{q}-q \geq 0$, so in fact |

1646 | what would be proved is that $\hat{q}-q \in \{ 0, 1, 2 \}$). For |

1647 | algebraic simplicity, |

1648 | we choose to prove (\ref{eq:lem:ccil0:sidv0:sgdu0:02:07}). |

1649 | |

1650 | First, adjust numerator and denominator of the first term in |

1651 | (\ref{eq:lem:ccil0:sidv0:sgdu0:02:07}) by $b^{n-1}$ so that the terms more closely |

1652 | resemble $u$ and $v$ in |

1653 | (\ref{eq:lem:ccil0:sidv0:sgdu0:01:02}) |

1654 | and |

1655 | (\ref{eq:lem:ccil0:sidv0:sgdu0:01:06}): |

1656 | |

1657 | \begin{equation} |

1658 | \label{eq:lem:ccil0:sidv0:sgdu0:02:08} |

1659 | \frac{u_n b^n + u_{n-1} b^{n-1}}{v_{n-1} b^{n-1}} |

1660 | - \frac{u}{v} \leq 2 . |

1661 | \end{equation} |

1662 | |

1663 | For logical implication to be maintained, |

1664 | we must make the most pessimistic choices and assumptions possible in |

1665 | (\ref{eq:lem:ccil0:sidv0:sgdu0:02:08}) in order to maximize the value |

1666 | of the left side of the inequality. |

1667 | The first assumption to be made is the error in estimating |

1668 | $u$ and $v$ based on their most significant digits. It can be |

1669 | seen that (\ref{eq:lem:ccil0:sidv0:sgdu0:02:08}) will be maximized if: |

1670 | |

1671 | \begin{itemize} |

1672 | \item We assume that $u = u_{n} b^n + u_{n-1} b^{n-1}$ (i.e. that we estimate |

1673 | $u$ precisely). |

1674 | \item We assume that $v = v_{n-1} b^{n-1} + b^{n-1} - 1$ (i.e. that |

1675 | we underestimate $v$ by the maximum amount possible). |

1676 | \item We minimize the value of $v_{n-1} b^{n-1}$. |

1677 | \end{itemize} |

1678 | |

1679 | Assuming that $u$ is estimated precisely yields |

1680 | |

1681 | \begin{equation} |

1682 | \label{eq:lem:ccil0:sidv0:sgdu0:02:09} |

1683 | \frac{u}{v_{n-1} b^{n-1}} |

1684 | - \frac{u}{v} \leq 2 . |

1685 | \end{equation} |

1686 | |

1687 | Assuming that $v$ is underestimated by the maximum amount possible |

1688 | yields |

1689 | |

1690 | \begin{equation} |

1691 | \label{eq:lem:ccil0:sidv0:sgdu0:02:10} |

1692 | \frac{u}{v - b^{n-1} + 1} |

1693 | - \frac{u}{v} \leq 2 . |

1694 | \end{equation} |

1695 | |

1696 | Finally, with $b$ and $v$ fixed, $u$ can be maximized by noting that |

1697 | $u \leq bv - 1$ (by the problem assumption that the quotient is a single digit). |

1698 | However, for algebraic simplicity, we make the substitution $u=bv$ (rather than |

1699 | $u=bv-1$), since the weaker upper bound is strong enough to prove the |

1700 | first result we seek. |

1701 | |

1702 | \begin{equation} |

1703 | \label{eq:lem:ccil0:sidv0:sgdu0:02:11} |

1704 | \frac{bv}{v - b^{n-1} + 1} |

1705 | - \frac{bv}{v} \leq 2 |

1706 | \end{equation} |

1707 | |

1708 | \begin{equation} |

1709 | \label{eq:lem:ccil0:sidv0:sgdu0:02:12} |

1710 | \frac{bv}{v - b^{n-1} + 1} |

1711 | - b \leq 2 |

1712 | \end{equation} |

1713 | |

1714 | Solving (\ref{eq:lem:ccil0:sidv0:sgdu0:02:12}) |

1715 | for $v$ yields |

1716 | |

1717 | \begin{equation} |

1718 | \label{eq:lem:ccil0:sidv0:sgdu0:02:13} |

1719 | v \geq \frac{b^n}{2} + b^{n-1} - \frac{1}{b} - 1 |

1720 | \end{equation} |

1721 | |

1722 | Again using the assumption that $v$ is underestimated by the maximum amount |

1723 | possible, we may make the substitution that $v = v_{n-1} b^{n-1} + b^{n-1} -1$, |

1724 | leading to |

1725 | |

1726 | \begin{equation} |

1727 | \label{eq:lem:ccil0:sidv0:sgdu0:02:13b} |

1728 | v_{n-1} \geq \frac{b}{2} - \frac{1}{2 b^{n-2}} . |

1729 | \end{equation} |

1730 | |

1731 | There are two cases to consider: $b$ even and $b$ odd. If $b$ is even, the |

1732 | proof is complete, as $\lfloor b/2 \rfloor = b/2$ and the choice of |

1733 | $v_{n-1} = \lfloor b/2 \rfloor$ will automatically satisfy |

1734 | (\ref{eq:lem:ccil0:sidv0:sgdu0:02:13b}). However, if $b$ is odd, |

1735 | $\lfloor b/2 \rfloor = b/2 - 1/2 < b/2 - 1/2b^{n-2}$, violating |

1736 | (\ref{eq:lem:ccil0:sidv0:sgdu0:02:13b}), |

1737 | and so we need to |

1738 | further examine this case. |

1739 | |

1740 | If $b$ is odd and $v_{n-1} = \lfloor b/2 \rfloor$, then |

1741 | $v_{n-1} = b/2 - 1/2$, violating (\ref{eq:lem:ccil0:sidv0:sgdu0:02:13b}). |

1742 | However, any larger choice of $v_{n-1}$ (such as |

1743 | $\lfloor b/2 \rfloor + 1$, $\lfloor b/2 \rfloor + 2$, etc.) satisfies |

1744 | (\ref{eq:lem:ccil0:sidv0:sgdu0:02:13b}); so that it remains only to prove |

1745 | the $v_{n-1} = \lfloor b/2 \rfloor = b/2 - 1/2$ |

1746 | case. |

1747 | |

1748 | If $v_{n-1} = \lfloor b/2 \rfloor = b/2 - 1/2$, then |

1749 | |

1750 | \begin{eqnarray} |

1751 | \label{eq:lem:ccil0:sidv0:sgdu0:02:14} |

1752 | v & \in & \left[ |

1753 | \left( \frac{b}{2} - \frac{1}{2}\right) b^{n-1}, |

1754 | \left( \frac{b}{2} - \frac{1}{2}\right) b^{n-1} + b^{n-1} - 1 |

1755 | \right] \\ |

1756 | \nonumber & = & |

1757 | \left[ |

1758 | \frac{b^n}{2} - \frac{b^{n-1}}{2}, |

1759 | \frac{b^n}{2} + \frac{b^{n-1}}{2} -1 |

1760 | \right] . |

1761 | \end{eqnarray} |

1762 | |

1763 | Note in this case that the estimation error ($v - v_{n-1}b^{n-1}$) and the |

1764 | value of $v$ are not independent; and in fact it is this aspect |

1765 | of the problem that has led to the violation of |

1766 | (\ref{eq:lem:ccil0:sidv0:sgdu0:02:13b}) with $b$ odd and |

1767 | $v_{n-1} = \lfloor b/2 \rfloor$. |

1768 | |

1769 | In order to prove the case of $b$ odd and $v = \lfloor b/2 \rfloor = b/2 - 1/2$, |

1770 | we must reexamine some simplifying assumptions made earlier in order to obtain |

1771 | tighter inequalities. In (\ref{eq:lem:ccil0:sidv0:sgdu0:02:02}), we can no |

1772 | longer accept the maximum of $(u \bmod v)/v$ as one; instead we construct the |

1773 | tighter inequality |

1774 | |

1775 | \begin{eqnarray} |

1776 | \label{eq:lem:ccil0:sidv0:sgdu0:02:15} |

1777 | & \displaystyle \hat{q} - q \leq \frac{u_n b + u_{n-1}}{v_{n-1}} |

1778 | - \frac{u}{v} |

1779 | + \frac{u \bmod v}{v} & \\ |

1780 | \nonumber & \vworkvimp & \\ |

1781 | \label{eq:lem:ccil0:sidv0:sgdu0:02:16} |

1782 | & \displaystyle \hat{q} - q \leq \frac{u_n b + u_{n-1}}{v_{n-1}} |

1783 | - \frac{u}{v} |

1784 | + \frac{v-1}{v} , & |

1785 | \end{eqnarray} |

1786 | |

1787 | which leads to |

1788 | |

1789 | \begin{equation} |

1790 | \label{eq:lem:ccil0:sidv0:sgdu0:02:17} |

1791 | \frac{u_n b + u_{n-1}}{v_{n-1}} |

1792 | - \frac{u+1}{v} |

1793 | < 2 . |

1794 | \end{equation} |

1795 | |

1796 | In order to maximize the left-hand side of |

1797 | (\ref{eq:lem:ccil0:sidv0:sgdu0:02:17}), we assume that we |

1798 | estimate $u$ exactly so that $u = u_n b^n + u_{n-1} b^{n-1}$, |

1799 | yielding |

1800 | |

1801 | \begin{equation} |

1802 | \label{eq:lem:ccil0:sidv0:sgdu0:02:18} |

1803 | \frac{u}{v_{n-1} b^{n-1}} |

1804 | - \frac{u+1}{v} |

1805 | < 2 . |

1806 | \end{equation} |

1807 | |

1808 | We also assume that $u$ is the maximum value |

1809 | possible, $u=bv-1$, leading to |

1810 | |

1811 | \begin{equation} |

1812 | \label{eq:lem:ccil0:sidv0:sgdu0:02:19} |

1813 | \frac{bv-1}{v_{n-1} b^{n-1}} |

1814 | - b |

1815 | < 2 . |

1816 | \end{equation} |

1817 | |

1818 | Finally, we assume that $v$ is the upper limit in |

1819 | (\ref{eq:lem:ccil0:sidv0:sgdu0:02:14}), |

1820 | $v=b^n/2 + b^{n-1}/2 - 1$, and substitute the known value of |

1821 | $v_{n-1}$ for the case being proved, $v_{n-1} = b/2-1/2$, yielding |

1822 | |

1823 | \begin{equation} |

1824 | \label{eq:lem:ccil0:sidv0:sgdu0:02:20} |

1825 | \frac{b \left( \frac{b^n}{2} + \frac{b^{n-1}}{2} - 1 \right) - 1} |

1826 | {\left( \frac{b}{2} - \frac{1}{2} \right) b^{n-1}} |

1827 | - b |

1828 | < 2 . |

1829 | \end{equation} |

1830 | |

1831 | Simplification of (\ref{eq:lem:ccil0:sidv0:sgdu0:02:20}) |

1832 | will establish that it is always true. This completes the proof. |

1833 | \end{vworklemmaproof} |

1834 | \vworklemmafooter{} |

1835 | |

1836 | Lemmas \ref{lem:ccil0:sidv0:sgdu0:01} and |

1837 | \ref{lem:ccil0:sidv0:sgdu0:02}, |

1838 | standing alone, lead to a good implementation of |

1839 | division without any further results. If it is known that |

1840 | $0 \leq \hat{q} - q \leq 2$, Algorithm \ref{alg:ccil0:sidv0:sgdu0:01} |

1841 | can be trivially modified to only calculate |

1842 | $\hat{q}$ (omitting the additional tests in Step \ref{enumstep:alg:ccil0:sidv0:sgdu0:01:03}), |

1843 | and then |

1844 | to include up to two add-back steps |

1845 | (duplication of Step \ref{enumstep:alg:ccil0:sidv0:sgdu0:01:06}). |

1846 | Although such an algorithm would be |

1847 | satisfactory, it has the disadvantage that the add-back steps would |

1848 | be executed very frequently, slowing the algorithm substantially, especially |

1849 | for long operands. |

1850 | We now show that the additional tests in Step \ref{enumstep:alg:ccil0:sidv0:sgdu0:01:03} of |

1851 | Algorithm \ref{alg:ccil0:sidv0:sgdu0:01} can eliminate |

1852 | altogether the case of $\hat{q}-q = 2$ (Lemma \ref{lem:ccil0:sidv0:sgdu0:03}), and |

1853 | can with a probability close to unity eliminate the |

1854 | case of $\hat{q}-q = 1$ (Lemmas \ref{lem:ccil0:sidv0:sgdu0:04} |

1855 | and \ref{lem:ccil0:sidv0:sgdu0:05}). Together these tests, which are present in |

1856 | the statement of Algorithm \ref{alg:ccil0:sidv0:sgdu0:01}, |

1857 | reduce add-back (Step \ref{enumstep:alg:ccil0:sidv0:sgdu0:01:06}) to rare occurrence, and create a more |

1858 | efficient algorithm than would be possible with the |

1859 | results of Lemmas \ref{lem:ccil0:sidv0:sgdu0:01} and \ref{lem:ccil0:sidv0:sgdu0:02} alone. |

1860 | |

1861 | \begin{vworklemmastatementpar} |

1862 | {\mbox{\boldmath$\hat{q} v_{n-2} \leq b \hat{r} + u_{j+n-2} \vworkhimp 0 \leq \hat{q} - q \leq 1$}} |

1863 | \label{lem:ccil0:sidv0:sgdu0:03} |

1864 | If the divisor normalization requirement ($v_{n-1} \geq \lfloor b/2 \rfloor$) as specified in |

1865 | Step \ref{enumstep:alg:ccil0:sidv0:sgdu0:01:01} of |

1866 | Algorithm \ref{alg:ccil0:sidv0:sgdu0:01} is met, then |

1867 | |

1868 | \begin{equation} |

1869 | \label{eq:lem:ccil0:sidv0:sgdu0:03:01} |

1870 | \hat{q} v_{n-2} \leq b \hat{r} + u_{j+n-2} \vworkhimp 0 \leq \hat{q} - q \leq 1 . |

1871 | \end{equation} |

1872 | \end{vworklemmastatementpar} |

1873 | \begin{vworklemmaproof} |

1874 | For reference, note that: |

1875 | |

1876 | \begin{eqnarray} |

1877 | \label{eq:lem:ccil0:sidv0:sgdu0:03:02} |

1878 | u & = & u_{j+n} b^{j+n} + u_{j+n-1} b^{j+n-1} + \ldots{} + u_1 b + u_0 \\ |

1879 | \label{eq:lem:ccil0:sidv0:sgdu0:03:03} |

1880 | v & = & v_{n-1} b^{n-1} + v_{n-2} b^{n-2} + \ldots{} + v_1 b + v_0 |

1881 | \end{eqnarray} |

1882 | |

1883 | By definition, we the remainder |

1884 | $\hat{r}$ has the value |

1885 | $\hat{r} = u_{j+n} b + u_{j+n-1} - \hat{q} v_{n-1}$. Substituting this value |

1886 | into (\ref{eq:lem:ccil0:sidv0:sgdu0:03:01}) produces |

1887 | |

1888 | \begin{equation} |

1889 | \label{eq:lem:ccil0:sidv0:sgdu0:03:04} |

1890 | \hat{q} v_{n-2} \leq b (u_{j+n} b + u_{j+n-1} - \hat{q} v_{n-1}) + u_{j+n-2} , |

1891 | \end{equation} |

1892 | |

1893 | and solving for $\hat{q}$ yields |

1894 | |

1895 | \begin{equation} |

1896 | \label{eq:lem:ccil0:sidv0:sgdu0:03:05} |

1897 | \hat{q} \leq \frac{u_{j+n} b^2 + u_{j+n-1}b + u_{j+n-2}}{v_{n-1}b + v_{n-2}}. |

1898 | \end{equation} |

1899 | |

1900 | It is known from Lemma \ref{lem:ccil0:sidv0:sgdu0:01} that the type of estimate |

1901 | represented by the floor of the right-hand size of (\ref{eq:lem:ccil0:sidv0:sgdu0:03:05}) |

1902 | can be no less than $q$, leading to |

1903 | |

1904 | \begin{eqnarray} |

1905 | \label{eq:lem:ccil0:sidv0:sgdu0:03:06} |

1906 | & \displaystyle q \leq \hat{q} |

1907 | \leq |

1908 | \left\lfloor { \frac{u_{j+n} b^2 + u_{j+n-1}b + u_{j+n-2}}{v_{n-1}b + v_{n-2}} } \right\rfloor & \\ |

1909 | \nonumber & \displaystyle \leq |

1910 | \frac{u_{j+n} b^2 + u_{j+n-1}b + u_{j+n-2}}{v_{n-1}b + v_{n-2}}. & |

1911 | \end{eqnarray} |

1912 | |

1913 | Because $q, \hat{q} \in \vworkintset$, it is only necessary to prove that |

1914 | |

1915 | \begin{equation} |

1916 | \label{eq:lem:ccil0:sidv0:sgdu0:03:07} |

1917 | \frac{u_{j+n} b^2 + u_{j+n-1}b + u_{j+n-2}}{v_{n-1}b + v_{n-2}} - |

1918 | \left\lfloor \frac{u}{v} \right\rfloor |

1919 | < 2 |

1920 | \end{equation} |

1921 | |

1922 | in order to prove that $\hat{q}-q \leq 1$. Using |

1923 | (\cmtnzeroxrefhyphen\ref{eq:cmtn0:sfcf0:02}), |

1924 | (\ref{eq:lem:ccil0:sidv0:sgdu0:03:07}) can be rewritten as |

1925 | |

1926 | \begin{equation} |

1927 | \label{eq:lem:ccil0:sidv0:sgdu0:03:08} |

1928 | \frac{u_{j+n} b^2 + u_{j+n-1}b + u_{j+n-2}}{v_{n-1}b + v_{n-2}} - |

1929 | \frac{u}{v} - |

1930 | \frac{u \bmod v}{v} |

1931 | < 2 . |

1932 | \end{equation} |

1933 | |

1934 | In order for implication to hold, we must make the most pessimistic |

1935 | assumptions about $u \bmod v$ (those which maximize it). The maximum value |

1936 | of $u \bmod v$ is $v-1$, leading to |

1937 | |

1938 | \begin{equation} |

1939 | \label{eq:lem:ccil0:sidv0:sgdu0:03:09} |

1940 | \frac{u_{j+n} b^2 + u_{j+n-1}b + u_{j+n-2}}{v_{n-1}b + v_{n-2}} - |

1941 | \frac{u + 1}{v} |

1942 | < 1 . |

1943 | \end{equation} |

1944 | |

1945 | In order to maximize the left side of (\ref{}), |

1946 | we must assume that $u$ is maximized, $v$ is minimized, |

1947 | |

1948 | |

1949 | \end{vworklemmaproof} |

1950 | \vworklemmafooter{} |

1951 | |

1952 | \begin{vworklemmastatementpar} |

1953 | {\mbox{\boldmath$\hat{q} v_{n-2} > b \hat{r} + u_{n-2} \vworkhimp q < \hat{q}$}} |

1954 | \label{lem:ccil0:sidv0:sgdu0:04} |

1955 | Given $\hat{q} > 0$, an estimate of $q$, and the remainder |

1956 | based on the estimate, $\hat{r} = u_n b + u_{n-1} - \hat{q} v_{n-1}$, |

1957 | |

1958 | \begin{equation} |

1959 | \label{eq:lem:ccil0:sidv0:sgdu0:04:01} |

1960 | \hat{q} v_{n-2} > b \hat{r} + u_{n-2} \vworkhimp \hat{q} > q . |

1961 | \end{equation} |

1962 | \end{vworklemmastatementpar} |

1963 | \begin{vworklemmaproof} |

1964 | We make the assumption that $v_{n-1}$ and $v_{n-2}$ are not both 0. |

1965 | $v_{n-1} > 0$ is guaranteed by |

1966 | the normalization of |

1967 | $v$ in Algorithm \ref{alg:ccil0:sidv0:sgdu0:01}. |

1968 | |

1969 | \begin{equation} |

1970 | \label{eq:lem:ccil0:sidv0:sgdu0:04:02} |

1971 | \hat{q} v_{n-2} > b \hat{r} + u_{n-2} |

1972 | \end{equation} |

1973 | |

1974 | \begin{equation} |

1975 | \label{eq:lem:ccil0:sidv0:sgdu0:04:03} |

1976 | \hat{q} v_{n-2} > b (u_n b + u_{n-1} - \hat{q} v_{n-1}) + u_{n-2} |

1977 | \end{equation} |

1978 | |

1979 | Solving (\ref{eq:lem:ccil0:sidv0:sgdu0:04:03}) for $\hat{q}$ yields |

1980 | |

1981 | \begin{equation} |

1982 | \label{eq:lem:ccil0:sidv0:sgdu0:04:04} |

1983 | \hat{q} |

1984 | > |

1985 | \frac{u_n b^2 + u_{n-1} b + u_{n-2}}{v_{n-1} b + v_{n-2}} |

1986 | \geq |

1987 | \left\lfloor {\frac{u_n b^2 + u_{n-1} b + u_{n-2}}{v_{n-1} b + v_{n-2}}} \right\rfloor |

1988 | . |

1989 | \end{equation} |

1990 | |

1991 | Note that the right-hand term of |

1992 | (\ref{eq:lem:ccil0:sidv0:sgdu0:04:04}) |

1993 | is similar in form to the estimate $\hat{q}$ in |

1994 | Lemma \ref{lem:ccil0:sidv0:sgdu0:01}, where it is proved that |

1995 | $\hat{q} \geq q$. It is possible to use identical algebraic technique |

1996 | as is used in Lemma \ref{lem:ccil0:sidv0:sgdu0:01} in order to prove that |

1997 | |

1998 | \begin{equation} |

1999 | \label{eq:lem:ccil0:sidv0:sgdu0:04:05} |

2000 | \left\lfloor {\frac{u_n b^2 + u_{n-1} b + u_{n-2}}{v_{n-1} b + v_{n-2}}} \right\rfloor |

2001 | \geq q, |

2002 | \end{equation} |

2003 | |

2004 | and it follows that |

2005 | |

2006 | \begin{equation} |

2007 | \label{eq:lem:ccil0:sidv0:sgdu0:04:06} |

2008 | \hat{q} |

2009 | > |

2010 | \frac{u_n b^2 + u_{n-1} b + u_{n-2}}{v_{n-1} b + v_{n-2}} |

2011 | \geq |

2012 | \left\lfloor {\frac{u_n b^2 + u_{n-1} b + u_{n-2}}{v_{n-1} b + v_{n-2}}} \right\rfloor |

2013 | \geq q, |

2014 | \end{equation} |

2015 | |

2016 | and thus $\hat{q} > q$. |

2017 | \end{vworklemmaproof} |

2018 | \vworklemmafooter{} |

2019 | |

2020 | \begin{vworklemmastatementpar} |

2021 | {Add-back occurs with approximate probability \mbox{\boldmath$2/b$}} |

2022 | \label{lem:ccil0:sidv0:sgdu0:05} |

2023 | The estimate of $q$ provided by (\ref{eq:ccil0:sidv0:sgdu0:01}), |

2024 | \end{vworklemmastatementpar} |

2025 | \begin{vworklemmaproof} |

2026 | \end{vworklemmaproof} |

2027 | \vworklemmafooter{} |

2028 | |

2029 | |

2030 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

2031 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

2032 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

2033 | \subsection{Division Of Signed Operands} |

2034 | |

2035 | |

2036 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

2037 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

2038 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

2039 | \subsection{Large Dividends With Machine-Native Divisors} |

2040 | \label{ccil0:sidv0:sldm0} |

2041 | |

2042 | Division of arbitrary-sized operands (Section \ref{ccil0:sidv0:sgdu0}) is |

2043 | a costly operation. In many practical applications, we are able to exploit |

2044 | data sizes of operands or special relationships between the values of |

2045 | operands to use the instruction set of the machine more effectively. |

2046 | In this subsection, we investigate what optimizations we may achieve when: |

2047 | |

2048 | \begin{itemize} |

2049 | \item We wish to calculate the quotient and remainder of |

2050 | unsigned integers $p$ and $q$: $p/q$ and |

2051 | $p \bmod{} q$; \emph{and} |

2052 | \item The machine possesses unsigned division instructions |

2053 | which provide both a quotient and a remainder from |

2054 | a division; \emph{and} |

2055 | \item The bitsize of the divisor $q$ is not larger than can |

2056 | be accomodated (as a divisor) by machine division instructions. |

2057 | \end{itemize} |

2058 | |

2059 | Processors which possess integer division instructions usually |

2060 | possess one of two types of instructions: |

2061 | |

2062 | \begin{itemize} |

2063 | \item Instructions where the the divisor, quotient and remainder are |

2064 | $Q$ bits, but the dividend is $2Q$ bits (we call these |

2065 | ``large dividend'' instructions). For example, an |

2066 | instruction which accepts a 16-bit dividend and an |

2067 | 8-bit divisor to produce an 8-bit quotient and an 8-bit remainder is |

2068 | typical. |

2069 | With such instructions, overflow is possible, and is always detectable. |

2070 | However, in this subsection we never describe algorithms which detect |

2071 | overflow---instead, we arrange for data values which cannot generate an |

2072 | overflow. |

2073 | \item Instructions where the dividend, divisor, quotient, and remainder |

2074 | are all $Q$ bits (we call these ``small dividend'' instructions). |

2075 | With such instructions, overflow is not possible. |

2076 | \end{itemize} |

2077 | |

2078 | We call the bitsize $Q$ a \emph{chunk}. We use \emph{chunk} rather than |

2079 | \emph{word} because the chunksize and wordsize in general are not |

2080 | required to be the same. |

2081 | |

2082 | For the remainder of this discussion, we assume large dividend |

2083 | instructions (the first category above). |

2084 | The algorithms developed can be implemented on small dividend machines |

2085 | by halving data sizes so that the divisor fills no more than half |

2086 | of the available bits. |

2087 | |

2088 | We assume that we are interested in calculating the integer quotient |

2089 | $\lfloor{}p/q\rfloor$ and remainder $p \bmod{} q$ of two unsigned |

2090 | integers $p$ and $q$, where $p$ is of size $P$ bits and $q$ is |

2091 | of size $Q$ bits. For simplicity and without detracting from the |

2092 | generality of the solution, we assume that $Q \vworkdivides{} P$. |

2093 | |

2094 | We then seek to calculate |

2095 | |

2096 | \begin{eqnarray} |

2097 | \label{eq:ccil0:sidv0:sldm0:001} |

2098 | \frac{p}{q} & = & \frac{2^{P-Q} p_{[P/Q-1]} + 2^{P-2Q} p_{[P/Q-2]} + \ldots{} + 2^{Q} p_{[1]} + p_{[0]}}{q_{[0]}} \\ |

2099 | \nonumber & = & \frac{\sum_{i=0}^{P/Q-1} 2^{iQ} p_{[i]}}{q_{[0]}} . |

2100 | \end{eqnarray} |

2101 | |

2102 | \noindent{}Using the integer identity |

2103 | |

2104 | \begin{equation} |

2105 | \label{eq:ccil0:sidv0:sldm0:002} |

2106 | \frac{a}{b} = \left\lfloor{\frac{a}{b}}\right\rfloor + \frac{a \bmod b}{b} , |

2107 | \end{equation} |

2108 | |

2109 | \noindent{}we can reform (\ref{eq:ccil0:sidv0:sldm0:001}) into |

2110 | |

2111 | \begin{eqnarray} |

2112 | \nonumber |

2113 | \frac{p}{q} & = & 2^{P-Q} \left\lfloor{\frac{p_{[P/Q-1]}}{q_{[0]}}}\right\rfloor \\ |

2114 | \label{eq:ccil0:sidv0:sldm0:003} |

2115 | & + & 2^{P-Q} \frac{p_{[P/Q-1]}\bmod q_{[0]}}{q_{[0]}} |

2116 | + 2^{P-2Q} \frac{p_{[P/Q-2]}}{q_{[0]}} \\ |

2117 | \nonumber & + & \frac{\sum_{i=0}^{P/Q-3} 2^{iQ} p_{[i]}}{q_{[0]}} , |

2118 | \end{eqnarray} |

2119 | |

2120 | \noindent{}which can be polished slightly to yield |

2121 | |

2122 | \begin{eqnarray} |

2123 | \nonumber |

2124 | \frac{p}{q} & = & 2^{P-Q} \left\lfloor{\frac{p_{[P/Q-1]}}{q_{[0]}}}\right\rfloor \\ |

2125 | \label{eq:ccil0:sidv0:sldm0:004} |

2126 | & + & 2^{P-2Q} \frac{2^Q (p_{[P/Q-1]}\bmod q_{[0]}) + p_{[P/Q-2]}}{q_{[0]}} \\ |

2127 | \nonumber & + & \frac{\sum_{i=0}^{P/Q-3} 2^{iQ} p_{[i]}}{q_{[0]}} . |

2128 | \end{eqnarray} |

2129 | |

2130 | Note in (\ref{eq:ccil0:sidv0:sldm0:004}) that the first term, |

2131 | $\lfloor{}p_{[P/Q-1]} / q_{[0]}\rfloor$, as well as |

2132 | a portion of the second term, $p_{[P/Q-1]}\bmod q_{[0]}$, can be |

2133 | calculated using a single machine division instruction with |

2134 | $p_{[P/Q-1]}$ as the dividend and $q_{[0]}$ as the divisor. |

2135 | Note also that multiplication of an integer by a power of 2 |

2136 | can be achieved by placing the integer correctly within the |

2137 | result. In this regard note that $Q=8$ or $Q=16$ are |

2138 | the most typical cases, and so the placement can be achieved simply |

2139 | by selecting the correct memory address. |

2140 | |

2141 | It is initially unclear whether we can evaluate or reduce the fraction in the |

2142 | second term of (\ref{eq:ccil0:sidv0:sldm0:004}), |

2143 | $[2^Q (p_{[P/Q-1]}\bmod q_{[0]}) + p_{[P/Q-2]}] / q_{[0]}$, |

2144 | using a single large dividend machine instruction, because |

2145 | the upper chunk of the dividend is populated with non-zero bits |

2146 | (specifically, $p_{[P/Q-1]}\bmod q_{[0]}$), and it seems that |

2147 | a division overflow may be possible. However, with some thought, |

2148 | it is clear that $p_{[P/Q-1]}\bmod q_{[0]} \leq q_{[0]} - 1$ and |

2149 | $p_{[P/Q-2]} \leq 2^Q - 1$, thus the largest numerator possible |

2150 | is $2^Q q_{[0]} - 1$, which, when divided by $q_{[0]}$, will result |

2151 | in a quotient and remainder of $2^Q - 1$. Thus, no division overflow |

2152 | can occur, and the fraction in the |

2153 | second term of (\ref{eq:ccil0:sidv0:sldm0:004}) can be evaluated |

2154 | using a large divisor integer machine instruction. |

2155 | |

2156 | The fraction in the |

2157 | second term of (\ref{eq:ccil0:sidv0:sldm0:004}) can be simplified |

2158 | using (\ref{eq:ccil0:sidv0:sldm0:002}) to yield: |

2159 | |

2160 | \begin{eqnarray} |

2161 | \nonumber |

2162 | \frac{p}{q} & = & 2^{P-Q} \left\lfloor{\frac{p_{[P/Q-1]}}{q_{[0]}}}\right\rfloor \\ |

2163 | \label{eq:ccil0:sidv0:sldm0:005} |

2164 | & + & 2^{P-2Q} \left\lfloor{\frac{2^Q (p_{[P/Q-1]}\bmod q_{[0]}) + p_{[P/Q-2]}}{q_{[0]}}}\right\rfloor \\ |

2165 | \nonumber & + & 2^{P-2Q} \frac{(2^Q (p_{[P/Q-1]}\bmod q_{[0]}) + p_{[P/Q-2]}) \bmod q_{[0]}}{q_{[0]}} \\ |

2166 | \nonumber & + & \frac{\sum_{i=0}^{P/Q-3} 2^{iQ} p_{[i]}}{q_{[0]}} . |

2167 | \end{eqnarray} |

2168 | |

2169 | The process of combining adjacent terms can be continued until all |

2170 | divisions and modulo operations necessary can be carried out using |

2171 | long dividend division instructions. If we envision a |

2172 | long-dividend division instruction as a functional block that |

2173 | accepts a $2Q$-bit dividend and a $Q$-bit divisor to produce a |

2174 | $Q$-bit quotient and a $Q$-bit remainder |

2175 | (Figure \ref{fig:ccil0:sidv0:sldm0:00}), then we can draw the |

2176 | entire division as outlined by (\ref{eq:ccil0:sidv0:sldm0:005}) |

2177 | as shown in Figure \ref{fig:ccil0:sidv0:sldm0:01}. |

2178 | |

2179 | \begin{figure} |

2180 | \centering |

2181 | \includegraphics[width=4.6in]{c_cil0/lddvblk.eps} |

2182 | \caption{Long Dividend Division Machine Instruction As A Functional Block} |

2183 | \label{fig:ccil0:sidv0:sldm0:00} |

2184 | \end{figure} |

2185 | |

2186 | \begin{figure} |

2187 | \centering |

2188 | \includegraphics[width=4.6in]{c_cil0/ldmnblk.eps} |

2189 | \caption{Long Dividend/Machine-Native Divisor Division In Functional Block Form} |

2190 | \label{fig:ccil0:sidv0:sldm0:01} |

2191 | \end{figure} |

2192 | |

2193 | The following example illustrates how to apply the technique. |

2194 | |

2195 | \begin{vworkexamplestatement} |

2196 | \label{ex:ccil0:sidv0:sldm0:01} |

2197 | Implement 32/8 unsigned division on the TMS370C8 processor, which is |

2198 | characterized by a 16/8 division instruction. |

2199 | \end{vworkexamplestatement} |

2200 | \begin{vworkexampleparsection}{Solution} |

2201 | It would be possible to prepare an implementation directly from |

2202 | Figure \ref{fig:ccil0:sidv0:sldm0:01}: however, it may be |

2203 | more instructive to work through a solution without the |

2204 | aid of this figure. |

2205 | |

2206 | In the case of the TMS370C8, the chunk size $Q$ is 8 bits; therefore |

2207 | $Q=8$. The problem statement indicates that we must accept 32-bit dividends; |

2208 | therefore $P=32$. Thus |

2209 | |

2210 | \begin{equation} |

2211 | \label{eq:ex:ccil0:sidv0:sldm0:01:001} |

2212 | p = 2^{24} p_{[3]} + 2^{16} p_{[2]} + 2^{8} p_{[1]} + p_{[0]} |

2213 | \end{equation} |

2214 | |

2215 | \noindent{}and |

2216 | |

2217 | \begin{equation} |

2218 | \label{eq:ex:ccil0:sidv0:sldm0:01:002} |

2219 | q = q_{[0]} . |

2220 | \end{equation} |

2221 | |

2222 | \noindent{}Thus the quotient and remainder we would like to determine, |

2223 | $\lfloor p/q \rfloor$ and $p \bmod q$, can be obtained by repeated |

2224 | application of (\ref{eq:ccil0:sidv0:sldm0:002}) as shown |

2225 | in Equations (\ref{eq:ex:ccil0:sidv0:sldm0:01:003}) |

2226 | through (\ref{eq:ex:ccil0:sidv0:sldm0:01:011}). |

2227 | |

2228 | \begin{equation} |

2229 | \label{eq:ex:ccil0:sidv0:sldm0:01:003} |

2230 | \frac{p}{q} |

2231 | = |

2232 | \frac{2^{24} p_{[3]} + 2^{16} p_{[2]} + 2^{8} p_{[1]} + p_{[0]}}{q_{[0]}} |

2233 | \end{equation} |

2234 | |

2235 | \begin{equation} |

2236 | \label{eq:ex:ccil0:sidv0:sldm0:01:004} |

2237 | \frac{p}{q} |

2238 | = |

2239 | 2^{24} \frac{p_{[3]}}{q_{[0]}} |

2240 | + |

2241 | 2^{16} \frac{p_{[2]}}{q_{[0]}} |

2242 | + |

2243 | 2^{8} \frac{p_{[1]}}{q_{[0]}} |

2244 | + |

2245 | \frac{p_{[0]}}{q_{[0]}} |

2246 | \end{equation} |

2247 | |

2248 | \begin{equation} |

2249 | \label{eq:ex:ccil0:sidv0:sldm0:01:005} |

2250 | \frac{p}{q} |

2251 | = |

2252 | 2^{24} \left\lfloor{\frac{p_{[3]}}{q_{[0]}}}\right\rfloor |

2253 | + |

2254 | 2^{24} \frac{(p_{[3]} \bmod q_{[0]})}{q_{[0]}} |

2255 | + |

2256 | 2^{16} \frac{p_{[2]}}{q_{[0]}} |

2257 | + |

2258 | 2^{8} \frac{p_{[1]}}{q_{[0]}} |

2259 | + |

2260 | \frac{p_{[0]}}{q_{[0]}} |

2261 | \end{equation} |

2262 | |

2263 | \begin{equation} |

2264 | \label{eq:ex:ccil0:sidv0:sldm0:01:006} |

2265 | \frac{p}{q} |

2266 | = |

2267 | 2^{24} \left\lfloor{\frac{p_{[3]}}{q_{[0]}}}\right\rfloor |

2268 | + |

2269 | 2^{16} \frac{2^8 (p_{[3]} \bmod q_{[0]}) + p_{[2]}}{q_{[0]}} |

2270 | + |

2271 | 2^{8} \frac{p_{[1]}}{q_{[0]}} |

2272 | + |

2273 | \frac{p_{[0]}}{q_{[0]}} |

2274 | \end{equation} |

2275 | |

2276 | \begin{eqnarray} |

2277 | \label{eq:ex:ccil0:sidv0:sldm0:01:007} |

2278 | \frac{p}{q} & = & 2^{24} \left\lfloor{\frac{p_{[3]}}{q_{[0]}}}\right\rfloor |

2279 | + |

2280 | 2^{16} \left\lfloor{\frac{2^8 (p_{[3]} \bmod q_{[0]}) + p_{[2]}}{q_{[0]}}}\right\rfloor \\ |

2281 | \nonumber & + & |

2282 | 2^{16} \frac{(2^8 (p_{[3]} \bmod q_{[0]}) + p_{[2]}) \bmod q_{[0]}}{q_{[0]}} |

2283 | + |

2284 | 2^{8} \frac{p_{[1]}}{q_{[0]}} |

2285 | + |

2286 | \frac{p_{[0]}}{q_{[0]}} |

2287 | \end{eqnarray} |

2288 | |

2289 | \begin{eqnarray} |

2290 | \label{eq:ex:ccil0:sidv0:sldm0:01:008} |

2291 | \frac{p}{q} & = & 2^{24} \left\lfloor{\frac{p_{[3]}}{q_{[0]}}}\right\rfloor |

2292 | + |

2293 | 2^{16} \left\lfloor{\frac{2^8 (p_{[3]} \bmod q_{[0]}) + p_{[2]}}{q_{[0]}}}\right\rfloor \\ |

2294 | \nonumber & + & |

2295 | 2^{8} \frac{2^8((2^8 (p_{[3]} \bmod q_{[0]}) + p_{[2]}) \bmod q_{[0]}) + p_{[1]}}{q_{[0]}} |

2296 | + |

2297 | \frac{p_{[0]}}{q_{[0]}} |

2298 | \end{eqnarray} |

2299 | |

2300 | \begin{eqnarray} |

2301 | \nonumber\frac{p}{q} & = & 2^{24} \left\lfloor{\frac{p_{[3]}}{q_{[0]}}}\right\rfloor |

2302 | + |

2303 | 2^{16} \left\lfloor{\frac{2^8 (p_{[3]} \bmod q_{[0]}) + p_{[2]}}{q_{[0]}}}\right\rfloor \\ |

2304 | \label{eq:ex:ccil0:sidv0:sldm0:01:009} |

2305 | & + & |

2306 | 2^{8} \left\lfloor{\frac{2^8((2^8 (p_{[3]} \bmod q_{[0]}) + p_{[2]}) \bmod q_{[0]}) + p_{[1]}}{q_{[0]}}}\right\rfloor \\ |

2307 | \nonumber & + & |

2308 | 2^{8} \frac{(2^8((2^8 (p_{[3]} \bmod q_{[0]}) + p_{[2]}) \bmod q_{[0]}) + p_{[1]}) \bmod q_{[0]}}{q_{[0]}} \\ |

2309 | \nonumber & + & |

2310 | \frac{p_{[0]}}{q_{[0]}} |

2311 | \end{eqnarray} |

2312 | |

2313 | \begin{eqnarray} |

2314 | \nonumber\frac{p}{q} & = & 2^{24} \left\lfloor{\frac{p_{[3]}}{q_{[0]}}}\right\rfloor |

2315 | + |

2316 | 2^{16} \left\lfloor{\frac{2^8 (p_{[3]} \bmod q_{[0]}) + p_{[2]}}{q_{[0]}}}\right\rfloor \\ |

2317 | \label{eq:ex:ccil0:sidv0:sldm0:01:010} |

2318 | & + & |

2319 | 2^{8} \left\lfloor{\frac{2^8((2^8 (p_{[3]} \bmod q_{[0]}) + p_{[2]}) \bmod q_{[0]}) + p_{[1]}}{q_{[0]}}}\right\rfloor \\ |

2320 | \nonumber & + & |

2321 | \frac{2^8((2^8((2^8 (p_{[3]} \bmod q_{[0]}) + p_{[2]}) \bmod q_{[0]}) + p_{[1]}) \bmod q_{[0]}) + p_{[0]}}{q_{[0]}} |

2322 | \end{eqnarray} |

2323 | |

2324 | \begin{eqnarray} |

2325 | \nonumber & \displaystyle \frac{p}{q} = 2^{24} \left\lfloor{\frac{p_{[3]}}{q_{[0]}}}\right\rfloor |

2326 | + |

2327 | 2^{16} \left\lfloor{\frac{2^8 (p_{[3]} \bmod q_{[0]}) + p_{[2]}}{q_{[0]}}}\right\rfloor & \\ |

2328 | \label{eq:ex:ccil0:sidv0:sldm0:01:011} |

2329 | & \displaystyle + |

2330 | 2^{8} \left\lfloor{\frac{2^8((2^8 (p_{[3]} \bmod q_{[0]}) + p_{[2]}) \bmod q_{[0]}) + p_{[1]}}{q_{[0]}}}\right\rfloor & \\ |

2331 | \nonumber & \displaystyle + |

2332 | \left\lfloor{\frac{2^8((2^8((2^8 (p_{[3]} \bmod q_{[0]}) + p_{[2]}) \bmod q_{[0]}) + p_{[1]}) \bmod q_{[0]}) + p_{[0]}}{q_{[0]}}}\right\rfloor + & \\ |

2333 | \nonumber |

2334 | & \displaystyle \frac{(2^8((2^8((2^8 (p_{[3]} \bmod q_{[0]}) + p_{[2]}) \bmod q_{[0]}) + p_{[1]}) \bmod q_{[0]}) + p_{[0]}) \bmod q_{[0]}}{q_{[0]}} |

2335 | & |

2336 | \end{eqnarray} |

2337 | |

2338 | Note several things about the implementation suggested by |

2339 | (\ref{eq:ex:ccil0:sidv0:sldm0:01:011}): |

2340 | |

2341 | \begin{itemize} |

2342 | \item No addition or multiplication is required to calculate terms such as |

2343 | $2^8 (p_{[3]} \bmod q_{[0]}) + p_{[2]}$. The high-order byte of the |

2344 | large dividend can be stuffed with $p_{[3]} \bmod q_{[0]}$ and |

2345 | the low-order byte with $p_{[2]}$. |

2346 | \item No addition or multiplication is required to calculate the |

2347 | result $\lfloor p/q \rfloor$. |

2348 | Note in (\ref{eq:ex:ccil0:sidv0:sldm0:01:011}) that the results are |

2349 | conveniently grouped as bytes with multipliers of $2^{24}$, |

2350 | $2^{16}$, $2^8$, and $2^0=1$. The terms can simply be placed into |

2351 | the appropriate byte, as a way of multplication by the appropriate |

2352 | power of 2. Note also that each term is guaranteed to be |

2353 | $\in \{0, 1, 2, \ldots{} , 255\}$, by the argument presented |

2354 | earlier for (\ref{eq:ccil0:sidv0:sldm0:004}). Thus, the |

2355 | addition will result in no carries, and each result byte can simply |

2356 | be placed directly in the correct memory location---addition is |

2357 | not necessary. |

2358 | \item Four machine division instructions are required, and the remainder |

2359 | is produced automatically by the fourth instruction. |

2360 | \end{itemize} |

2361 | |

2362 | An implemenation for the TMS370C8 is supplied as Figure |

2363 | \ref{fig:ex:ccil0:sidv0:sldm0:01:01}. A block diagram of the data |

2364 | flow for this implementation is supplied as |

2365 | Figure \ref{fig:ex:ccil0:sidv0:sldm0:01:02}. |

2366 | \end{vworkexampleparsection} |

2367 | \vworkexamplefooter{} |

2368 | |

2369 | \begin{figure} |

2370 | \begin{verbatim} |

2371 | ;Assume that byte memory locations p3, p2, p1, and p0 contain the |

2372 | ;32-bit unsigned dividend, and byte q0 contains the 8-bit unsigned |

2373 | ;divisor. Assume also that the result quotient will be placed |

2374 | ;in byte memory locations d3, d2, d1, and d0; and that the |

2375 | ;remainder will be placed in the byte memory location r0. Further |

2376 | ;assume that all memory locations are in the register file (near). |

2377 | CLR A ;High-order chunk of large divisor |

2378 | ;must be 0. |

2379 | MOV p3, B ;Load the low-order chunk of divisor. |

2380 | DIV q0, A ;Perform the first division. |

2381 | MOV A, d3 ;Quotient becomes this part of the |

2382 | ;result. |

2383 | MOV B, A ;Remainder becomes high-order chunk of |

2384 | ;next division. |

2385 | MOV p2, B ;Next byte becomes low-order chunk. |

2386 | DIV q0, A ;Do the second division. |

2387 | MOV A, d2 ;Quotient becomes this part of the |

2388 | ;result. |

2389 | MOV B, A ;Remainder becomes high-order chunk of |

2390 | ;next division. |

2391 | MOV p1, B ;Next byte becomes low-order chunk. |

2392 | DIV q0, A ;Do the third division. |

2393 | MOV A, d1 ;Quotient becomes this part of the |

2394 | ;result. |

2395 | MOV B, A ;Remainder becomes high-order chunk of |

2396 | ;next division. |

2397 | MOV p0, B ;Next byte becomes low-order chunk. |

2398 | DIV q0, A ;Do the fourth division. |

2399 | MOV A, d0 ;Quotient becomes this part of the |

2400 | ;result. |

2401 | MOV B, r0 ;This is the remainder, which could be used |

2402 | ;for rounding. |

2403 | \end{verbatim} |

2404 | \caption{TMS370C8 Code Snippet Illustrating Unsigned 32/8 |

2405 | Division Using Unsigned 16/8 |

2406 | Machine Instructions (Example \ref{ex:ccil0:sidv0:sldm0:01})} |

2407 | \label{fig:ex:ccil0:sidv0:sldm0:01:01} |

2408 | \end{figure} |

2409 | |

2410 | \begin{figure} |

2411 | \centering |

2412 | \includegraphics[width=4.6in]{c_cil0/t370dmp.eps} |

2413 | \caption{Block Diagram Of Data Flow Of Figure \ref{fig:ex:ccil0:sidv0:sldm0:01:01}} |

2414 | \label{fig:ex:ccil0:sidv0:sldm0:01:02} |

2415 | \end{figure} |

2416 | |

2417 | |

2418 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

2419 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

2420 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

2421 | \section{Miscellaneous Integer Mappings} |

2422 | %Section tag: MIM0 |

2423 | \label{ccil0:smim0} |

2424 | |

2425 | Embedded system work and ROM constraints often inspire a great deal |

2426 | of cleverness in the selection of instructions to perform mappings or |

2427 | tests. In this section, we discuss integer mappings (i.e. functions) |

2428 | for which economical implementations are known; and in the next section |

2429 | (Section \ref{ccil0:smit0}) |

2430 | we discuss integer tests for which economical implementations are known. |

2431 | |

2432 | To the best of our knowledge, there is no way to derive these mappings |

2433 | and tests---they have been collected from many software developers and |

2434 | come from human intuition and experience. |

2435 | |

2436 | |

2437 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

2438 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

2439 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

2440 | \subsection{Lowest-Order Bit} |

2441 | %Subsection tag: LIB0 |

2442 | \label{ccil0:smim0:slib0} |

2443 | |

2444 | \index{lowest-order bit} |

2445 | \index{least significant bit} |

2446 | |

2447 | The mapping |

2448 | |

2449 | \texttt{mask = x \& -x} |

2450 | |

2451 | \noindent{}is the most economical way known to extract the |

2452 | lowest-order bit set in an integer \texttt{x}, or |

2453 | 0 if no bits are set.\footnote{This mapping was contributed by |

2454 | David Baker (\texttt{bakerda@engin.umich.edu}) |

2455 | and Raul Selgado (\texttt{rselgado@visteon.com}).} Since most processors have an instruction to form the |

2456 | two's complement of an integer, this mapping usually requires only |

2457 | two arithmetic instructions. |

2458 | |

2459 | When implementing this mapping in assembly-language on processors without a |

2460 | two's complement instruction, two other possible implementations are: |

2461 | |

2462 | \begin{itemize} |

2463 | \item \texttt{mask = x \& ($\sim$x + 1)} |

2464 | \item \texttt{mask = x \& ((x \^{ } -1) + 1)} |

2465 | \end{itemize} |

2466 | |

2467 | |

2468 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

2469 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

2470 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

2471 | \section{Miscellaneous Integer Tests} |

2472 | %Section tag: MIT0 |

2473 | \label{ccil0:smit0} |

2474 | |

2475 | \subsection{Power Of 2} |

2476 | %Subsection tag: PTW0 |

2477 | \label{ccil0:smit0:sptw0} |

2478 | |

2479 | \index{power of two} |

2480 | \index{2N@$2^N$} |

2481 | |

2482 | The test |

2483 | |

2484 | \texttt{(x \& (x-1) == 0) \&\& (x != 0)} |

2485 | |

2486 | \noindent{}is the most economical way known to |

2487 | test whether an integer is a positive power of two |

2488 | (1, 2, 4, 8, 16, etc.).\footnote{The test appeared as part of |

2489 | a discussion on |

2490 | the GMP mailing list in 2002.} |

2491 | |

2492 | |

2493 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

2494 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

2495 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

2496 | \section{Exercises} |

2497 | |

2498 | \begin{vworkexercisestatement} |

2499 | \label{exe:ccil0:sexe0:01} |

2500 | Show that any $m$-bit two's complement integer $u_{[m-1:0]}$ except |

2501 | $-2^{m-1}$ can be negated by forming the one's complement, then adding one. |

2502 | \end{vworkexercisestatement} |

2503 | |

2504 | |

2505 | |

2506 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

2507 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

2508 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

2509 | \vfill |

2510 | \noindent\begin{figure}[!b] |

2511 | \noindent\rule[-0.25in]{\textwidth}{1pt} |

2512 | \begin{tiny} |

2513 | \begin{verbatim} |

2514 | $RCSfile: c_cil0.tex,v $ |

2515 | $Source: /home/dashley/cvsrep/e3ft_gpl01/e3ft_gpl01/dtaipubs/esrgubka/c_cil0/c_cil0.tex,v $ |

2516 | $Revision: 1.24 $ |

2517 | $Author: dtashley $ |

2518 | $Date: 2003/11/03 02:14:24 $ |

2519 | \end{verbatim} |

2520 | \end{tiny} |

2521 | \noindent\rule[0.25in]{\textwidth}{1pt} |

2522 | \end{figure} |

2523 | |

2524 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |

2525 | % $Log: c_cil0.tex,v $ |

2526 | % Revision 1.24 2003/11/03 02:14:24 dtashley |

2527 | % All duplicate labels as flagged by LaTeX removed. Additional files added |

2528 | % to Microsoft Visual Studio edit context. |

2529 | % |

2530 | % Revision 1.23 2002/08/26 17:57:03 dtashley |

2531 | % Additional solutions chapter added. Precautionary checkin to be sure |

2532 | % that I've captured all changes. |

2533 | % |

2534 | % Revision 1.22 2002/08/25 13:18:21 dtashley |

2535 | % Edits. |

2536 | % |

2537 | % Revision 1.21 2002/08/22 00:33:33 dtashley |

2538 | % Have made aesthetic changes in CFRY0 and CCFR0. Checking in all before |

2539 | % rebuild of book. |

2540 | % |

2541 | % Revision 1.20 2002/08/21 09:12:11 dtashley |

2542 | % Safety checkin. |

2543 | % |

2544 | % Revision 1.19 2002/08/19 05:06:45 dtashley |

2545 | % Substantial progress in integer numerics chapter. Preparing to |

2546 | % work at home. |

2547 | % |

2548 | % Revision 1.18 2002/08/10 07:34:48 dtashley |

2549 | % Checkin before resuming work on a lemma at home. |

2550 | % |

2551 | % Revision 1.17 2002/08/09 03:13:42 dtashley |

2552 | % Significant lemma proved, and other progress. |

2553 | % |

2554 | % Revision 1.16 2002/08/06 22:15:24 dtashley |

2555 | % Substantial lemma proof progress. |

2556 | % |

2557 | % Revision 1.15 2002/08/05 01:28:33 dtashley |

2558 | % Lemma demonstrating that 0 \leq \hat{q}-q \leq 2 completed. |

2559 | % |

2560 | % Revision 1.14 2002/08/03 23:04:57 dtashley |

2561 | % Safety checkin. Having difficulty: may have solved the wrong |

2562 | % problem. |

2563 | % |

2564 | % Revision 1.13 2002/08/01 01:31:24 dtashley |

2565 | % Safety checkin. Having difficulty completing an alternate version of |

2566 | % Knuth's proof about the accuracy of quotient estimates based on first |

2567 | % digits of dividend and divisor. |

2568 | % |

2569 | % Revision 1.12 2002/07/31 04:38:31 dtashley |

2570 | % Edits to basic integer algorithms chapter, safety checkin. |

2571 | % |

2572 | % Revision 1.11 2002/07/29 16:30:08 dtashley |

2573 | % Safety checkin before moving work back to WSU server Kalman. |

2574 | % |

2575 | % Revision 1.10 2002/07/26 20:39:43 dtashley |

2576 | % Safety checkin. |

2577 | % |

2578 | % Revision 1.9 2002/07/26 14:58:01 dtashley |

2579 | % Safety checkin. |

2580 | % |

2581 | % Revision 1.8 2002/07/21 16:19:28 dtashley |

2582 | % Finalization of material about integer division when dividend is |

2583 | % long but divisor is same as data size that machine can use as |

2584 | % divisor natively. |

2585 | % |

2586 | % Revision 1.7 2002/07/20 23:01:45 dtashley |

2587 | % Adding material about division with large dividend and machine-native |

2588 | % divisor. |

2589 | % |

2590 | % Revision 1.6 2002/06/11 04:34:19 dtashley |

2591 | % Information for David Baker added. |

2592 | % |

2593 | % Revision 1.5 2002/06/08 05:50:25 dtashley |

2594 | % Added missing tilde. Can't get it to be same size as other |

2595 | % characters--need to research that. |

2596 | % |

2597 | % Revision 1.4 2002/06/07 02:08:32 dtashley |

2598 | % Section added for integer mappings and tests. Mapping for lowest-order |

2599 | % bit set from Raul Selgado and Dave (last name presently unknown) |

2600 | % at Visteon added. Test for power of 2 added. |

2601 | % |

2602 | % Revision 1.3 2001/08/31 23:11:17 dtashley |

2603 | % End of August 2001 safety check-in. |

2604 | % |

2605 | % Revision 1.2 2001/08/27 20:06:08 dtashley |

2606 | % Edits and substantial re-organization. |

2607 | % |

2608 | % Revision 1.1 2001/08/25 22:51:25 dtashley |

2609 | % Complex re-organization of book. |

2610 | % |

2611 | %End of file C_CIL0.TEX |

dashley@gmail.com | ViewVC Help |

Powered by ViewVC 1.1.25 |