/[dtapublic]/to_be_filed/uculib01/doc/manual/c_tbg0/c_tbg0.tex
ViewVC logotype

Contents of /to_be_filed/uculib01/doc/manual/c_tbg0/c_tbg0.tex

Parent Directory Parent Directory | Revision Log Revision Log


Revision 30 - (show annotations) (download) (as text)
Sat Oct 8 07:22:17 2016 UTC (7 years, 5 months ago) by dashley
File MIME type: application/x-tex
File size: 32767 byte(s)
Initial commit.
1 %$Header: /home/dashley/cvsrep/uculib01/uculib01/doc/manual/c_tbg0/c_tbg0.tex,v 1.12 2010/04/05 15:13:50 dashley Exp $
2
3 \chapter{Technical Background}
4 \label{ctbg0}
5
6 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
7 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
8 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
9 \section{Introduction}
10 %Section tag: int0
11 \label{ctbg0:sint0}
12
13
14 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
15 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
16 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
17 \section{Interrupt Compatibility Issues}
18 %Section tag: ici0
19 \label{ctbg0:sici0}
20
21
22 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
23 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
24 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
25 \section{Linear Software Timers}
26 %Section tag: lst0
27 \label{ctbg0:slst0}
28
29 Several of the block memory functions described in
30 Chapter \ref{cbmf0} are designed primarily to
31 implement what we call here a
32 \index{software timer}\index{linear software timer}\emph{software timer} (although these functions
33 may have other uses).
34
35 We define a \index{software timer}\index{linear software timer}software timer as an unsigned integer
36 of any length convenient to the machine that:
37
38 \begin{itemize}
39 \item Is decremented or incremented at a constant rate,\footnote{It is also
40 possible to decrement (or increment) software timers at a non-constant rate to
41 approximate equations of the form $x(k+1) = \overline{\alpha} x(k)$,
42 $\overline{\alpha} < 1$ (or related
43 equations), but we don't discuss schemes of this type here. The advantage
44 of such a scheme would be to extend the maximum time period of the timer
45 while still keeping good precision at low values of time. We would call
46 such software timers \index{non-linear software timer}\index{software timer}%
47 \emph{non-linear} software timers to differentiate them
48 from the timers described here.}
49 but not
50 below or above the minimum or maximum value of an unsigned
51 integer.
52 \item Is set and tested by one process (the application) but
53 is decremented or incremented by another process (typically
54 system software).
55 \end{itemize}
56
57 Typically, linear software timers are grouped together as arrays of
58 unsigned integers that are decremented or incremented at the same rate (hence the
59 efficiency of the block manipulation described in Chapter \ref{cbmf0}).
60 Most commonly, the rates at which groups of timers may be decremented or
61 incremented are provided in binary decades (but 1-2-5 decades are also
62 not uncommon).
63
64 Linear software timers are the most efficient known way to manipulate time
65 in stateful logic where the logic is implemented as code (rather than data).
66 The efficiency comes about because:
67
68 \begin{itemize}
69 \item The timers are decremented and incremented centrally using
70 block memory operations (saves FLASH bulk
71 and CPU cycles).
72 \item The arrangement of timers in binary decades or 1-2-5 decades
73 allows the usage of smaller data types (at the expense of precision)
74 than would be possible if time were manipulated only in terms of one
75 base quantum.
76 \item The test for an expired timer ($t=0$) often compiles well because
77 tests against zero are less expensive than tests against other constants
78 in CPU instruction sets.
79 \end{itemize}
80
81 In most of the discussion which follows, we confine the discussion to
82 the case of software timers that are decremented, but not below zero.
83 (Software timers that are incremented are less common and have different
84 applications.)
85
86 The arrangement of software timers in most applications is in terms of
87 a base quantum, which we denote $\sigma$\@. $\sigma$=1 millisecond is
88 a common choice.
89
90 In a binary decade arrangement, the quantum used to decrement the
91 different arrays is $\sigma 2^q$, where $q$ is the ``tier'' of the array of
92 software timers. If $\sigma$=1 millisecond, ``Tier 0'' software timers
93 are decremented every millisecond, ``Tier 1'' software timers are decremented
94 every two milliseconds, ``Tier 2'' software timers are decremented
95 every four milliseconds, and so on.
96
97 Note that if a software timer (at tier $q$) is set to a value $n$
98 and then tested periodically to determine if it has reached zero, the
99 actual amount of time $t$ that may have elapsed is given by\footnote{In
100 order to obtain (\ref{eq:ctbg0:slst0:01}), we make the assumption that
101 the process that tests the software timer runs at least as often as the
102 software timer is decremented. There are scheduling scenarios that add
103 yet more to the error, and these are not developed here.}
104
105 \begin{equation}
106 \label{eq:ctbg0:slst0:01}
107 (n - 1) \sigma 2^q
108 <
109 t
110 \leq
111 n \sigma 2^q .
112 \end{equation}
113
114 \noindent{}(\ref{eq:ctbg0:slst0:01}) comes about because it is possible
115 that the timer was set to $n$ just before it was decremented.
116
117 We propose to choose
118
119 \begin{equation}
120 \label{eq:ctbg0:slst0:02}
121 n = \left\lfloor {\frac{\alpha}{\sigma 2^q}} \right\rfloor .
122 \end{equation}
123
124 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
125 \noindent\rule{\textwidth}{2pt}
126
127 TODO:
128
129 \begin{enumerate}
130 \item Break into subsections.
131 \item Develop scheduling assumptions that lead to (\ref{eq:ctbg0:slst0:01}) and friends.
132 \item Show relationship with timed automata.
133 \item Show state-space and complexity reduction advantages.
134 \end{enumerate}
135
136
137 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
138 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
139 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
140 \section{Square Root Extraction (Integer)}
141 %Section tag: sre0
142 \label{ctbg0:ssre0}
143
144 This section presents information about the extraction of
145 square roots over an integer domain and integer range. Usually, the function
146 of interest is
147
148 \begin{equation}
149 \label{eq:ctbg0:ssre0:01}
150 f(x) = \left\lfloor \sqrt{x} \right\rfloor .
151 \end{equation}
152
153 Other functions of interest can generally be implemented in terms
154 of (\ref{eq:ctbg0:ssre0:01}).
155
156 If a square root with information after the radix point is desired, the argument $x$ can be premultiplied
157 by the square of the reciprocal of the precision after the radix point. For example,
158 a good approximation to $10 \sqrt{x}$ is
159
160 \begin{equation}
161 \label{eq:ctbg0:ssre0:02}
162 10 \sqrt{x} \approx g(x) = \left\lfloor 10 \sqrt{x} \right\rfloor
163 = \left\lfloor \sqrt{100 x} \right\rfloor ,
164 \end{equation}
165
166 \noindent{}which can be implemented in terms of (\ref{eq:ctbg0:ssre0:01})\@.
167 (\ref{eq:ctbg0:ssre0:02}) provides a result that is human-friendly (i.e. 453 is
168 interpreted as 45.3), but the radix point of the result is not positioned between
169 any two bits. A more typical approach might be to premultiply by an even
170 power of two so as to place the radix point of the result between two bits.
171 For example,
172
173 \begin{equation}
174 \label{eq:ctbg0:ssre0:02b}
175 2^8 \sqrt{x} \approx \left\lfloor 2^8 \sqrt{x} \right\rfloor
176 = \left\lfloor \sqrt{2^{16} x} \right\rfloor
177 \end{equation}
178
179 \noindent{}will calculate the square root of an integer
180 with eight bits after the radix point.
181
182 Similarly, if rounding to the nearest integer is desired, note that\footnote{To
183 prove (\ref{eq:ctbg0:ssre0:03}), consider four cases. \emph{Case 1:} If the fractional part of
184 $\sqrt{x}$ is 0, then $\lfloor \sqrt{4x} \rfloor = \lfloor 2 \sqrt{x} \rfloor$
185 will be an integer, the result will be exact, and (\ref{eq:ctbg0:ssre0:03}) holds.
186 \emph{Case 2:} If the fractional part of
187 $\sqrt{x}$ is less than 0.5, then $\lfloor \sqrt{4x} \rfloor = \lfloor 2 \sqrt{x} \rfloor$
188 will be an even integer, the square root will be rounded down, and (\ref{eq:ctbg0:ssre0:03}) holds.
189 \emph{Case 3:} If the fractional part of
190 $\sqrt{x}$ is equal to 0.5, then $\lfloor \sqrt{4x} \rfloor = \lfloor 2 \sqrt{x} \rfloor$
191 will be an odd integer, the square root will be rounded up, and (\ref{eq:ctbg0:ssre0:03}) holds.
192 \emph{Case 4:} If the fractional part of
193 $\sqrt{x}$ is greater than 0.5, then $\lfloor \sqrt{4x} \rfloor = \lfloor 2 \sqrt{x} \rfloor$
194 will be an odd integer, the square root will be rounded up, and (\ref{eq:ctbg0:ssre0:03}) holds.}
195
196 \begin{equation}
197 \label{eq:ctbg0:ssre0:03}
198 h(x) = \left\lfloor \sqrt{x} + 0.5 \right\rfloor
199 = \left\lfloor \frac{\left\lfloor \sqrt{4 x} \right\rfloor + 1}{2} \right\rfloor ,
200 \end{equation}
201
202 \noindent{}which can also be easily implemented in terms of (\ref{eq:ctbg0:ssre0:01}).
203
204
205 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
206 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
207 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
208 \subsection{Summation of Odd Integers}
209 \label{ctbg0:ssre0:ssoi0}
210
211 Consider the square of a non-negative integer $n$, $n^2$. The distance from
212 $n^2$ to the square of the next larger integer is
213
214 \begin{equation}
215 \label{eq:ctbg0:ssre0:ssoi0:01}
216 (n+1)^2 - n^2 = 2n + 1 .
217 \end{equation}
218
219 $2n+1, n \in \mathbb{Z}^+$ is the set of odd non-negative integers.
220 It follows directly that the set of squared integers can be formed by
221 successively adding odd integers, i.e.
222
223 \begin{eqnarray}
224 \nonumber 1^2 = 1 & = & 1 \\
225 2^2 = 4 & = & 1 + 3 \\
226 \nonumber 3^2 = 9 & = & 1 + 3 + 5 \\
227 \nonumber 4^2 = 16 & = & 1 + 3 + 5 + 7 \\
228 \nonumber & \ldots{} &
229 \end{eqnarray}
230
231 One possible integer square root algorithm would involve summing consecutive
232 odd integers until the argument is exceeded.
233 This algorithm is not developed further because it would be impractical to
234 calculate the square root of any integers except small integers using this
235 method.
236
237
238 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
239 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
240 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
241 \subsection{Bit-By-Bit Bisection}
242 \label{ctbg0:ssre0:sbis0}
243
244 Square roots of an integer can be found using bit-by-bit bisection.
245 In this method, starting with the most significant bit of the result, each bit
246 is assumed `1' and a trial square is calculated. If the trial square is no
247 larger than the argument, that bit of the result is `1'. However, if the trial
248 square is larger than the argument, that bit of the result is `0'.
249
250 \begin{tiny}
251 \begin{verbatim}
252 Essentially, it is bit-by-bit trial squaring. "Bisection" because
253 each bit of the square root cuts the interval (in which the square root
254 might lie) in half.
255
256 Two of the posters in comp.arch.embedded indicated that "bisection" can be
257 economized further by using the identity:
258
259 (n + 2^k)^2 = n^2 + n*2^(k+1) + 2^(2*k)
260
261 The "n^2" term is significant because it means that if you have the
262 previous trial square and you are considering setting bit "k", you can
263 compute the next trial square from the previous square using only shifting
264 and adding.
265
266 If you carry this to the extreme, because you initially assume no bits
267 are set, you start with n = n**2 = 0, and the first trial square does
268 not involve multiplication. So you can implement the entire square
269 algorithm without multiplying, even once.
270
271 Anyway, applying all possible optimizations, one gets this:
272
273 http://www.uculib.com/vvcuculib01/viewvc.cgi/uculib01/src/stm8/
274 cosmic/modx0/atu24sqrtfrxn/devel/alg_eval01.c?revision=1.7&view=markup
275
276 or this in the 32-bit case:
277
278 unsigned int sqrt32(unsigned long arg)
279 {
280 unsigned long two_2k_mask;
281 unsigned long trial_square;
282 unsigned long trial_square_prev;
283 unsigned long square_root_ls_ip1;
284
285 two_2k_mask = 0x40000000UL;
286 trial_square = 0;
287 trial_square_prev = 0;
288 square_root_ls_ip1 = 0;
289
290 while(1)
291 {
292 trial_square = (trial_square_prev)
293 + (square_root_ls_ip1)
294 + (two_2k_mask);
295
296 square_root_ls_ip1 >>= 1;
297
298 if (arg >= trial_square)
299 {
300 square_root_ls_ip1 |= two_2k_mask;
301 trial_square_prev = trial_square;
302 }
303
304 if (two_2k_mask == 1)
305 break;
306
307 two_2k_mask >>= 2;
308 }
309
310 return(square_root_ls_ip1);
311 }
312
313 If you look at the loop, there really isn't much there. There will be
314 16 iterations. The test against "1" to exit the loop can be implemented
315 as a single byte test (because a single "1" is being right-shifted, "1"
316 as the value of the least significant byte means that the other three
317 bytes are zero).
318
319 Temporary storage requirements are about 4 times the size of the input
320 argument. So, for a 32-bit argument, that would be 16 bytes.
321 \end{verbatim}
322 \end{tiny}
323
324
325 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
326 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
327 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
328 \subsection{The Babylonian Method}
329 \label{ctbg0:ssre0:sbab0}
330
331 TBD.
332
333
334 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
335 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
336 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
337 \subsection{Unprocessed Newsgroup Posts}
338 \label{ctbg0:ssre0:sunp0}
339
340 Have to process these newsgroup posts:
341
342 \begin{tiny}
343 \begin{verbatim}
344 I have the need in a microcontroller application to calculate floor(sqrt(x))
345 where x is approximately in the range of 2^32 - 2^64.
346
347 What are the algorithms I should look at?
348
349 These small microcontrollers are characterized by having a 8-bit times 8-bit
350 to give 16-bit result integer unsigned multiply instruction, and a similar
351 16/8 division instruction to give a quotient and a remainder.
352 Mulitplications of larger integers have to be done by multiplying the bytes
353 and adding.
354
355 I'm aware of these algorithms:
356
357 a)Adding consecutive odd integers and counting how many you have to add,
358 i.e. 1 + 3 = 4, 4 + 5 = 9, 9 + 7 = 16, 16 + 9 = 25, etc.
359
360 b)Trial squaring, testing setting each bit to "1", i.e.
361
362 http://www.uculib.com/vvcuculib01/viewvc.cgi/uculib01/src/stm8/cosmic/modxx/atu8sqrtfrxx/src/atu8sqrtfrxx.c?revision=1.3&view=markup
363
364 c)Another method that seems to work (don't know the name of it), here is
365 source code:
366
367 http://www.uculib.com/vvcuculib01/viewvc.cgi/uculib01/src/stm8/cosmic/modxx/atu16sqrtx10frxx/src/atu16sqrtx10frxx.c?revision=1.3&view=markup
368
369 What other options should I investigate?
370
371 ================================================================================
372 Here is some C code for integer square roots using many different
373 algorithms:
374 http://cap.connx.com/chess-engines/new-approach/jsqrt0.ZIP
375
376 Worth a look:
377 http://www.azillionmonkeys.com/qed/sqroot.html
378
379 Consider also:
380 http://www.google.com/#hl=en&source=hp&q=fast+integer+square+root&rlz=
381 1W1GGLD_en&aq=0&aqi=g2&oq=fast+integer+sq&fp=a048890d3c90c6fc
382 ================================================================================
383 > I have the need in a microcontroller application to calculate
384 > floor(sqrt(x)) where x is approximately in the range of 2^32 - 2^64.
385
386 x = y^2 - (2^16)^2
387
388 > What are the algorithms I should look at?
389
390 Google is your friend :> There are many sqrt algorithms with
391 different tradeoffs (speed/size).
392
393 I would, instead, ask that you might want to examine what *else*
394 you know about "x". I.e., you've already constrained the range
395 (why? does that give you any other information that can be
396 exploited?); are there any other characteristics about how
397 "x" is synthesized that you can exploit to divide and conquer?
398
399 E.g. if x = y * z, then knowing sqrt(y) and/or sqrt(z) simplifies
400 your problem.
401
402 > These small microcontrollers are characterized by having a 8-bit times
403 > 8-bit to give 16-bit result integer unsigned multiply instruction, and a
404 > similar 16/8 division instruction to give a quotient and a remainder.
405 > Mulitplications of larger integers have to be done by multiplying the
406 > bytes and adding.
407 >
408 > I'm aware of these algorithms:
409 >
410 > a)Adding consecutive odd integers and counting how many you have to add,
411 > i.e. 1 + 3 = 4, 4 + 5 = 9, 9 + 7 = 16, 16 + 9 = 25, etc.
412 >
413 > b)Trial squaring, testing setting each bit to "1", i.e.
414 >
415 > http://www.uculib.com/vvcuculib01/viewvc.cgi/uculib01/src/stm8/cosmic/modxx/atu8sqrtfrxx/src/atu8sqrtfrxx.c?revision=1.3&view=markup
416 >
417 > c)Another method that seems to work (don't know the name of it), here is
418 > source code:
419 >
420 > http://www.uculib.com/vvcuculib01/viewvc.cgi/uculib01/src/stm8/cosmic/modxx/atu16sqrtx10frxx/src/atu16sqrtx10frxx.c?revision=1.3&view=markup
421 >
422 > What other options should I investigate?
423 ================================================================================
424 > Hi,
425 >
426 > I have the need in a microcontroller application to calculate floor(sqrt(x))
427 > where x is approximately in the range of 2^32 - 2^64.
428 >
429 > What are the algorithms I should look at?
430 >
431 > These small microcontrollers are characterized by having a 8-bit times 8-bit
432 > to give 16-bit result integer unsigned multiply instruction, and a similar
433 > 16/8 division instruction to give a quotient and a remainder.
434 > Mulitplications of larger integers have to be done by multiplying the bytes
435 > and adding.
436 >
437 > I'm aware of these algorithms:
438 >
439 > a)Adding consecutive odd integers and counting how many you have to add,
440 > i.e. 1 + 3 = 4, 4 + 5 = 9, 9 + 7 = 16, 16 + 9 = 25, etc.
441 >
442 > b)Trial squaring, testing setting each bit to "1", i.e.
443 >
444 > http://www.uculib.com/vvcuculib01/viewvc.cgi/uculib01/src/stm8/cosmic...
445 >
446 > c)Another method that seems to work (don't know the name of it), here is
447 > source code:
448 >
449 > http://www.uculib.com/vvcuculib01/viewvc.cgi/uculib01/src/stm8/cosmic...
450 >
451 > What other options should I investigate?
452 >
453 > Thanks, Datesfat
454
455 The one I have been using last 15-20 years is probably your B, at
456 least sounds
457 like what I made up back then - successive approximation for each bit
458 of the result.
459 If multiplication is fast on the core you will be using it it is hard
460 to beat.
461
462 Dimiter
463 ================================================================================
464 > Hi,
465 >
466 > I have the need in a microcontroller application to calculate
467 > floor(sqrt(x))
468 > where x is approximately in the range of 2^32 - 2^64.
469 >
470 > What are the algorithms I should look at?
471 >
472 > These small microcontrollers are characterized by having a 8-bit times
473 > 8-bit
474 > to give 16-bit result integer unsigned multiply instruction, and a similar
475 > 16/8 division instruction to give a quotient and a remainder.
476 > Mulitplications of larger integers have to be done by multiplying the
477 > bytes
478 > and adding.
479 >
480 > I'm aware of these algorithms:
481 >
482 > a)Adding consecutive odd integers and counting how many you have to add,
483 > i.e. 1 + 3 = 4, 4 + 5 = 9, 9 + 7 = 16, 16 + 9 = 25, etc.
484 >
485 > b)Trial squaring, testing setting each bit to "1", i.e.
486 >
487 > http://www.uculib.com/vvcuculib01/viewvc.cgi/uculib01/src/stm8/cosmic...
488 >
489 > c)Another method that seems to work (don't know the name of it), here is
490 > source code:
491 >
492 > http://www.uculib.com/vvcuculib01/viewvc.cgi/uculib01/src/stm8/cosmic...
493 >
494 > What other options should I investigate?
495 >
496 > Thanks, Datesfat
497 >
498 >The one I have been using last 15-20 years is probably your B, at
499 >least sounds
500 >like what I made up back then - successive approximation for each bit
501 >of the result.
502 >If multiplication is fast on the core you will be using it it is hard
503 >to beat.
504
505 Thanks for the insight.
506
507 For small operands, I agree with your analysis.
508
509 However, for larger operands, the Babylonian method:
510
511 http://en.wikipedia.org/wiki/Methods_of_computing_square_roots#Babylonian_method
512
513 looks very promising.
514
515 The issue is that for the cost of one division (the /2 and addition is
516 virtually free) you get far more than one bit of additional information
517 about the square root.
518
519 For larger operands, I don't believe that (b) will be the right answer any
520 longer.
521
522 But I'm not sure yet ...
523 ================================================================================
524 > a)Adding consecutive odd integers and counting how many you have to add,
525 > i.e. 1 + 3 = 4, 4 + 5 = 9, 9 + 7 = 16, 16 + 9 = 25, etc.
526
527 That quite certainly can't work for the input range you're targeting.
528 You would be looking at up to four billion iterations if x was 2^64
529
530 > b)Trial squaring, testing setting each bit to "1", i.e.
531
532 A whole lot better already. It becomes even better if you update the
533 square of your current result differentially, applying the binomial
534 formula. As you update the result-to-be ('res' in the following), by
535 adding a 1-bit at position 'k':
536
537 x --> x + 2^k
538
539 it's square, which eventually should be equal to the input number,
540 updates too:
541
542 x^2 --> x^2 + 2*(2^k)*x + (2^k)^2
543 = x^2 + x<<(k+1) + 1<<(2*k)
544
545 That's two additions and a bit of shifting. Even an 8-bit CPU should be
546 able to manage that in reasonable time. Basically you go looking for a
547 bit position 'k' such that the current difference between x^2 and the
548 target value is still larger than or equal to (x<<(k+1)+1<<(2*k)).
549
550 This is actually a special case of an algorithm people used to be taught
551 in higher schools. Just like we all (hopefully) learned to do long
552 multiplication and long division in elementary school, there used to be
553 an algorithm for long square-roots. I've seen it in a textbook used to
554 teach metal workers' apprentices from the 1940s. It's a little daunting
555 in decimal numbers on paper, but boils down to the above when performed
556 in binary: you figure out one digit per iteration by looking at the
557 difference between the target figure and square of the result-to-be.
558 It's also somewhat similar to long division in that way.
559
560 > What other options should I investigate?
561
562 One traditional trick for sqrt(), and some other inverse functions, too,
563 is to treat it as a root-finding problem for the function
564
565 f(x) := x^2 - y
566
567 The x that solves f(x)=0 is sqrt(y). You can solve that using the
568 Newton-Raphson iteration algorithm, a.k.a. the tangent method:
569
570 x' = x - f(x)/f'(x)
571 = x - (x^2 - y) / (2 * x)
572 = x - x/2 - y/(2*x)
573 = x/2 - (y/2)/x
574
575 This requires long division, but pays off by converging on the solution
576 real fast, once you've come anywhere near it.
577
578 This algorithm and the bit-by-bit one above are somewhat related. The
579 search for the right bit position, 'k', is effectively a simplified
580 version of the division (y/2)/x.
581 ================================================================================
582 > I have the need in a microcontroller application to calculate floor(sqrt(x))
583 > where x is approximately in the range of 2^32 - 2^64.
584 >
585 > What are the algorithms I should look at?
586
587 Newton-Raphson or bisection, depending upon how slow division is. N-R is
588 asymptotically faster (the number of correct digits doubles at each step),
589 but each step requires a division.
590
591 Bisection is essentially your option b), but it can be computed much more
592 efficiently than the algorithm given. You don't need arbitrary multiplies,
593 only shifts, as:
594
595 (a+2^k)^2 = a^2 + 2.a.2^k + (2^k)^2
596 = a^2 + a.2^(k+1) + 2^2k
597
598 IOW:
599
600 uint32_t isqrt(uint64_t x)
601 {
602 uint64_t a = 0;
603 uint64_t a2 = 0;
604 int k;
605
606 for (k = 31; k >= 0; k--) {
607
608 uint64_t t = a2 + (a << (k+1)) + ((uint64_t)1 << (k + k));
609 if (x > t) {
610 a += 1 << k;
611 a2 = t;
612 }
613 }
614
615 return a;
616 }
617
618 Depending upon the CPU, it may be faster to calculate the a.2^(k+1) and
619 2^2k terms incrementally.
620 ================================================================================
621 Wiki
622
623 http://en.wikipedia.org/wiki/Methods_of_computing_square_roots\\end{verbatim}
624 ================================================================================
625 Noticed your post on sci.math. I am surprised no one mentioned this one
626 from Joe Leherbauer that only uses shifts and adds. I cannot find the
627 original usenet message but a copy is here:
628
629 http://groups.google.com/group/comp.lang.c/browse\_thread/thread/14ed11f99b6ac8ab/d4297a8de3ead321?hl=en\&ie=UTF-8\&q=\%22joe+Leherbauer\%22
630
631 I clipped the message:
632
633 <----------------------clipped from message---------------------------->
634 The function below is probably the fastest you can get with integer
635 operations only. It executes in constant time of 16 loop iterations
636 assuming a long is 32 bits, i.e. it's O(log(n)).
637 As a by-product it also produces the square root remainder.
638 More formally, it computes s and r, such that:
639
640 s = floor(sqrt(n))
641 r = n - s * s
642
643 I have to admit that I did not invent this.
644 It is from an anonymous Usenet source.
645
646 unsigned long
647 i\_sqrt (unsigned long n, unsigned long * rem)
648 {
649 unsigned long r, s, t;
650
651 r = 0;
652 t = (~0UL >> 2) + 1; /* largest power of 4 supported by data type */
653 do
654 {
655 s = r + t;
656 if ( n >= s )
657 {
658 n -= s;
659 r = s + t;
660 }
661 r >>= 1;
662 }
663 while ( (t >>= 2) != 0 );
664 *rem = n;
665
666 return r;
667
668 }
669
670 ---
671 Joe Leherbauer Leherbauer at telering dot at
672 Registered Linux User \# 215803
673 ================================================================================
674 \end{tiny}
675
676
677 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
678 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
679 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
680 \section[\emph{UcuAtS32S16v2CpDiva2FRxx(\protect\mbox{\protect$\cdot$})} Function Calculation Method]
681 {\emph{UcuAtS32S16v2CpDiva2FRxx(\protect\mbox{\protect\boldmath $\cdot$})} Function Calculation Method}
682 \label{ctbg0:svec2}
683
684 \index{UcuAtS32S16v2CpDiva2FRxx()@\emph{UcuAtS32S16v2CpDiva2FRxx($\cdot$)}}%
685 The nominal calculation performed by the
686 \emph{UcuAtS32S16v2CpDiva2FRxx($\cdot$)} function is
687
688 \begin{equation}
689 \label{eq:ctbg0:svec2:01}
690 g(a_x, a_y, b_x, b_y)
691 =
692 \left\lfloor \frac{\vec{a} \times \vec{b}}{| \vec{b} | \hat{k}} \right\rfloor
693 =
694 \left\lfloor \frac{a_x b_y - a_y b_x}{\sqrt{b_x^2 + b_y^2}} \right\rfloor ,
695 \end{equation}
696
697 \noindent{}where the floor function of a negative argument rounds towards zero.
698
699 Clearly, (\ref{eq:ctbg0:svec2:01}) isn't a convenient method of calcuation for an
700 inexpensive microcontroller, as the result
701 of the square root calculation might need to include a fractional part.
702
703 (\ref{eq:ctbg0:svec2:01}) can be modified by squaring then taking the square root,
704 making adjustments for the loss of sign information caused by squaring:
705
706 \begin{equation}
707 \label{eq:ctbg0:svec2:02}
708 \left\lfloor \frac{a_x b_y - a_y b_x}{\sqrt{b_x^2 + b_y^2}} \right\rfloor
709 =
710 \left\lfloor \sqrt{\frac{(a_x b_y - a_y b_x)^2}{b_x^2 + b_y^2}} \right\rfloor sgn (a_x b_y - a_y b_x).
711 \end{equation}
712
713 Note in (\ref{eq:ctbg0:svec2:02})
714 the the remainder of the division can be discarded, i.e.
715
716 \begin{equation}
717 \label{eq:ctbg0:svec2:03}
718 \left\lfloor \sqrt{\frac{(a_x b_y - a_y b_x)^2}{b_x^2 + b_y^2}} \right\rfloor
719 =
720 \left\lfloor \sqrt{\left\lfloor\frac{(a_x b_y - a_y b_x)^2}{b_x^2 + b_y^2}\right\rfloor} \right\rfloor .
721 \end{equation}
722
723 \noindent{}(\ref{eq:ctbg0:svec2:03}) is valid because only the square root of an integer can be
724 an integer, so flooring the input to a square root calculation whose output is floored
725 will not have an effect.
726
727 The size of the integer to be squared in (\ref{eq:ctbg0:svec2:03}) must be established.
728 $a_x b_y - a_y b_x$ may have a magnitude as large as $2^{31}-2^{15}$, so 31 bits are required (for an unsigned representation).
729 In practice, 32 bits (4 bytes) will be used for storage.
730
731 The term $(a_x b_y - a_y b_x)^2$ may be as large as $(2^{31}-2^{15})^2 = 2^{62}-2^{47}+2^{30}$, so 62 bits are required.
732 In practice, 64 bits (8 bytes) will be used for storage.
733
734 The most difficult question to answer is the upper bound on the result of the
735 division. One would suspect from the form of (\ref{eq:ctbg0:svec2:03})
736 that there is an upper bound less than $2^{62}-2^{47}+2^{30}$, as the
737 numerator and denominator both increase with increasing $b_x$ or $b_y$.
738 If an upper bound smaller than $2^{62}-2^{47}+2^{30}$ exists, it would make
739 both the division and the square root extraction more efficient.
740
741 An upper bound on the result of the division in (\ref{eq:ctbg0:svec2:03})
742 need not be overly tight. The upper bound is only to assist in devising
743 efficient calculation.
744
745 The maximum magnitude of $a_x b_y - a_y b_x$ can only occur when a negative
746 number is subtracted from a positive, or a positive number is subtracted from
747 a negative. The most positive number that can be formed as the product
748 of two UCU\_SINT16's is $2^{15}2^{15} = 2^{30}$. The most negative number that can be formed
749 is $-2^{15}(2^{15}-1) = -2^{30}+2^{15}$. The maximum magnitude of
750 $a_x b_y - a_y b_x$ is thus $2^{30} - (-2^{30}+2^{15}) = 2^{31} - 2^{15}$.
751
752 To simplify the algebra (in exchange for a looser bound),
753 one can assume that both positive and negative
754 UCU\_SINT16's may reach a magnitude of $2^{15}$. Assuming
755 that $a_x = a_y = 2^{15}$, the maximum value of
756 (\ref{eq:ctbg0:svec2:03}) is
757
758 \begin{eqnarray}
759 \nonumber
760 &
761 \displaystyle{
762 \sqrt{\frac{(2^{15} b_x + 2^{15} b_y)^2}{b_x^2 + b_y^2}}
763 =
764 \sqrt{\frac{2^{30}(b_x^2 + 2 b_x b_y + b_y^2)}{b_x^2 + b_y^2}}
765 }
766 & \\
767 \label{eq:ctbg0:svec2:04}
768 &
769 \displaystyle{
770 = \sqrt{2^{30} \frac{b_x^2 + b_y^2}{b_x^2 + b_y^2} + 2^{31} \frac{b_x b_y}{b_x^2 + b_y^2}}
771 }
772 & \\
773 \nonumber
774 &
775 \displaystyle{
776 = \sqrt{2^{30} + 2^{31} \frac{b_x b_y}{b_x^2 + b_y^2}}
777 }
778 &
779 \end{eqnarray}
780
781 (\ref{eq:ctbg0:svec2:04}) implies that the maximum value of (\ref{eq:ctbg0:svec2:03})
782 depends on the maximum of the function
783
784 \begin{equation}
785 \label{eq:ctbg0:svec2:05}
786 f(b_x, b_y) = \frac{b_x b_y}{b_x^2 + b_y^2} , \;\; b_x, b_y \in \mathbb{Z}^+ .
787 \end{equation}
788
789 Several helpful posters\footnote{Thanks to Ray Vickson, Gerry Myerson, Rob Johnson, and ``Scattered''.}
790 on \texttt{sci.math} provided proofs that
791 (\ref{eq:ctbg0:svec2:05}) could not exceed 1/2. The proof that I found easiest to understand
792 was provided by Gerry Myerson. Consider the function
793
794 \begin{equation}
795 \label{eq:ctbg0:svec2:06}
796 h(b_x, b_y)
797 =
798 \frac{1}{2} - f(b_x, b_y)
799 =
800 \frac{1}{2} \left( \frac{b_x^2 - 2 b_x b_y + b_y^2}{b_x^2 + b_y^2} \right)
801 =
802 \frac{1}{2} \left( \frac{(b_x - b_y)^2}{b_x^2 + b_y^2} \right).
803 \end{equation}
804
805 $h(b_x, b_y)$ is non-negative over the domain of interest, proving that $f(b_x, b_y)$ cannot exceed $1/2$.
806
807 Substituting the maximum value of 1/2 into (\ref{eq:ctbg0:svec2:04})
808 leads to
809
810 \begin{equation}
811 \label{eq:ctbg0:svec2:07}
812 \frac{(a_x b_y - a_y b_x)^2}{b_x^2 + b_y^2}
813 \leq
814 2^{30} + 2^{31} \left( \frac{1}{2} \right)
815 =
816 2^{30} + 2^{30}
817 =
818 2^{31} .
819 \end{equation}
820
821 \noindent{}(\ref{eq:ctbg0:svec2:07})
822 establishes that the result of the division may not exceed 32 bits. This is
823 an important result, as the dividend of the division is 64 bits, requiring
824 64 rounds in the general case to obtain a quotient. Knowledge that the
825 quotient may not exceed 32 bits will approximately halve the division time, as
826 only 32 iterations rather than 64 need to be performed.
827
828 The square root extraction will involve a 32-bit input (the quotient from the
829 division) and a 16-bit result.
830
831
832 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
833 \noindent\begin{figure}[!b]
834 \noindent\rule[-0.25in]{\textwidth}{1pt}
835 \begin{tiny}
836 \begin{verbatim}
837 $RCSfile: c_tbg0.tex,v $
838 $Source: /home/dashley/cvsrep/uculib01/uculib01/doc/manual/c_tbg0/c_tbg0.tex,v $
839 $Revision: 1.12 $
840 $Author: dashley $
841 $Date: 2010/04/05 15:13:50 $
842 \end{verbatim}
843 \end{tiny}
844 \noindent\rule[0.25in]{\textwidth}{1pt}
845 \end{figure}
846
847 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
848 %$Log: c_tbg0.tex,v $
849 %Revision 1.12 2010/04/05 15:13:50 dashley
850 %Edits.
851 %
852 %Revision 1.11 2010/04/05 12:17:12 dashley
853 %Edits.
854 %
855 %Revision 1.10 2010/04/01 21:07:34 dashley
856 %Edits.
857 %
858 %Revision 1.9 2010/04/01 20:21:50 dashley
859 %Edits.
860 %
861 %Revision 1.8 2010/03/03 16:30:35 dashley
862 %Edits.
863 %
864 %Revision 1.7 2010/03/03 00:19:21 dashley
865 %Edits.
866 %
867 %Revision 1.6 2010/02/05 03:04:01 dashley
868 %Additional square root information added.
869 %
870 %Revision 1.5 2010/02/05 02:58:12 dashley
871 %Newsgroup post text added.
872 %
873 %Revision 1.4 2010/01/28 21:18:33 dashley
874 %a)Chapter start quotes removed.
875 %b)Aesthetic comment line added at the bottom of most files.
876 %
877 %Revision 1.3 2007/10/08 18:16:34 dtashley
878 %Edits.
879 %
880 %Revision 1.2 2007/10/07 18:11:22 dtashley
881 %Edits.
882 %
883 %Revision 1.1 2007/10/07 01:22:01 dtashley
884 %Initial checkin.
885 %
886 %End of $RCSfile: c_tbg0.tex,v $.
887 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
888

dashley@gmail.com
ViewVC Help
Powered by ViewVC 1.1.25