/[dtapublic]/to_be_filed/uculib01/doc/manual/c_tbg0/c_tbg0.tex
ViewVC logotype

Annotation of /to_be_filed/uculib01/doc/manual/c_tbg0/c_tbg0.tex

Parent Directory Parent Directory | Revision Log Revision Log


Revision 30 - (hide annotations) (download) (as text)
Sat Oct 8 07:22:17 2016 UTC (7 years, 8 months ago) by dashley
File MIME type: application/x-tex
File size: 32767 byte(s)
Initial commit.
1 dashley 30 %$Header: /home/dashley/cvsrep/uculib01/uculib01/doc/manual/c_tbg0/c_tbg0.tex,v 1.12 2010/04/05 15:13:50 dashley Exp $
2    
3     \chapter{Technical Background}
4     \label{ctbg0}
5    
6     %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
7     %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
8     %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
9     \section{Introduction}
10     %Section tag: int0
11     \label{ctbg0:sint0}
12    
13    
14     %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
15     %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
16     %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
17     \section{Interrupt Compatibility Issues}
18     %Section tag: ici0
19     \label{ctbg0:sici0}
20    
21    
22     %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
23     %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
24     %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
25     \section{Linear Software Timers}
26     %Section tag: lst0
27     \label{ctbg0:slst0}
28    
29     Several of the block memory functions described in
30     Chapter \ref{cbmf0} are designed primarily to
31     implement what we call here a
32     \index{software timer}\index{linear software timer}\emph{software timer} (although these functions
33     may have other uses).
34    
35     We define a \index{software timer}\index{linear software timer}software timer as an unsigned integer
36     of any length convenient to the machine that:
37    
38     \begin{itemize}
39     \item Is decremented or incremented at a constant rate,\footnote{It is also
40     possible to decrement (or increment) software timers at a non-constant rate to
41     approximate equations of the form $x(k+1) = \overline{\alpha} x(k)$,
42     $\overline{\alpha} < 1$ (or related
43     equations), but we don't discuss schemes of this type here. The advantage
44     of such a scheme would be to extend the maximum time period of the timer
45     while still keeping good precision at low values of time. We would call
46     such software timers \index{non-linear software timer}\index{software timer}%
47     \emph{non-linear} software timers to differentiate them
48     from the timers described here.}
49     but not
50     below or above the minimum or maximum value of an unsigned
51     integer.
52     \item Is set and tested by one process (the application) but
53     is decremented or incremented by another process (typically
54     system software).
55     \end{itemize}
56    
57     Typically, linear software timers are grouped together as arrays of
58     unsigned integers that are decremented or incremented at the same rate (hence the
59     efficiency of the block manipulation described in Chapter \ref{cbmf0}).
60     Most commonly, the rates at which groups of timers may be decremented or
61     incremented are provided in binary decades (but 1-2-5 decades are also
62     not uncommon).
63    
64     Linear software timers are the most efficient known way to manipulate time
65     in stateful logic where the logic is implemented as code (rather than data).
66     The efficiency comes about because:
67    
68     \begin{itemize}
69     \item The timers are decremented and incremented centrally using
70     block memory operations (saves FLASH bulk
71     and CPU cycles).
72     \item The arrangement of timers in binary decades or 1-2-5 decades
73     allows the usage of smaller data types (at the expense of precision)
74     than would be possible if time were manipulated only in terms of one
75     base quantum.
76     \item The test for an expired timer ($t=0$) often compiles well because
77     tests against zero are less expensive than tests against other constants
78     in CPU instruction sets.
79     \end{itemize}
80    
81     In most of the discussion which follows, we confine the discussion to
82     the case of software timers that are decremented, but not below zero.
83     (Software timers that are incremented are less common and have different
84     applications.)
85    
86     The arrangement of software timers in most applications is in terms of
87     a base quantum, which we denote $\sigma$\@. $\sigma$=1 millisecond is
88     a common choice.
89    
90     In a binary decade arrangement, the quantum used to decrement the
91     different arrays is $\sigma 2^q$, where $q$ is the ``tier'' of the array of
92     software timers. If $\sigma$=1 millisecond, ``Tier 0'' software timers
93     are decremented every millisecond, ``Tier 1'' software timers are decremented
94     every two milliseconds, ``Tier 2'' software timers are decremented
95     every four milliseconds, and so on.
96    
97     Note that if a software timer (at tier $q$) is set to a value $n$
98     and then tested periodically to determine if it has reached zero, the
99     actual amount of time $t$ that may have elapsed is given by\footnote{In
100     order to obtain (\ref{eq:ctbg0:slst0:01}), we make the assumption that
101     the process that tests the software timer runs at least as often as the
102     software timer is decremented. There are scheduling scenarios that add
103     yet more to the error, and these are not developed here.}
104    
105     \begin{equation}
106     \label{eq:ctbg0:slst0:01}
107     (n - 1) \sigma 2^q
108     <
109     t
110     \leq
111     n \sigma 2^q .
112     \end{equation}
113    
114     \noindent{}(\ref{eq:ctbg0:slst0:01}) comes about because it is possible
115     that the timer was set to $n$ just before it was decremented.
116    
117     We propose to choose
118    
119     \begin{equation}
120     \label{eq:ctbg0:slst0:02}
121     n = \left\lfloor {\frac{\alpha}{\sigma 2^q}} \right\rfloor .
122     \end{equation}
123    
124     %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
125     \noindent\rule{\textwidth}{2pt}
126    
127     TODO:
128    
129     \begin{enumerate}
130     \item Break into subsections.
131     \item Develop scheduling assumptions that lead to (\ref{eq:ctbg0:slst0:01}) and friends.
132     \item Show relationship with timed automata.
133     \item Show state-space and complexity reduction advantages.
134     \end{enumerate}
135    
136    
137     %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
138     %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
139     %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
140     \section{Square Root Extraction (Integer)}
141     %Section tag: sre0
142     \label{ctbg0:ssre0}
143    
144     This section presents information about the extraction of
145     square roots over an integer domain and integer range. Usually, the function
146     of interest is
147    
148     \begin{equation}
149     \label{eq:ctbg0:ssre0:01}
150     f(x) = \left\lfloor \sqrt{x} \right\rfloor .
151     \end{equation}
152    
153     Other functions of interest can generally be implemented in terms
154     of (\ref{eq:ctbg0:ssre0:01}).
155    
156     If a square root with information after the radix point is desired, the argument $x$ can be premultiplied
157     by the square of the reciprocal of the precision after the radix point. For example,
158     a good approximation to $10 \sqrt{x}$ is
159    
160     \begin{equation}
161     \label{eq:ctbg0:ssre0:02}
162     10 \sqrt{x} \approx g(x) = \left\lfloor 10 \sqrt{x} \right\rfloor
163     = \left\lfloor \sqrt{100 x} \right\rfloor ,
164     \end{equation}
165    
166     \noindent{}which can be implemented in terms of (\ref{eq:ctbg0:ssre0:01})\@.
167     (\ref{eq:ctbg0:ssre0:02}) provides a result that is human-friendly (i.e. 453 is
168     interpreted as 45.3), but the radix point of the result is not positioned between
169     any two bits. A more typical approach might be to premultiply by an even
170     power of two so as to place the radix point of the result between two bits.
171     For example,
172    
173     \begin{equation}
174     \label{eq:ctbg0:ssre0:02b}
175     2^8 \sqrt{x} \approx \left\lfloor 2^8 \sqrt{x} \right\rfloor
176     = \left\lfloor \sqrt{2^{16} x} \right\rfloor
177     \end{equation}
178    
179     \noindent{}will calculate the square root of an integer
180     with eight bits after the radix point.
181    
182     Similarly, if rounding to the nearest integer is desired, note that\footnote{To
183     prove (\ref{eq:ctbg0:ssre0:03}), consider four cases. \emph{Case 1:} If the fractional part of
184     $\sqrt{x}$ is 0, then $\lfloor \sqrt{4x} \rfloor = \lfloor 2 \sqrt{x} \rfloor$
185     will be an integer, the result will be exact, and (\ref{eq:ctbg0:ssre0:03}) holds.
186     \emph{Case 2:} If the fractional part of
187     $\sqrt{x}$ is less than 0.5, then $\lfloor \sqrt{4x} \rfloor = \lfloor 2 \sqrt{x} \rfloor$
188     will be an even integer, the square root will be rounded down, and (\ref{eq:ctbg0:ssre0:03}) holds.
189     \emph{Case 3:} If the fractional part of
190     $\sqrt{x}$ is equal to 0.5, then $\lfloor \sqrt{4x} \rfloor = \lfloor 2 \sqrt{x} \rfloor$
191     will be an odd integer, the square root will be rounded up, and (\ref{eq:ctbg0:ssre0:03}) holds.
192     \emph{Case 4:} If the fractional part of
193     $\sqrt{x}$ is greater than 0.5, then $\lfloor \sqrt{4x} \rfloor = \lfloor 2 \sqrt{x} \rfloor$
194     will be an odd integer, the square root will be rounded up, and (\ref{eq:ctbg0:ssre0:03}) holds.}
195    
196     \begin{equation}
197     \label{eq:ctbg0:ssre0:03}
198     h(x) = \left\lfloor \sqrt{x} + 0.5 \right\rfloor
199     = \left\lfloor \frac{\left\lfloor \sqrt{4 x} \right\rfloor + 1}{2} \right\rfloor ,
200     \end{equation}
201    
202     \noindent{}which can also be easily implemented in terms of (\ref{eq:ctbg0:ssre0:01}).
203    
204    
205     %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
206     %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
207     %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
208     \subsection{Summation of Odd Integers}
209     \label{ctbg0:ssre0:ssoi0}
210    
211     Consider the square of a non-negative integer $n$, $n^2$. The distance from
212     $n^2$ to the square of the next larger integer is
213    
214     \begin{equation}
215     \label{eq:ctbg0:ssre0:ssoi0:01}
216     (n+1)^2 - n^2 = 2n + 1 .
217     \end{equation}
218    
219     $2n+1, n \in \mathbb{Z}^+$ is the set of odd non-negative integers.
220     It follows directly that the set of squared integers can be formed by
221     successively adding odd integers, i.e.
222    
223     \begin{eqnarray}
224     \nonumber 1^2 = 1 & = & 1 \\
225     2^2 = 4 & = & 1 + 3 \\
226     \nonumber 3^2 = 9 & = & 1 + 3 + 5 \\
227     \nonumber 4^2 = 16 & = & 1 + 3 + 5 + 7 \\
228     \nonumber & \ldots{} &
229     \end{eqnarray}
230    
231     One possible integer square root algorithm would involve summing consecutive
232     odd integers until the argument is exceeded.
233     This algorithm is not developed further because it would be impractical to
234     calculate the square root of any integers except small integers using this
235     method.
236    
237    
238     %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
239     %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
240     %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
241     \subsection{Bit-By-Bit Bisection}
242     \label{ctbg0:ssre0:sbis0}
243    
244     Square roots of an integer can be found using bit-by-bit bisection.
245     In this method, starting with the most significant bit of the result, each bit
246     is assumed `1' and a trial square is calculated. If the trial square is no
247     larger than the argument, that bit of the result is `1'. However, if the trial
248     square is larger than the argument, that bit of the result is `0'.
249    
250     \begin{tiny}
251     \begin{verbatim}
252     Essentially, it is bit-by-bit trial squaring. "Bisection" because
253     each bit of the square root cuts the interval (in which the square root
254     might lie) in half.
255    
256     Two of the posters in comp.arch.embedded indicated that "bisection" can be
257     economized further by using the identity:
258    
259     (n + 2^k)^2 = n^2 + n*2^(k+1) + 2^(2*k)
260    
261     The "n^2" term is significant because it means that if you have the
262     previous trial square and you are considering setting bit "k", you can
263     compute the next trial square from the previous square using only shifting
264     and adding.
265    
266     If you carry this to the extreme, because you initially assume no bits
267     are set, you start with n = n**2 = 0, and the first trial square does
268     not involve multiplication. So you can implement the entire square
269     algorithm without multiplying, even once.
270    
271     Anyway, applying all possible optimizations, one gets this:
272    
273     http://www.uculib.com/vvcuculib01/viewvc.cgi/uculib01/src/stm8/
274     cosmic/modx0/atu24sqrtfrxn/devel/alg_eval01.c?revision=1.7&view=markup
275    
276     or this in the 32-bit case:
277    
278     unsigned int sqrt32(unsigned long arg)
279     {
280     unsigned long two_2k_mask;
281     unsigned long trial_square;
282     unsigned long trial_square_prev;
283     unsigned long square_root_ls_ip1;
284    
285     two_2k_mask = 0x40000000UL;
286     trial_square = 0;
287     trial_square_prev = 0;
288     square_root_ls_ip1 = 0;
289    
290     while(1)
291     {
292     trial_square = (trial_square_prev)
293     + (square_root_ls_ip1)
294     + (two_2k_mask);
295    
296     square_root_ls_ip1 >>= 1;
297    
298     if (arg >= trial_square)
299     {
300     square_root_ls_ip1 |= two_2k_mask;
301     trial_square_prev = trial_square;
302     }
303    
304     if (two_2k_mask == 1)
305     break;
306    
307     two_2k_mask >>= 2;
308     }
309    
310     return(square_root_ls_ip1);
311     }
312    
313     If you look at the loop, there really isn't much there. There will be
314     16 iterations. The test against "1" to exit the loop can be implemented
315     as a single byte test (because a single "1" is being right-shifted, "1"
316     as the value of the least significant byte means that the other three
317     bytes are zero).
318    
319     Temporary storage requirements are about 4 times the size of the input
320     argument. So, for a 32-bit argument, that would be 16 bytes.
321     \end{verbatim}
322     \end{tiny}
323    
324    
325     %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
326     %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
327     %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
328     \subsection{The Babylonian Method}
329     \label{ctbg0:ssre0:sbab0}
330    
331     TBD.
332    
333    
334     %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
335     %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
336     %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
337     \subsection{Unprocessed Newsgroup Posts}
338     \label{ctbg0:ssre0:sunp0}
339    
340     Have to process these newsgroup posts:
341    
342     \begin{tiny}
343     \begin{verbatim}
344     I have the need in a microcontroller application to calculate floor(sqrt(x))
345     where x is approximately in the range of 2^32 - 2^64.
346    
347     What are the algorithms I should look at?
348    
349     These small microcontrollers are characterized by having a 8-bit times 8-bit
350     to give 16-bit result integer unsigned multiply instruction, and a similar
351     16/8 division instruction to give a quotient and a remainder.
352     Mulitplications of larger integers have to be done by multiplying the bytes
353     and adding.
354    
355     I'm aware of these algorithms:
356    
357     a)Adding consecutive odd integers and counting how many you have to add,
358     i.e. 1 + 3 = 4, 4 + 5 = 9, 9 + 7 = 16, 16 + 9 = 25, etc.
359    
360     b)Trial squaring, testing setting each bit to "1", i.e.
361    
362     http://www.uculib.com/vvcuculib01/viewvc.cgi/uculib01/src/stm8/cosmic/modxx/atu8sqrtfrxx/src/atu8sqrtfrxx.c?revision=1.3&view=markup
363    
364     c)Another method that seems to work (don't know the name of it), here is
365     source code:
366    
367     http://www.uculib.com/vvcuculib01/viewvc.cgi/uculib01/src/stm8/cosmic/modxx/atu16sqrtx10frxx/src/atu16sqrtx10frxx.c?revision=1.3&view=markup
368    
369     What other options should I investigate?
370    
371     ================================================================================
372     Here is some C code for integer square roots using many different
373     algorithms:
374     http://cap.connx.com/chess-engines/new-approach/jsqrt0.ZIP
375    
376     Worth a look:
377     http://www.azillionmonkeys.com/qed/sqroot.html
378    
379     Consider also:
380     http://www.google.com/#hl=en&source=hp&q=fast+integer+square+root&rlz=
381     1W1GGLD_en&aq=0&aqi=g2&oq=fast+integer+sq&fp=a048890d3c90c6fc
382     ================================================================================
383     > I have the need in a microcontroller application to calculate
384     > floor(sqrt(x)) where x is approximately in the range of 2^32 - 2^64.
385    
386     x = y^2 - (2^16)^2
387    
388     > What are the algorithms I should look at?
389    
390     Google is your friend :> There are many sqrt algorithms with
391     different tradeoffs (speed/size).
392    
393     I would, instead, ask that you might want to examine what *else*
394     you know about "x". I.e., you've already constrained the range
395     (why? does that give you any other information that can be
396     exploited?); are there any other characteristics about how
397     "x" is synthesized that you can exploit to divide and conquer?
398    
399     E.g. if x = y * z, then knowing sqrt(y) and/or sqrt(z) simplifies
400     your problem.
401    
402     > These small microcontrollers are characterized by having a 8-bit times
403     > 8-bit to give 16-bit result integer unsigned multiply instruction, and a
404     > similar 16/8 division instruction to give a quotient and a remainder.
405     > Mulitplications of larger integers have to be done by multiplying the
406     > bytes and adding.
407     >
408     > I'm aware of these algorithms:
409     >
410     > a)Adding consecutive odd integers and counting how many you have to add,
411     > i.e. 1 + 3 = 4, 4 + 5 = 9, 9 + 7 = 16, 16 + 9 = 25, etc.
412     >
413     > b)Trial squaring, testing setting each bit to "1", i.e.
414     >
415     > http://www.uculib.com/vvcuculib01/viewvc.cgi/uculib01/src/stm8/cosmic/modxx/atu8sqrtfrxx/src/atu8sqrtfrxx.c?revision=1.3&view=markup
416     >
417     > c)Another method that seems to work (don't know the name of it), here is
418     > source code:
419     >
420     > http://www.uculib.com/vvcuculib01/viewvc.cgi/uculib01/src/stm8/cosmic/modxx/atu16sqrtx10frxx/src/atu16sqrtx10frxx.c?revision=1.3&view=markup
421     >
422     > What other options should I investigate?
423     ================================================================================
424     > Hi,
425     >
426     > I have the need in a microcontroller application to calculate floor(sqrt(x))
427     > where x is approximately in the range of 2^32 - 2^64.
428     >
429     > What are the algorithms I should look at?
430     >
431     > These small microcontrollers are characterized by having a 8-bit times 8-bit
432     > to give 16-bit result integer unsigned multiply instruction, and a similar
433     > 16/8 division instruction to give a quotient and a remainder.
434     > Mulitplications of larger integers have to be done by multiplying the bytes
435     > and adding.
436     >
437     > I'm aware of these algorithms:
438     >
439     > a)Adding consecutive odd integers and counting how many you have to add,
440     > i.e. 1 + 3 = 4, 4 + 5 = 9, 9 + 7 = 16, 16 + 9 = 25, etc.
441     >
442     > b)Trial squaring, testing setting each bit to "1", i.e.
443     >
444     > http://www.uculib.com/vvcuculib01/viewvc.cgi/uculib01/src/stm8/cosmic...
445     >
446     > c)Another method that seems to work (don't know the name of it), here is
447     > source code:
448     >
449     > http://www.uculib.com/vvcuculib01/viewvc.cgi/uculib01/src/stm8/cosmic...
450     >
451     > What other options should I investigate?
452     >
453     > Thanks, Datesfat
454    
455     The one I have been using last 15-20 years is probably your B, at
456     least sounds
457     like what I made up back then - successive approximation for each bit
458     of the result.
459     If multiplication is fast on the core you will be using it it is hard
460     to beat.
461    
462     Dimiter
463     ================================================================================
464     > Hi,
465     >
466     > I have the need in a microcontroller application to calculate
467     > floor(sqrt(x))
468     > where x is approximately in the range of 2^32 - 2^64.
469     >
470     > What are the algorithms I should look at?
471     >
472     > These small microcontrollers are characterized by having a 8-bit times
473     > 8-bit
474     > to give 16-bit result integer unsigned multiply instruction, and a similar
475     > 16/8 division instruction to give a quotient and a remainder.
476     > Mulitplications of larger integers have to be done by multiplying the
477     > bytes
478     > and adding.
479     >
480     > I'm aware of these algorithms:
481     >
482     > a)Adding consecutive odd integers and counting how many you have to add,
483     > i.e. 1 + 3 = 4, 4 + 5 = 9, 9 + 7 = 16, 16 + 9 = 25, etc.
484     >
485     > b)Trial squaring, testing setting each bit to "1", i.e.
486     >
487     > http://www.uculib.com/vvcuculib01/viewvc.cgi/uculib01/src/stm8/cosmic...
488     >
489     > c)Another method that seems to work (don't know the name of it), here is
490     > source code:
491     >
492     > http://www.uculib.com/vvcuculib01/viewvc.cgi/uculib01/src/stm8/cosmic...
493     >
494     > What other options should I investigate?
495     >
496     > Thanks, Datesfat
497     >
498     >The one I have been using last 15-20 years is probably your B, at
499     >least sounds
500     >like what I made up back then - successive approximation for each bit
501     >of the result.
502     >If multiplication is fast on the core you will be using it it is hard
503     >to beat.
504    
505     Thanks for the insight.
506    
507     For small operands, I agree with your analysis.
508    
509     However, for larger operands, the Babylonian method:
510    
511     http://en.wikipedia.org/wiki/Methods_of_computing_square_roots#Babylonian_method
512    
513     looks very promising.
514    
515     The issue is that for the cost of one division (the /2 and addition is
516     virtually free) you get far more than one bit of additional information
517     about the square root.
518    
519     For larger operands, I don't believe that (b) will be the right answer any
520     longer.
521    
522     But I'm not sure yet ...
523     ================================================================================
524     > a)Adding consecutive odd integers and counting how many you have to add,
525     > i.e. 1 + 3 = 4, 4 + 5 = 9, 9 + 7 = 16, 16 + 9 = 25, etc.
526    
527     That quite certainly can't work for the input range you're targeting.
528     You would be looking at up to four billion iterations if x was 2^64
529    
530     > b)Trial squaring, testing setting each bit to "1", i.e.
531    
532     A whole lot better already. It becomes even better if you update the
533     square of your current result differentially, applying the binomial
534     formula. As you update the result-to-be ('res' in the following), by
535     adding a 1-bit at position 'k':
536    
537     x --> x + 2^k
538    
539     it's square, which eventually should be equal to the input number,
540     updates too:
541    
542     x^2 --> x^2 + 2*(2^k)*x + (2^k)^2
543     = x^2 + x<<(k+1) + 1<<(2*k)
544    
545     That's two additions and a bit of shifting. Even an 8-bit CPU should be
546     able to manage that in reasonable time. Basically you go looking for a
547     bit position 'k' such that the current difference between x^2 and the
548     target value is still larger than or equal to (x<<(k+1)+1<<(2*k)).
549    
550     This is actually a special case of an algorithm people used to be taught
551     in higher schools. Just like we all (hopefully) learned to do long
552     multiplication and long division in elementary school, there used to be
553     an algorithm for long square-roots. I've seen it in a textbook used to
554     teach metal workers' apprentices from the 1940s. It's a little daunting
555     in decimal numbers on paper, but boils down to the above when performed
556     in binary: you figure out one digit per iteration by looking at the
557     difference between the target figure and square of the result-to-be.
558     It's also somewhat similar to long division in that way.
559    
560     > What other options should I investigate?
561    
562     One traditional trick for sqrt(), and some other inverse functions, too,
563     is to treat it as a root-finding problem for the function
564    
565     f(x) := x^2 - y
566    
567     The x that solves f(x)=0 is sqrt(y). You can solve that using the
568     Newton-Raphson iteration algorithm, a.k.a. the tangent method:
569    
570     x' = x - f(x)/f'(x)
571     = x - (x^2 - y) / (2 * x)
572     = x - x/2 - y/(2*x)
573     = x/2 - (y/2)/x
574    
575     This requires long division, but pays off by converging on the solution
576     real fast, once you've come anywhere near it.
577    
578     This algorithm and the bit-by-bit one above are somewhat related. The
579     search for the right bit position, 'k', is effectively a simplified
580     version of the division (y/2)/x.
581     ================================================================================
582     > I have the need in a microcontroller application to calculate floor(sqrt(x))
583     > where x is approximately in the range of 2^32 - 2^64.
584     >
585     > What are the algorithms I should look at?
586    
587     Newton-Raphson or bisection, depending upon how slow division is. N-R is
588     asymptotically faster (the number of correct digits doubles at each step),
589     but each step requires a division.
590    
591     Bisection is essentially your option b), but it can be computed much more
592     efficiently than the algorithm given. You don't need arbitrary multiplies,
593     only shifts, as:
594    
595     (a+2^k)^2 = a^2 + 2.a.2^k + (2^k)^2
596     = a^2 + a.2^(k+1) + 2^2k
597    
598     IOW:
599    
600     uint32_t isqrt(uint64_t x)
601     {
602     uint64_t a = 0;
603     uint64_t a2 = 0;
604     int k;
605    
606     for (k = 31; k >= 0; k--) {
607    
608     uint64_t t = a2 + (a << (k+1)) + ((uint64_t)1 << (k + k));
609     if (x > t) {
610     a += 1 << k;
611     a2 = t;
612     }
613     }
614    
615     return a;
616     }
617    
618     Depending upon the CPU, it may be faster to calculate the a.2^(k+1) and
619     2^2k terms incrementally.
620     ================================================================================
621     Wiki
622    
623     http://en.wikipedia.org/wiki/Methods_of_computing_square_roots\\end{verbatim}
624     ================================================================================
625     Noticed your post on sci.math. I am surprised no one mentioned this one
626     from Joe Leherbauer that only uses shifts and adds. I cannot find the
627     original usenet message but a copy is here:
628    
629     http://groups.google.com/group/comp.lang.c/browse\_thread/thread/14ed11f99b6ac8ab/d4297a8de3ead321?hl=en\&ie=UTF-8\&q=\%22joe+Leherbauer\%22
630    
631     I clipped the message:
632    
633     <----------------------clipped from message---------------------------->
634     The function below is probably the fastest you can get with integer
635     operations only. It executes in constant time of 16 loop iterations
636     assuming a long is 32 bits, i.e. it's O(log(n)).
637     As a by-product it also produces the square root remainder.
638     More formally, it computes s and r, such that:
639    
640     s = floor(sqrt(n))
641     r = n - s * s
642    
643     I have to admit that I did not invent this.
644     It is from an anonymous Usenet source.
645    
646     unsigned long
647     i\_sqrt (unsigned long n, unsigned long * rem)
648     {
649     unsigned long r, s, t;
650    
651     r = 0;
652     t = (~0UL >> 2) + 1; /* largest power of 4 supported by data type */
653     do
654     {
655     s = r + t;
656     if ( n >= s )
657     {
658     n -= s;
659     r = s + t;
660     }
661     r >>= 1;
662     }
663     while ( (t >>= 2) != 0 );
664     *rem = n;
665    
666     return r;
667    
668     }
669    
670     ---
671     Joe Leherbauer Leherbauer at telering dot at
672     Registered Linux User \# 215803
673     ================================================================================
674     \end{tiny}
675    
676    
677     %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
678     %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
679     %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
680     \section[\emph{UcuAtS32S16v2CpDiva2FRxx(\protect\mbox{\protect$\cdot$})} Function Calculation Method]
681     {\emph{UcuAtS32S16v2CpDiva2FRxx(\protect\mbox{\protect\boldmath $\cdot$})} Function Calculation Method}
682     \label{ctbg0:svec2}
683    
684     \index{UcuAtS32S16v2CpDiva2FRxx()@\emph{UcuAtS32S16v2CpDiva2FRxx($\cdot$)}}%
685     The nominal calculation performed by the
686     \emph{UcuAtS32S16v2CpDiva2FRxx($\cdot$)} function is
687    
688     \begin{equation}
689     \label{eq:ctbg0:svec2:01}
690     g(a_x, a_y, b_x, b_y)
691     =
692     \left\lfloor \frac{\vec{a} \times \vec{b}}{| \vec{b} | \hat{k}} \right\rfloor
693     =
694     \left\lfloor \frac{a_x b_y - a_y b_x}{\sqrt{b_x^2 + b_y^2}} \right\rfloor ,
695     \end{equation}
696    
697     \noindent{}where the floor function of a negative argument rounds towards zero.
698    
699     Clearly, (\ref{eq:ctbg0:svec2:01}) isn't a convenient method of calcuation for an
700     inexpensive microcontroller, as the result
701     of the square root calculation might need to include a fractional part.
702    
703     (\ref{eq:ctbg0:svec2:01}) can be modified by squaring then taking the square root,
704     making adjustments for the loss of sign information caused by squaring:
705    
706     \begin{equation}
707     \label{eq:ctbg0:svec2:02}
708     \left\lfloor \frac{a_x b_y - a_y b_x}{\sqrt{b_x^2 + b_y^2}} \right\rfloor
709     =
710     \left\lfloor \sqrt{\frac{(a_x b_y - a_y b_x)^2}{b_x^2 + b_y^2}} \right\rfloor sgn (a_x b_y - a_y b_x).
711     \end{equation}
712    
713     Note in (\ref{eq:ctbg0:svec2:02})
714     the the remainder of the division can be discarded, i.e.
715    
716     \begin{equation}
717     \label{eq:ctbg0:svec2:03}
718     \left\lfloor \sqrt{\frac{(a_x b_y - a_y b_x)^2}{b_x^2 + b_y^2}} \right\rfloor
719     =
720     \left\lfloor \sqrt{\left\lfloor\frac{(a_x b_y - a_y b_x)^2}{b_x^2 + b_y^2}\right\rfloor} \right\rfloor .
721     \end{equation}
722    
723     \noindent{}(\ref{eq:ctbg0:svec2:03}) is valid because only the square root of an integer can be
724     an integer, so flooring the input to a square root calculation whose output is floored
725     will not have an effect.
726    
727     The size of the integer to be squared in (\ref{eq:ctbg0:svec2:03}) must be established.
728     $a_x b_y - a_y b_x$ may have a magnitude as large as $2^{31}-2^{15}$, so 31 bits are required (for an unsigned representation).
729     In practice, 32 bits (4 bytes) will be used for storage.
730    
731     The term $(a_x b_y - a_y b_x)^2$ may be as large as $(2^{31}-2^{15})^2 = 2^{62}-2^{47}+2^{30}$, so 62 bits are required.
732     In practice, 64 bits (8 bytes) will be used for storage.
733    
734     The most difficult question to answer is the upper bound on the result of the
735     division. One would suspect from the form of (\ref{eq:ctbg0:svec2:03})
736     that there is an upper bound less than $2^{62}-2^{47}+2^{30}$, as the
737     numerator and denominator both increase with increasing $b_x$ or $b_y$.
738     If an upper bound smaller than $2^{62}-2^{47}+2^{30}$ exists, it would make
739     both the division and the square root extraction more efficient.
740    
741     An upper bound on the result of the division in (\ref{eq:ctbg0:svec2:03})
742     need not be overly tight. The upper bound is only to assist in devising
743     efficient calculation.
744    
745     The maximum magnitude of $a_x b_y - a_y b_x$ can only occur when a negative
746     number is subtracted from a positive, or a positive number is subtracted from
747     a negative. The most positive number that can be formed as the product
748     of two UCU\_SINT16's is $2^{15}2^{15} = 2^{30}$. The most negative number that can be formed
749     is $-2^{15}(2^{15}-1) = -2^{30}+2^{15}$. The maximum magnitude of
750     $a_x b_y - a_y b_x$ is thus $2^{30} - (-2^{30}+2^{15}) = 2^{31} - 2^{15}$.
751    
752     To simplify the algebra (in exchange for a looser bound),
753     one can assume that both positive and negative
754     UCU\_SINT16's may reach a magnitude of $2^{15}$. Assuming
755     that $a_x = a_y = 2^{15}$, the maximum value of
756     (\ref{eq:ctbg0:svec2:03}) is
757    
758     \begin{eqnarray}
759     \nonumber
760     &
761     \displaystyle{
762     \sqrt{\frac{(2^{15} b_x + 2^{15} b_y)^2}{b_x^2 + b_y^2}}
763     =
764     \sqrt{\frac{2^{30}(b_x^2 + 2 b_x b_y + b_y^2)}{b_x^2 + b_y^2}}
765     }
766     & \\
767     \label{eq:ctbg0:svec2:04}
768     &
769     \displaystyle{
770     = \sqrt{2^{30} \frac{b_x^2 + b_y^2}{b_x^2 + b_y^2} + 2^{31} \frac{b_x b_y}{b_x^2 + b_y^2}}
771     }
772     & \\
773     \nonumber
774     &
775     \displaystyle{
776     = \sqrt{2^{30} + 2^{31} \frac{b_x b_y}{b_x^2 + b_y^2}}
777     }
778     &
779     \end{eqnarray}
780    
781     (\ref{eq:ctbg0:svec2:04}) implies that the maximum value of (\ref{eq:ctbg0:svec2:03})
782     depends on the maximum of the function
783    
784     \begin{equation}
785     \label{eq:ctbg0:svec2:05}
786     f(b_x, b_y) = \frac{b_x b_y}{b_x^2 + b_y^2} , \;\; b_x, b_y \in \mathbb{Z}^+ .
787     \end{equation}
788    
789     Several helpful posters\footnote{Thanks to Ray Vickson, Gerry Myerson, Rob Johnson, and ``Scattered''.}
790     on \texttt{sci.math} provided proofs that
791     (\ref{eq:ctbg0:svec2:05}) could not exceed 1/2. The proof that I found easiest to understand
792     was provided by Gerry Myerson. Consider the function
793    
794     \begin{equation}
795     \label{eq:ctbg0:svec2:06}
796     h(b_x, b_y)
797     =
798     \frac{1}{2} - f(b_x, b_y)
799     =
800     \frac{1}{2} \left( \frac{b_x^2 - 2 b_x b_y + b_y^2}{b_x^2 + b_y^2} \right)
801     =
802     \frac{1}{2} \left( \frac{(b_x - b_y)^2}{b_x^2 + b_y^2} \right).
803     \end{equation}
804    
805     $h(b_x, b_y)$ is non-negative over the domain of interest, proving that $f(b_x, b_y)$ cannot exceed $1/2$.
806    
807     Substituting the maximum value of 1/2 into (\ref{eq:ctbg0:svec2:04})
808     leads to
809    
810     \begin{equation}
811     \label{eq:ctbg0:svec2:07}
812     \frac{(a_x b_y - a_y b_x)^2}{b_x^2 + b_y^2}
813     \leq
814     2^{30} + 2^{31} \left( \frac{1}{2} \right)
815     =
816     2^{30} + 2^{30}
817     =
818     2^{31} .
819     \end{equation}
820    
821     \noindent{}(\ref{eq:ctbg0:svec2:07})
822     establishes that the result of the division may not exceed 32 bits. This is
823     an important result, as the dividend of the division is 64 bits, requiring
824     64 rounds in the general case to obtain a quotient. Knowledge that the
825     quotient may not exceed 32 bits will approximately halve the division time, as
826     only 32 iterations rather than 64 need to be performed.
827    
828     The square root extraction will involve a 32-bit input (the quotient from the
829     division) and a 16-bit result.
830    
831    
832     %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
833     \noindent\begin{figure}[!b]
834     \noindent\rule[-0.25in]{\textwidth}{1pt}
835     \begin{tiny}
836     \begin{verbatim}
837     $RCSfile: c_tbg0.tex,v $
838     $Source: /home/dashley/cvsrep/uculib01/uculib01/doc/manual/c_tbg0/c_tbg0.tex,v $
839     $Revision: 1.12 $
840     $Author: dashley $
841     $Date: 2010/04/05 15:13:50 $
842     \end{verbatim}
843     \end{tiny}
844     \noindent\rule[0.25in]{\textwidth}{1pt}
845     \end{figure}
846    
847     %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
848     %$Log: c_tbg0.tex,v $
849     %Revision 1.12 2010/04/05 15:13:50 dashley
850     %Edits.
851     %
852     %Revision 1.11 2010/04/05 12:17:12 dashley
853     %Edits.
854     %
855     %Revision 1.10 2010/04/01 21:07:34 dashley
856     %Edits.
857     %
858     %Revision 1.9 2010/04/01 20:21:50 dashley
859     %Edits.
860     %
861     %Revision 1.8 2010/03/03 16:30:35 dashley
862     %Edits.
863     %
864     %Revision 1.7 2010/03/03 00:19:21 dashley
865     %Edits.
866     %
867     %Revision 1.6 2010/02/05 03:04:01 dashley
868     %Additional square root information added.
869     %
870     %Revision 1.5 2010/02/05 02:58:12 dashley
871     %Newsgroup post text added.
872     %
873     %Revision 1.4 2010/01/28 21:18:33 dashley
874     %a)Chapter start quotes removed.
875     %b)Aesthetic comment line added at the bottom of most files.
876     %
877     %Revision 1.3 2007/10/08 18:16:34 dtashley
878     %Edits.
879     %
880     %Revision 1.2 2007/10/07 18:11:22 dtashley
881     %Edits.
882     %
883     %Revision 1.1 2007/10/07 01:22:01 dtashley
884     %Initial checkin.
885     %
886     %End of $RCSfile: c_tbg0.tex,v $.
887     %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
888    

dashley@gmail.com
ViewVC Help
Powered by ViewVC 1.1.25