%$Header$ \chapter{Rational Linear Approximation} \label{crat0} \beginchapterquote{``Die ganzen Zahlen hat der liebe Gott gemacht, alles andere ist Menschenwerk.''\footnote{German language: God made the integers; everything else was made by man.}} {Leopold Kronecker} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Introduction} %Section tag: INT0 \label{crat0:sint0} In this chapter, we consider practical applications of rational approximation. Chapters \cfryzeroxrefhyphen\ref{cfry0} and \ccfrzeroxrefhyphen\ref{ccfr0} have presented algorithms for finding the closest rational numbers to an arbitrary real number, subject to constraints on the numerator and denominator. The basis of these algorithms is complex and comes from number theory, and so these algorithms and their basis have been presented in separate chapters. In Section \ref{crat0:srla0}, rational linear approximation itself and associated error bounds are presented. By \emph{rational linear approximation} we mean simply the approximation of a line $y = r_I x$ ($y, r_I, x \in \vworkrealset$) by a line \begin{equation} \label{eq:crat0:sint0:01} y = \left\lfloor \frac{h \lfloor x \rfloor + z}{k} \right\rfloor , \end{equation} \noindent{}where we choose $h/k \approx r_I$ and optionally choose $z$ to shift the error introduced. Note that (\ref{eq:crat0:sint0:01}) is very economical for microcontroller instruction sets, since only integer arithmetic is required. We may choose $h/k$ from a Farey series (see Chapters \cfryzeroxrefhyphen\ref{cfry0} and \ccfrzeroxrefhyphen\ref{ccfr0}), or we may choose a ratio $h/2^q$ so that the division in (\ref{eq:crat0:sint0:01}) can be implemented by a bitwise right shift. Section \ref{crat0:srla0} discusses linear rational approximation in general, with a special eye on error analysis. Section \ref{crat0:spwi0} discusses piecewise linear rational approximation, which is the approximation of a curve or complex mapping by a number of joined line segments. Section \ref{crat0:sfdv0} discusses frequency division and rational counting. Such techniques share the same mathematical framework as rational linear approximation, and as with rational linear approximation the ratio involved may be chosen from a Farey series or with a denominator of $2^q$, depending on the algorithm employed. Section \ref{crat0:sbla0} discusses Bresenham's classic line algorithm, which is a practical application of rational linear approximation. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Rational Linear Approximation} %Section tag: RLA0 \label{crat0:srla0} It occurs frequently in embedded software design that one wishes to implement a linear scaling from a domain to a range of the form \begin{equation} \label{eq:crat0:srla0:01} f(x) = r_I x , \end{equation} \noindent{}where $r_I$ is the \emph{ideal} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \subsection{Model Functions} %Section tag: mfu0 \label{crat0:srla0:smfu0} In general, we seek to approximate the ideal function \noindent{}by some less ideal function where \begin{itemize} \item $r_A \neq r_I$, although we seek to choose $r_A \approx r_I$. \item The input to the function, $x$, may already contain quantization error. \item Although $r_I x \in \vworkrealsetnonneg$, we must choose an integer as the function output. \end{itemize} In modeling quantization error, we use the floor function\index{floor function} ($\lfloor\cdot\rfloor$) for algebraic simplicity. The floor function precisely describes the behavior of integer division instructions (where remainders are discarded), but may not describe other sources of quantization, such as quantization that occurs in A/D conversion. However, techniques identical to those presented in this section may be used when quantization is not best described by the floor function, and these results are left to the reader. Traditionally, because addition of integers is an inexpensive machine operation, a parameter $z \in \vworkintset$ may optionally be added to the product $hx$ in order to round or otherwise shift the result. If $x$ is assumed to be without error, the ideal function is given by (\ref{eq:crat0:srla0:smfu0:01}), whereas the function that can be economically implemented is \begin{equation} \label{eq:crat0:srla0:smfu0:02} g(x) = \left\lfloor \frac{hx + z}{k} \right\rfloor = \left\lfloor r_A x + \frac{z}{k} \right\rfloor . \end{equation} If, on the other hand, $x$ may be already quantized, the function that can actually be implemented is \begin{equation} \label{eq:crat0:srla0:smfu0:03} h(x) = \left\lfloor \frac{h \lfloor x \rfloor + z}{k} \right\rfloor = \left\lfloor r_A \lfloor x \rfloor + \frac{z}{k} \right\rfloor . \end{equation} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section[\protect\mbox{\protect$h/2^q$} and \protect\mbox{\protect$2^q/k$} Rational Linear Approximation] {\protect\mbox{\protect\boldmath$h/2^q$} and \protect\mbox{\protect\boldmath$2^q/k$} Rational Linear Approximation} %Section tag: HQQ0 \label{crat0:shqq0} \index{h/2q@$h/2^q$ rational linear approximation} \index{rational linear approximation!h/2q@$h/2^q$} The algorithms presented in Chapters \cfryzeroxrefhyphen\ref{cfry0} and \ccfrzeroxrefhyphen\ref{ccfr0} will always provide the rational number $h/k$ closest to an arbitrary real number $r_I$ subject to the constraints $h \leq h_{MAX}$ and $k \leq k_{MAX}$. However, because shifting in order to implement multiplication or division by a power of 2 is at least as fast (and often \emph{much} faster) on all processors as arbitrary multiplication or division, and because not all processors have multiplication and division instructions, it is worthwhile to examine choosing $h/k$ so that either $h$ or $k$ are powers of 2. There are thus three rational linear approximation techniques to be examined: \begin{enumerate} \item \emph{$h/k$ rational linear approximation}, in which an arbitrary $h \leq h_{MAX}$ and an arbitrary $k \leq k_{MAX}$ are used, with $r_A = h/k$. $h$ and $k$ can be chosen using the algorithms presented in Chapters \cfryzeroxrefhyphen\ref{cfry0} and \ccfrzeroxrefhyphen\ref{ccfr0}. Implementation of this technique would most often involve a single integer multiplication instruction to form the product $hx$, followed by an optional single addition instruction to form the sum $hx+z$, and then followed by by a single division instruction to form the quotient $\lfloor (hx+z)/k \rfloor$. Implementation may also less commonly involve multiplication, addition, and division of operands too large to be processed with single machine instructions. \item \emph{$h/2^q$ rational linear approximation}, in which an arbitrary $h \leq h_{MAX}$ and an integral power of two $k=2^q$ are used, with $r_A = h/2^q$. Implementation of this technique would most often involve a single integer multiplication instruction to form the product $hx$, followed by an optional single addition instruction to form the sum $hx+z$, and then followed by right shift instruction(s) to form the quotient $\lfloor (hx+z)/2^q \rfloor$. Implementation may also less commonly involve multiplication, addition, and right shift of operands too large to be processed with single machine instructions. \item \emph{$2^q/k$ rational linear approximation}, in which an integral power of two $h=2^q$ and an arbitrary $k \leq k_{MAX}$ are used, with $r_A = 2^q/k$. Implementation of this technique would most often involve left shift instruction(s) to form the product $2^qx$, followed by an optional single addition instruction to form the sum $2^qx+z$, and then followed by a single division instruction to form the quotient $\lfloor (2^qx+z)/k \rfloor$. Implementation may also less commonly involve left shift, addition, and division of operands too large to be processed with single machine instructions. \end{enumerate} We use the nomenclature ``\emph{$h/k$ rational linear approximation}'', ``\emph{$h/2^q$ rational linear approximation}'', and ``\emph{$2^q/k$ rational linear approximation}'' to identify the three techniques enumerated above. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \subsection{Integer Arithmetic and Processor Instruction Set Characteristics} %Subsection tag: pis0 \label{crat0:shqq0:pis0} The following observations about integer arithmetic and about processors used in embedded control can be made: \begin{enumerate} \item \label{enum:crat0:shqq0:pis0:01:01a} \emph{Shifting is the fastest method of integer multiplication or division (by $2^q$ only), followed by utilization of the processor multiplication or division instructions (for arbitrary operands), followed by software implementation of multiplication or division (for arbitrary operands).} Relative costs vary depending on the processor, but the monotonic ordering always holds. $h/2^q$ and $2^q/k$ rational linear approximation are thus worthy of investigation. (Note also that in many practical applications of $h/2^q$ and $2^q/k$ rational linear approximation, the required shift is performed by addressing the operand with an offset, and so has no cost.) \item \label{enum:crat0:shqq0:pis0:01:01b} \emph{Shifting is $O(N)$ (where $N$ is the number of bits in the argument), but both multiplication and division are $O(N^2)$ for practical\footnote{\index{Karatsuba multiplication}Karatsuba multiplication, for example, is $O(N^{\log_2 3}) \approx O(N^{1.58}) \ll O(N^2)$. However, Karatsuba multiplication cannot be applied economically to the small operands that typically occur in embedded control work. It would be rare in embedded control applications for the length of a multiplication operand to exceed four times the length that is accommodated by a machine instruction; and this is far below the threshold at which Karatsuba multiplication is economical. Thus, for all intents and purposes in embedded control work, multiplication is $O(N^2)$.} operands (where $N$ is the number of bits in each operand).} It follows that $2^q/k$ and $h/2^q$ rational linear approximation will scale to large operands better than $h/k$ rational linear approximation. \item \label{enum:crat0:shqq0:pis0:01:02a} \emph{Integer division instructions take as long or longer than integer multiplication instructions.} In designing digital logic to implement basic integer arithmetic, division is the operation most difficult to perform economically.\footnote{For some processors, the penalty is extreme. For example, on the NEC V850 (a RISC processor), a division requires 36 clock cycles, whereas multiplication, addition, and subtraction each effectively require 1 clock cycle.} It follows that multiplication using operands that exceed the machine's word size is often far less expensive than division using operands that exceed the machine's word size. \item \label{enum:crat0:shqq0:pis0:01:03a} \emph{All processors that have an integer division instruction also have an integer multiplication instruction.} Phrased differently, no processor has an integer division instruction but no integer multiplication instruction. \end{enumerate} Enumerated items (\ref{enum:crat0:shqq0:pis0:01:01a}) through (\ref{enum:crat0:shqq0:pis0:01:03a}) above lead to the following conclusions. \begin{enumerate} \item $h/2^q$ rational linear approximation is likely to be implementable more efficiently on most processors than $h/k$ rational linear approximation. (\emph{Rationale:} shift instruction(s) or accessing a memory address with an offset is likely to be more economical than division, particularly if $k$ would exceed the native operand size of the processor.) \item $h/2^q$ rational linear approximation is likely to be a more useful technique than $2^q/k$ rational linear approximation. (\emph{Rationale:} the generally high cost of division compared to multiplication, and the existence of processors that possess a multiplication instruction but no division instruction.) \end{enumerate} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \subsection[Design Procedure For \protect\mbox{\protect$h/2^q$} Rational Linear Approximations] {Design Procedure For \protect\mbox{\protect\boldmath$h/2^q$} Rational Linear Approximation} %Subsection tag: dph0 \label{crat0:shqq0:dph0} An $h/2^q$ rational linear approximation is parameterized by: \begin{itemize} \item The unsigned or signed nature of $h$ and $x$. (Rational linear approximations may involve either signed or unsigned domains and ranges. Furthermore, signed integers may be maintained using either 2's-complement or sign-magnitude representation, and the processor instruction set may or may not directly support signed multiplication.) \item $r_I$, the real number we wish to approximate by $r_A = h/2^q$. \item $x_{MAX}$, the maximum possible value of the input argument $x$. (Typically, software contains a test to clip the output if $x > x_{MAX}$.) \item $w_h$, the width in bits allowed for $h$. (Typically, $w_h$ is the maximum operand size of a machine multiplication instruction.) \item $w_r$, the width in bits allowed for the result $hx$. (Typically, $w_r$ is the maximum result size of a machine multiplication instruction.) \item The rounding mode when choosing $h$ (and thus effectively $r_A$) based on $r_I$. It is common to choose the closest value, $r_A=\lfloor r_I 2^q + 1/2 \rfloor/2^q$ or $r_A=\lceil r_I 2^q - 1/2 \rceil/2^q$, but other choices are possible. \item The rounding mode for the result (i.e. the choice of $z$ in Eq. \ref{eq:crat0:sint0:01}). \end{itemize} This section develops a design procedure for $h/2^q$ rational linear approximations with the most typical set of assumptions: unsigned arithmetic, $r_A=\lfloor r_I 2^q + 1/2 \rfloor/2^q$, and $z=0$. Design procedures for other scenarios are presented as exercises. By definition, $h$ is constrained in two ways: \begin{equation} \label{eq:crat0:shqq0:dph0:00} h \leq 2^{w_h} - 1 \end{equation} \noindent{}and \begin{equation} \label{eq:crat0:shqq0:dph0:01} h \leq \frac{2^{w_r} - 1}{x_{MAX}} . \end{equation} \noindent{}(\ref{eq:crat0:shqq0:dph0:00}) comes directly from the requirement that $h$ fit in $w_h$ bits. (\ref{eq:crat0:shqq0:dph0:01}) comes directly from the requirement that $hx$ fit in $w_r$ bits. (\ref{eq:crat0:shqq0:dph0:00}) and (\ref{eq:crat0:shqq0:dph0:01}) may be combined to form one inequality: \begin{equation} \label{eq:crat0:shqq0:dph0:02} h \leq \min \left( { 2^{w_h} - 1, \frac{2^{w_r} - 1}{x_{MAX}} } \right ) . \end{equation} If $q$ is known, the choice of $h$ that will be made so as to minimize $|r_A-r_I| = |h/2^q - r_I|$ is \begin{equation} \label{eq:crat0:shqq0:dph0:03} h=\left\lfloor r_I 2^q + \frac{1}{2} \right\rfloor . \end{equation} \noindent{}It is required that the choice of $h$ specified by (\ref{eq:crat0:shqq0:dph0:03}) meet (\ref{eq:crat0:shqq0:dph0:02}). Making the most pessimistic assumption about the rounding of $h$ and substituting into (\ref{eq:crat0:shqq0:dph0:02}) leads to \begin{equation} \label{eq:crat0:shqq0:dph0:04} r_I 2^q + \frac{1}{2} \leq \min \left( { 2^{w_h} - 1, \frac{2^{w_r} - 1}{x_{MAX}} } \right ) . \end{equation} \noindent{}Isolating $q$ in (\ref{eq:crat0:shqq0:dph0:04}) yields \begin{equation} \label{eq:crat0:shqq0:dph0:05} 2^q \leq \frac{\min \left( { 2^{w_h} - 1, \frac{2^{w_r} - 1}{x_{MAX}} } \right ) - \frac{1}{2}} {r_I}. \end{equation} \noindent{}Solving (\ref{eq:crat0:shqq0:dph0:05}) for maximum value of $q$ that meets the constraint yields \begin{equation} \label{eq:crat0:shqq0:dph0:06} q= \left\lfloor { \log_2 \left( { \frac{\min \left( { 2^{w_h} - 1, \frac{2^{w_r} - 1}{x_{MAX}} } \right ) - \frac{1}{2}}{r_I} } \right) } \right\rfloor . \end{equation} \noindent{}(\ref{eq:crat0:shqq0:dph0:06}) can be rewritten for easier calculation using most calculators (which do not allow the direct evaluation of base-2 logarithms): \begin{equation} \label{eq:crat0:shqq0:dph0:07} q= \left\lfloor \frac { { \ln \left( { \frac{\min \left( { 2^{w_h} - 1, \frac{2^{w_r} - 1}{x_{MAX}} } \right ) - \frac{1}{2}}{r_I} } \right) } } {\ln 2} \right\rfloor . \end{equation} \noindent{}Once $q$ is established using (\ref{eq:crat0:shqq0:dph0:07}), $h$ can be calculated using (\ref{eq:crat0:shqq0:dph0:03}). In embedded control work (as well as in operating system internals), $h/2^q$ rational linear approximations are often used in conjunction with tabulated constants or calibratable parameters where each constant or calibratable parameter may vary over a range of $[0, r_I]$, and where $r_I$ is the value used in the design procedure presented above. In these applications, the values of $h$ are tabulated, but $q$ is invariant (usually hard-coded) and is chosen at design time based on the upper bound $r_I$ of the interval $[0, r_I]$ in which each tabulated constant or calibratable parameter will fall. With $q$ fixed, $r_A$ can be adjusted in steps of $1/2^q$. If $r_I$ is invariant, a final design step may be to reduce the rational number $h/2^q$ by dividing some or all occurrences of 2 as a factor from both the numerator and denominator. With some processors and in some applications, this may save execution time by reducing the number of shift instructions that must be executed, reducing the execution time of the shift instructions that are executed, or allowing shifting via offset addressing. For example, on a byte-addressible machine, if the design procedure yields $h=608$ and $q=10$, it may be desirable to divide both $h$ and $2^q$ by 4 to yield $h=152$ and $q=8$, as this allows the shift by 8 to be done by fetching alternate bytes (rather than by actual shifting). In other applications, it may be desirable to remove \emph{all} occurrences of 2 as a prime factor from $h$. For an invariant $r_I$, a suitable design procedure is: \begin{enumerate} \item Choose $q$ using (\ref{eq:crat0:shqq0:dph0:07}). \item With $q$ fixed, choose $h$ using (\ref{eq:crat0:shqq0:dph0:03}). \item If economies can be achieved on the target processor, examine the possibility of removing some or all occurrences of 2 as a prime factor from $h$ and decreasing $q$. \end{enumerate} For tabulated or calibratable constants in the interval $[0,r_I]$, a suitable design procedure is to use the procedure presented immediately above but without the third step. Each tabulated value of $h$ is chosen using (\ref{eq:crat0:shqq0:dph0:03}). %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \subsection[Design Procedure For \protect\mbox{\protect$2^q/k$} Rational Linear Approximations] {Design Procedure For \protect\mbox{\protect\boldmath$2^q/k$} Rational Linear Approximation} %Subsection tag: dpk0 \label{crat0:shqq0:dpk0} TBD. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Piecewise Rational Linear Approximation} %Section tag: PWI0 \label{crat0:spwi0} TBD. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section[Frequency Division And Rational Counting] {Frequency Division And Rational Counting Techniques} %Section tag: FDV0 \label{crat0:sfdv0} \index{frequency division}\index{rational counting}\index{counting}% Often, software must ``divide down'' an execution rate. For example, an interrupt service routine may be scheduled by hardware every 10ms, but may perform useful processing only every 50ms. This requires that the ISR maintain a counter and only perform useful processing every fifth invocation. This section deals with counting strategies used to achieve invocation frequency division and other similar results. Frequency division and rational counting techniques presented in this section find application primarily in the following scenarios: \begin{itemize} \item ISRs and other software components which must divide down their invocation rate. \item Pulse counting and scaling from encoders and other similar systems. \item The correction of inaccuracies in timebases (such as crystals which oscillate at a frequency different than the nominal rate). \end{itemize} Because the techniques presented must be usable with inexpensive microcontrollers, such techniques must meet these constraints: \begin{enumerate} \item \label{enum:01:crat0:sfdv0:econex} The counting techniques must be economical to execute on an inexpensive microcontroller. \item \label{enum:01:crat0:sfdv0:econcccalc} An inexpensive microcontroller must be capable of calculating any constants used as limits in counting (i.e. it cannot necessarily be assumed that a more powerful computer calculates these constants, and it cannot be assumed that these limits do not change on the fly). \end{enumerate} In this section, we analyze the behavior of several types of rational counting algorithms, supplied as Algorithms \ref{alg:crat0:sfdv0:01a} through \ref{alg:crat0:sfdv0:02a}. \begin{algorithm} \begin{verbatim} /* The constants K1 through K4, which parameterize the */ /* counting behavior, are assumed assigned elsewhere in */ /* the code. The solution is analyzed in terms of the */ /* parameters K1 through K4. */ /* */ /* We also place the following restrictions on K1 through */ /* K4: */ /* K1 : K1 <= K3 - K2. */ /* K2 : K4 > K2 > 0. */ /* K3 : No restrictions. */ /* K4 : K4 > K2 > 0. */ void base_rate_sub(void) { static int state = K1; state += K2; if (state >= K3) { state -= K4; A(); } } \end{verbatim} \caption{Rational Counting Algorithm For $K_2/K_4 < 1$} \label{alg:crat0:sfdv0:01a} \end{algorithm} \begin{algorithm} \begin{verbatim} /* The constants K1 through K4, which parameterize the */ /* counting behavior, are assumed assigned elsewhere in */ /* the code. The solution is analyzed in terms of the */ /* parameters K1 through K4. */ /* */ /* We also place the following restrictions on K1 through */ /* K4: */ /* K1 : K1 <= K3 - K2. */ /* K2 : K2 > 0. */ /* K3 : No restrictions. */ /* K4 : K4 > 0. */ void base_rate_sub(void) { static int state = K1; state += K2; while (state >= K3) { state -= K4; A(); } } \end{verbatim} \caption{Rational Counting Algorithm For $K_2/K_4 \geq 1$} \label{alg:crat0:sfdv0:01b} \end{algorithm} \begin{algorithm} \begin{verbatim} /* The constants K1, K2, and K4, which parameterize the */ /* counting behavior, are assumed assigned elsewhere in */ /* the code. The solution is analyzed in terms of the */ /* parameters K1 through K4. */ /* */ /* We also place the following restrictions on K1, K2, */ /* and K4: */ /* K1 : K1 >= 0. */ /* K2 : K4 > K2 > 0. */ /* K4 : K4 > K2 > 0. */ /* */ /* Special thanks to Chuck B. Falconer (of the */ /* comp.arch.embedded newsgroup) for this rational */ /* counting algorithm. */ /* */ /* Note below that the test against K3 does not exist, */ /* instead a test against zero is used, which many */ /* machine instruction sets will do as part of the */ /* subtraction (but perhaps this needs to be coded in */ /* A/L). This saves machine code and also eliminates */ /* one unnecessary degree of freedom (K3). */ void base_rate_sub(void) { static int state = K1; if ((state -= K2) < 0) { state += K4; A(); } } \end{verbatim} \caption{Zero-Test Rational Counting Algorithm For $K_2/K_4 < 1$} \label{alg:crat0:sfdv0:01c} \end{algorithm} \begin{algorithm} \begin{verbatim} ;Special thanks to John Larkin (of the comp.arch.embedded ;newsgroup) for this rational counting algorithm. ; ;This is the TMS-370C8 assembly-language version of the ;algorithm. The algorithm is parameterized solely by ;K1 and K2, with no restrictions on their values, because ;the values are naturally constrained by the data types. ;K1, which is the initial value of "state", is assumed ;assigned elsewhere. The snippet shown here uses only ;K2. MOV state, A ;Get "state". ADD #K2, A ;Increase by K2. Carry flag ;will be set if rollover to or ;past zero. PUSH ST ;Save carry flag. MOV A, state ;Move new value back. POP ST ;Restore carry flag. JNC done ;If didn't roll, don't run sub. CALL A_SUBROUTINE ;Run sub. done: /* This is the 'C' version of the algorithm. It is not */ /* as easy or efficient in 'C' to detect rollover. */ void base_rate_sub(void) { static unsigned int state = K1; unsigned int old_state; old_state = state; state += K2; if (state < old_state) { A(); } } \end{verbatim} \caption{$2^q$ Rollover Rational Counting Algorithm} \label{alg:crat0:sfdv0:01d} \end{algorithm} \begin{algorithm} \begin{verbatim} /* The constants K1 through K4, which parameterize the */ /* counting behavior, are assumed assigned elsewhere in */ /* the code. The solution is analyzed in terms of the */ /* parameters K1 through K4. */ /* */ /* We also place the following restrictions on K1 through */ /* K4: */ /* K1 : K1 <= K3. */ /* K2 : K2 > 0. */ /* K3 : No restrictions. */ /* K4 : K4 > 0. */ void base_rate_sub(void) { static unsigned int state = K1; if (state >= K3) { state -= K4; A(); } else { state += K2; B(); } } \end{verbatim} \caption{Rational Counting Algorithm With \texttt{else} Clause} \label{alg:crat0:sfdv0:02a} \end{algorithm} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \subsection[Properties Of Algorithm \ref{alg:crat0:sfdv0:01a}] {Properties Of Rational Counting Algorithm \ref{alg:crat0:sfdv0:01a}} %Section tag: PRC0 \label{crat0:sfdv0:sprc0} Algorithm \ref{alg:crat0:sfdv0:01a} is used frequently in microcontroller software. A base rate subroutine\footnote{For brevity, we usually call this just the \emph{base subroutine}.} (named ``\texttt{base\_rate\_sub()}'' in the algorithm) is called at a periodic rate, and subroutine ``\texttt{A()}'' is called at a lesser rate. We are interested in determining the relationships between the rates as a function of $K_1$, $K_2$, $K_3$, and $K_4$; and we are interested in developing other properties. Notationally when analyzing rational counting algorithms, we agree that $state_n$ denotes the value of the \texttt{state} variable after the $n$th invocation and before the $n+1$'th invocation of the base rate subroutine. Using this convention with Algorithm \ref{alg:crat0:sfdv0:01a}, $state_0 = K_1$.\footnote{Algorithm \ref{alg:crat0:sfdv0:01a} requires a knowledge of `C' to fully understand. The \texttt{static} keyword ensures that the variable \texttt{state} is initialized only once, at the time the program is loaded. \texttt{state} is \emph{not} initialized each time the base subroutine runs.} We can first easily derive the number of initial invocations of the base subroutine before ``\texttt{A()}'' is called for the first time. \begin{vworklemmastatement} \label{lem:crat0:sfdv0:sprc0:01} $N_{STARTUP}$, the number of invocations of the base subroutine in Algorithm \ref{alg:crat0:sfdv0:01a} before ``\texttt{A()}'' is called for the first time, is given by \begin{equation} \label{eq:lem:crat0:sfdv0:sprc0:01:01} N_{STARTUP} = \left\lceil { \frac{-K_1 - K_2 + K_3}{K_2} } \right\rceil . \end{equation} \end{vworklemmastatement} \begin{vworklemmaproof} The value of \texttt{state} after the $n$th invocation is $state_n = K_1 + n K_2$. In order for the test in the \texttt{if()} statement not to be met, we require that \begin{equation} \label{eq:lem:crat0:sfdv0:sprc0:01:02} K_1 + n K_2 < K_3 \end{equation} \noindent{}or equivalently that \begin{equation} \label{eq:lem:crat0:sfdv0:sprc0:01:03} n < \frac{K_3 - K_1}{K_2} . \end{equation} Solving (\ref{eq:lem:crat0:sfdv0:sprc0:01:03}) for the largest value of $n \in \vworkintset$ which still meets the criterion yields (\ref{eq:lem:crat0:sfdv0:sprc0:01:01}). Note that the derivation of (\ref{eq:lem:crat0:sfdv0:sprc0:01:01}) requires that the restrictions on $K_1$ through $K_4$ documented in Algorithm \ref{alg:crat0:sfdv0:01a} be met. \end{vworklemmaproof} \begin{vworklemmaparsection}{Remarks} Note that if one chooses $K_1 > K_3 - K_2$ (in contradiction to the restrictions in Algorithm \ref{alg:crat0:sfdv0:01a}), it is possible to devise a counting scheme (and results analogous to this lemma) where ``\texttt{A()}'' is run a number of times before it is \emph{not} run for the first time. The construction of an analogous lemma is the topic of Exercise \ref{exe:crat0:sexe0:01}. \end{vworklemmaparsection} \begin{vworklemmastatement} \label{lem:crat0:sfdv0:sprc0:02} Let $N_I$ be the number of times the Algorithm \ref{alg:crat0:sfdv0:01a} base subroutine is called, let $N_O$ be the number of times the ``\texttt{A()}'' subroutine is called, let $f_I$ be the frequency of invocation of the Algorithm \ref{alg:crat0:sfdv0:01a} base subroutine, and let $f_O$ be the frequency of invocation of ``\texttt{A()}''. Provided the constraints on $K_1$ through $K_4$ documented in Algorithm \ref{alg:crat0:sfdv0:01a} are met, \begin{equation} \label{eq:lem:crat0:sfdv0:sprc0:02:01} \lim_{N_I\rightarrow\infty}\frac{N_O}{N_I} = \frac{f_O}{f_I} = \frac{K_2}{K_4} . \end{equation} \end{vworklemmastatement} \begin{vworklemmaproof} (\ref{eq:lem:crat0:sfdv0:sprc0:02:01}) indicates that once the initial delay (determined by $K_1$ and $K_3$) has finished, $N_O/N_I$ will converge on a steady-state value of $K_2/K_4$. Assume that $K_1=0$ and $K_3=K_4$. The conditional subtraction then calculates $state \bmod K_4$. After the $n$th invocation of the base subroutine, the value of \texttt{state} will be \begin{equation} \label{eq:lem:crat0:sfdv0:sprc0:02:02} state_n|_{K_1=0, K_3=K_4} = n K_2 \bmod K_4 . \end{equation} Assume that for two distinct values of $n \in \vworkintsetnonneg$, $n_1$ and $n_2$, the value of the \texttt{state} variable is the same: \begin{equation} \label{eq:lem:crat0:sfdv0:sprc0:02:03} n_1 K_2 \bmod K_4 = n_2 K_2 \bmod K_4. \end{equation} Then \begin{equation} \label{eq:lem:crat0:sfdv0:sprc0:02:04} (n_2 - n_1) K_2 = i K_4, \; \exists i \in \vworkintsetpos . \end{equation} However, we have no knowledge of whether $K_2$ and $K_4$ are coprime (they are not required to be). We may rewrite (\ref{eq:lem:crat0:sfdv0:sprc0:02:04}) equivalently as \begin{equation} \label{eq:lem:crat0:sfdv0:sprc0:02:05} (n_2 - n_1) \frac{K_2}{\gcd(K_2, K_4)} = i \frac{K_4}{\gcd(K_2, K_4)}, \; \exists i \in \vworkintsetpos \end{equation} where of course by definition \begin{equation} \label{eq:lem:crat0:sfdv0:sprc0:02:06} \gcd \left( { \frac{K_2}{\gcd(K_2, K_4)}, \frac{K_4}{\gcd(K_2, K_4)} } \right) = 1. \end{equation} In order to satisfy (\ref{eq:lem:crat0:sfdv0:sprc0:02:05}), $n_2 - n_1$ must contain all of the prime factors of $K_4/\gcd(K_2,K_4)$ in at least the same multiplicities, and it follows that the set of values of $n_2-n_1$ that satisfies (\ref{eq:lem:crat0:sfdv0:sprc0:02:03}) is precisely the set of multiples of $K_4/\gcd(K_2,K_4)$: \begin{equation} \label{eq:lem:crat0:sfdv0:sprc0:02:07} n_2 - n_1 = j \frac{K_4}{\gcd(K_2, K_4)}, \; \exists j \in \vworkintsetpos . \end{equation} Examining (\ref{eq:lem:crat0:sfdv0:sprc0:02:02}), it can also be seen that \begin{equation} \label{eq:lem:crat0:sfdv0:sprc0:02:08} \gcd(K_2, K_4) \vworkdivides (n K_2 \bmod K_4), \end{equation} and so \begin{eqnarray} \label{eq:lem:crat0:sfdv0:sprc0:02:09} & n K_2 \bmod K_4 \in & \\ \nonumber & \{ 0, \gcd(K_2, K_4), 2 \gcd(K_2, K_4), \ldots , K_4 - \gcd(K_2, K_4) \} , & \end{eqnarray} a set which contains exactly $K_4/\gcd(K_2, K_4)$ elements. Thus we've established by the pigeonhole principle that the sequence of the values of the variable \texttt{state} specified by (\ref{eq:lem:crat0:sfdv0:sprc0:02:02}) repeats perfectly with periodicity $K_4/\gcd(K_2, K_4)$, and we've established that in one period, every element of the set specified in (\ref{eq:lem:crat0:sfdv0:sprc0:02:09}) appears exactly once. (However, we have not specified the order in which the elements appear, but this is not important for this lemma. In general the elements appear out of the order shown in Eq. \ref{eq:lem:crat0:sfdv0:sprc0:02:09}.) To establish the frequency with which the test against $K_4$ is met, note that if $state_n + K_2 \geq K_4$, then \begin{eqnarray} \label{eq:lem:crat0:sfdv0:sprc0:02:10} & \displaystyle{state_n \in \left\{ \frac{K_4-K_2}{\gcd(K_2,K_4)} \gcd(K_2, K_4), \right.} & \\ \nonumber & \displaystyle{\left. \left(\frac{K_4-K_2}{\gcd(K_2,K_4)} + 1 \right) \gcd(K_2, K_4), \ldots , K_4 - \gcd(K_2, K_4)\right\} ,} & \end{eqnarray} which has a cardinality $K_2/K_4$ that of the set in (\ref{eq:lem:crat0:sfdv0:sprc0:02:09}). Since the \texttt{state} variable cycles through the set in (\ref{eq:lem:crat0:sfdv0:sprc0:02:09}) with perfect periodicity and since $K_2/K_4$ of the set elements lead to the \texttt{if()} statement test being met, (\ref{eq:lem:crat0:sfdv0:sprc0:02:01}) is also met as $N_I\rightarrow\infty$. Note that if $K_1 \neq 0$, it simply changes the startup behavior of the rational counting. So long as $K_2 < K_4$, Algorithm \ref{alg:crat0:sfdv0:01a} will reach a steady state where (\ref{eq:lem:crat0:sfdv0:sprc0:02:01}) holds. Note that if $K_3 \neq K_4$, it simply ``shifts'' the sets specified in (\ref{eq:lem:crat0:sfdv0:sprc0:02:09}) and (\ref{eq:lem:crat0:sfdv0:sprc0:02:10}), but (\ref{eq:lem:crat0:sfdv0:sprc0:02:01}) still holds. The lemma has thus been proved for every case. (We have neglected to give the formal proof as required by the definition of a limit that for any arbitrarily small error $\epsilon$, a finite $N_I$ can be found so that the error is at or below $\epsilon$; however the skeptical reader is encouraged to complete Exercise \ref{exe:crat0:sexe0:02}.) \end{vworklemmaproof} \begin{vworklemmaparsection}{Remarks} It is possible to view the long-term accuracy of Algorithm \ref{alg:crat0:sfdv0:01a} in terms of a limit, as is done in (\ref{eq:lem:crat0:sfdv0:sprc0:02:01}). However, it is also possible to observe that $K_1$ and $K_3$ set a delay until the counting algorithm reaches steady state. With $K_3=K_4$, the attainment of steady state is characterized by the \texttt{state} variable being assigned for the first time to one of the values in (\ref{eq:lem:crat0:sfdv0:sprc0:02:09}). Once in steady state, the algorithm cycles with perfect periodic behavior through all of the $K_4/\gcd(K_2,K_4)$ elements in (\ref{eq:lem:crat0:sfdv0:sprc0:02:09}), but not necessarily in the order shown in the equation. During this period of length $K_4/\gcd(K_2,K_4)$, exactly $K_2/\gcd(K_2,K_4)$ invocations of the base subroutine result in subroutine ``\texttt{A()}'' being run, and exactly $(K_4-K_2)/\gcd(K_2,K_4)$ do not. Thus, after reaching steady-state the algorithm has \emph{perfect} accuracy if one considers periods of length $K_4/\gcd(K_2,K_4)$. \end{vworklemmaparsection} %\vworklemmafooter{} \begin{vworklemmastatement} \label{lem:crat0:sfdv0:sprc0:04} If $K_3=K_4$, $K_1=0$, and $\gcd(K_2, K_4)=1$\footnote{\label{footnote:lem:crat0:sfdv0:sprc0:04:01}If $\gcd(K_2, K_4) > 1$, then by Theorem \cprizeroxrefhyphen\ref{thm:cpri0:ppn0:00a} the largest value that $n K_2 \bmod K_4$ can attain is $K_4-\gcd(K_2, K_4)$ and the interval in (\ref{eq:lem:crat0:sfdv0:sprc0:04:01}) is correspondingly smaller. (\ref{eq:lem:crat0:sfdv0:sprc0:04:01}) is technically correct but not as conservative as possible. This is a minor point and we do not dwell on it.}, the error between the approximation to $N_O$ implemented by Algorithm \ref{alg:crat0:sfdv0:01a} and the ``ideal'' mapping is always in the set \begin{equation} \label{eq:lem:crat0:sfdv0:sprc0:04:01} \left[ - \frac{K_4 - 1}{K_4} , 0 \right] , \end{equation} and no algorithm can be constructed to confine the error to a smaller interval. \end{vworklemmastatement} \begin{vworklemmaproof} With $K_1=0$ and $K_3 = K_4$, it can be verified analytically that the total number of times the function ``\texttt{A()}'' has been invoked up to and including the $n$th invocation of the base subroutine is \begin{equation} \label{eq:lem:crat0:sfdv0:sprc0:04:02} N_O = \left\lfloor \frac{n K_2}{K_4} \right\rfloor . \end{equation} On the other hand, the ``ideal'' number of invocations, which we denote $\overline{N_O}$, is given by \begin{equation} \label{eq:lem:crat0:sfdv0:sprc0:04:03} \overline{N_O} = \frac{n K_2}{K_4} . \end{equation} Quantization of the rational number in (\ref{eq:lem:crat0:sfdv0:sprc0:04:02}) can introduce an error of up to $-(K_4-1)/K_4$, therefore \begin{equation} \label{eq:lem:crat0:sfdv0:sprc0:04:04} N_O - \overline{N_O} = \left\lfloor \frac{n K_2}{K_4} \right\rfloor - \frac{n K_2}{K_4} \in \left[ - \frac{K_4 - 1}{K_4} , 0 \right] . \end{equation} This proves the error bound for Algorithm \ref{alg:crat0:sfdv0:01a}. The proof that there can be no better algorithm is the topic of Exercise \ref{exe:crat0:sexe0:06}. \end{vworklemmaproof} \begin{vworklemmaparsection}{Remarks} Algorithm \ref{alg:crat0:sfdv0:01a} is \emph{optimal} in the sense that no algorithm can achieve a tighter error bound than (\ref{eq:lem:crat0:sfdv0:sprc0:04:01}). As demonstrated in Exercises \ref{exe:crat0:sexe0:04} and \ref{exe:crat0:sexe0:05}, $K_1 \neq 0$ can be chosen to shift the interval in (\ref{eq:lem:crat0:sfdv0:sprc0:04:01}), but the span of the interval cannot be reduced. \end{vworklemmaparsection} \vworklemmafooter{} Lemmas \ref{lem:crat0:sfdv0:sprc0:02} and \ref{lem:crat0:sfdv0:sprc0:04} have demonstrated that the ratio of counts $N_O/N_I$ will asymptotically approach $K_2/K_4$ (i.e. the long-term accuracy of Algorithm \ref{alg:crat0:sfdv0:01a} is \emph{perfect}). However, for many applications it is also desirable to have a lack of ``bursty'' behavior. We demonstrate the lack of bursty behavior in the following lemma. \begin{vworklemmastatement} \label{lem:crat0:sfdv0:sprc0:03} For Algorithm \ref{alg:crat0:sfdv0:01a}, once steady state has been achieved, the number of consecutive base subroutine invocations during which subroutine ``\texttt{A()}'' is executed is always in the set \begin{equation} \label{eq:lem:crat0:sfdv0:sprc0:03:01} \left\{ \left\lfloor \frac{K_2}{K_4 - K_2} \right\rfloor , \left\lceil \frac{K_2}{K_4 - K_2} \right\rceil \right\} \cap \vworkintsetpos, \end{equation} which contains one integer if $K_2/K_4 \leq 1/2$ or $(K_4-K_2) \vworkdivides K_2$, or two integers otherwise. Once steady state has been achieved, the number of consecutive base function invocations during which subroutine ``\texttt{A()}'' is not executed is always in the set \begin{equation} \label{eq:lem:crat0:sfdv0:sprc0:03:02} \left\{ \left\lfloor \frac{K_4-K_2}{K_2} \right\rfloor , \left\lceil \frac{K_4-K_2}{K_2} \right\rceil \right\} \cap \vworkintsetpos, \end{equation} which contains one integer if $K_2/K_4 \geq 1/2$ or $K_2 \vworkdivides K_4$, or two integers otherwise. \end{vworklemmastatement} \begin{vworklemmaproof} As before in Lemma \ref{lem:crat0:sfdv0:sprc0:02} for convenience and without loss of generality, assume $K_3=K_4$ and $K_1=0$. Then after a transient period determined by $K_1$ and $K_3$, the \texttt{state} variable will be assigned one of the values in (\ref{eq:lem:crat0:sfdv0:sprc0:02:09}) and cycle through those values in an unestablished order but with perfect periodicity. To accomplish this proof, we must establish something about the order in which the \texttt{state} variable attains the values in the set in (\ref{eq:lem:crat0:sfdv0:sprc0:02:09}). We can partition the set in (\ref{eq:lem:crat0:sfdv0:sprc0:02:09}) into two sets; the set of values of \texttt{state} for which if the base subroutine is invoked with \texttt{state} in this set, subroutine ``\texttt{A()}'' will not be invoked (we call this set $\phi_1$), and the set of values of \texttt{state} for which if the base subroutine is invoked with \texttt{state} in this set, subroutine ``\texttt{A()}'' will be invoked (we call this set $\phi_2$). $\phi_1$ and $\phi_2$ are identified below. \begin{eqnarray} \label{eq:lem:crat0:sfdv0:sprc0:03:03} & \phi_1 = & \\ \nonumber & \displaystyle{\left\{ 0, \gcd(K_2, K_4), 2 \gcd(K_2, K_4), \ldots , \left(\frac{K_4-K_2}{\gcd(K_2,K_4)} - 1 \right) \gcd(K_2, K_4) \right\}} & \end{eqnarray} \begin{eqnarray} \label{eq:lem:crat0:sfdv0:sprc0:03:04} & \displaystyle{ \phi_2 = \left\{\left(\frac{K_4-K_2}{\gcd(K_2,K_4)}\right) \gcd(K_2, K_4),\right.} & \\ \nonumber & \displaystyle{\left. \left(\frac{K_4-K_2}{\gcd(K_2,K_4)} + 1 \right) \gcd(K_2, K_4) , \ldots , K_4 - \gcd(K_2, K_4) \right\}} & \end{eqnarray} We can also make the following four additional useful observations about $\phi_1$ and $\phi_2$. Note that (\ref{eq:lem:crat0:sfdv0:sprc0:03:07}) and (\ref{eq:lem:crat0:sfdv0:sprc0:03:08}) become equality if $\gcd(K_2, K_4) = 1$. \begin{equation} \label{eq:lem:crat0:sfdv0:sprc0:03:05} n(\phi_1) = \frac{K_4 - K_2}{\gcd(K_2, K_4)} \end{equation} \begin{equation} \label{eq:lem:crat0:sfdv0:sprc0:03:06} n(\phi_2) = \frac{K_2}{\gcd(K_2, K_4)} \end{equation} \begin{equation} \label{eq:lem:crat0:sfdv0:sprc0:03:07} \phi_1 \subseteq \{ 0, 1, \ldots , K_4 - K_2 - 1 \} \end{equation} \begin{equation} \label{eq:lem:crat0:sfdv0:sprc0:03:08} \phi_2 \subseteq \{K_4 - K_2, \ldots , K_4 - 1 \} \end{equation} We first prove (\ref{eq:lem:crat0:sfdv0:sprc0:03:01}). If $state_n \in \phi_2$ at the time the base function is invoked, then ``\texttt{A()}'' will be invoked. We also know that since $state_n \in \phi_2$, $state_n + K_2 \geq K_4$, so \begin{equation} \label{eq:lem:crat0:sfdv0:sprc0:03:09} state_{n+1} \;\; =|_{state_n \in \phi_2} \;\; state_n - (K_4 - K_2) . \end{equation} Thus so long as $state_n \in \phi_2$, $state_{n+1} < state_n$ as specified above in (\ref{eq:lem:crat0:sfdv0:sprc0:03:09}). With each invocation of the base subroutine, \texttt{state} will ``walk downward'' through $\phi_2$. It can also be observed that when \texttt{state} drops below the smallest element of $\phi_2$, the next value of \texttt{state} will be in $\phi_1$. Note also that although the downward walk specified in (\ref{eq:lem:crat0:sfdv0:sprc0:03:09}) walks downward in absolute steps of $K_4-K_2$, this corresponds to $(K_4-K_2) / \gcd(K_2, K_4)$ \emph{elements} of $\phi_2$, since the elements of $\phi_2$ are separated by $\gcd(K_2, K_4)$. Given the ``downward walk'' specified in (\ref{eq:lem:crat0:sfdv0:sprc0:03:09}), the only question to be answered is how many consecutive values of \texttt{state}, separated by $K_4-K_2$ (or $(K_4-K_2)/\gcd(K_2, K_4)$ elements), can ``fit'' into $\phi_2$. Considering that $n(\phi_2) = K_2/\gcd(K_2, K_4)$ (Eq. \ref{eq:lem:crat0:sfdv0:sprc0:03:06}) and that the downward step represents $(K_4-K_2)/\gcd(K_2, K_4)$ set elements, (\ref{eq:lem:crat0:sfdv0:sprc0:03:01}) comes immediately by a graphical argument. We now prove (\ref{eq:lem:crat0:sfdv0:sprc0:03:02}). This can be proved using exactly the same arguments as for (\ref{eq:lem:crat0:sfdv0:sprc0:03:01}), but considering the upward walk through $\phi_1$ rather than the downward walk through $\phi_2$. As with Lemma \ref{lem:crat0:sfdv0:sprc0:02}, note that the choices of $K_1$ and $K_3$ do not materially affect the proof above. $K_1$ and $K_3$ only set a delay until the rational counting algorithm reaches steady state. $K_3$ only shifts the sets $\phi_1$ and $\phi_2$. \end{vworklemmaproof} \begin{vworklemmaparsection}{Remark \#1} This lemma proves an \emph{extremely} important property for the usability of Algorithm \ref{alg:crat0:sfdv0:01a}. It says that once steady state has been reached, the variability in the number of consecutive times ``\texttt{A()}'' is run or not run is at most one count. \end{vworklemmaparsection} \begin{vworklemmaparsection}{Remark \#2} It is probably also possible to construct a rational counting algorithm so that the number of consecutive times ``\texttt{A()}'' is run is constant, but the algorithm achieves long-term accuracy by varying only the number of consecutive times ``\texttt{A()}'' is not run (or vice-versa), but this is not done here. \end{vworklemmaparsection} \begin{vworklemmaparsection}{Remark \#3} There is no requirement that $K_2$ and $K_4$ be coprime. In fact, as demonstrated later, it may be advantageous to choose a large $K_2$ and $K_4$ to approximate a simple ratio so that very fine adjustments can be made. For example, if the ideal ratio is 1/2, it may be desirable in some applications to choose $K_2$=1,000 and $K_4$=2,000 so that fine adjustments can be made by slightly perturbing $K_2$ or $K_4$. One might adjust 1,000/2,000 downward to 999/2,000 or upward to 1,001/2,000 by modifying $K_2$ (both very fine adjustments). \end{vworklemmaparsection} \begin{vworklemmaparsection}{Remark \#4} The most common choice of $K_1$ in practice is 0. If $K_1=0$ is chosen, it can be shown that the number of initial invocations of the base subroutine is in the set identified in (\ref{eq:lem:crat0:sfdv0:sprc0:03:01}). (See Exercise \ref{exe:crat0:sexe0:07}.) \end{vworklemmaparsection} \vworklemmafooter{} For microcontroller work, it is considered a desirable property that software components be resilient to state upset (see Section \chgrzeroxrefhyphen\ref{chgr0:sdda0:srob0}). It can be observed that Algorithm \ref{alg:crat0:sfdv0:01a} will exhibit very anomalous behavior if \texttt{state} is upset to a very negative value. One possible correction to this shortcoming is illustrated in Figure \ref{fig:crat0:sfdv0:sprc0:01}. Other possible corrections are the topic of Exercise \ref{exe:crat0:sexe0:08}. \begin{figure} \begin{verbatim} /* The constants K1 through K4, which parameterize the */ /* counting behavior, are assumed assigned elsewhere in */ /* the code. The solution is analyzed in terms of the */ /* parameters K1 through K4. */ /* */ /* We also place the following restrictions on K1 through */ /* K4: */ /* K1 : K1 <= K3 - K2. */ /* K2 : K4 > K2 > 0. */ /* K3 : No restrictions. */ /* K4 : K4 > K2 > 0. */ void base_rate_func(void) { static int state = K1; state += K2; if ((state < K1) || (state >= K3)) { state -= K4; A(); } } \end{verbatim} \caption{Algorithm \ref{alg:crat0:sfdv0:01a} With State Upset Shortcoming Corrected} \label{fig:crat0:sfdv0:sprc0:01} \end{figure} \begin{vworkexamplestatement} \label{ex:crat0:sfdv0:sprc0:01} Determine the behavior of Algorithm \ref{alg:crat0:sfdv0:01a} with $K_1=0$, $K_2=30$, and $K_3=K_4=50$. \end{vworkexamplestatement} \begin{vworkexampleparsection}{Solution} We first predict the behavior, and then trace the algorithm to verify whether the predictions are accurate. We make the following predictions: \begin{itemize} \item The steady state sequence of invocations of ``\texttt{A()}'' will be periodic with period $K_4/\gcd(K_2, K_4) = 50/10 = 5$, as described in Lemma \ref{lem:crat0:sfdv0:sprc0:02}. \item The number of initial invocations of the base subroutine in which ``\texttt{A()}'' is not run will be $\lceil (K_4 - K_2) / K_2 \rceil = \lceil 2/3 \rceil = 1$, as described in Remark \#4 of Lemma \ref{lem:crat0:sfdv0:sprc0:03} and in the solution to Exercise \ref{exe:crat0:sexe0:07}. \item In steady state, the number of consecutive invocations of the base subroutine during which ``\texttt{A()}'' is not executed will always be 1, as described in Equation \ref{eq:lem:crat0:sfdv0:sprc0:03:02} of Lemma \ref{lem:crat0:sfdv0:sprc0:03}. (Applying Eq. \ref{eq:lem:crat0:sfdv0:sprc0:03:02} yields \ $\{ \lfloor 20/30 \rfloor , \lceil 20/30 \rceil \} \cap \vworkintsetpos % = \{ 0,1 \} \cap \{1, 2, \ldots \} = \{ 1 \}$.) \item In steady state, the number of consecutive invocations of the base subroutine during which ``\texttt{A()}'' is executed will always be 1 or 2, as described in Equation \ref{eq:lem:crat0:sfdv0:sprc0:03:01} of Lemma \ref{lem:crat0:sfdv0:sprc0:03}. (Applying Eq. \ref{eq:lem:crat0:sfdv0:sprc0:03:01} yields \ $\{ \lfloor 30/20 \rfloor , \lceil 30/20 \rceil \} \cap \vworkintsetpos % = \{ 1,2 \} \cap \{1, 2, \ldots \} = \{ 1,2 \}$.) \item The rational counting algorithm will have perfect long-term accuracy. \end{itemize} We can verify the predictions above by tracing the behavior of Algorithm \ref{alg:crat0:sfdv0:01a}. We adopt the convention that $A_n = 1$ if subroutine ``\texttt{A()}'' is invoked during the $n$th invocation of the base subroutine. Table \ref{tbl:crat0:sfdv0:sprc0:01} contains the results of tracing Algorithm \ref{alg:crat0:sfdv0:01a} with $K_1=0$, $K_2=30$, and $K_3=K_4=50$. \begin{table} \caption{Trace Of Algorithm \ref{alg:crat0:sfdv0:01a} With $K_1=0$, $K_2=30$, And $K_3=K_4=50$ (Example \ref{ex:crat0:sfdv0:sprc0:01})} \label{tbl:crat0:sfdv0:sprc0:01} \begin{center} \begin{tabular}{|c|c|c|} \hline Index ($n$) & $state_n$ & $A_n$ \\ \hline \hline 0 & 0 & N/A \\ \hline 1 & 30 & 0 \\ \hline 2 & 10 & 1 \\ \hline 3 & 40 & 0 \\ \hline 4 & 20 & 1 \\ \hline 5 & 0 & 1 \\ \hline 6 & 30 & 0 \\ \hline 7 & 10 & 1 \\ \hline 8 & 40 & 0 \\ \hline 9 & 20 & 1 \\ \hline 10 & 0 & 1 \\ \hline \end{tabular} \end{center} \end{table} It can be verfied from the table that all of the predicted properties are exhibited by the algorithm. \end{vworkexampleparsection} \vworkexamplefooter{} A second characteristic of Algorithm \ref{alg:crat0:sfdv0:01a} that should be analyzed carefully is the behavior of the algorithm if parameters $K_2$ and $K_4$ are adjusted ``on the fly''. ``On-the-fly'' adjustment raises the following concerns. We assume for convenience that $K_1=0$ and $K_3=K_4$. \begin{enumerate} \item \label{enum:crat0:sfdv0:sprc0:01:01} \textbf{Critical section protocol:} if the rational counting algorithm is implemented in a process which is asynchronous to the process which desires to change $K_2$ and $K_4$, what precautions must be taken? \item \label{enum:crat0:sfdv0:sprc0:01:02} \textbf{Anomalous behavior:} will the rational counting algorithm behave in a \emph{very} unexpected way if $K_2$ and $K_4$ are changed on the fly? \item \label{enum:crat0:sfdv0:sprc0:01:03} \textbf{Preservation of accuracy:} even if the behavior exhibited is not \emph{extremely} anomalous, how should $K_2$ and $K_4$ be modified on the fly so as to preserve the maximum accuracy? \end{enumerate} \textbf{(Concern \#\ref{enum:crat0:sfdv0:sprc0:01:02}):} It can be observed with Algorithm \ref{alg:crat0:sfdv0:01a} that neither increasing nor decreasing $K_2$ nor $K_4$ on the fly will lead to \emph{highly} anomalous behavior. Each invocation of the algorithm will map \texttt{state} back into the set identified in (\ref{eq:lem:crat0:sfdv0:sprc0:02:09}). Thus on-the-fly changes to $K_2$ and $K_4$ will establish the rational counting algorithm immediately into steady-state behavior, and the result will not be \emph{highly} anomalous if such on-the-fly changes are not made very often. \textbf{(Concern \#\ref{enum:crat0:sfdv0:sprc0:01:03}):} It can be deduced from (\ref{eq:lem:crat0:sfdv0:sprc0:04:02}), (\ref{eq:lem:crat0:sfdv0:sprc0:04:03}), and (\ref{eq:lem:crat0:sfdv0:sprc0:04:04}) that the value of the \texttt{state} variable in Algorithm \ref{alg:crat0:sfdv0:01a} satisfies the relationship \begin{equation} \label{eq:crat0:sfdv0:sprc0:01} \overline{N_O} - N_O = \frac{state}{K_4} ; \end{equation} \noindent{}in other words, the \texttt{state} variable contains the remainder of an effective division by $K_4$ and thus maintains the fractional part of $\overline{N_O}$. Altering $K_4$ on the fly to a new value (say, $\overline{K_4}$) may be problematic, because to preserve the current fractional part of $\overline{N_O}$, one must adjust it for the new denominator $\overline{K_4}$. This requires solving the equation \begin{equation} \label{eq:crat0:sfdv0:sprc0:02} \frac{state}{K_4} = \frac{n}{\;\;\overline{K_4}\;\;} \end{equation} \noindent{}for $n$ which must be an integer to avoid loss of information. In general, this would require that $K_4 \vworkdivides \overline{K_4}$, a constraint which would be rarely met. Thus, for high-precision applications where a new rational counting rate should become effective seamlessly, the best strategy would seem to be to modify $K_2$ only. It can be verified that modifying $K_2$ on the fly accomplishes a perfect rate transition. \textbf{(Concern \#\ref{enum:crat0:sfdv0:sprc0:01:01}):} In microcontroller work, ordinal data types often represent machine-native data types. In such cases, it may be possible for one process to set $K_2$ or $K_4$ for another process that is asynchronous with respect to it by relying on the atomicity of machine instructions (i.e. without formal mutual exclusion protocol). However, in other cases where the ordinal data types of $K_2$ or $K_4$ are larger than can be accomodated by a single machine instruction or where $K_2$ and $K_4$ must be modified together atomically, mutual exclusion protocol should be used to prevent anomalous behavior due to race conditions (see Exercise \ref{exe:crat0:sexe0:14}). %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \subsection[Properties Of Algorithm \ref{alg:crat0:sfdv0:01b}] {Properties Of Rational Counting Algorithm \ref{alg:crat0:sfdv0:01b}} %Section tag: PRC1 \label{crat0:sfdv0:sprc1} Algorithm \ref{alg:crat0:sfdv0:01a} has the disadvantage that it requires $K_2/K_4 < 1$ (i.e. it can only decrease frequency, but never increase frequency). This deficiency can be corrected by using Algorithm \ref{alg:crat0:sfdv0:01b}. Note that Algorithm \ref{alg:crat0:sfdv0:01b} will properly deal with $K_2$ and $K_4$ chosen such that $0 < K_2/K_4 < \infty$. The most common reason that one may want a counting algorithm that will correctly handle $K_2/K_4 \geq 1$ is to conveniently handle $K_2/K_4 \approx 1$. In practice, $K_2/K_4$ may represent a quantity that is normally very close to 1 but may also be slightly less than or slightly greater than 1. For example, one may use $K_2/K_4 \approx 1$ to correct for a crystal or a resonator which deviates slightly from its nominal frequency. We illustrate this with the following example. \begin{vworkexamplestatement} \label{ex:crat0:sfdv0:sprc1:01} A microcontroller software load keeps time via an interrupt service routine that runs every 1ms, but this frequency may be off by as much as 1 part in 10,000 due to variations in crystal or resonator manufacture. The interrupt service routine updates a counter which represents the number of milliseconds elapsed since the software load was reset. Devise a rational counting strategy based on Algorithm \ref{alg:crat0:sfdv0:01b} which will allow the time accuracy to be trimmed to within one second per year or less by adjusting only $K_4$, and implement the counting strategy in software. \end{vworkexamplestatement} \begin{vworkexampleparsection}{Solution} $K_2/K_4$ will be nominally very close to 1 ($K_2 \approx K_4$). If we assume that each year has 365.2422\footnote{The period of the earth's rotation about the sun is not an integral number of days, which is why the rules for leap years exist. Ironically, the assignment of leap years is itself a problem very similar to the rational counting problems discussed in this chapter.} days ($\approx$ 31,556,926 seconds), then choosing $K_2 \approx K_4 = 31,556,926$ will yield satisfactory results. If we may need to compensate for up to 1 part in 10,000 of crystal or resonator inaccuracy, we may need to adjust $K_2$ as low as 0.9999 $\times$ 31,556,926 $\approx$ 31,553,770 (to compensate for a fast crystal or resonator) or as high as 1.0001 $\times$ 31,556,926 $\approx$ 31,560,082 (to compensate for a slow crystal or resonator). Choosing $K_4 = 31,556,926$ yields the convenient relationship that each count in $K_2$ corresponds to one second per year. \begin{figure} \begin{verbatim} /* The constants K1 through K4, which parameterize the */ /* counting behavior, are assumed assigned elsewhere in */ /* the code. */ /* */ /* The variable time_count below is the number of milli- */ /* seconds since the software was reset. */ int time_count = 0; /* It is assumed that the base rate subroutine below is */ /* called every millisecond (or, at least what should be */ /* every millisecond of the crystal or resonator were */ /* perfect). */ void base_rate_sub(void) { static int state = K1; state += K2; while (state >= K3) { state -= K4; time_count++; } } \end{verbatim} \caption{Algorithm \ref{alg:crat0:sfdv0:01b} Applied To Timekeeping (Example \ref{ex:crat0:sfdv0:sprc1:01})} \label{fig:ex:crat0:sfdv0:sprc1:01:01} \end{figure} Figure \ref{fig:ex:crat0:sfdv0:sprc1:01:01} provides an illustration of Algorithm \ref{alg:crat0:sfdv0:01b} applied in this scenario. We assume that $K_4$ contains the constant value 31,556,926 and that $K_2$ is modified about this value either downwards or upwards to trim the timekeeping. Note that Algorithm \ref{alg:crat0:sfdv0:01b} will correctly handle $K_2 \geq K_4$. Also note in the implementation illustrated in Figure \ref{fig:ex:crat0:sfdv0:sprc1:01:01} that large integers (27 bits or more) are required. (See also Exercise \ref{exe:crat0:sexe0:09}). \end{vworkexampleparsection} \vworkexamplefooter{} It may not be obvious whether Algorithm \ref{alg:crat0:sfdv0:01b} has the same or similar desirable properties as Algorithm \ref{alg:crat0:sfdv0:01a} presented in Lemmas \ref{lem:crat0:sfdv0:sprc0:01}, \ref{lem:crat0:sfdv0:sprc0:02}, \ref{lem:crat0:sfdv0:sprc0:04}, and \ref{lem:crat0:sfdv0:sprc0:03}. Algorithm \ref{alg:crat0:sfdv0:01b} does have these desirable properties, and these properties are presented as Lemmas \ref{lem:crat0:sfdv0:sprc1:01}, \ref{lem:crat0:sfdv0:sprc1:02}, \ref{lem:crat0:sfdv0:sprc1:03}, and \ref{lem:crat0:sfdv0:sprc1:04}. The proofs of these lemmas are identical or very similar to the proofs of Lemmas \ref{lem:crat0:sfdv0:sprc0:01}, \ref{lem:crat0:sfdv0:sprc0:02}, \ref{lem:crat0:sfdv0:sprc0:04}, and \ref{lem:crat0:sfdv0:sprc0:03}; and so these proofs when not identical are presented as exercises. Note that Algorithm \ref{alg:crat0:sfdv0:01b} behaves identically to Algorithm \ref{alg:crat0:sfdv0:01a} when $K_2 < K_4$, and the case of $K_2=K_4$ is trivial, so in general only the behavior when $K_2 > K_4$ remains to be proved. \begin{vworklemmastatement} \label{lem:crat0:sfdv0:sprc1:01} $N_{STARTUP}$, the number of invocations of the base subroutine in Algorithm \ref{alg:crat0:sfdv0:01b} before ``\texttt{A()}'' is called for the first time, is given by \begin{equation} \label{eq:lem:crat0:sfdv0:sprc1:01:01} N_{STARTUP} = \left\lceil { \frac{-K_1 - K_2 + K_3}{K_2} } \right\rceil . \end{equation} \end{vworklemmastatement} \begin{vworklemmaproof} The proof is identical to the proof of Lemma \ref{lem:crat0:sfdv0:sprc0:01}. \end{vworklemmaproof} \begin{vworklemmastatement} \label{lem:crat0:sfdv0:sprc1:02} Let $N_I$ be the number of times the Algorithm \ref{alg:crat0:sfdv0:01b} base subroutine is called, let $N_O$ be the number of times the ``\texttt{A()}'' subroutine is called, let $f_I$ be the frequency of invocation of the Algorithm \ref{alg:crat0:sfdv0:01a} base subroutine, and let $f_O$ be the frequency of invocation of ``\texttt{A()}''. \begin{equation} \label{eq:lem:crat0:sfdv0:sprc1:02:01} \lim_{N_I\rightarrow\infty}\frac{N_O}{N_I} = \frac{f_O}{f_I} = \frac{K_2}{K_4} . \end{equation} \end{vworklemmastatement} \begin{vworklemmaproof} See Exercise \ref{exe:crat0:sexe0:10}. \end{vworklemmaproof} \begin{vworklemmastatement} \label{lem:crat0:sfdv0:sprc1:03} If $K_3=K_4$, $K_1=0$, and $\gcd(K_2, K_4)=1$\footnote{See also footnote \ref{footnote:lem:crat0:sfdv0:sprc0:04:01} in this chapter.}, the error between the approximation to $N_O$ implemented by Algorithm \ref{alg:crat0:sfdv0:01b} and the ``ideal'' mapping is always in the set \begin{equation} \label{eq:lem:crat0:sfdv0:sprc1:03:01} \left[ - \frac{K_4 - 1}{K_4} , 0 \right] , \end{equation} and no algorithm can be constructed to confine the error to a smaller interval. \end{vworklemmastatement} \begin{vworklemmaproof} The proof is identical to the proof of Lemma \ref{lem:crat0:sfdv0:sprc0:04}. \end{vworklemmaproof} \begin{vworklemmastatement} \label{lem:crat0:sfdv0:sprc1:04} For Algorithm \ref{alg:crat0:sfdv0:01b} with $K_2 \geq K_4$, once steady state has been achieved (see Exercise \ref{exe:crat0:sexe0:13}), each invocation of the base subroutine will result in a number of invocations of ``\texttt{A()}'' which is in the set \begin{equation} \label{eq:lem:crat0:sfdv0:sprc1:04:01} \left\{ \left\lfloor \frac{K_2}{K_4} \right\rfloor , \left\lceil \frac{K_2}{K_4} \right\rceil \right\}, \end{equation} which contains one integer if $K_4 \vworkdivides K_2$, or two integers otherwise. With $K_2 < K_4$, the behavior will be as specified in Lemma \ref{lem:crat0:sfdv0:sprc0:03}. \end{vworklemmastatement} \begin{vworklemmaproof} See Exercise \ref{exe:crat0:sexe0:12}. \end{vworklemmaproof} \begin{vworklemmaparsection}{Remark} Note that Lemma \ref{lem:crat0:sfdv0:sprc0:03} and this lemma specify different aspects of behavior, which is why (\ref{eq:lem:crat0:sfdv0:sprc0:03:01}) and (\ref{eq:lem:crat0:sfdv0:sprc0:03:02}) take different forms than (\ref{eq:lem:crat0:sfdv0:sprc1:04:01}). Lemma \ref{lem:crat0:sfdv0:sprc0:03} specifies the number of consecutive invocations of the base subroutine for which ``\texttt{A()}'' will be run, but with $K_2 \geq K_4$ it does not make sense to specify behavior in this way since ``\texttt{A()}'' will be run on \emph{every} invocation of the base subroutine. This lemma specifies the number of times ``\texttt{A()}'' will be run on a \emph{single} invocation of the base subroutine (which is not meaningful if $K_2 < K_4$ since the result will always be 0 or 1). \end{vworklemmaparsection} %\vworklemmafooter{} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \subsection[Properties Of Algorithm \ref{alg:crat0:sfdv0:01c}] {Properties Of Rational Counting Algorithm \ref{alg:crat0:sfdv0:01c}} %Section tag: PRX0 \label{crat0:sfdv0:sprx0} Algorithm \ref{alg:crat0:sfdv0:01c}\footnote{Algorithm \ref{alg:crat0:sfdv0:01c} was contributed in March, 2003 by Chuck B. Falconer \cite{bibref:i:chuckbfalconer} via the \texttt{comp.arch.embedded} \cite{bibref:n:comparchembedded} newsgroup.} is a variant of Algorithm \ref{alg:crat0:sfdv0:01a} which has one fewer degrees of freedom than Algorithms \ref{alg:crat0:sfdv0:01a} and \ref{alg:crat0:sfdv0:01b} and can be implemented more efficiently under most instruction sets. Algorithm \ref{alg:crat0:sfdv0:01c} is superior to Algorithms \ref{alg:crat0:sfdv0:01a} and \ref{alg:crat0:sfdv0:01b} from a computational efficiency point of view, but is less intuitive. The superiority in computational efficiency of Algorithm \ref{alg:crat0:sfdv0:01c} comes from the possibility of using an implicit test against zero (rather than an explicit test against $K_3$, as is found in Algorithms \ref{alg:crat0:sfdv0:01a} and \ref{alg:crat0:sfdv0:01b}). Many machine instruction sets automatically set flags to indicate a negative result when the subtraction of $K_2$ is performed, thus often allowing a conditional branch without an additional instruction. Whether an instruction will be saved in the code of Figure \ref{fig:crat0:sfdv0:01c} depends on the sophistication of the `C' compiler, but of course if the algorithm were coded in assembly-language an instruction could be saved on most processors. The properties of rational counting Algorithm \ref{alg:crat0:sfdv0:01c} are nearly identical to those of Algorithm \ref{alg:crat0:sfdv0:01a}, and we prove the important properties now. \begin{vworklemmastatement} \label{lem:crat0:sfdv0:sprx0:01} $N_{STARTUP}$, the number of invocations of the base subroutine in Algorithm \ref{alg:crat0:sfdv0:01c} before ``\texttt{A()}'' is called for the first time, is given by \begin{equation} \label{eq:lem:crat0:sfdv0:sprx0:01:01} N_{STARTUP} = \left\lfloor { \frac{K_1}{K_2} } \right\rfloor . \end{equation} \end{vworklemmastatement} \begin{vworklemmaproof} The value of \texttt{state} when tested against zero in the \texttt{if()} statement during the $n$th invocation of the base subroutine is $K_1 - n K_2$. In order for the test not to be met on the $n$th invocation of the base subroutine, we require that \begin{equation} \label{eq:lem:crat0:sfdv0:sprx0:01:02} K_1 - n K_2 \geq 0 \end{equation} \noindent{}or equivalently that \begin{equation} \label{eq:lem:crat0:sfdv0:sprx0:01:03} n \leq \frac{K_1}{K_2} . \end{equation} Solving (\ref{eq:lem:crat0:sfdv0:sprx0:01:03}) for the largest value of $n \in \vworkintset$ which still meets the criterion yields (\ref{eq:lem:crat0:sfdv0:sprx0:01:01}). Note that the derivation of (\ref{eq:lem:crat0:sfdv0:sprx0:01:01}) requires that the restrictions on $K_1$, $K_2$, and $K_3$ documented in Figure \ref{fig:crat0:sfdv0:01c} be met. \end{vworklemmaproof} \begin{vworklemmastatement} \label{lem:crat0:sfdv0:sprx0:02} Let $N_I$ be the number of times the Algorithm \ref{alg:crat0:sfdv0:01c} base subroutine is called, let $N_O$ be the number of times the ``\texttt{A()}'' subroutine is called, let $f_I$ be the frequency of invocation of the Algorithm \ref{alg:crat0:sfdv0:01a} base subroutine, and let $f_O$ be the frequency of invocation of ``\texttt{A()}''. Provided the constraints on $K_1$, $K_2$, and $K_3$ documented in Figure \ref{fig:crat0:sfdv0:01c} are met, \begin{equation} \label{eq:lem:crat0:sfdv0:sprx0:02:01} \lim_{N_I\rightarrow\infty}\frac{N_O}{N_I} = \frac{f_O}{f_I} = \frac{K_2}{K_4} . \end{equation} \end{vworklemmastatement} \begin{vworklemmaproof} (\ref{eq:lem:crat0:sfdv0:sprx0:02:01}) indicates that once an initial delay (determined by $K_1$) has finished, $N_O/N_I$ will converge on a steady-state value of $K_2/K_4$. The most straightforward way to analyze Algorithm \ref{alg:crat0:sfdv0:01c} is to show how an algorithm already understood (Algorithm \ref{alg:crat0:sfdv0:01a}) can be transformed to Algorithm \ref{alg:crat0:sfdv0:01c} in a way where the analysis of Algorithm \ref{alg:crat0:sfdv0:01a} also applies to Algorithm \ref{alg:crat0:sfdv0:01c}. Figure \ref{fig:lem:crat0:sfdv0:sprx0:02:01} shows how such a transformation can be performed in four steps. \begin{figure} (a) Algorithm \ref{alg:crat0:sfdv0:01a} unchanged. $state_{a,n} \in \{0, 1, \ldots, K_4 - 1 \}$. \begin{verbatim} state += K2; if (state >= K4) { state -= K4; A(); } \end{verbatim} (b) ``\texttt{>=}'' changed to ``\texttt{>}''. $state_{b,n} \in \{1, 2, \ldots, K_4 \}$, $state_{b,n} = state_{a,n} + 1$. \begin{verbatim} state += K2; if (state > K4) { state -= K4; A(); } \end{verbatim} (c) Test against $K_4$ changed to test against zero. $state_{c,n} \in \{-K_4 + 1, -K_4 + 2, \ldots, 0 \}$, $state_{c,n} = state_{b,n} - K_4$. \begin{verbatim} state += K2; if (state > 0) { state -= K4; A(); } \end{verbatim} (d) Sign inversion. $state_{d,n} \in \{ 0, 1, \ldots, K_4 - 1 \}$, $state_{d,n} = - state_{c,n}$. \begin{verbatim} state -= K2; if (state < 0) { state += K4; A(); } \end{verbatim} (e) `C' expression rearrangement. $state_{e,n} \in \{ 0, 1, \ldots, K_4 - 1 \}$, $state_{e,n} = state_{d,n}$. \begin{verbatim} if ((state -= K2) < 0) { state += K4; A(); } \end{verbatim} \caption{4-Step Transformation Of Algorithm \ref{alg:crat0:sfdv0:01a} To Algorithm \ref{alg:crat0:sfdv0:01c}} \label{fig:lem:crat0:sfdv0:sprx0:02:01} \end{figure} In Figure \ref{fig:lem:crat0:sfdv0:sprx0:02:01}, each of the four steps required to transform from Algorithm \ref{alg:crat0:sfdv0:01a} to Algorithm \ref{alg:crat0:sfdv0:01c} includes an equation to transform the \texttt{state} variable. Combining all of these transformations yields \begin{eqnarray} \label{eq:lem:crat0:sfdv0:sprx0:02:02} state_{e,n} & = & K_4 - 1 - state_{a,n} \\ \label{eq:lem:crat0:sfdv0:sprx0:02:03} state_{a,n} & = & K_4 - 1 - state_{e,n} \end{eqnarray} We thus see that Figure \ref{fig:lem:crat0:sfdv0:sprx0:02:01}(a) (corresponding to Algorithm \ref{alg:crat0:sfdv0:01a}) and Figure \ref{fig:lem:crat0:sfdv0:sprx0:02:01}(e) (corresponding to Algorithm \ref{alg:crat0:sfdv0:01c}) have \texttt{state} semantics which involve the same range but a reversed order. (\ref{eq:lem:crat0:sfdv0:sprx0:02:01}) follows directly from this observation and from Lemma \ref{lem:crat0:sfdv0:sprc0:02}. \end{vworklemmaproof} %\vworklemmafooter{} \begin{vworklemmastatement} \label{lem:crat0:sfdv0:sprx0:03} If $K_1=0$ and $\gcd(K_2, K_4)=1$\footnote{See also footnote \ref{footnote:lem:crat0:sfdv0:sprc0:04:01} in this chapter.}, the error between the approximation to $N_O$ implemented by Algorithm \ref{alg:crat0:sfdv0:01c} and the ``ideal'' mapping is always in the set \begin{equation} \label{eq:lem:crat0:sfdv0:sprx0:03:01} \left[ 0, \frac{K_4 - 1}{K_4} \right] , \end{equation} and no algorithm can be constructed to confine the error to a smaller interval. \end{vworklemmastatement} \begin{vworklemmaproof} Using the duality illustrated by (\ref{eq:lem:crat0:sfdv0:sprx0:02:02}) and (\ref{eq:lem:crat0:sfdv0:sprx0:02:03}), starting Algorithm \ref{alg:crat0:sfdv0:01c} with $state_0=0$ will yield a dual state vector with respect to starting Algorithm \ref{alg:crat0:sfdv0:01a} with $state_0=K_4-1$. Thus, \begin{equation} \label{eq:lem:crat0:sfdv0:sprx0:03:02} N_O = \left\lfloor \frac{n K_2 + K_4 - 1}{K_4} \right\rfloor . \end{equation} Using this altered value of $N_O$ in (\ref{eq:lem:crat0:sfdv0:sprc0:04:04}) leads directly to (\ref{eq:lem:crat0:sfdv0:sprx0:03:01}). The proof that there can be no better algorithm is identical to the same proof for Lemma \ref{lem:crat0:sfdv0:sprc0:04} (Exercise \ref{exe:crat0:sexe0:06}). \end{vworklemmaproof} %\vworklemmafooter{} \begin{vworklemmastatement} \label{lem:crat0:sfdv0:sprx0:04} For Algorithm \ref{alg:crat0:sfdv0:01c}, once steady state has been achieved, the number of consecutive base subroutine invocations during which subroutine ``\texttt{A()}'' is executed is always in the set \begin{equation} \label{eq:lem:crat0:sfdv0:sprx0:04:01} \left\{ \left\lfloor \frac{K_2}{K_4 - K_2} \right\rfloor , \left\lceil \frac{K_2}{K_4 - K_2} \right\rceil \right\} \cap \vworkintsetpos, \end{equation} which contains one integer if $K_2/K_4 \leq 1/2$ or $(K_4-K_2) \vworkdivides K_2$, or two integers otherwise. Once steady state has been achieved, the number of consecutive base function invocations during which subroutine ``\texttt{A()}'' is not executed is always in the set \begin{equation} \label{eq:lem:crat0:sfdv0:sprx0:04:02} \left\{ \left\lfloor \frac{K_4-K_2}{K_2} \right\rfloor , \left\lceil \frac{K_4-K_2}{K_2} \right\rceil \right\} \cap \vworkintsetpos, \end{equation} which contains one integer if $K_2/K_4 \geq 1/2$ or $K_2 \vworkdivides K_4$, or two integers otherwise. \end{vworklemmastatement} \begin{vworklemmaproof} The proof comes directly from the duality between algorithm Algorithms \ref{alg:crat0:sfdv0:01a} and \ref{alg:crat0:sfdv0:01c} established in the proof of Lemma \ref{lem:crat0:sfdv0:sprx0:01}, so that the results from Lemma \ref{lem:crat0:sfdv0:sprc0:03} apply without modification. \end{vworklemmaproof} \vworklemmafooter{} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \subsection[Properties Of Algorithm \ref{alg:crat0:sfdv0:01d}] {Properties Of Rational Counting Algorithm \ref{alg:crat0:sfdv0:01d}} %Section tag: PRX1 \label{crat0:sfdv0:sprx1} Algorithm \ref{alg:crat0:sfdv0:01d}\footnote{Algorithm \ref{alg:crat0:sfdv0:01d} was contributed in March, 2003 by John Larkin \cite{bibref:i:johnlarkin} via the \texttt{comp.arch.embedded} \cite{bibref:n:comparchembedded} newsgroup.} (Figure \ref{fig:crat0:sfdv0:01d}) is a further economization of Algorithms \ref{alg:crat0:sfdv0:01a} through \ref{alg:crat0:sfdv0:01c} that can be made by eliminating the addition or subtraction of $K_4$ and test against $K_3$ and instead using the inherent machine integer size of $W$ bits to perform arithmetic modulo $2^W$. Thus, effectively, Algorithm \ref{alg:crat0:sfdv0:01d} is equivalent to Algorithm \ref{alg:crat0:sfdv0:01a} with $K_4 = K_3 = 2^W$. Figure \ref{fig:crat0:sfdv0:01d} shows both assembly-language (Texas Instruments TMS-370C8) and `C' implementations of the algorithm. The assembly-language version uses the carry flag of the processor and thus is \emph{very} efficient. Because `C' does not have access to the processor flags, the 'C' version is less efficient. The ``less than'' comparison when using unsigned integers is equivalent to a rollover test. It is easy to see from the figure that Algorithm \ref{alg:crat0:sfdv0:01d} is equivalent in all respects to Algorithm \ref{alg:crat0:sfdv0:01a} with $K_3 = K_4$ fixed at $2^W$. It is not necessary to enforce any constraints on $K_2$ because $K_2 < K_3 = K_4 = 2^W$ due to the inherent size of a machine integer. Note that unlike Algorithms \ref{alg:crat0:sfdv0:01a} through \ref{alg:crat0:sfdv0:01c} which allow $K_2$ and $K_4$ to be chosen independently and from the Farey series of appropriate order, Algorithm \ref{alg:crat0:sfdv0:01c} only allows $K_2/K_4$ of the form $K_2/2^W$. The properties below follow immediately from the properties of Algorithm \ref{alg:crat0:sfdv0:01a}. \begin{vworklemmastatement} \label{lem:crat0:sfdv0:sprx1:01} $N_{STARTUP}$, the number of invocations of the base subroutine in Algorithm \ref{alg:crat0:sfdv0:01d} before ``\texttt{A()}'' is called for the first time, is given by \begin{equation} \label{eq:lem:crat0:sfdv0:sprx1:01:01} N_{STARTUP} = \left\lfloor { \frac{2^W - K_1 - 1}{K_2} } \right\rfloor . \end{equation} \end{vworklemmastatement} \begin{vworklemmaproof} The value of \texttt{state} after the $n$th invocation is $state_n = K_1 + n K_2$. In order for the test in the \texttt{if()} statement not to be met, we require that \begin{equation} \label{eq:lem:crat0:sfdv0:sprx1:01:02} K_1 + n K_2 \leq 2^W - 1 \end{equation} \noindent{}or equivalently that \begin{equation} \label{eq:lem:crat0:sfdv0:sprx1:01:03} n \leq \frac{2^W - K_1 - 1}{K_2} . \end{equation} Solving (\ref{eq:lem:crat0:sfdv0:sprx1:01:03}) for the largest value of $n \in \vworkintset$ which still meets the criterion yields (\ref{eq:lem:crat0:sfdv0:sprx1:01:01}). \end{vworklemmaproof} \begin{vworklemmastatement} \label{lem:crat0:sfdv0:sprx1:02} Let $N_I$ be the number of times the Algorithm \ref{alg:crat0:sfdv0:01d} base subroutine is called, let $N_O$ be the number of times the ``\texttt{A()}'' subroutine is called, let $f_I$ be the frequency of invocation of the Algorithm \ref{alg:crat0:sfdv0:01d} base subroutine, and let $f_O$ be the frequency of invocation of ``\texttt{A()}''. Then \begin{equation} \label{eq:lem:crat0:sfdv0:sprx1:02:01} \lim_{N_I\rightarrow\infty}\frac{N_O}{N_I} = \frac{f_O}{f_I} = \frac{K_2}{2^W} , \end{equation} where $W$ is the number of bits in a machine unsigned integer. Note that $K_2 < 2^W$ since $K_2 \in \{ 0, 1, \ldots , 2^W-1 \}$. \end{vworklemmastatement} \begin{vworklemmaproof} The proof is identical to the proof of Lemma \ref{lem:crat0:sfdv0:sprc0:02} with $K_3=K_4=2^W$. Note that Algorithm \ref{alg:crat0:sfdv0:01a} calculates $n K_2 \bmod K_4$ by subtraction, whereas Algorithm \ref{alg:crat0:sfdv0:01d} calculates $n K_2 \bmod 2^W$ by the properties of a $W$-bit counter which is allowed to roll over. \end{vworklemmaproof} %\vworklemmafooter{} \begin{vworklemmastatement} \label{lem:crat0:sfdv0:sprx1:03} If $\gcd(K_2, 2^W)=1$\footnote{See also footnote \ref{footnote:lem:crat0:sfdv0:sprc0:04:01} in this chapter. Note also that in this context the condition $\gcd(K_2, 2^W)=1$ is equivalent to the condition that $K_2$ be odd.}, the error between the approximation to $N_O$ implemented by Algorithm \ref{alg:crat0:sfdv0:01d} and the ``ideal'' mapping is always in the set \begin{equation} \label{eq:lem:crat0:sfdv0:sprx1:03:01} \left[ - \frac{2^W - 1}{2^W} , 0 \right] , \end{equation} and no algorithm can be constructed to confine the error to a smaller interval. \end{vworklemmastatement} \begin{vworklemmaproof} The proof is identical to the proof of Lemma \ref{lem:crat0:sfdv0:sprc0:04} with $K_4 = 2^W$. \end{vworklemmaproof} %\vworklemmafooter{} \begin{vworklemmastatement} \label{lem:crat0:sfdv0:sprx1:04} For Algorithm \ref{alg:crat0:sfdv0:01d} (Figure \ref{fig:crat0:sfdv0:01d}), once steady state has been achieved, the number of consecutive base subroutine invocations during which subroutine ``\texttt{A()}'' is executed is always in the set \begin{equation} \label{eq:lem:crat0:sfdv0:sprx1:04:01} \left\{ \left\lfloor \frac{K_2}{2^W - K_2} \right\rfloor , \left\lceil \frac{K_2}{2^W - K_2} \right\rceil \right\} \cap \vworkintsetpos, \end{equation} which contains one integer if $K_2/2^W \leq 1/2$ or $(2^W-K_2) \vworkdivides K_2$, or two integers otherwise. Once steady state has been achieved, the number of consecutive base function invocations during which subroutine ``\texttt{A()}'' is not executed is always in the set \begin{equation} \label{eq:lem:crat0:sfdv0:sprx1:04:02} \left\{ \left\lfloor \frac{2^W-K_2}{K_2} \right\rfloor , \left\lceil \frac{2^W-K_2}{K_2} \right\rceil \right\} \cap \vworkintsetpos, \end{equation} which contains one integer if $K_2/2^W \geq 1/2$ or $K_2 \vworkdivides 2^W$, or two integers otherwise. \end{vworklemmastatement} \begin{vworklemmaproof} The proof is identical to the proof of Lemma \ref{lem:crat0:sfdv0:sprc0:03} with $K_4 = 2^W$. \end{vworklemmaproof} \vworklemmafooter{} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \subsection[Properties Of Algorithm \ref{alg:crat0:sfdv0:02a}] {Properties Of Rational Counting Algorithm \ref{alg:crat0:sfdv0:02a}} %Section tag: PRC2 \label{crat0:sfdv0:sprc2} Another useful rational counting algorithm is Algorithm \ref{alg:crat0:sfdv0:02a}. At first glance, it may appear that Algorithm \ref{alg:crat0:sfdv0:02a} is qualitatively different than Algorithms \ref{alg:crat0:sfdv0:01a} and \ref{alg:crat0:sfdv0:01b}. However, as the following lemmas demonstrate, Algorithm \ref{alg:crat0:sfdv0:02a} can be easily rearranged to be in the form of Algorithm \ref{alg:crat0:sfdv0:01a}. \begin{vworklemmastatement} \label{lem:crat0:sfdv0:sprc2:01} $N_{STARTUP}$, the number of invocations of the base subroutine in Algorithm \ref{alg:crat0:sfdv0:02a} before ``\texttt{A()}'' is called for the first time, is given by \begin{equation} \label{eq:lem:crat0:sfdv0:sprc2:01:01} N_{STARTUP} = \left\lceil { \frac{K_3 - K_1}{K_2} } \right\rceil . \end{equation} \end{vworklemmastatement} \begin{vworklemmaproof} The value of \texttt{state} after the $n$th invocation is $K_1 + n K_2$. In order for the test in the \texttt{if()} statement to be met on the $n+1$'th invocation of the base subroutine, we require that \begin{equation} \label{eq:lem:crat0:sfdv0:sprc2:01:02} K_1 + n K_2 \geq K_3 \end{equation} \noindent{}or equivalently that \begin{equation} \label{eq:lem:crat0:sfdv0:sprc2:01:03} n \geq \frac{K_3 - K_1}{K_2} . \end{equation} Solving (\ref{eq:lem:crat0:sfdv0:sprc2:01:03}) for the smallest value of $n \in \vworkintset$ which still meets the criterion yields (\ref{eq:lem:crat0:sfdv0:sprc2:01:01}). Note that the derivation of (\ref{eq:lem:crat0:sfdv0:sprc2:01:01}) requires that the restrictions on $K_1$ through $K_4$ documented in Figure \ref{fig:crat0:sfdv0:02a} be met. \end{vworklemmaproof} \begin{vworklemmastatement} \label{lem:crat0:sfdv0:sprc2:02} Let $N_I$ be the number of times the Algorithm \ref{alg:crat0:sfdv0:02a} base subroutine is called, let $N_{OA}$ be the number of times the ``\texttt{A()}'' subroutine is called, let $f_I$ be the frequency of invocation of the Algorithm \ref{alg:crat0:sfdv0:02a} base subroutine, and let $f_{OA}$ be the frequency of invocation of ``\texttt{A()}''. Then, the proportion of times the ``\texttt{A()}'' subroutine is called is given by \begin{equation} \label{eq:lem:crat0:sfdv0:sprc2:02:01} \lim_{N_I\rightarrow\infty}\frac{N_{OA}}{N_I} = \frac{f_{OA}}{f_I} = \frac{K_2}{K_4 + K_2} , \end{equation} and the proportion of times the ``\texttt{B()}'' subroutine is called is given by \begin{equation} \label{eq:lem:crat0:sfdv0:sprc2:02:02} \lim_{N_I\rightarrow\infty}\frac{N_{OB}}{N_I} = \frac{f_{OB}}{f_I} = 1 - \frac{f_{OA}}{f_I} = \frac{K_4}{K_4 + K_2} . \end{equation} \end{vworklemmastatement} \begin{vworklemmaproof} As in Lemma \ref{} and without loss of generality, we assume for analytic convenience that $K_1=0$ and $K_3=K_4$. Note that $K_1$ and $K_3$ influence only the transient startup behavior of the algorithm. It can be observed from the algorithm that once steady state is achieved, \texttt{state} will be confined to the set \begin{equation} \label{eq:lem:crat0:sfdv0:sprc2:02:10} state \in \{ 0, 1, \ldots , K_4 + K_2 - 1 \} . \end{equation} It is certainly possible to use results from number theory and analyze which values in the set (\ref{eq:lem:crat0:sfdv0:sprc2:02:10}) can be attained and the order in which they can be attained. However, an easier approach is to observe that Algorithm \ref{alg:crat0:sfdv0:02a} can be rearranged to take the form of rational counting Algorithm \ref{alg:crat0:sfdv0:01a}. This rearranged algorithm is presented as Figure \ref{fig:lem:crat0:sfdv0:sprc2:02:01}. Note that the algorithm is rearranged only for easier analysis. \begin{figure} \begin{verbatim} void base_rate_sub(void) { static unsigned int state = K1; state += K2; if (state >= (K4 + K2)) { state -= (K4 + K2); A(); } else { B(); } } \end{verbatim} \caption{Algorithm \ref{alg:crat0:sfdv0:02a} Modified To Resemble Algorithm \ref{alg:crat0:sfdv0:01a} (Proof Of Lemma \ref{lem:crat0:sfdv0:sprc2:02})} \label{fig:lem:crat0:sfdv0:sprc2:02:01} \end{figure} In Figure \ref{fig:lem:crat0:sfdv0:sprc2:02:01}, the statement ``\texttt{state += K2}'' has been removed from the \texttt{else} clause and placed above the \texttt{if()} statement, and other constants have been adjusted accordingly. It can be observed that the figure is structurally identical to rational counting algorithm, except for the \texttt{else} clause (which does not affect the counting behavior) and the specific constants for testing and incrementation. In Figure \ref{fig:lem:crat0:sfdv0:sprc2:02:01}, as contrasted with Algorithm \ref{alg:crat0:sfdv0:01a}, ``$K_4 + K_2$'' takes the place of $K_4$. $\gcd(K_2, K_4 + K_2) = \gcd(K_2, K_4)$ (see Lemma \cprizeroxrefhyphen\ref{lem:cpri0:gcd0:01}), so the results from \end{vworklemmaproof} \begin{vworklemmastatement} \label{lem:crat0:sfdv0:sprc2:03} If $K_3=K_4$, $K_1=0$, and $\gcd(K_2, K_4)=1$, the error between the approximation to $N_O$ implemented by Algorithm \ref{alg:crat0:sfdv0:01b} and the ``ideal'' mapping is always in the set \begin{equation} \label{eq:lem:crat0:sfdv0:sprc2:03:01} \left[ - \frac{K_4 - 1}{K_4} , 0 \right] , \end{equation} and no algorithm can be constructed to confine the error to a smaller interval. \end{vworklemmastatement} \begin{vworklemmaproof} The proof is identical to Lemma \ref{lem:crat0:sfdv0:sprc0:04}. \end{vworklemmaproof} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Bresenham's Line Algorithm} %Section tag: BLA0 \label{crat0:sbla0} \index{Bresenham's line algorithm}\emph{Bresenham's line algorithm} is a very efficient algorithm for drawing lines on devices that have a rectangular array of pixels which can be individually illuminated. Bresenham's line algorithm is efficient for small microcontrollers because it relies only on integer addition, subtraction, shifting, and comparison. Bresenham's line algorithm is presented for two reasons: \begin{itemize} \item The algorithm is useful for drawing lines on LCD displays and other devices typically controlled by microcontrollers. \item The algorithm is an [extremely optimized] application of the rational counting algorithms presented in this chapter. \end{itemize} \begin{figure} \begin{center} \begin{huge} Figure Space Reserved \end{huge} \end{center} \caption{Raster Grid For Development Of Bresenham's Line Algorithm} \label{fig:crat0:sbla0:01} \end{figure} Assume that we wish to draw a line from $(0,0)$ to $(x_f, y_f)$ on a raster device (Figure \ref{fig:crat0:sbla0:01}). For simplicity of development, assume that $y_f \leq x_f$ (i.e. that the slope $m \leq 1$). For each value of $x \in \vworkintset$, the ideal value of $y$ is given by \begin{equation} \label{eq:crat0:sbla0:01} y = mx = \frac{y_f}{x_f} x = \frac{y_f x}{x_f} . \end{equation} \noindent{}However, on a raster device, we must usually choose an inexact pixel to illuminate, since it is typically rare that $x_f \vworkdivides y_f x$. If $x_f \vworkdivides y_f x$, then the ideal value of $y$ is an integer, and we choose to illuminate $(x, (y_f x)/x_f)$. However, if $x_f \vworknotdivides y_f x$, then we must choose either a pixel with the same y-coordinate as the previous pixel (we call this choice `D') or the pixel with a y-coordinate one greater than the previous pixel (we call this choice `U'). The fractional part of the quotient $(y_f x) / x_f$ indicates whether D or U is closer to the ideal line. If $y_f x \bmod x_f \geq x_f/2$, we choose U, otherwise we choose D (note that the decision to choose U in the equality case is arbitrary). Using the rational approximation techniques presented in Section \ref{crat0:sfdv0}, it is straightforward to develop an algorithm, which is presented as the code in Figure \ref{fig:crat0:sbla0:02}. Note that this code will only work if $m = y_f/x_f \leq 1$. \begin{figure} \begin{verbatim} /* Draws a line from (0,0) to (x_f,y_f) on a raster */ /* device. */ void bresenham_line(int x_f, int y_f) { int d=0; /* The modulo counter. */ int x=0, y=0; /* x- and y-coordinates currently being */ /* evaluated. */ int d_old; /* Remembers previous value of d. */ plotpoint(0,0); /* Plot initial point. */ while (x <= x_f) { d_old = d; d += y_f; if (d >= x_f) d -= x_f; x++; if ( ( (d == 0) && (d_old < x_f/2) ) || ( (d >= x_f/2) && ((d_old < x_f/2) || (d_old >= d)) ) ) y++; plotpoint(x,y); } } \end{verbatim} \caption{First Attempt At A Raster Device Line Algorithm Using Rational Counting Techniques} \label{fig:crat0:sbla0:02} \end{figure} There are a few efficiency refinements that can be made to the code in Figure \ref{fig:crat0:sbla0:02}, but overall it is a very efficient algorithm. Note that nearly all compilers will handle the integer division by two using a shift operation rather than a division. We can however substantially simplify and economize the code of Figure \ref{fig:crat0:sbla0:02} by using the technique presented in Figures \ref{fig:crat0:sfdv0:fab0:03} and \ref{fig:crat0:sfdv0:fab0:04}, and this improved code is presented as Figure \ref{fig:crat0:sbla0:03}. \begin{figure} \begin{verbatim} /* Draws a line from (0,0) to (x_f,y_f) on a raster */ /* device. */ void bresenham_line(int x_f, int y_f) { int d=y_f; /* Position of the ideal line minus */ /* the position of the line we are */ /* drawing, in units of 1/x_f. The */ /* initialization value is y_f because */ /* the algorithm is looking one pixel */ /* ahead in the x direction, so we */ /* begin at x=1. */ int x=0, y=0; /* x- and y-coordinates currently being */ /* evaluated. */ plotpoint(0,0); /* Plot initial point. */ while (x <= x_f) { x++; /* We move to the right regardless. */ if (d >= x_f/2) { /* The "U" choice. We must jump up a pixel */ /* to keep up with the ideal line. */ d += (y_f - x_f); y++; /* Jump up a pixel. */ } else /* d < x_f/2 */ { /* The "D" choice. Distance is not large */ /* enough to jump up a pixel. */ d += y_f; } plotpoint(x,y); } } \end{verbatim} \caption{Second Attempt At A Raster Device Line Algorithm Using Rational Counting Techniques} \label{fig:crat0:sbla0:03} \end{figure} In order to understand the code of Figure \ref{fig:crat0:sbla0:03}, it is helpful to view the problem in an alternate way. For any $x \in \vworkintset$, let $d$ be the distance between the position of the ideal line (characterized by $y = y_f x / x_f$) and the actual pixel which will be illuminated. It is easy to observe that: \begin{itemize} \item When drawing a raster line, if one proceeds from $(x, y)$ to $(x+1, y)$ (i.e. makes the ``D'' choice), $d$ will increase by $y_f/x_f$. \item When drawing a raster line, if one proceeds from $(x,y)$ to $(x+1, y+1)$ (i.e. makes the ``U'' choice), $d$ will increase by $(y_f - x_f)/x_f$. (The increase of $y_f/x_f$ comes about because the ideal line proceeds upward from $x$ to $x+1$, while the decrease of $x_f/x_f = 1$ comes about because the line being drawn jumps upward by one unit, thus tending to ``catch'' the ideal line.) \end{itemize} The code of Figure \ref{fig:crat0:sbla0:03} implements the two observations above in a straightforward way. $d$ is maintained in units of $1/x_f$, and when ``U'' is chosen over ``D'' whenever the gap between the ideal line and the current row of pixels being drawn becomes too large. The code in Figure \ref{fig:crat0:sbla0:03} does however contain logical and performance problems which should be corrected: \begin{itemize} \item The test of $d$ against $x_f/2$ will perform as intended. For example, if $d=2$ and $x_f=5$, the test ``\texttt{d >= x\_f/2}'' in the code will evaluate true although the actual condition is false. To correct this defect, the units of $d$ should be changed from $1/x_f$ to $1/(2 x_f)$. \item The quantity $y_f - x_f$ is calculated repeatedly. This calculation should be moved out of the \emph{while()} loop. \item The test against $x_f$ may be more economical if changed to a test against 0 (but this requires a different initialization assignment for $d$). \end{itemize} Figure \ref{fig:crat0:sbla0:04} corrects these defects from Figure \ref{fig:crat0:sbla0:03}. Figure \ref{fig:crat0:sbla0:04} is essentially the Bresenham line algorithm, except that it only draws starting from the origin and will only draw a line with a slope $m = y_f/x_f \leq 1$. \begin{figure} \begin{verbatim} /* Draws a line from (0,0) to (x_f,y_f) on a raster */ /* device. */ void bresenham_line(int x_f, int y_f) { int d = 2 * y_f - x_f; /* Position of the ideal line minus */ /* the position of the line we are */ /* drawing, in units of 1/(2 * x_f). */ /* Initialization value of 2 * y_f is */ /* because algorithm is looking one */ /* pixel ahead. Value of -x_f is from */ /* shifting the midpoint test (the */ /* "if" statement below) downward to a */ /* test against zero. */ int dD = 2 * y_f; int dU = dD - x_f; /* Amounts to add to d if "D" and "U" */ /* pixels are chosen, respectively. */ /* Calculated here outside of loop. */ int x=0, y=0; /* x- and y-coordinates currently being */ /* evaluated. */ plotpoint(0,0); /* Plot initial point. */ while (x <= x_f) { x++; /* We move to the right regardless. */ if (d >= 0) { /* The "U" choice. We must jump up a pixel */ /* to keep up with the ideal line. */ d += dU; y++; /* Jump up a pixel. */ } else /* d < 0 */ { /* The "D" choice. Distance is not large */ /* enough to jump up a pixel. */ d += dD; } plotpoint(x,y); } } \end{verbatim} \caption{Third Attempt At A Raster Device Line Algorithm Using Rational Counting Techniques} \label{fig:crat0:sbla0:04} \end{figure} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Authors And Acknowledgements} %Section tag: ACK0 This chapter was primarily written by \index{Ashley, David T.} David T. Ashley \cite{bibref:i:daveashley}. We would like to gratefully acknowledge the assistance of \index{Falconer, Chuck B.} Chuck B. Falconer \cite{bibref:i:chuckbfalconer}, \index{Hoffmann, Klaus} Klaus Hoffmann \cite{bibref:i:klaushoffmann}, \index{Larkin, John} John Larkin \cite{bibref:i:johnlarkin}, \index{Smith, Thad} Thad Smith \cite{bibref:i:thadsmith}, and \index{Voipio, Tauno} Tauno Voipio \cite{bibref:i:taunovoipio} for insight into rational counting approaches, contributed via the \texttt{sci.math} \cite{bibref:n:scimathnewsgroup} and \texttt{comp.arch.embedded} \cite{bibref:n:comparchembedded} newsgroups. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Exercises} %Section tag: EXE0 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \subsection[\protect\mbox{\protect$h/2^q$} and \protect\mbox{\protect$2^q/k$} Rational Linear Approximation] {\protect\mbox{\protect\boldmath$h/2^q$} and \protect\mbox{\protect\boldmath$2^q/k$} Rational Linear Approximation} \begin{vworkexercisestatement} \label{exe:crat0:sexe0:a01} Derive equations analogous to (\ref{eq:crat0:shqq0:dph0:03}) and (\ref{eq:crat0:shqq0:dph0:07}) if $r_A$ is chosen without rounding, i.e. $h=\lfloor r_I 2^q \rfloor$ and therefore $r_A=\lfloor r_I 2^q \rfloor/2^q$. \end{vworkexercisestatement} \vworkexercisefooter{} \begin{vworkexercisestatement} \label{exe:crat0:sexe0:a02} Derive equations analogous to (\ref{eq:crat0:shqq0:dph0:03}) and (\ref{eq:crat0:shqq0:dph0:07}) if $z$ is chosen for rounding with the midpoint case rounded down, i.e. $z=2^{q-1}-1$, and applied as in (\ref{eq:crat0:sint0:01}). \end{vworkexercisestatement} \vworkexercisefooter{} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \subsection{Rational Counting} \begin{vworkexercisestatement} \label{exe:crat0:sexe0:01} For Algorithm \ref{alg:crat0:sfdv0:01a}, assume that one chooses $K_1 > K_3 - K_2$ (in contradiction to the restrictions in Figure \ref{fig:crat0:sfdv0:01a}). Derive a result similar to Lemma \ref{lem:crat0:sfdv0:sprc0:01} for the number of base subroutine invocations in which ``\texttt{A()}'' is run before it is \emph{not} run for the first time. \end{vworkexercisestatement} \vworkexercisefooter{} \begin{vworkexercisestatement} \label{exe:crat0:sexe0:02} This will be the $\epsilon$ lemma proof. \end{vworkexercisestatement} \vworkexercisefooter{} \begin{vworkexercisestatement} \label{exe:crat0:sexe0:03} Rederive appropriate results similar to Lemma \ref{lem:crat0:sfdv0:sprc0:04} in the case where $\gcd(K_2, K_4) > 1$. \end{vworkexercisestatement} \vworkexercisefooter{} \begin{vworkexercisestatement} \label{exe:crat0:sexe0:04} Rederive appropriate results similar to Lemma \ref{lem:crat0:sfdv0:sprc0:04} in the case where $K_1 \neq 0$. \end{vworkexercisestatement} \vworkexercisefooter{} \begin{vworkexercisestatement} \label{exe:crat0:sexe0:05} Rederive appropriate results similar to Lemma \ref{lem:crat0:sfdv0:sprc0:04} in the case where $\gcd(K_2, K_4) > 1$ and $K_1 \neq 0$. \end{vworkexercisestatement} \vworkexercisefooter{} \begin{vworkexercisestatement} \label{exe:crat0:sexe0:06} For Lemma \ref{lem:crat0:sfdv0:sprc0:04}, complete the missing proof: show that if $\gcd(K_2, K_4) = 1$, no algorithm can lead to a tighter bound than (\ref{eq:lem:crat0:sfdv0:sprc0:04:01}). \textbf{Hint:} start with the observation that with $\gcd(K_2, K_4) = 1$, $n K_2 \bmod K_4$ will attain every value in the set $\{ 0, \ldots , K_4-1 \}$. \end{vworkexercisestatement} \vworkexercisefooter{} \begin{vworkexercisestatement} \label{exe:crat0:sexe0:07} For Lemma \ref{lem:crat0:sfdv0:sprc0:03}, show that if $K_1=0$, the number of initial invocations of the base subroutine before ``\texttt{A()}'' is first called is in the set specified in (\ref{eq:lem:crat0:sfdv0:sprc0:03:01}). \end{vworkexercisestatement} \vworkexercisefooter{} \begin{vworkexercisestatement} \label{exe:crat0:sexe0:08} Develop other techniques to correct the state upset vulnerability of Algorithm \ref{alg:crat0:sfdv0:01a} besides the technique illustrated in Figure \ref{fig:crat0:sfdv0:sprc0:01}. \end{vworkexercisestatement} \vworkexercisefooter{} \begin{vworkexercisestatement} \label{exe:crat0:sexe0:09} Show for Example \ref{ex:crat0:sfdv0:sprc1:01} that integers of at least 27 bits are required. \end{vworkexercisestatement} \vworkexercisefooter{} \begin{vworkexercisestatement} \label{exe:crat0:sexe0:10} Prove Lemma \ref{lem:crat0:sfdv0:sprc1:02}. \end{vworkexercisestatement} \vworkexercisefooter{} \begin{vworkexercisestatement} \label{exe:crat0:sexe0:12} Prove Lemma \ref{lem:crat0:sfdv0:sprc1:04}. \end{vworkexercisestatement} \vworkexercisefooter{} \begin{vworkexercisestatement} \label{exe:crat0:sexe0:13} Define the term \emph{steady state} as used in Lemma \ref{lem:crat0:sfdv0:sprc1:04} in terms of set membership of the \texttt{state} variable. \end{vworkexercisestatement} \vworkexercisefooter{} \begin{vworkexercisestatement} \label{exe:crat0:sexe0:14} For Algorithm \ref{alg:crat0:sfdv0:01a}, devise examples of anomalous behavior due to race conditions that may occur if $K_2$ and/or $K_4$ are set in a process which is asynchronous with respect to the process which implements the rational counting algorithm if mutual exclusion protocol is not implemented. \end{vworkexercisestatement} \vworkexercisefooter{} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \vfill \noindent\begin{figure}[!b] \noindent\rule[-0.25in]{\textwidth}{1pt} \begin{tiny} \begin{verbatim} $HeadURL$ $Revision$ $Date$ $Author$ \end{verbatim} \end{tiny} \noindent\rule[0.25in]{\textwidth}{1pt} \end{figure} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % %End of file C_RAT0.TEX