%$Header: /home/dashley/cvsrep/e3ft_gpl01/e3ft_gpl01/webprojs/pamc/gen_a/docs/manual/man_a/c_tbg0/c_tbg0.tex,v 1.35 2009/11/01 02:42:55 dashley Exp $

\chapter{Technical Background and Decisions}

\label{ctbg0}

\beginchapterquote{``The purpose of computing is insight, not numbers.''}
                  {Richard W. Hamming, 1962}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Introduction}
%Section tag:  INT0
\label{ctbg0:sint0}

This chapter provides technical background and describes key 
\emph{\productbasename{}}
technical
and design decisions.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Technical Background}
%Section tag:  tbg0
\label{ctbg0:stbg0}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{The SHA1 Secure Hash Algorithm}
%Subsection tag:  sha0
\label{ctbg0:stbg0:ssha0}

The SHA1 secure hash algorithm is described by 
\index{RFC 3174}RFC 3174
(available many places on the Internet).  The algorithm maps from
a block of data of any practical length to a 160-bit hash.  The important features of
the algorithm are described in the executive summary of RFC 3174:

\begin{quote}
\emph{This document specifies a Secure Hash Algorithm, SHA-1, for computing
a condensed representation of a message or a data file.  When a
message of any length $<$ $2^{64}$ bits is input, the SHA-1 produces a
160-bit output called a message digest.  The message digest can then,
for example, be input to a signature algorithm which generates or
verifies the signature for the message.  Signing the message digest
rather than the message often improves the efficiency of the process
because the message digest is usually much smaller in size than the
message.  The same hash algorithm must be used by the verifier of a
digital signature as was used by the creator of the digital
signature.  Any change to the message in transit will, with very high
probability, result in a different message digest, and the signature
will fail to verify.}

\emph{The SHA-1 is called secure because it is computationally infeasible
to find a message which corresponds to a given message digest, or to
find two different messages which produce the same message digest.
Any change to a message in transit will, with very high probability,
result in a different message digest, and the signature will fail to
verify.}
\end{quote}

The SHA1 algorithm is used for several purposes within
\emph{\productbasename{}-\productversion{}}:

\begin{itemize}
\item Rather than storing user passwords plain, 
      the standard hash\index{standard hash function} (\S{}\ref{ctbg0:sddc0:sshf0}) of each
      user password is stored.  The standard hash is
      based on the SHA1 algorithm.  (A hash
      is used because of the non-reversibility---it isn't
      possible to go backwards from the hash to the password.)
\item Session identifiers (SIDs, \S{}\ref{ctbg0:sdty0:ssid0})
      are based on the SHA1 function.  (It isn't possible for
      an attacker to guess the exact form of a SID because of
      the construction of the hash.)
\item The SHA1 hash is calculated for files stored in the
      file repository, and this information is retained in the
      database record corresponding to the file.  This serves
      three purposes:
      
      \begin{itemize}
      \item It allows the file to be periodically checked for corruption
            (due to hard disk failure or software defects).
      \item It allows users who upload a file to be sure that it
            was uploaded without corruption.
      \item It allows users who download a file to be sure that
            it was downloaded without corruption.
      \end{itemize} 
\end{itemize}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Data Types}
%Section tag:  dty0
\label{ctbg0:sdty0}

\emph{PHP} has several native data types, including
integers and strings.  \emph{PHP} handles
strings well, and so ``custom'' data types are
in most cases most conveniently represented as strings (although
sometimes arrays of integers, arrays of strings, or some combination
is most convenient).  This section for the
most part describes the ``custom'' data types 
used in the \emph{\productbasename{}-\productversion{}} software.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Hash Key}
%Subsection tag:  thk0
\label{ctbg0:sdty0:sthk0}

\index{hash key}Some of the 
security of \emph{\productbasename{}-\productversion{}} is based
on an attacker's inability to determine in advance what hash value will
be paired with data by the \emph{\productbasename{}-\productversion{}}.
Because the source code of \emph{\productbasename{}-\productversion{}} is
public, it is necessary to have an element of hash calculation that is not
known to an attacker.

The unknown random element is called the 
\index{hash key}\emph{hash key}.  The hash key is a string consisting
(quite arbitrarily) of printable characters.

A reasonable guideline for the length $n$ of the hash key is that it
should have at least as many possible values as the hash output
that it influences.  Assuming that lower- and upper-case letters
and digits are used (62 possibilities):

\begin{eqnarray}
\label{eq:ctbg0:sdty0:sthk0:01} 62^n & \geq & 2^{160} \\
\label{eq:ctbg0:sdty0:sthk0:02} n & \geq & \frac{160 \log 2}{\log 62} \approx 27
\end{eqnarray}

A hash key of 100 characters each chosen from letters and digits
should thus be more than adequate to prevent attacks.

The hash key is most readily generated by the 
\index{hashkeygen@\emph{hashkeygen}}\emph{hashkeygen} program,
described in \S{}\ref{csco0:ssph0:shkg0} (p. \pageref{csco0:ssph0:shkg0})
and in \S{}\ref{cist0:scsh0} (p. \pageref{cist0:scsh0}).
The \emph{hashkeygen} program generates a hash key substantially
longer than the threshold suggested
by Eq. \ref{eq:ctbg0:sdty0:sthk0:02} (see 
Fig. \ref{fig:cist0:scsh0:01}, p. \pageref{fig:cist0:scsh0:01}).


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{AUTIME (Augmented Unix Timestamp)}
%Subsection tag:  ati0
\label{ctbg0:sdty0:sati0}

\index{AUTIME}%
An \emph{augmented Unix timestamp} is a decimal representation of the number of
seconds since Jan 1, 1900 GMT, and includes a fractional part.  Note that for times
after Jan 1, 1970 GMT, the AUTIME can be formed in a straightforward way from
the standard Unix time by adding a constant.

\begin{figure}
\centering
\includegraphics[width=4.6in]{c_tbg0/autimeformat01.eps}
\caption{Format of AUTIME}
\label{fig:ctbg0:sdty0:sati0:00}
\end{figure}

Figure \ref{fig:ctbg0:sdty0:sati0:00} illustrates the format of
an AUTIME\@.  An AUTIME is a string consisting
of exactly 19 characters, with the following
components.

\begin{itemize}
\item \textbf{Integer seconds since midnight, January 1, 1900 GMT (10 characters):}
      These 10 characters are a decimal integer, zero-padded on the left as
      necessary, that represent the integer seconds since midnight, January 1,
      1900 GMT.
\item \textbf{Nanoseconds associated with the integer seconds (9 characters):}
      These 9 characters are an integer, zero-padded on the left as
      necessary, that represent the nanoseconds associated with the
      integer seconds since the midnight, January 1, 1900 GMT.  
\end{itemize}

\index{leap second}Leap seconds are handled by ignoring them.
In essence, a ``virtual timespace'' is created where every day is exactly
86,400 seconds long and leap seconds don't exist.  This strategy is
very similar to the notion of Unix time except that leap seconds are
avoided.  The strategy has these components:

\begin{itemize}
\item Past and future times may be translated into an AUTIME that is
      either ambiguous or non-existent in the presence of leap seconds (this could
      occur only for a time within one second of midnight).
\item When an AUTIME is obtained from the PHP library functions included with
      \emph{\productbasename{}-\productversion{}}, a time within one
      second of midnight Unix time\footnote{Midnight UTC.} won't be supplied (instead, the
      library function will \index{sleep}sleep or spin-lock 
      until the two-second window of vulnerability has passed).
      This behavior is designed to avoid supplying ambiguous times or
      reverse-ordered times.
\item Time differences calculated using two AUTIMEs may differ from the
      actual time difference by up to the number of leap seconds that
      have occurred between the two AUTIMEs.\footnote{In practice, this
      is a very small error, bounded at several seconds per year.}      
\end{itemize}

Note that AUTIMEs as described have the property that the lexical
string sort order corresponds to the time sort order.

The calendaring range of an AUTIME spans:

\begin{itemize}
\item Midnight on January 1, 1900, GMT, \emph{through}
\item One nanosecond before midnight on January 1, 2200, UTC.
\end{itemize}

\noindent{}The calendaring range was chosen to allow the representation of
past events (such as birthdays), but also to allow dates substantially
in the future. 

In addition, the following values are reserved.
In each of the descriptions below, each ``X'' character
signifies a ``don't care'' (the ``X'' characters
are ignored in comparisons).

\begin{itemize}
\item \emph{9999999996XXXXXXXXX} is reserved to
      indicate an underflow result, i.e. a time
      before January 1, 1900, GMT.
\item \emph{9999999997XXXXXXXXX} is reserved to
      indicate an overflow result, i.e. a time
      after one nanosecond before 
      January 1, 1200, UTC.
\item \emph{9999999998XXXXXXXXX} is reserved to
      indicate an indeterminate time, i.e. the
      time can't be reliably determined.
\item \emph{9999999999XXXXXXXXX} is reserved to
      indicate an otherwise unspecified error.
\item Values corresponding to a time of at least
      midnight, January 1, 2200, UTC but less than 
      \emph{9999999996XXXXXXXXX} are treated
      as \emph{9999999999XXXXXXXXX}.  These
      values should never occur in practice as the
      date arithmetic functions should not allow these
      values to be calculated as output.
\end{itemize}

Note that the AUTIME format can be used for values that:

\begin{itemize}
\item Are used as part of generating a unique or random data value
      (see, for example, \S{}\ref{ctbg0:sdty0:ssgu0}).
\item Are used to determine elapsed time.
\end{itemize}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{AUTIMEIP (Augmented Unix Timestamp Integer Pair)}
%Subsection tag:  ati2
\label{ctbg0:sdty0:sati2}

\index{AUTIMEIP}The string representation of an AUTIME is most convenient for
string manipulation in \emph{PHP}.  However, for manipulation 
in \emph{MySQL} and in CGI-BIN programs it may be more efficient to
represent the AUTIME as a pair of integers; one for the whole seconds
and one for the fractional nanoseconds.  This represesentation as a pair
of integers is the AUTIMEIP data type.

As of January 1, 2200, there will have been approximately
9,467,280,000 seconds since January 1, 1900.  The whole number
of seconds requires 34 bits to represent, and so this is compatible
with the \emph{MySQL} \emph{bigint} data type (but incompatible with
native \emph{PHP} integers with \emph{PHP} version 4.X on 32-bit
platforms).

The number of fractional nanoseconds $n$ always meets the
constraint $0 \leq n \leq 999,999,999$ and thus requires
30 bits to represent.  This integer is compatible with
the \emph{MySQL} \emph{int} data type and with \emph{PHP}
integers.

An AUTIMEIP may be represented in the following ways:

\begin{itemize}
\item A \emph{bigint}/\emph{int} pair in \emph{MySQL}.
\item A 64-bit/32-bit integer pair in a compiled C program.
\item A string/integer pair in \emph{PHP}, where the string
      is manipulated using \emph{bcmath} and the integer
      is manipulated as provided for in the language.
\end{itemize}

The exception values as described in \S{}\ref{ctbg0:sdty0:sati0}
also apply to AUTIMEIP values.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{AUTIMEI (Augmented Unix Timestamp Integer)}
%Subsection tag:  ati3
\label{ctbg0:sdty0:sati3}

\index{AUTIMEI}On January 1, 2200 UTC, the number of nanoseconds since 
midnight January 1, 1900 GMT will be approximately 
$9.4673 \times 10^{18}$, requiring 
64 bits to represent.  The integer representation of
the number of nanoseconds since midnight January 1, 1900
is the AUTIMEI data type.

Note that the AUTIMEI data type cannot be represented in a 
\emph{MySQL} \emph{bigint}, as a \emph{MySQL} \emph{bigint}
can represent values only from $-2^{63}$ through $2^{63}-1$.

The AUTIMEI data type can often be represented in an integer
in the C programming langauge, as implementations often provide
for unsigned 64-bit integers.

The exception values as described in \S{}\ref{ctbg0:sdty0:sati0}
also apply to AUTIMEIP values.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{AUTIMELP (Augmented Unix Timestamp, Low Precision)}
%Subsection tag:  ati1
\label{ctbg0:sdty0:sati1}

\index{AUTIMELP}The augmented Unix time is also manipulated in the
software without the fractional part.  Such values are used where
a precision of one second is adequate.

An AUTIMELP is a string of exactly 10 characters.
The ranges and exception values specified in 
\S{}\ref{ctbg0:sdty0:sati1} apply.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{UBDT (Unbound Date)}
%Subsection tag:  ubd0
\label{ctbg0:sdty0:subd0}

It is occasionally desirable to specify a date in a context where a narrow
or unique mark or range in time isn't important or can't be determined.
An example would be
a birthday---a birthday alone gives no indication of the global time zone
where the person was born (and hence the range of times
potentially represented by the date
would be interpreted differently at different global locations).

These vague dates are called \index{unbound date}\emph{unbound dates}. 

Unbound dates are in general unsuitable for most applications within 
\emph{\productbasename{}-\productversion{}}\@.  Even software release dates
are best maintained as AUTIMEs---for example, a software release on a
certain date in the U.S.
might be best thought of as occurring on the following day in Europe.

Unbound dates are always represented within the
\emph{\productbasename{}-\productversion{}} software in the form
\emph{YYYYMMDD}, and may range from 19000101 to 
21991231, inclusive.

The value \emph{00000000} is used to represent an underflow (date before January
1, 1900), the value \emph{99999999} is used to represent an overflow (date
after December 31, 2199), and the value \emph{--------} is used to represent
any other type of error.

The \emph{PHP} library files in \emph{\productbasename{}-\productversion{}}
include functions to perform calculations with unbound dates.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{SGUID (Server Globally-Unique Identifier)}
%Subsection tag:  sgu0
\label{ctbg0:sdty0:ssgu0}

It is necessary or helpful in some contexts to have a way to create an
identifier that is guaranteed to occur no more often than once in the lifetime
of the server.  \emph{MySQL} can be used to create such identifiers, and there
are also methods based on file and IPC semantics that can be used.

The method used in the software is a \index{spin lock}spin lock on a precision
timestamp; and the timestamp is concatenated with the PID.  The method works
because:

\begin{itemize}
\item A single process (by virtue of the spin lock) can't generate the same
      precision timestamp twice.
\item No two processes can have the same PID at the same time.
\end{itemize}

\begin{figure}
\centering
\includegraphics[width=4.6in]{c_tbg0/sguidformat01.eps}
\caption{Format of SGUID}
\label{fig:ctbg0:sdty0:ssgu0:00}
\end{figure}

Figure \ref{fig:ctbg0:sdty0:ssgu0:00} illustrates the format of
an SGUID.  An SGUID consists of 29 characters, with the following
components.

\begin{itemize}
\item \textbf{Integer seconds since the January 1, 1900 GMT (10 characters):}
      These 10 characters are an integer, zero-padded on the left as
      necessary, that represent the integer seconds since January 1, 1900
      GMT.\footnote{Note that 10 digits comfortably solves the Unix
      2037 A.D. issue, as this will guarantee SGUIDs 
      beyond 2200 A.D.}
\item \textbf{Nanoseconds associated with the integer seconds (9 characters):}
      These 9 characters are an integer, zero-padded on the left as
      necessary, that represent the nanoseconds associated with the
      integer seconds since January 1, 1900,
      GMT.\footnote{As of this writing, Linux provides time to a resolution
      of microseconds.  It is anticipated that a resolution of nanoseconds will
      accommodate any hardware speed advances in the foreseeable future, as typical
      hardware gate propagation delays are on the order of several nanoseconds.}  
\item \textbf{PID (10 characters):}
      These 10 characters are an integer, zero-padded on the left as
      necessary, that represent Unix PID expressed 
      as a decimal number.\footnote{As of this writing, PIDs are 16 bits only.
      However, it seems inevitable that PIDs will be expanded to 24 or 32 bits in the 
      future.}  
\end{itemize}

Note that SGUIDs as described have a very important property in addition to
guaranteed uniqueness---the lexical
string sort order corresponds to the time sort order.

Two sample applications of SGUIDs are:

\begin{itemize}
\item The basis for a session identifier (guaranteed unique).
\item A field in a database record to detect browser editing collisions---when a record
      is modified and a new SGUID is assigned to the record, it is \emph{guaranteed}
      not to be the same as the previous SGUID, and thus detection of the editing collision
      is guaranteed.
\end{itemize}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{SID (Session Identifier)}
%Subsection tag:  sid0
\label{ctbg0:sdty0:ssid0}

A \index{session identifier}session identifier (\index{SID}SID) is a
string consisting of exactly 69 characters used to uniquely
identify a session (Figure \ref{fig:ctbg0:sdty0:ssid0:00}).

\begin{figure}
\centering
\includegraphics[width=4.6in]{c_tbg0/sidformat01.eps}
\caption{Format of SID}
\label{fig:ctbg0:sdty0:ssid0:00}
\end{figure}

The first part of the SID is a SGUID 
(\S{}\ref{ctbg0:sdty0:ssgu0}).  The second part of a SID is
the system hash function of the SGUID.  The SGUID portion is guaranteed to
be unique (this is a defining property of an SGUID),
and so each SID is unique within the first 29 characters.
The SHA1 hash appended to the SGUID is designed to eliminate an attacker's
ability to construct a valid SID.  It may be possible to guess an SGUID based
on server characteristics; but it should not be possible to guess the corresponding
hash (this is a defining property of the system hash function).

At the time a user logs in (either as a user or a guest), the SID is created
and provided to the browser as a cookie.

The session state changes that occur as \emph{\productbasename{}-\productversion{}}
is used are all stored on the server side.  The advantages of this approach
are:

\begin{itemize}
\item Only one cookie of a very short length is stored in the client's browser.
      This approach meets the RFC 2109 constraints and is also suitable
      for mobile devices.
\item An attacker's ability to gather information about the internal workings
      of \emph{\productbasename{}-\productversion{}} by
      observing cookie reassignments is severely limited
      (nothing except the SID is exposed, and exposed only once per session).
\item An attacker's ability to tamper with cookies is eliminated (only one
      cookie is provided, and it is tamper-proof).       
\end{itemize}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Database Design Decisions and Discussion}
%Section tag:  ddd0
\label{ctbg0:sddd0}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Global Scope Naming Conventions}
%Subsection tag:  gsn0
\label{ctbg0:sddd0:sgsn0}

In certain of the database tables and in certain other
contexts, a global naming convention is used rather than
further indexing information by application and page.  (For
example, the \emph{upns} and \emph{stns} tables in 
\S{}\ref{ctbg0:sddd0:scfn0}, Figure
\ref{fig:ctbg0:sddd0:scfn0:spga0:00}, p. 
\pageref{fig:ctbg0:sddd0:scfn0:spga0:00}.)

The conventions are quite loose, and simply consist of assigning names
that suggest, in order, the application, then the page, then the 
variable within the page.
If this convention is followed, applications and pages can easily
query for all the database records that affect them by using the
SQL \emph{like} clause.\footnote{It was verified that \emph{MySQL} will
perform such a query nearly instantly even with a million records, so long
as the column involved in the query is indexed and so long as the wildcard
part of the name is at the end.}

For example, a \emph{MySQL} query of the form\\\\
\texttt{SELECT * FROM tablename\\WHERE columnname LIKE 'APP\_\%' ORDER BY columnname;}\\\\
will very efficiently extract all records whose value in a column
begins with ``APP\_'', so long as the column is indexed.

Using these naming conventions, it is also possible to simulate arrays.  For
example, a variable named\\\\
\texttt{APP\_EMPCOST\_PG\_RECORDDISPLAY\_EMP\_HRS\_000029\_000042}\\\\
might be used to represent row 29 and column 42 of the underlying variable.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{\emph{MySQL} Database Locking}
%Subsection tag:  mdl0
\label{ctbg0:sddd0:smdl0}

The design of \emph{\productbasename{}-\productversion{}} includes
many tables and complex relations.  In order to ensure database
consistency, it is necessary to ensure atomicity of operations.

The approach taken to serialize database access is:

\begin{itemize}
\item Serialization is accomplished via the SQL \emph{LOCK TABLES}
      and \emph{UNLOCK TABLES} statements.

      \begin{itemize}
      \item At the start of the critical section, \emph{all} tables existing
            in the database are locked via a single \emph{LOCK TABLES}
            statement.\footnote{It was determined via newsgroup posts
            that the practical limit on the number of tables that can 
            be locked in a single SQL statement is much higher---probably
            thousands of tables---than will be encountered in practice in
            this product.  As a fallback position if locking all tables
            proves impossible, the \emph{GET\_LOCK()} 
            function and its companion function can be used.  \emph{GET\_LOCK()}
            isn't the first choice because the name is server-global rather
            than database global, and malicious code (in another database
            application) could cause denial of service.  Additionally, it has
            been verified that this method handles all the important scenarios
            such as ordinary contention, dying processes, a process that 
            terminates without releasing the lock, etc.}
      \item At the end of the critical section, a single \emph{UNLOCK TABLES}
            statement is executed.
      \end{itemize}
\item Maintenance scripts that run hot will be written to lock and unlock so that
      if there is collision with a web page, the web page will be delayed for only
      a small amount of time.
\item A recursive locking protocol is employed in the software to simplify the case
      of a critical section occurring within a critical section.  (This ensures
      in an orderly way
      that large-scope critical sections are not compromised by small-scope critical
      sections.)
\end{itemize}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{File Repository Organization}
%Subsection tag:  fro0
\label{ctbg0:sddd0:sfro0}

The \emph{\productbasename{}} software needs to maintain user-uploaded
files in conjunction with a 
database.  All such files are aggregated into one structure
called the \emph{file repository}.\index{file repository}

In the type of file repository described here, each file in the repository
has a corresponding database record containing a unique integer index.  These indices
are assigned automatically by \emph{MySQL} and are assigned sequentially
as files are added (i.e. ``1'', ``2'', ``3'', etc.).

Although
\emph{MySQL} has the capability to store files directly as part of
the database, the files are stored as distinct files directly
under the operating system for the following
reasons:

\begin{itemize}
\item The upper limit for the size of the individual and collective
      files stored directly in 
      \emph{MySQL} is not known or trusted.  (The behavior of 
      the *nix filesystem, on the other hand, is well-understood
      and trusted.)
\item The ability to split the file repository across multiple
      volumes (for disk capacity reasons) via symlinks must
      be preserved.
\item In the event of \emph{MySQL} database corruption, the files must still
      be easily recoverable.  This constraint would not be met if the
      files were stored directly in \emph{MySQL}.
\item If incremental backups are used, it is known that \emph{MySQL} tends
      to cause an entire table of files to be backed up (this can be tens
      or hundreds of megabytes or more), whereas an approach relying on discrete
      files will cause only the new or modified files to be incrementally
      backed up.
\end{itemize}

The following constraints and design goals for the file repository exist:

\begin{itemize}
\item The differential growth of disk consumption due to overhead (such as
      directory creation) as files are added
      should not be excessive under the assumption that indices grow linearly. 
\item It is known that *nix systems begin to experience performance issues
      when directories contain more than about 200 files or subdirectories (due to
      the linear search).  Any solution should not place more than about 200 files
      or subdirectories in a directory.
\item The solution should facilitate symlinking to split the file
      repository across multiple volumes.
\item The solution must accommodate logical indices as large as $2^{64}-1$.\footnote{This design
      goal applies because \emph{MySQL} allows 64-bit integers to be used as the primary
      key for database tables.}
\item The solution must allow files to be aged out or deleted
      randomly with respect to the integer indices (to comply with records retention
      policies, because they are removed from the repository, or for other reasons).
\end{itemize}

In order to accommodate the constraints, files are stored in a directory structure
based on prime moduli of the database integer index $n$.  
The lowest level of the directory structure contains a single
file per directory, with the lowest-level directory named being the 
decimal representation of the
integer index $n$.

The definining equations for the directory path components are supplied below.

\begin{eqnarray}
\label{eq:ctbg0:sddd0:sfro0:00}   d    & = & 160                               \\
\label{eq:ctbg0:sddd0:sfro0:01} c_0    & = & \lfloor n / d \rfloor \bmod 7   \\
\label{eq:ctbg0:sddd0:sfro0:02} c_1    & = & \lfloor n / d \rfloor \bmod 11  \\ 
\label{eq:ctbg0:sddd0:sfro0:03} c_2    & = & \lfloor n / d \rfloor \bmod 13  \\
\label{eq:ctbg0:sddd0:sfro0:04} c_3    & = & \lfloor n / d \rfloor \bmod 89  \\
\label{eq:ctbg0:sddd0:sfro0:05} c_4    & = & \lfloor n / d \rfloor \bmod 97  \\
\label{eq:ctbg0:sddd0:sfro0:06} c_5    & = & \lfloor n / d \rfloor \bmod 101 \\
\label{eq:ctbg0:sddd0:sfro0:07} c_6    & = & \lfloor n / d \rfloor \bmod 103 \\
\label{eq:ctbg0:sddd0:sfro0:08} c_7    & = & \lfloor n / d \rfloor \bmod 107 \\
\label{eq:ctbg0:sddd0:sfro0:09} c_8    & = & \lfloor n / d \rfloor \bmod 109 \\
\label{eq:ctbg0:sddd0:sfro0:10} c_9    & = & \lfloor n / d \rfloor \bmod 113 
\end{eqnarray}

With reference to (\ref{eq:ctbg0:sddd0:sfro0:01})
through (\ref{eq:ctbg0:sddd0:sfro0:10}), the relative path within a file repository
to the directory containing a file with database index $n$ is\\
``$c_0$/$c_1$/$c_2$/$c_3$/$c_4$/$c_5$/$c_6$/$c_7$/$c_8$/$c_9$/$n$''\@.
Each path component $c_i$ and $n$ is the traditional human-friendly variable-length
representation:
for example, ``3'', ``25'', ```111', or ``36237456726''.  

Note that the storage scheme proposed will handle indices 
as large as $2^{64}-1$, i.e.

\begin{equation}
\label{eq:ctbg0:sddd0:sfro0:20}
d \times c_0 \times c_1 \times c_2 \times c_3 \times c_4
\times c_5 \times c_6 \times c_7 \times c_8 \times c_9 > 2^{64} - 1 .
\end{equation}

Note also that the first three
components ($c_0$, $c_1$, and $c_2$) are small and chosen for 
convenient symlinking, whereas the last seven components ($c_3$ through $c_9$)
are chosen to increase the product rapidly while staying clear of the 
*nix performance limit of approximately 200 entries per directory.

Two properties of the storage scheme proposed by (\ref{eq:ctbg0:sddd0:sfro0:01})
through (\ref{eq:ctbg0:sddd0:sfro0:10}) may require further explanation.

\begin{itemize}
\item The differential cost of storing a file in the repository is
      approximately the file size (with operating system overhead)
      plus approximately 1.0625 directories.  This comes about because
      every file requires its own directory, and because
      every 160 files, 10 directories must be created.
\item If the index $n$ increases linearly, the number of files stored in
      each directory (especially at the top levels) will be approximately
      equal.  (The reason for this is the coprimality of the divisors
      used.  The necessary proof comes from number theory, and isn't 
      included here.)  This means that symlinking at the top levels will be
      effective in evenly dividing storage requirements.
\end{itemize}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Database Design to Support Core Functionality}
%Subsection tag:  cfn0
\label{ctbg0:sddd0:scfn0}

The core functionality as described in Chapter \ref{ccfn0} consists of:

\begin{itemize}
\item Site navigation (\S{}\ref{ccfn0:ssng0}).
\item Authentication and login (\S{}\ref{ccfn0:salg0}).
\item Permission groups and attributes (\S{}\ref{ccfn0:sgra0}).
\item Notification (\S{}\ref{ccfn0:snot0}).
\item Action list (\S{}\ref{ccfn0:sacl0}).
\item Logging (\S{}\ref{ccfn0:slog0}).
\end{itemize}

Figure \ref{fig:ctbg0:sddd0:scfn0:spga0:00} provides an overview of the
tables involved in implementing the core functionality.  In the figure,
for efficiency, note that three of the most frequently
used read-only tables (\emph{applications}, \emph{pages},
and \emph{navlines}) are implemented as lookup tables in 
\emph{PHP} rather than
as \emph{MySQL} tables.\footnote{Presumably, the \emph{PHP}
interpreter parsing these tables is far more efficient than executing
an SQL query and processing the results.}

\begin{figure}
\centering
\includegraphics[width=4.6in]{c_tbg0/dbdesign01.eps}
\caption{User, Permission Group, Application, and Page Database Design}
\label{fig:ctbg0:sddd0:scfn0:spga0:00}
\end{figure}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsubsection{Users and Tokens}
%Subsubsection tag:  utk0
\label{ctbg0:sddd0:scfn0:sutk0}

The database design includes a table of users (the \emph{users} table
in Figure \ref{fig:ctbg0:sddd0:scfn0:spga0:00}).  Each user who can log
in to the system has one record in this table.  As mentioned
in \S{}????, each user always logs in using a userid that 
represents the user---there
are no userids used solely for administration.

The \emph{users} table 
(Figure \ref{fig:ctbg0:sddd0:scfn0:spga0:00}) is related 1:1 with the
\emph{tokens} table.
Each user is optionally assigned a cryptographic token to use in
authenticating.  Token assignment is on a per-user
basis, so that \emph{\productbasename{}-\productversion{}} will support:

\begin{itemize}
\item No users authenticating using cryptographic tokens.
\item Some users authenticating with cryptographic tokens, and some authenticating
      without cryptographic tokens.
\item All users authenticating with cryptographic tokens.
\end{itemize}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsubsection{User Permission Attributes}
%Subsubsection tag:  pga0
\label{ctbg0:sddd0:scfn0:spga0}

Figure \ref{fig:ctbg0:sddd0:scfn0:spga0:00} shows that the \emph{users} and 
\emph{upermattrs} tables are related by the \emph{userupermattrs} table, providing 
an $\infty{}:\infty{}$ mapping.

Each record in the \emph{userupermattrs} table also carries with it an
optional string.  Such as string is interpreted as a user-specific value of the
related record in the \emph{upermattrs} table.  As a 
contrived example, one user may have
a value of MAX\_LOGINS of 2, and another user may have a value of 5.

If the optional string in the \emph{userupermattrs} table is the empty string,
then the $\infty{}:\infty{}$ relation effectively specifies a named set
of users.  Such named sets are typically used to control user permissions
in a fine-grained way---a given group may contain the users who can add
products to the \emph{products} table, for example.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsubsection{User Persistent Named State}
%Subsubsection tag:  upn0
\label{ctbg0:sddd0:scfn0:supn0}

User permission attributes do not change as the result of ordinary usage of
the \emph{\productbasename{}-\productversion{}} system.  Although 
user permission attributes can be modified, this is an administrative action
and does not happen often.

For state bound to a user which may change often, the 
\emph{upns} table (Figure \ref{fig:ctbg0:sddd0:scfn0:spga0:00}) is
provided.  This state is very similar in spirit to what may be contained
in the \emph{Windows} registry.

Each record in the \emph{upns} table consists of a name and a value.

For each user, the names in the \emph{upns} table must be unique (no
duplicates are allowed).

A naming convention is used so that each entry in the \emph{upns} table
is bound to a specific application and page.  The \emph{upns} table is indexed
by name for fast retrieval.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsubsection{Sessions}
%Subsubsection tag:  ses0
\label{ctbg0:sddd0:scfn0:sses0}

A \emph{session} is the relationship between the server and a client during
a single login (whether as a named user or a guest).

Sessions are stored in the \emph{sessions} table 
(Figure \ref{fig:ctbg0:sddd0:scfn0:spga0:00}).  Each record in the 
\emph{sessions} table contains the session identifier, information about the
user, information about the time of the last actions associated with the
session, etc.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsubsection{Session Temporary Named State}
%Subsubsection tag:  stn0
\label{ctbg0:sddd0:scfn0:sstn0}

State that should be stored associated with a session rather than a user
(i.e. that should be deleted when a session ends) is stored in the
\emph{stns} table (Figure \ref{fig:ctbg0:sddd0:scfn0:spga0:00}).

Each entry in the \emph{stns} table consists of a name and a value.
Within each session, each name in the \emph{stns} table must be unique.

A naming convention is used so that each entry in the \emph{stns} table
is bound to a specific application and page.  The \emph{stns} table is indexed
by name for fast retrieval.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsubsection{Chain of Command}
%Subsubsection tag:  coc0
\label{ctbg0:sddd0:scfn0:scoc0}

In some contexts, \emph{\productbasename{}-\productversion{}} components
need to make decisions based on reporting relationships between
employees.  For example, some information about an employee should only
be viewable or editable by those above the employee in the chain of command.

The database contains an \emph{enterprises} table
(Figure \ref{fig:ctbg0:sddd0:scfn0:spga0:00}) and an
\emph{usersenterprises} table that provide a 
$\infty{}:\infty{}$ mapping between \emph{users}
and \emph{enterprises}.  A given user may belong to more than one 
enterprise and may exist in reporting relationships in more than
one enterprise.

The database also contains a \emph{cocrels} table
(Figure \ref{fig:ctbg0:sddd0:scfn0:spga0:00}) that can capture
reporting relationships between employees for each enterprise.  
Both normal reporting relationships
and matrix reporting relationships can be captured.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsubsection{Login Attempts}
%Subsubsection tag:  lat0
\label{ctbg0:sddd0:scfn0:slat0}

The database contains a \emph{loginattempts} table
(Figure \ref{fig:ctbg0:sddd0:scfn0:spga0:00}).
The \emph{loginattempts} records unsuccessful login attempts that meet criteria
in order to implement security policies that involve the maximum number of
certain types of login attempts in a period of time.\footnote{In principle,
the \emph{logentries} table also contains enough information to implement
security policies, but the design is more versatile if a separate table
is used for this purpose.}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsubsection{User Templates}
%Subsubsection tag:  utp0
\label{ctbg0:sddd0:scfn0:sutp0}

In a mature deployment of \emph{\productbasename{}-\productversion{}},
a typical user may have dozens of associated permission attributes.
Creating a new user would be quite tedious if a complex set of permission 
attributes need to be created at the same time.

To make this process easier, user templates are provided via the 
\emph{utemplates} and \emph{utemplatespermgroups} tables depicted in
Figure \ref{fig:ctbg0:sddd0:scfn0:spga0:00}.  A user template gives
a one-click way to create a user with a complex set of permission attributes.

A user template contains the same type of $\infty{}:\infty{}$
relation to the \emph{permattrs} table as a user.  Creating a new user
from a user template is a relatively straightforward process of copying.

In general, a user may create another user if and only if:

\begin{itemize}
\item The user has the permission attribute set to be able to create
      other users.
\item The security level integer of the created user is greater than that of
      the creating user.
\item All ranked permission attributes of the created user are inferior to those of
      the creating user.
\end{itemize}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsubsection{Site Navigation}
%Subsubsection tag:  snv0
\label{ctbg0:sddd0:scfn0:ssnv0}

The left navigation pane of the \emph{\productbasename{}-\productversion{}}
displays a hierarchical set of links to pages.  The left navigation page
is hierarchical in the sense that links to pages may be displayed under headings
or sub-headings.  The logic used to
display links is:

\begin{itemize}
\item Each page in the \emph{pages} table contains a set of \emph{permgroups}.
      Any user must be a member of at least one of the set of \emph{permgroups}
      in order for the link to the page to be displayed (and in order to use
      the services of the page at all).
\item A heading or sub-heading is displayed if and only if there is at least
      one page link under the heading or sub-heading that, based on 
      the required \emph{permgroups}, should be displayed.  (Headings and
      sub-headings automatically drop out when there are no page links under
      them.)
\end{itemize}

The relationship between 
\emph{applications}, \emph{pages}, and \emph{navlines} shown
in Figure \ref{fig:ctbg0:sddd0:scfn0:spga0:00} has these
elements:

\begin{itemize}
\item \emph{applications} have a $1:\infty$ relationship with the
      \emph{pages}.  Every page is a member of exactly one application.
\item The \emph{navlines} specify the headings and page links that may appear
      (depending on \emph{permgroups}) in the left navigation pane.  A given page
      may appear in more than one location in the left navigation pane (hence
      the $\infty{}:1$ relationship\footnote{With the additional caveat that
      a page is not required to appear in the left navigation pane---some pages
      can only be reached indirectly.}).
\end{itemize}

Note that the \emph{pages} and \emph{applications} tables are linked to
from other tables (these relationships are not shown due to complexity constraints
in Figure \ref{fig:ctbg0:sddd0:scfn0:spga0:00}).  For
example, the \emph{globalvars} table is linked by page and application
so that the variables relevant to a page can be extracted more economically.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsubsection{Products and Product Components}
%Subsubsection tag:  pcn0
\label{ctbg0:sddd0:scfn0:spcn0}

The \emph{products}, \emph{pcomponents}, \emph{psubcomponents}, 
and \emph{pversions} tables in Figure \ref{fig:ctbg0:sddd0:scfn0:spga0:00}
define the products that are produced.  Note that a version can only
be associated with a product (rather than a component or subcomponent
of a product).

The exact semantics of what constitutes a product, product component,
and product subcomponent is naturally dependent on the organization.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsubsection{File Repository Organization}
%Subsubsection tag:  fro0
\label{ctbg0:sddd0:scfn0:sfro0}

The information maintained about the file repository files in the
\emph{MySQL} database is maintained in the \emph{frfiles},
\emph{frfilesfrfusages}, and \emph{frfusages} tables
(Figure \ref{fig:ctbg0:sddd0:scfn0:spga0:00}).  These three
tables create a $\infty{}:\infty{}$ mapping between
the \emph{frfiles} and \emph{frfusages} tables.

The \emph{frfiles} table contains one record per file in the
file repository.  Each of these records contains all direct information
about the stored file (location, SHA1 hash value, etc.).

The \emph{frfilesfrfusages} and \emph{frfusages} tables are necessary because
there are PHP scripts that:

\begin{itemize}
\item Display information about a file in the file repository.
\item Serve the contents of a file in the file repository.
\end{itemize}

These scripts must be able to determine whether a given user is authorized to
view information about a file repository file or download the file.
The difficulty in making this determination is that the file repository is
is shared among many applications.  Without some sort of a hint, the process
of determining whether a user may view information about a repository file
or download it would be an iterative process of checking every
\emph{\productbasename{}-\productversion{}} application to determine if it
has involvement with the repository file and if the user has permissions for the
file.  The \emph{frfusages} table is a table of possible usages for a file 
repository file.  Using this table, the applications to check for file permissions
should be reduced in a typical case to one or two.

At the present time, each usage of a file repository file is simply
an integer, and so the \emph{frfusages} is superfluous (the integer could
simply be stored directly in the \emph{frfilesfrusages} table).  However,
the more complex schema is used in case future expansion or enhancement 
occurs.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsubsection{Global Variables}
%Subsubsection tag:  gvr0
\label{ctbg0:sddd0:scfn0:sgvr0}

The database includes a table of global variables 
(the \emph{globalvars} table in Figure \ref{fig:ctbg0:sddd0:scfn0:spga0:00},
p. \pageref{fig:ctbg0:sddd0:scfn0:spga0:00}).

Global variables have these uses:

\begin{itemize}
\item Influencing the behavior of applications (global variables often override
      default behavior---similar to the typical behavior of \emph{Unix} environment
      variables).  For example, the web interface to the database can be taken
      offline by setting a global variable.
\item Mutual exclusion:  for example, a global variable is used to ensure that two instances
      of the maintenance script don't run concurrently.
\item Application persistent state:  state that is not bound to a user or a session
      (discussed later) can be stored in global variables (similar to the behavior of
      the \emph{Windows} registry).
\end{itemize}

Global variables have these essential characteristics:

\begin{itemize}
\item The name of a global variable must be unique (duplicate names in the 
      \emph{globalvars} table are not allowed).  In order to facilitate and 
      guarantee uniqueness, a naming convention
      (\S{}\ref{ctbg0:sddd0:sgsn0}, p. \pageref{ctbg0:sddd0:sgsn0}) is used that
      names the global variables by application then page.
\item The \emph{globalvars} table is indexed by variable name.
\end{itemize}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsubsection{Logging}
%Subsubsection tag:  log0
\label{ctbg0:sddd0:scfn0:slog0}

\emph{\productbasename{}-\productversion{}} maintains a log in the form of
a database table (the \emph{logentries} table in Figure \ref{fig:ctbg0:sddd0:scfn0:spga0:00},
p. \pageref{fig:ctbg0:sddd0:scfn0:spga0:00}).  This database table
contains indexed columns so that the log can be viewed based on
chronological order, application, log entry type, and security threat level.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsubsection{Inbox}
%Subsubsection tag:  inb0
\label{ctbg0:sddd0:scfn0:sinb0}

When certain events/actions occur, one or more users should be notified.
The notices are inserted into the \emph{inbox} table 
(Figure \ref{fig:ctbg0:sddd0:scfn0:spga0:00}).

Presently, only the \emph{inbox} is implemented, and is used only for
automatic notifications from applications.  In the future, a more general messaging
system (with the ability to send/receive e-mail or the ability to send/receive
messages to other users) may be implemented.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsubsection{Host E-mail Preferences}
%Subsubsection tag:  hep0
\label{ctbg0:sddd0:scfn0:shep0}

Notifications are always sent to the \emph{inbox} and can be viewed by the user.
For each user, it can be configured whether notifications are also sent to
the user's e-mail address(es).

There are two concerns with injecting e-mail bound for remote hosts:

\begin{itemize}
\item Some remote hosts may filter for SPAM by limiting the number of
      e-mails that can delivered from a certain host.  With these remote hosts,
      e-mail injection must be rate-limited.
\item A software bug could result in a large number of notifications erroneously
      created.  Rate-limiting all e-mail injections for remote hosts is prudent.      
\end{itemize}

The \emph{hostemailprefs} table (Figure \ref{fig:ctbg0:sddd0:scfn0:spga0:00})
is a table of regular expressions designed to match e-mail addresses.
During the high-frequency maintenance script, the \emph{hostemailprefs}
records are scanned in a specific order in an attempt to match
the outgoing e-mail address.  The data in the first match is used to control
the generation of outgoing e-mail.

Note that generation of outgoing notification e-mail is simply a process of
sending each message in the \emph{inbox} table subject to user preferences
and rules in the \emph{hostemailprefs} table. 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Product Design Decisions and Discussion}
%Section tag:  ddc0
\label{ctbg0:sddc0}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Two-Factor Authentication}\index{two-factor authentication}
%Subection tag:  tfa0
\label{ctbg0:sddc0:stfa0}

\emph{Two-factor authentication}\index{two-factor authentication} is the practice of 
authenticating users based on both something the user \emph{knows} (typically a password)
and something the user \emph{has} (typically a cryptographic token).

It is anticipated that \emph{\productbasename{}} will be used from airport
kiosks and other locations where password capture is a very tangible possibility.

The best known countermeasure against password capture and password guessing is
one-time passwords\index{one-time password} (OTPs) generated by a
cryptographic token.  

OTPs generated by a cryptographic token are effective against password capture because
each one-time password generated can be used only once and is useless in the future.
OTP capture is not helpful to an attacker.

OTPs generated by a cryptographic token are effective against password guessing
because the OTPs generated have an approximately uniform
distribution across the space of all
OTPs that can be generated.  This is unlike passwords generated by humans,
which tend to involve language words and may facilitate guessing.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsubsection{Overview of the Solution}
%Subsubsection tag:  ovs0
\label{ctbg0:sddc0:stfa0:sovs0}

%Note to self:  .JPG was converted to .EPS on *nix box using
%"convert cckt1_01.jpg cckt1_01.eps", with no additonal options
%or qualifiers.  .EPS ended up to be quite large.
%
\begin{figure}
\centering
\includegraphics[width=4.6in]{c_tbg0/cckt1_01.eps}
\caption{CryptoCard KT-1 Keychain Token}
\label{fig:ctbg0:sddc0:stfa0:sovs0:00}
\end{figure}

The model of token supported by \emph{\productbasename{}}
is the CryptoCard\index{CryptoCard Corporation} 
KT-1 keychain token\index{KT-1 keychain cryptographic token} 
(Fig. \ref{fig:ctbg0:sddc0:stfa0:sovs0:00}).


On a per-user basis, logins to \emph{\productbasename{}} may be either with:

\begin{itemize}
\item Userid and password.
\item Userid, password, and OTP. 
\end{itemize}

The cryptographic basis of the CryptoCard KT-1 token and similar products
is that:

\begin{itemize}
\item The token is programmed with a key that cannot be extracted\footnote{The
      ability to extract the key from the token is equivalent to being able
      to predict all future OTPs that will be generated by the token, and hence
      would render the token useless as a security device.}, and this key is known
      to the \emph{\productbasename{}} software.
\item Mathematically, there is no way to reverse-engineer the key by observing
      OTPs generated by the token; hence there is no way to predict future
      OTPs generated by the token based on observing past OTPs.
\end{itemize}

\emph{\productbasename{}} utilizes the CryptoCard KT-1 token configured
so that each of the 8 displayed OTP characters can be one of 32 different
possibilities.\footnote{The KT-1 token can also be configured so that 
each of the 8 OTP characters can be one of 64 possibilities, leading to
$64^8 = 2.8 \times 10^{14}$ OTPs.  The base-32
OTPs were chosen because they are case-insensitive and lead to fewer user 
data entry errors.}  
The number of possible OTPs is thus $32^8 = 1.1 \times 10^{12}$.
The large number of OTPs makes a brute-force attack unattractive---even with 10 guesses
per second, the expected time to guess an OTP would be 1,700 years.

The CryptoCard KT-1 keychain token is an event-driven device---it generates
sequential OTPs according to a mathematical sequence, with one OTP generated
at each activation of the token.  The \productbasename{} software, because
it has access to the token key, is able to predict the OTPs that should be
generated by the token.  The \productbasename{} software also allows approximately
three\footnote{Configurable:  three is the default value.}
of the predicted sequential OTPs to be used, in case the KT-1 token was activated
and the OTP never used.\footnote{This might happen, for example, if the token button
is accidentally pressed by objects in the user's pockets, or if children are
allowed to play with the token.}

It can occur, however, that the KT-1 token falls out of synchronization with
the \productbasename{} software.  In this case, the token allows a resynchronization
procedure where the user enters a resynchronization string (8 digits,
allowing for $10^8$ different values).  When the token is supplied
with a resynchronization string, it resets its internal state and provides
an OTP.  This procedure allows the token and the \productbasename{} software
to be brought back into synchronization.

Note to self:  in conversation with Bill LaHam in late October 2009, Bill indicated
that the traditional approach with tokens is to have an inner window and an outer
window.  A typical inner window might be of size 3, and a typical outer window might
be of size 1000.

\begin{itemize}
\item If the token value falls within the inner window, it is authenticated without
      any other verification steps.
\item If the token value is not within the inner window but is within the outer
      window, two (additional?) consecutive values are required for authentication.
\item If the token value is not within the inner window and not within the
      outer window, some sort of resynchronization is required.
\end{itemize}

The size of the inner window is a major security risk (it raises the probability
of a successful guess), so it should be small.  The outer window is not a security
concern at all.

Need to incorporate the information from Bill into this document and the
strategy in this documennt.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsubsection{Additional Prudent User Authentication Practices}
%Subsubsection tag:  aap0
\label{ctbg0:sddc0:stfa0:saap0}

Even when OTPs are employed, it is not prudent to allow an attacker
a large number of authentication attempts.  The following 
prudent practices are also used by the \productbasename{} software.

\begin{enumerate}
\item Resynchronization strings are provided by the server, are generated
      as sequential values (``00000000'', ``00000001'', ``00000002'', etc.),
      and are only reused with a period of $10^8$.  (This is designed to
      elmiminate an attacker's ability to observe how a token responds to
      a specific resynchronization string and to reuse this information as
      part of an attack.)
\item Unsuccessful authentication attempts that involve an invalid userid
      are logged but otherwise simply ignored.
\item Unsuccessful authentication attempts that involve a valid userid but 
      both invalid password and invalid OTP are logged but otherwise
      simply ignored.
\item Unsuccessful authentication attempts involving a valid userid and
      either a valid password and invalid OTP or invalid password and valid
      OTP\footnote{In the case of a user for whom two-factor authentication
      is not enabled, valid userid and invalid password are treated as described
      here---the key element is that the attacker appears to be ``one piece''
      away from a successful attack.} 
      are treated more aggressively because this case hints at an attacker
      who has obtained a user's password but has no token or has obtained a token
      but does not have the user's password, and is ``fishing'' for the missing
      piece.  If a sufficient number of these attacks directed at the userid have
      occurred from the same IP in a short period of time, login ability for
      this userid from the affected IP is silently disabled for a period of 
      time.\footnote{\emph{Silently} means that the web interface will only indicate
      unsuccessful authentication, and will give no indication that a probable attack
      has been detected or that authentication for the affected userid from the 
      affected IP is temporarily impossible.  \emph{Disabled} means that even if
      correct authentication credentials are provided, they will be rejected.  The
      purpose of this policy is to eliminate the ability of an attacker to try large
      numbers of attacks in a short period of time, and to deny an attacker information
      about how to better mount an attack.} 
\end{enumerate}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Per-User Authentication and Administrative Rights}
%Subsection tag:  pua0
\label{ctbg0:sddc0:spua0}

\emph{\productbasename{}-\productversion{}} has no notion of an
``administrative'' or ``master'' password.  Each individual user
logs in using a user-ID identifying themselves, even to perform
administrative tasks.

Tasks are divided into two groups:

\begin{itemize}
\item Ordinary tasks (that typically view data or moves workproducts
      through a process).
\item Administrative tasks that:
      \begin{itemize}
      \item Represent administration of the \emph{\productbasename{}-\productversion{}}
            software, rather than usage of the software; AND/OR
      \item Are sensitive to mistakes, and could destroy or corrupt 
            data.
      \end{itemize}
\end{itemize}

\emph{\productbasename{}-\productversion{}} allows two types of logins:

\begin{itemize}
\item \emph{Normal login}:  the user has priveleges only to perform
      non-administrative tasks.
\item \emph{Administrative login}:  the user has priveleges to perform
      non-administrative and administrative tasks.
\end{itemize}

In order to perform a normal login, the user simply enters their
user-ID (``\emph{jsmith}'', for example) as the user-ID when logging in.

In order to perform an administrative login, the user enters their
user-ID postfixed with an asterisk (``\emph{jsmith*}'', for
example as the user-ID when logging in.  When a user has performed an
administrative login, the color scheme used for the web pages
is based on the color red rather than on blues and grays.

The only way to switch between normal and administrative logins 
is by logging out and logging in again.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{\emph{su} Logins}
%Subsection tag:  sul0
\label{ctbg0:sddc0:ssul0}

\index{su login@\emph{su} login}It is helpful in some
situations to be able to log in as a different user.  Such logins are 
called \emph{su} logins (after the *nix \emph{su} command).  These situations
include:

\begin{itemize}
\item Testing.
\item Performing actions on behalf of other users.
\end{itemize}

In order to \emph{su} as another user, the user logging in
must enter a user-ID of the form ``\emph{actualuser as suuser}''
as the user-ID when logging in.  ``\emph{actualuser as suuser*}'' can
also be used for an \emph{su} administrative login.

In order to perform an \emph{su} login as another user, a user
must have strictly superior priveleges.  Specifically:

\begin{itemize}
\item The \emph{seclvl} of the actual user must be 
      a smaller integer than the \emph{seclvl} of the \emph{su} user.
\item The actual user must have at least every permission attribute
      that the \emph{su} user has.
\item For those permission attributes that are ranked, the ranking
      must be at least as great as the \emph{su} user.
\end{itemize}

Once the login is complete, the \emph{su} login is indistinguishable from an
actual login except for the log entries.  The session maintained does
not contain any state to indicate that it represents an \emph{su} login.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Multiple Logins}
%Subsection tag:  mlo0
\label{ctbg0:sddc0:smlo0}

The maximum number of logins per user is a global configuration
constant, defined in the \texttt{config.inc} \emph{PHP} file.
The default value is 3.

If the maximum number of logins is reached and another user 
successfully authenticates with the
same user-ID, the session with the oldest creation time is destroyed (forcibly
logging out this instance of the user).


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{The Standard Hash Function}
%Subection tag:  shf0
\label{ctbg0:sddc0:sshf0}

The \index{standard hash function}\emph{standard hash function} is the
standard way that the \emph{\productbasename{}-\productversion{}} software
maps between a number of arbitrary input arguments and a 160-bit
SHA1 hash.  The standard hash function involves mixing
the input arguments with a secret hash key
(described in \S{}\ref{ctbg0:sdty0:sthk0}).

If `+' is used to represent the string concatenation operation, all
arguments $a_i$ that are not strings are converted to string format, then the
SHA1 hash is applied to the concatenation of the secret hash key $k_h$ and the
arguments $a_i$ as described by the first few patterns below.

\begin{eqnarray}
\nonumber
SHF_1(a_1)              & = & SHA1(k_h + a_1 + k_h) \\
\label{eq:ctbg0:sddc0:sshf0:01}
SHF_2(a_1, a_2)         & = & SHA1(k_h + a_1 + k_h + a_2 + k_h) \\
\nonumber
SHF_3(a_1, a_2, a_3)    & = & SHA1(k_h + a_1 + k_h + a_2 + k_h + a_3 + k_h)
\end{eqnarray}

Note that the hash functions above are designed so that it should be
impossible for an attacker to predict the hash that will be generated
for a given set of input arguments $a_i$ unless the hash key $k_h$ has
been compromised.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Storage of User Passwords}
%Subection tag:  sup0
\label{ctbg0:sddc0:ssup0}

It is generally known that individuals tend to choose identical or similar
passwords across many different computing applications.  For that reason, it
is important to safeguard the passwords used with
\emph{\productbasename{}-\productversion{}}.  Although 
\emph{\productbasename{}-\productversion{}} may be a relatively unimportant
product to a user, the other software products with which the user is
unwisely using the same password may be important.
It is important to protect passwords.

Passwords are never stored in \emph{\productbasename{}-\productversion{}}.
Instead, the $SHF1(\cdot{})$ of the password (as described in
Eq. \ref{eq:ctbg0:sddc0:sshf0:01}) is stored.

It is possible, although extremely unlikely, that two non-identical 
passwords would have the same stored hash.  Assuming that 
a typical user is comfortable using 62 characters (26 lower-case letters, 26 upper-case
letters, and 10 digits) as part of a password, an approximation of how many
password characters $n$ a 160-bit hash is equivalent to can be obtained by
solving:

\begin{equation}
62^n = 2^{160}
\end{equation}

\begin{equation}
n = \frac{160 \log 2}{\log 62} \approx 27
\end{equation}

\noindent{}Thus, a hash collision involving two different typical passwords is
\emph{very} unlikely.

If the stored password hash is compromised but the hash key is not, no attack
is possible except brute-force password guessing (with no advantage gained
due to compromise of the stored password hash).

If both the stored password hash and the hash key are compromised, the best
attack possible is a dictionary attack.  This may or may not be fruitful,
depending on the strength of password chosen by the user.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Internal Representation of Time}
%Subection tag:  rti0
\label{ctbg0:sddc0:srti0}

It is very common for a project to involve individuals from several
countries.  In this context, it is important to have no ambiguity
about the values of time recorded in a database.

The following design decisions have been made:

\begin{itemize}
\item All stored time values corresponding to events will
      be in \index{UTC}UTC.
\item In many contexts, the stored time values will also
      be presented in local time.  (However, 
      presentation in local time alone is strongly discouraged---UTC should be the norm
      for collaboration.)
\item The calendaring range for date functionality will be
      from 1900 through 2999.\footnote{This represents a
      span of approximately 34,713,000,000 seconds.  Although this
      exceeds the range of a 32-bit representation, it comfortably
      fits in a 64-bit representation.}
\end{itemize}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{\emph{PHP} Design Decisions and Discussion}
%Section tag:  php0
\label{ctbg0:sphp0}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Native Integer Size}
%Subsection tag:  nis0
\label{ctbg0:sphp0:snis0}

Prior to \emph{PHP} 5, \emph{PHP} integers were the same size as the native
integer of the underlying machine.  With some machines this is 32 bits, and with
other machines this is 64 bits.

In the case of a 32-bit integer, the range of values that can be represented
by an integer $x$ is

\begin{eqnarray}
-(2^{31})      & \leq \; x \; \leq & 2^{31} - 1\\
\nonumber -2,147,483,648 & \leq \; x \; \leq & 2,147,483,647 .
\end{eqnarray}

It is necessary to ensure that 64-bit integer arithmetic is available from 
\emph{PHP} for the following reasons:

\begin{itemize}
\item Certain of the database tables may exceed $2^{31}-1$ records or,
      due to addition and deletion of records, have key values that
      exceed $2^{31}-1$.
\item The number of seconds since the Unix epoch will exceed $2^{31}-1$ seconds
      in 2037 A.D\@.  \emph{\productbasename{}} must be able to perform
      date calculations further into the future than 2037 A.D., and so
      integers exceeding 32 bits in size would be convenient. 
\end{itemize}

To work around the limitations of 32-bit integers, the following strategy
is used:

\begin{itemize}
\item Integers that may exceed 32 bits are represented as strings rather than
      as integers, and \emph{PHP}'s \emph{bcmath} library is used to manipulate
      these strings.
\item In issuing \emph{MySQL} statements that specify integers larger than 32 bits,
      no special action is required, as an SQL statement is ultimately only a string.
      However, when obtaining result sets from \emph{MySQL} that may involve
      integers larger than 32 bits, special SQL statements that cast the portions of
      the result set to a string are used.
\end{itemize}

The strategy described above will work equally well with \emph{PHP} when 
64-bit integers are directly supported, but with slight inefficiency.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Web Interface Design Decisions and Discussion}
%Section tag:  wid0
\label{ctbg0:swid0}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Client IP Address Modified During Session}
%Subsection tag:  ipa0
\label{ctbg0:swid0:sipa0}

Newsgroup posters have identified that an IP address may shift during a
session (due to DHCP lease lifetimes and so on).  For this reason, an
IP address that shifts during a session will result in a warning in the
logs rather than termination of the session.\footnote{Termination of the
session is a forced immediate logout.}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{\emph{PHP}-Spawned Program Design Decisions and Discussion}
%Section tag:  phs0
\label{ctbg0:sphs0}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{\emph{cron}-Job Design Decisions and Discussion}
%Section tag:  cjd0
\label{ctbg0:scjd0}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\noindent\begin{figure}[!b]
\noindent\rule[-0.25in]{\textwidth}{1pt}
\begin{tiny}
\begin{verbatim}
$RCSfile: c_tbg0.tex,v $
$Source: /home/dashley/cvsrep/e3ft_gpl01/e3ft_gpl01/webprojs/pamc/gen_a/docs/manual/man_a/c_tbg0/c_tbg0.tex,v $
$Revision: 1.35 $
$Author: dashley $
$Date: 2009/11/01 02:42:55 $
\end{verbatim}
\end{tiny}
\noindent\rule[0.25in]{\textwidth}{1pt}
\end{figure}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%$Log: c_tbg0.tex,v $
%Revision 1.35  2009/11/01 02:42:55  dashley
%Edits.
%
%Revision 1.34  2007/07/13 02:38:04  dashley
%Edits.
%
%Revision 1.33  2007/07/13 01:19:57  dashley
%Edits.
%
%Revision 1.32  2007/07/11 14:31:40  dashley
%Edits.
%
%Revision 1.31  2007/07/07 23:12:54  dashley
%Edits.
%
%Revision 1.30  2007/07/07 03:49:58  dashley
%Edits.
%
%Revision 1.29  2007/07/07 03:31:53  dashley
%Edits.
%
%Revision 1.28  2007/07/04 22:41:27  dashley
%Edits.
%
%Revision 1.27  2007/07/04 05:09:24  dashley
%Edits.
%
%Revision 1.26  2007/07/02 01:32:50  dashley
%Edits.
%
%Revision 1.25  2007/07/01 03:07:14  dashley
%Edits.
%
%Revision 1.24  2007/06/24 20:21:55  dashley
%Edits.
%
%Revision 1.23  2007/06/24 16:35:48  dashley
%Edits.
%
%Revision 1.22  2007/06/24 02:16:50  dashley
%Edits.
%
%Revision 1.21  2007/06/24 00:42:20  dashley
%Edits.
%
%Revision 1.20  2007/06/23 22:16:10  dashley
%Safety checkin before graphics file renaming.
%
%Revision 1.19  2007/06/21 03:26:10  dashley
%Edits.
%
%Revision 1.18  2007/06/21 03:21:32  dashley
%Edits.
%
%Revision 1.17  2007/06/21 03:09:39  dashley
%Edits.
%
%Revision 1.16  2007/06/14 01:59:36  dashley
%Edits.
%
%Revision 1.15  2007/06/10 18:03:20  dashley
%Edits.
%
%Revision 1.14  2007/06/10 16:04:12  dashley
%Edits.
%
%Revision 1.13  2007/06/10 05:01:40  dashley
%Edits.
%
%Revision 1.12  2007/06/10 03:42:30  dashley
%Edits.
%
%Revision 1.11  2007/06/10 02:30:17  dashley
%Initial checkin.phpstatic01.dsf
%
%Revision 1.10  2007/06/07 20:05:20  dashley
%Edits.
%
%Revision 1.9  2007/06/07 05:03:20  dashley
%Edits.
%
%Revision 1.8  2007/06/06 01:55:08  dashley
%Multiply-defined label corrected.
%
%Revision 1.7  2007/06/06 00:32:07  dashley
%Edits.
%
%Revision 1.6  2007/06/05 15:47:12  dashley
%Structural edits.
%
%Revision 1.5  2007/06/05 00:39:55  dashley
%Edits.
%
%Revision 1.4  2007/06/04 03:26:55  dashley
%Edits.
%
%Revision 1.3  2007/06/03 23:15:27  dashley
%Edits.
%
%Revision 1.2  2007/06/03 07:57:10  dashley
%Edits.
%
%Revision 1.1  2007/06/03 07:16:08  dashley
%Initial checkin.
%
%End of $RCSfile: c_tbg0.tex,v $.