1 files changed, 1107 insertions, 0 deletions
diff --git a/baseline/source/huff_dec/compress.txt b/baseline/source/huff_dec/compress.txt
new file mode 100644
index 0000000..5966bcf
--- /dev/null
+++ b/baseline/source/huff_dec/compress.txt
@@ -0,0 +1,1107 @@
+===========================================================+
+|     Introduction to the losslessy compression schemes     |
+|           Description of the codec source codes           |
+-----------------------------------------------------------+
+| From David Bourgin (E-mail: david.bourgin@ufrima.imag.fr) |
+| Date: 22/9/94                                             |
+===========================================================+
+                          ------ BE CARE ------
+This file (compress.txt) is copyrighted. (c) David Bourgin - 1994
+Permission to use this documentation for any purpose other than
+its incorporation into a commercial product is hereby granted without fee.
+Permission to copy and distribute this documentation only for non-commercial use
+is also granted without fee, provided, however, that the above copyright notice
+appears in all copies, that both that copyright notice and this permission notice appear in supporting documentation. The author makes no representations about
+the suitability of this documentation for any purpose. It is provided "as is"
+without express or implied warranty. 
+The source codes you obtain with this file are *NOT* covered by the same
+copyright, because you can include them for both commercial and non-commercial
+use. See below for more infos.
+The source code files (codrl1.c, dcodrl1.c, codrle2.c, dcodrle2.c, codrle3.c,
+dcodrle3.c, codrle4.c, dcodrle4.c, codhuff.c, dcodhuff.c) are copyrighted.
+They have been uploaded on ftp in turing.imag.fr (129.88.31.7):/pub/compression
+on 22/5/94 and have been modified on 22/9/94.
+(c) David Bourgin - 1994
+The source codes I provide have no buggs (!) but being that I make them
+available for free I have some notes to make. They can change at any time
+without notice. I assume no responsability or liability for any errors or
+inaccurracies, make no warranty of any kind (express, implied or statutory)
+with respect to this publication and expressly disclaim any and all warranties
+of merchantability, fitness for particular purposes. Of course, if you have
+some problems to use the information presented here, I will try to help you if
+I can.
+If you include the source codes in your application, here are the conditions:
+- You have to put my name in the header of your source file (not in the
+excutable program if you don't want) (this item is a must)
+- I would like to see your resulting application, if possible (this item is not
+a must, because some applications must remain secret)
+- Whenever you gain money with your application, I would like to receive a very
+little part in order to be encouraged to update my source codes and to develop
+new schemes (this item is not a must)
+                          ---------------------
+There are several means to compress data. Here, we are only going to deal with
+the losslessy schemes. These schemes are also called non-destructive because
+you always recover the initial data you had, and this, as soon as you need them.
+With losslessy schemes, you won't never lose any informations (except perhaps
+when you store or transmit your data but this is another problem...).
+In this introduction, we are going to see:
+- The RLE scheme (with different possible algorithms)
+- The Huffman schemes (dynamical scheme)
+- And the LZW scheme
+For the novice, a compresser is a program able to read several data (e.g. bytes)
+in input and to write several data in output. The data you obtain from the
+output (also called compressed data) will - of course - take less space than
+the the input data. This is true in most of cases, if the compresser works
+and if the type of the data is correct to be compressed with the given scheme.
+The codec (coder-decoder) enables you to save space on your hard disk and/or
+to save the communication costs because you always store/transmit the compressed
+data. You'll use the decompresser as soon as you need to recover your initial
+useful data. Note that the compressed data are useless if you have not
+the decoder...
+You are doubtless asking "How can I reduce the data size without losing some
+informations?". It's easy to answer to this question. I'll only take an example.
+I'm sure you have heard about the morse. This system established in the 19th
+century use a scheme very close to the huffman one. In the morse you encode
+the letters to transmit with two kinds of signs. If you encode these two sign
+possibilities in one bit, the symbol 'e' is transmitted in a single bit and
+the symbols 'y' and 'z' need four bits. Look at the symbols in the text you are
+reading, you'll fast understand the compression ratio...
+Important: The source codes associated to the algorithms I present are
+completely adaptative on what you need to compress. They all use basical
+macros on the top of the file. Usually the macros to change are:
+- beginning_of_data
+- end_of_data
+- read_byte
+- read_block
+- write_byte
+- write_block
+These allow the programmer to modify only a little part of the header
+of the source codes in order to compress as well memory as files.
+beginning_of_data(): Macro used to set the program so that the next read_byte()
+call will read the first byte to compress.
+end_of_data(): Returns a boolean to know whether there is no more bytes to read
+from the input stream. Return 0 if there is no more byte to compress, another
+non-zero value otherwise.
+read_byte(): Returns a byte read from the input stream if available.
+write_byte(x): Writes the byte 'x' to the output stream.
+read_block(...) and write_block(...): Same use as read_byte and write_byte(x)
+but these macros work on blocks of bytes and not only on a single byte.
+If you want to compress *from* the memory, before entering in a xxxcoding
+procedure ('xxx' is the actual extension to replace with a given codec), you
+have to add a pointer set up to the beginning of the zone to compress. Note
+that the following pointer 'source_memory_base' is not to add, it is just given
+here to specify a name to the address of the memory zone you are going to
+encode or decode. That is the same about source_memory_end which can be either
+a pointer to create or an existing pointer.
+unsigned char *source_memory_base, /* Base of the source memory */
+              *source_memory_end,  /* Last address to read.
+           source_memory_end=source_memory_base+source_zone_length-1 */
+              *source_ptr;         /* Used in the xxxcoding procedure */
+void pre_start()
+{ source_ptr=source_memory_base;
+  xxxcoding();
+}
+end_of_data() and read_byte() are also to modify to compress *from* memory:
+#define end_of_data()  (source_ptr>source_memory_end)
+#define read_byte()  (*(source_ptr++))
+If you want to compress *to* memory, before entering in a xxxcoding procedure
+('xxx' is the actual extension to replace with a given codec), you have to add
+a pointer. Note that the pointer 'dest_memory_base' is not to add, it is just
+given there to specify the address of the destination memory zone you are
+going to encode or decode.
+unsigned char *dest_memory_base, /* Base of the destination memory */
+              *dest_ptr;         /* Used in the xxxcoding procedure */
+void pre_start()
+{ dest_ptr=dest_memory_base;
+  xxxcoding();
+}
+Of course, you can combine both from and to memory in the pre_start() procedure.
+The files dest_file and source_file handled in the main() function are
+to remove...
+void pre_start()
+{ source_ptr=source_memory_base;
+  dest_ptr=dest_memory_base;
+  xxxcoding();
+}
+In fact, to write to memory, the problem is in the write_byte(x) procedure.
+This problem exists because your destination zone can either be a static
+zone or a dynamically allocated zone. In the two cases, you have to check
+if there is no overflow, especially if the coder is not efficient and must
+produce more bytes than you reserved in memory.
+In the first case, with a *static* zone, write_byte(x) macro should look like
+that:
+unsigned long int dest_zone_length,
+                  current_size;
+#define write_byte(x)  { if (current_size==dest_zone_length) \
+                            exit(1); \
+                         dest_ptr[current_size++]=(unsigned char)(x); \
+                       }
+In the static version, the pre_start() procedure is to modify as following:
+void pre_start()
+{ source_ptr=source_memory_base;
+  dest_ptr=dest_memory_base;
+  dest_zone_length=...; /* Set up to the actual destination zone length */
+  current_size=0; /* Number of written bytes */
+  xxxcoding();
+}
+Otherwise, dest_ptr is a zone created by the malloc instruction and you can try
+to resize the allocated zone with the realloc instruction. Note that I increment
+the zone one kilo-bytes by one kylo-bytes. You have to add two other variables:
+unsigned long int dest_zone_length,
+                  current_size;
+#define write_byte(x)  { if (current_size==dest_zone_length) \
+                            { dest_zone_length += 1024; \
+                              if ((dest_ptr=(unsigned char *)realloc(dest_ptr,dest_zone_length*sizeof(unsigned char)))==NULL) \
+                                 exit(1); /* You can't compress in memory \
+                                               => I exit but *you* can make a routine to swap on disk */ \
+                            } \
+                         dest_ptr[current_size++]=(unsigned char)(x); \
+                       }
+With the dynamically allocated version, change the pre_start() routine as following:
+void pre_start()
+{ source_ptr=source_memory_base;
+  dest_ptr=dest_memory_base;
+  dest_zone_length=1024;
+  if ((dest_ptr=(unsigned char *)malloc(dest_zone_length*sizeof(unsigned char)))==NULL)
+     exit(1); /* You need at least 1 kb in the dynamical memory ! */
+  current_size=0; /* Number of written bytes */
+  xxxcoding();
+  /* Handle the bytes in dest_ptr but don't forget to free these bytes with:
+     free(dest_ptr);
+  */
+}
+The previously given macros work as:
+void demo()       /* The file opening, closing and variables
+                     must be set up by the calling procedure */
+{ unsigned char byte;
+                  /* And not 'char byte' (!) */
+  while (!end_of_data())
+        { byte=read_byte();
+          printf("Byte read=%c\n",byte);
+        }
+}
+You must not change the rest of the program unless you're really sure and
+really need to do it!
+==========================================================+
+|                     The RLE encoding                     |
+==========================================================+
+RLE is an acronym that stands for Run Length Encoding. You may encounter it
+as an other acronym: RLC, Run Length Coding.
+The idea in this scheme is to recode your data with regard to the repetition
+frames. A frame is one or more bytes that occurr one or several times.
+There are several means to encode occurrences. So, you'll have several codecs.
+For example, you may have a sequence such as:
+0,0,0,0,0,0,255,255,255,2,3,4,2,3,4,5,8,11
+Some codecs will only deal with the repetitions of '0' and '255' but some other
+will deal with the repetitions of '0', '255', and '2,3,4'.
+You have to keep in your mind something important based on this example. A codec
+won't work on all the data you will try to compress. So, in case of non
+existence of sequence repetitions, the codecs based on RLE schemes must not
+display a message to say: "Bye bye". Actually, they will try to encode these
+non repeted data with a value that says "Sorry, I only make a copy of the inital
+input". Of course, a copy of the input data with an header in front of this copy
+will make a biggest output data but if you consider the whole data to compress,
+the encoding of repeated frames will take less space than the encoding
+of non-repeated frames.
+All of the algorithms with the name of RLE have the following look with three
+or four values:
+- Value saying if there's a repetition
+- Value saying how many repetitions (or non repetition)
+- Value of the length of the frame (useless if you just encode frame
+with one byte as maximum length)
+- Value of the frame to repeat (or not)
+I gave four algorithms to explain what I say.
+*** First RLE scheme ***
+The first scheme is the simpliest I know, and looks like the one used in MAC
+system (MacPackBit) and some image file formats such as Targa, PCX, TIFF, ...
+Here, all compressed blocks begin with a byte, named header, which description
+is:
+Bits   7 6 5 4 3 2 1 0
+Header X X X X X X X X
+Bits 7: Compression status (1=Compression applied)
+     0 to 6: Number of bytes to handle
+So, if the bit 7 is set up to 0, the 0 to 6 bits give the number of bytes
+that follow (minus 1, to gain more over compress) and that were not compressed
+(native bytes). If the bit 7 is set up to 1, the same 0 to 6 bits give
+the number of repetition (minus 2) of the following byte.
+As you see, this method only handle frame with one byte.
+Additional note: You have 'minus 1' for non-repeated frames because you must
+have at least one byte to compress and 'minus 2' for repeated frames because the
+repetition must be 2, at least.
+Compression scheme:
+              First byte=Next
+                    /\
+                   /  \
+Count the byte         Count the occurrence of NON identical
+occurrences            bytes (maximum 128 times)
+(maximum 129 times)    and store them in an array
+        |                        |
+        |                        |
+  1 bit '1'                 1 bit '0'
+ 7 bits giving           + 7 bits giving
+  the number (-2)           the number (-1)
+  of repetitions            of non repetition
+ repeated byte           + n non repeated bytes
+        |                        |
+ 1xxxxxxx,yyyyyyyy        0xxxxxxx,n bytes
+[-----------------]      [----------------]
+Example:
+Sequence of bytes to encode | Coded values | Differences with compression
+                            |              |         (unit: byte)
+-------------------------------------------------------------------------
+       255,15,              |  1,255,15,   |            -1
+       255,255,             |    128,255,  |             0
+        15,15,              |    128,15,   |             0
+     255,255,255,           |   129,255,   |            +1
+       15,15,15,            |    129,15,   |            +1
+   255,255,255,255,         |   130,255,   |            +2
+     15,15,15,15            |    130,15    |            +2
+See codecs source codes: codrle1.c and dcodrle1.c
+*** Second RLE scheme ***
+In the second scheme of RLE compression you look for the less frequent byte
+in the source to compress and use it as an header for all compressed block.
+In the best cases, the occurrence of this byte is zero in the data to compress.
+Two possible schemes, firstly with handling frames with only one byte,
+secondly with handling frames with one byte *and* more. The first case is
+the subject of this current compression scheme, the second is the subject
+of next compression scheme.
+For the frame of one byte, header byte is written in front of all repetition
+with at least 4 bytes. It is then followed by the repetition number minus 1 and
+the repeated byte.
+Header byte, Occurrence number-1, repeated byte
+If a byte don't repeat more than tree times, the three bytes are written without
+changes in the destination stream (no header nor length, nor repetition in front
+or after theses bytes).
+An exception: If the header byte appears in the source one, two, three and up
+times, it'll be respectively encoded as following:
+- Header byte, 0
+- Header byte, 1
+- Header byte, 2
+- Header byte, Occurrence number-1, Header byte
+Example, let's take the previous example. A non frequent byte is zero-ASCII
+because it never appears.
+Sequence of bytes to encode | Coded values | Differences with compression
+                            |              |         (unit: byte)
+-------------------------------------------------------------------------
+       255,15,              |    255,15,   |            -1
+       255,255,             |   255,255,   |             0
+        15,15,              |     15,15,   |             0
+     255,255,255,           | 255,255,255, |             0
+       15,15,15,            |   15,15,15,  |             0
+   255,255,255,255,         |   0,3,255,   |            -1
+     15,15,15,15            |    0,3,15    |            -1
+If the header would appear, we would see:
+Sequence of bytes to encode | Coded values | Differences with compression
+                            |              |         (unit: byte)
+-------------------------------------------------------------------------
+          0,                |      0,0,    |            +1
+         255,               |      255,    |             0
+         0,0,               |      0,1,    |             0
+         15,                |      15,     |             0
+        0,0,0,              |      0,2,    |            -1
+         255,               |      255,    |             0
+       0,0,0,0              |     0,3,0    |            -1
+See codecs source codes: codrle2.c and dcodrle2.c
+*** Third RLE scheme ***
+It's the same idea as the second scheme but we can encode frames with
+more than one byte. So we have three cases:
+- If it was the header byte, whatever is its occurrence, you encode it with:
+Header byte,0,number of occurrence-1
+- For frames which (repetition-1)*length>3, encode it as:
+Header byte, Number of frame repetition-1, frame length-1,bytes of frame
+- If no previous cases were detected, you write them as originally (no header,
+nor length, nor repetition in front or after theses bytes).
+Example based on the previous examples:
+Sequence of bytes to encode |   Coded values   | Differences with compression
+                            |                  |         (unit: byte)
+-----------------------------------------------------------------------------
+           255,15,          |      255,15,     |             0
+           255,255,         |     255,255,     |             0
+            15,15,          |       15,15,     |             0
+         255,255,255,       |   255,255,255,   |             0
+           15,15,15,        |     15,15,15,    |             0
+       255,255,255,255,     | 255,255,255,255, |             0
+         15,15,15,15,       |   15,15,15,15,   |             0
+      16,17,18,16,17,18,    |16,17,18,16,17,18,|             0
+     255,255,255,255,255,   |    0,4,0,255,    |            -1
+       15,15,15,15,15,      |     0,4,0,15,    |            -1
+ 16,17,18,16,17,18,16,17,18,|  0,2,2,16,17,18, |            -3
+  16,17,18,19,16,17,18,19   |0,1,3,16,17,18,19 |            -1
+If the header (value 0) would be met, we would see:
+Sequence of bytes to encode | Coded values  | Differences with compression
+                            |               |         (unit: byte)
+--------------------------------------------------------------------------
+          0,                |     0,0,0,    |            +2
+         255,               |      255,     |             0
+         0,0,               |     0,0,1,    |            +1
+          15,               |       15,     |             0
+        0,0,0,              |     0,0,2,    |             0
+         255,               |      255,     |             0
+       0,0,0,0              |     0,0,3     |            -1
+See codecs source codes: codrle3.c and dcodrle3.c
+*** Fourth RLE scheme ***
+This last RLE algorithm better handles repetitions of any kind (one byte
+and more) and non repetitions, including few non repetitions, and does not
+read the source by twice as RLE type 3.
+Compression scheme is:
+                  First byte=Next byte?
+                           /\
+                      Yes /  \ No
+                         /    \
+                 1 bit '0'     1 bit '1'
+                       /        \
+                      /          \
+       Count the                    Motif of several
+       occurrences                  repeated  byte?
+       of 1 repeated                ( 65 bytes repeated
+       byte (maximum                257 times maxi)
+       16449 times)                           /\
+            /\                               /  \
+           /  \                             /    \
+          /    \                           /      \
+         /      \                         /        \
+  1 bit '0'       1 bit '1'        1 bit '0'          1 bit '1'
+ 6 bits        + 14 bits        + 6 bits of              |
+giving the      giving the       the length      Number of non repetition
+length (-2)     length (-66)     of the motif         (maximum 8224)
+of the          of the           + 8 bits of               /\
+repeated byte   repeated byte    the number (-2)     < 33 /  \ > 32
+ repeated byte + repeated byte  of repetition           /    \
+    |                |           + bytes of the   1 bit '0'       1 bit '1'
+    |                |           motif          + 5 bits of     + 13 bits
+    |                |               |          the numer (-1)  of the
+    |                |               |          of non          number (-33)
+    |                |               |          repetition      of repetition
+    |                |               |          + non           + non
+    |                |               |          repeated        repeated
+    |                |               |          bytes           bytes
+    |                |               |             |               |
+    |                |               |             |  111xxxxx,xxxxxxxx,n bytes
+    |                |               |             | [-------------------------]
+    |                |               |             |
+    |                |               |      110xxxxx,n bytes
+    |                |               |     [----------------]
+    |                |               |
+    |                |  10xxxxxx,yyyyyyyy,n bytes
+    |                | [-------------------------]
+    |                |
+    |   01xxxxxx,xxxxxxxx,1 byte
+    |  [------------------------]
+    |
+ 00xxxxxx,1 byte
+[---------------]
+Example, same as previously:
+Sequence of bytes to encode |    Coded values     | Differences with compression
+                            |                     |         (unit: byte)
+--------------------------------------------------------------------------
+       255,15               |   11000001b,255,15, |             +1
+       255,255              |   00000000b,255,    |              0
+        15,15               |    00000000b,15,    |              0
+     255,255,255            |   00000001b,255,    |             -1
+       15,15,15             |    00000001b,15,    |             -1
+   255,255,255,255          |    00000010b,255,   |             -2
+     15,15,15,15            |     00000010b,15,   |             -2
+  16,17,18,16,17,18         |10000001b,0,16,17,18,|             -1
+ 255,255,255,255,255        |   00000011b,255,    |             -3
+   15,15,15,15,15           |    00000011b,15,    |             -3
+ 16,17,18,16,17,18,16,17,18 | 10000001b,16,17,18, |             -4
+  16,17,18,19,16,17,18,19   |10000010b,16,17,18,19|             -2
+==========================================================+
+|                   The Huffman encoding                   |
+==========================================================+
+This method comes from the searcher who established the algorithm in 1952.
+This method allows both a dynamic and static statistic schemes. A statistic
+scheme works on the data occurrences. It is not as with RLE where you had
+a consideration of the current occurrence of a frame but rather a consideration
+of the global occurrences of each data in the input stream. In this last case,
+frames can be any kinds of sequences you want. On the other hand, Huffman
+static encoding appears in some compressers such as ARJ on PCs. This enforces
+the encoder to consider every statistic as the same for all the data you have.
+Of course, the results are not as good as if it were a dynamic encoding.
+The static encoding is faster than the dynamic encoding but the dynamic encoding
+will be adapted to the statistic of the bytes of the input stream and will
+of course become more efficient by producing shortest output.
+The main idea in Huffman encoding is to re-code every byte with regard to its
+occurrence. The more frequent bytes in the data to compress will be encoded with
+less than 8 bits and the others could need 8 bits see even more to be encoded.
+You immediately see that the codes associated to the different bytes won't have
+identical size. The Huffman method will actually require that the binary codes
+have not a fixed size. We speak then about variable length codes.
+The dynamical Huffman scheme needs the binary trees for the encoding. This
+enables you to obtain the best codes, adapted to the source data.
+The demonstration won't be given there. To help the neophyt, I will just explain
+what is a binary tree.
+A binary tree is special fashion to represent the data. A binary tree is
+a structure with an associated value with two pointers. The term of binary has
+been given because of the presence of two pointers. Because of some conventions,
+one of the pointer is called left pointer and the second pointer is called right
+pointer. Here is a visual representation of a binary tree.
+     Value
+      / \
+     /   \
+ Value    Value
+  / \      / \
+... ...  ... ...
+One problem with a binary encoding is a prefix problem. A prefix is the first
+part of the representation of a value, e.g. "h" and "he" are prefixes of "hello"
+but not "el". To understand the problem, let's code the letters "A", "B", "C",
+"D", and "E" respectively as 00b, 01b, 10b, 11b, and 100b. When you read
+the binary sequence 00100100b, you are unable to say if this comes from "ACBA"
+or "AEE". To avoid such situations, the codes must have a prefix property.
+And the letter "E" mustn't begin with the sequence of an other code. With "A",
+"B", "C", "D", and "E" respectively affected with 1b, 01b, 001b, 0001b, and
+0000b, the sequence 1001011b will only be decoded as "ACBA".
+ 1      0
+<-  /\  ->
+   /  \
+ "A"  /\
+    "B" \
+        /\
+      "C" \
+          /\
+       "D"  "E"
+As you see, with this tree, an encoding will have the prefix property
+if the bytes are at the end of each "branch" and you have no byte at the "node".
+You also see that if you try to reach a character by the right pointer you add
+a bit set to 0 and by the left pointer, you add a bit set to 1 to the current
+code. The previous *bad* encoding provide the following bad tree:
+       /\
+      /  \
+     /    \
+    /\    /\
+   /  \ "B" "A"
+  /    \
+"D"  "C"\
+      /  \
+         "E"
+You see here that the coder shouldn't put the "C" at a node...
+As you see, the largest binary code are those with the longest distance
+from the top of the tree. Finally, the more frequent bytes will be the highest
+in the tree in order you have the shortest encoding and the less frequent bytes
+will be the lowest in the tree.
+From an algorithmic point of view, you make a list of each byte you encountered
+in the stream to compress. This list will always be sorted. The zero-occurrence
+bytes are removed from this list. You take the two bytes with the smallest
+occurrences in the list. Whenever two bytes have the same "weight", you take two
+of them regardless to their ASCII value. You join them in a node. This node will
+have a fictive byte value (256 will be a good one!) and its weight will be
+the sum of the two joined bytes. You replace then the two joined bytes with
+the fictive byte. And you continue so until you have one byte (fictive or not)
+in the list. Of course, this process will produce the shortest codes if the list
+remains sorted. I will not explain with arcana hard maths why the result
+is a set of the shortest bytes...
+Important: I use as convention that the right sub-trees have a weight greater
+or equal to the weight of the left sub-trees.
+Example: Let's take a file to compress where we notice the following
+occurrences:
+Listed bytes | Frequences (Weight)
+----------------------------------
+      0      |        338
+     255     |        300
+      31     |        280
+      77     |         24
+     115     |         21
+      83     |         20
+     222     |         5
+We will begin by joining the bytes 83 and 222. This will produce a fictive node
+1 with a weight of 20+5=25.
+(Fictive 1,25)
+      /\
+     /  \
+(222,5) (83,20)
+Listed bytes | Frequences (Weight)
+----------------------------------
+      0      |        338
+     255     |        300
+      31     |        280
+  Fictive 1  |         25
+      77     |         24
+     115     |         21
+Note that the list is sorted... The smallest values in the frequences are 21 and
+24. That is why we will take the bytes 77 and 115 to build the fictive node 2.
+(Fictive 2,45)
+      /\
+     /  \
+(115,21) (77,25)
+Listed bytes | Frequences (Weight)
+----------------------------------
+      0      |        338
+     255     |        300
+      31     |        280
+  Fictive 2  |         45
+  Fictive 1  |         25
+The nodes with smallest weights are the fictive 1 and 2 nodes. These are joined
+to build the fictive node 3 whose weight is 40+25=70.
+        (Fictive 3,70)
+             /   \
+           /       \
+         /           \
+       /\            / \
+     /   \          /    \
+(222,5)  (83,20) (115,21) (77,25)
+Listed bytes | Frequences (Weight)
+----------------------------------
+      0      |        338
+     255     |        300
+      31     |        280
+  Fictive 3  |         70
+The fictive node 3 is linked to the byte 31. Total weight: 280+70=350.
+             (Fictive 4,350)
+                   /   \
+                 /       \
+               /           \
+             /  \       (31,280)
+           /      \
+         /          \
+       /\            / \
+     /   \          /    \
+(222,5)  (83,20) (115,21) (77,25)
+Listed bytes | Frequences (Weight)
+----------------------------------
+  Fictive 4  |        350
+      0      |        338
+     255     |        300
+As you see, being that we sort the list, the fictive node 4 has become the first
+of the list. We join the bytes 0 and 255 in a same fictive node, the number 5
+whose weight is 338+300=638.
+(Fictive 5,638)
+        /\
+       /  \
+(255,300) (0,338)
+Listed bytes | Frequences (Weight)
+----------------------------------
+  Fictive 5  |        638
+  Fictive 4  |        350
+The fictive nodes 4 and 5 are finally joined. Final weight: 638+350=998 bytes.
+It is actually the total byte number in the initial file: 338+300+24+21+20+5.
+                         (Tree,998)
+                       1     /  \     0
+                      <-   /      \   ->
+                         /          \
+                       /              \
+                     /                  \
+                   /   \                / \
+                 /       \             /    \
+               /           \          /       \
+             /  \       (31,280)  (255,300) (0,338)
+           /      \
+         /          \
+       /\            / \
+     /   \          /    \
+(222,5)  (83,20) (115,21) (77,25)
+Bytes | Huffman codes | Frequences | Binary length*Frequence
+------------------------------------------------------------
+  0   |       00b     |     338    |           676
+ 255  |       01b     |     300    |           600
+  31  |       10b     |     280    |           560
+  77  |      1101b    |      24    |            96
+ 115  |      1100b    |      21    |            84
+  83  |      1110b    |      20    |            80
+ 222  |      1111b    |      5     |            20
+Results: Original file size: (338+300+280+24+21+20+5)*8=7904 bits (=998 bytes)
+versus 676+600+560+96+84+80+20=2116 bits, i.e. 2116/8=265 bytes.
+Now you know how to code an input stream. The last problem is to decode all this
+stuff. Actually, when you meet a binary sequence you can't say whether it comes
+from such byte list or such other one. Furthermore, if you change the occurrence
+of one or two bytes, you won't obtain the same resulting binary tree. Try for
+example to encode the previous list but with the following occurrences:
+Listed bytes | Frequences (Weight)
+----------------------------------
+     255     |        418
+      0      |        300
+      31     |        100
+      77     |         24
+     115     |         21
+      83     |         20
+     222     |         5
+As you can observe it, the resulting binary tree is quite different, we had yet
+the same initial bytes. To not be in such a situation we will put an header
+in front of all data. I can't comment longly this header but I can say
+I minimize it as much as I could. The header is divided into two parts.
+The first part of this header looks closely to a boolean table (coded more or
+less in binary to save space) and the second part provide to the decoder
+the binary code associated to each byte encountered in the original input
+stream.
+Here is a summary of the header:
+First part
+----------
+                        First bit
+                          /  \
+                      1 /      \ 0
+                      /          \
+  256 bits set to 0 or 1        5 bits for the number n (minus 1)
+  depending whether the         of bytes encountered
+  corresponding byte was        in the file to compres
+  in the file to compress                   |
+  (=> n bits set to 1,                     \ /
+   n>32)                        n values of 8-bits (n<=32)
+                     \           /
+                       \       /
+                         \   /
+Second part                |
+-----------                |
+                           |
+            +------------->|
+(n+1) times |              |
+(n bytes of |          First bit?
+the values  |            /   \
+encountered |         1 /      \ 0
+in the      |          /        \ 
+source file |   8 bits of      5 bits of the 
+ the code  | the length       length (-1)
+of a        | (-1) of the      of the following
+fictive     | following        binary
+byte        | binary code      code
+to stop the | (length>32)      (length<=32)
+decoding.   |          \       /
+The fictive |           \     /
+is set to   |            \   /
+256 in the  |              |
+Huffman     |         binary code
+-tree of    |              |
+encoding)   +--------------|
+                           |
+            Binary encoding of the source file
+                           |
+                  Code of end of encoding
+                           |
+With my codecs I can handle binary sequences with a length of 256 bits.
+This correspond to encode all the input stream from one byte to infinite length.
+In fact if a byte had a range from 0 to 257 instead of 0 to 255, I would have a
+bug with my codecs with an input stream of at least 370,959,230,771,131,880,927,
+453,318,055,001,997,489,772,178,180,790,105 bytes !!!
+Where come this explosive number? In fact, to have a severe bug, I must have
+a completely unbalanced tree:
+   Tree
+    /\
+      \
+      /\
+        \
+        /\
+          \
+          ...
+           /\
+             \
+             /\
+Let's take the following example:
+Listed bytes | Frequences (Weight)
+----------------------------------
+      32     |         5
+      101    |         3
+      97     |         2
+      100    |         1
+      115    |         1
+This produces the following unbalanced tree:
+    Tree
+     /\
+(32,5) \
+       /\
+ (101,3) \
+         /\
+   (97,2)  \
+           /\
+    (115,1)  (100,1)
+Let's speak about a mathematical series: The Fibonacci series. It is defined as
+following:
+{ Fib(0)=0
+{ Fib(1)=1
+{ Fib(n)=Fib(n-2)+Fib(n-1)
+Fib(0)=0, Fib(1)=1, Fib(2)=1, Fib(3)=2, Fib(4)=3, Fib(5)=5, Fib(6)=8, Fib(7)=13,
+etc.
+But 1, 1, 2, 3, 5, 8 are the occurrences of our list! We can actually
+demonstrate that to have an unbalanced tree, we have to take a list with
+an occurrence based on the Fibonacci series (these values are minimal).
+If the data to compress have m different bytes, when the tree is unbalanced,
+the longest code need m-1 bits. In our little previous example where m=5,
+the longest codes are associated to the bytes 100 and 115, respectively coded
+0001b and 0000b. We can also say that to have an unbalanced tree we must have
+at least 5+3+2+1+1=12=Fib(7)-1. To conclude about all that, with a coder that
+uses m-1 bits, you must never have an input stream size over than Fib(m+2)-1,
+otherwise, there could be a bug in the output stream. Of course, with my codecs
+there will never be a bug because I can deal with binary code sizes of 1 to 256
+bits. Some encoder could use that with m=31, Fib(31+2)-1=3,524,577 and m=32,
+Fib(32+2)-1=5,702,886. And an encoder that uses unisgned integer of 32 bits
+shouldn't have a bug until about 4 Gb...
+==========================================================+
+|                     The LZW encoding                     |
+==========================================================+
+The LZW scheme is due to three searchers, i.e. Abraham Lempel and Jacob Ziv
+worked on it in 1977, and Terry Welch achieved this scheme in 1984.
+LZW is patented in USA. This patent, number 4,558,302, is covered by Unisys
+Corporation. You can usually write (without fees) software codecs which use
+the LZW scheme but hardware companies can't do so. You may get a limited
+licence by writting to:
+Welch Licencing Department
+Office of the General Counsel
+M/S C1SW19
+Unisys corporation
+Blue Bell
+Pennsylvania, 19424 (USA)
+If you're occidental, you are surely using an LZW encoding every time you are
+speaking, especially when you use a dictionary. Let's consider, for example,
+the word "Cirrus". As you read a dictionary, you begin with "A", "Aa", and so
+on. But a computer has no experience and it must suppose that some words
+already exist. That is why with "Cirrus", it supposes that "C", "Ci", "Cir",
+"Cirr", "Cirru", and "Cirrus" exist. Of course, being that this is a computer,
+all these words are encoded as index numbers. Every time you go forward, you add
+a new number associated to the new word. Being that a computer is byte-based
+and not alphabetic-based, you have an initial dictionary of 256 letters instead
+of our 26 ('A' to 'Z') letters.
+Example: Let's code "XYXYZ". First step, "X" is recognized in the initial
+dictionary of 256 letters as the 89th. Second step, "Y" is read. Does "XY"
+exist? No, then "XY" is stored as the word 256. You write in the output stream
+the ASCII of "X", i.e. 88. Now "YX" is tested as not referenced in the current
+dictionary. It is stored as the word 257. You write now in the output stream 89
+(ASCII of "Y"). "XY" is now met. But now "XY" is known as the reference 256.
+Being that "XY" exists, you test the sequence with one more letter, i.e. "XYZ".
+This last word is not referenced in the current dictionary. You write then the
+value 256. Finally, you reach the last letter ("Z"). You add "YZ" as the
+reference 258 but it is the last letter. That is why you just write the value
+90 (ASCII of "Z").
+Another encoding sample with the string "ABADABCCCABCEABCECCA".
+----+-----+---------------+------+----------+-------------------------+------+
+|Step|Input|Dictionary test|Prefix|New symbol|Dictionary               |Output|
+|    |     |               |      |          |D0=ASCII with 256 letters|      |
+----+-----+---------------+------+----------+-------------------------+------+
+|  1 | "A" |"A" in D0      | "A"  |    "B"   | D1=D0                   |  65  |
+|    | "B" |"AB" not in D0 |      |          | and "AB"=256            |      |
+----+-----+---------------+------+----------+-------------------------+------+
+|  2 | "A" |"B" in D1      | "B"  |    "A"   | D2=D1                   |  66  |
+|    |     |"BA" not in D1 |      |          | and "BA"=257            |      |
+----+-----+---------------+------+----------+-------------------------+------+
+|  3 | "D" |"A" in D2      | "A"  |    "D"   | D3=D2                   |  65  |
+|    |     |"AD" not in D2 |      |          | and "AD"=258            |      |
+----+-----+---------------+------+----------+-------------------------+------+
+|  4 | "A" |"D" in D3      | "D"  |    "A"   | D4=D3                   |  68  |
+|    |     |"DA" not in D3 |      |          | and "DA"=259            |      |
+----+-----+---------------+------+----------+-------------------------+------+
+|  5 | "B" |"A" in D4      | "AB" |    "C"   | D5=D4                   |  256 |
+|    | "C" |"AB" in D4     |      |          | and "ABC"=260           |      |
+|    |     |"ABC" not in D4|      |          |                         |      |
+----+-----+---------------+------+----------+-------------------------+------+
+|  6 | "C" |"C" in D5      | "C"  |    "C"   | D6=D5                   |  67  |
+|    |     |"CC" not in D5 |      |          | and "CC"=261            |      |
+----+-----+---------------+------+----------+-------------------------+------+
+|  7 | "C" |"C" in D6      | "CC" |    "A"   | D7=D6                   |  261 |
+|    | "A" |"CC" in D6     |      |          | and "CCA"=262           |      |
+|    |     |"CCA" not in D6|      |          |                         |      |
+----+-----+---------------+------+----------+-------------------------+------+
+|  8 | "B" |"A" in D7      | "ABC"|    "E"   | D8=D7                   |  260 |
+|    | "C" |"AB" in D7     |      |          | and "ABCE"=263          |      |
+|    | "E" |"ABC" in D7    |      |          |                         |      |
+|    |    <"ABCE" not in D7|      |          |                         |      |
+----+-----+---------------+------+----------+-------------------------+------+
+|  9 | "A" |"E" in D8      | "E"  |    "A"   | D9=D8                   |  69  |
+|    |     |"EA" not in D8 |      |          | and "EA"=264            |      |
+----+-----+---------------+------+----------+-------------------------+------+
+| 10 | "B" |"A" in D9      |"ABCE"|    "C"   | D10=D9                  |  263 |
+|    | "C" |"AB" in D9     |      |          | and "ABCEC"=265         |      |
+|    | "E" |"ABC" in D9    |      |          |                         |      |
+|    | "C" |"ABCE" in D9   |      |          |                         |      |
+|    |    <"ABCEC" not in D9>     |          |                         |      |
+----+-----+---------------+------+----------+-------------------------+------+
+| 11 | "C" |"C" in D10     | "CCA"|          |                         |  262 |
+|    | "A" |"CC" in D10    |      |          |                         |      |
+|    |    <"CCA" not in D10|      |          |                         |      |
+----+-----+---------------+------+----------+-------------------------+------+
+You will notice a problem with the above output: How to write a code of 256
+(for example) on 8 bits? It's simple to solve this problem. You just say that
+the encoding starts with 9 bits and as you reach the 512th word, you use a
+10-bits encoding. With 1024 words, you use 11 bits; with 2048 words, 12 bits;
+and so on with all numbers of 2^n (n is positive). To better synchronize
+the coder and the decoder with all that, most of implementations use two
+additional references. The word 256 is a code of reinitialisation (the codec
+must reinitialize completely the current dictionary to its 256 initial letters)
+and the word 257 is a code of end of information (no more data to read).
+Of course, you start your first new word as the code number 258.
+You can also do so as in the GIF file format and start with an initial
+dictionary of 18 words to code an input stream with only letters coded on 4 bits
+(you start with codes of 5 bits in the output stream!). The 18 initial words
+are: 0 to 15 (initial letters), 16 (reinit the dictionary), and 17 (end of
+information). First new word has code 18, second word, code 19, ...
+Important: You can consider that your dictionary is limited to 4096 different
+words (as in GIF and TIFF file formats). But if your dictionary is full, you
+can decide to send old codes *without* reinitializing the dictionary. All the
+decoders must be compliant with this. This enables you to consider that it is
+not efficient to reinitialize the full dictionary. Instead of this, you don't
+change the dictionary and you send/receive (depending if it's a coder or a
+decoder) existing codes in the full dictionary.
+My codecs are able to deal as well with most of initial size of data in the
+initial dictionary as with full dictionary.
+Let's see how to decode an LZW encoding. We saw with true dynamical Huffman
+scheme that you needed an header in the encoding codes. Any header is useless
+in LZW scheme. When two successive bytes are read, the first must exist in the
+dictionary. This code can be immediately decoded and written in the output
+stream. If the second code is equal or less than the word number in the current
+dictionary, this code is decoded as the first one. At the opposite, if the
+second code is equal to the word number in dictionary plus one, this means you
+have to write a word composed with the word (the sentence, not the code number)
+of the last code plus the first character of the last code. In between, you make
+appear a new word. This new word is the one you just sent to the output stream,
+it means composed by all the letters of the word associated to the first code
+and the first letter of the word of the second code. You continue the processing
+with the second and third codes read in the input stream (of codes)...
+Example: Let's decode the previous encoding given a bit more above.
+------+-------+----------------+----------+------------------+--------+
+| Step | Input | Code to decode | New code |    Dictionary    | Output |
+------+-------+----------------+----------+------------------+--------+
+|   1  |   65  |       65       |    66    |     65,66=256    |   "A"  |
+|      |   66  |                |          |                  |        |
+------+-------+----------------+----------+------------------+--------+
+|   2  |   65  |       66       |    65    |     66,65=257    |   "B"  |
+------+-------+----------------+----------+------------------+--------+
+|   3  |   68  |       65       |    68    |     65,68=258    |   "A"  |
+------+-------+----------------+----------+------------------+--------+
+|   4  |  256  |       68       |    256   |     68,65=259    |   "D"  |
+------+-------+----------------+----------+------------------+--------+
+|   5  |   67  |       256      |    67    |   65,66,67=260   |   "AB" |
+------+-------+----------------+----------+------------------+--------+
+|   6  |  261  |       67       |    261   |     67,67=261    |   "C"  |
+------+-------+----------------+----------+------------------+--------+
+|   7  |  260  |       261      |    260   |   67,67,65=262   |   "CC" |
+------+-------+----------------+----------+------------------+--------+
+|   8  |   69  |       260      |    69    |  65,66,67,69=263 |  "ABC" |
+------+-------+----------------+----------+------------------+--------+
+|   9  |  263  |       69       |    263   |     69,65=264    |   "E"  |
+------+-------+----------------+----------+------------------+--------+
+|  10  |  262  |       263      |    262   |65,66,67,69,67=256| "ABCE" |
+------+-------+----------------+----------+------------------+--------+
+|  11  |       |       262      |          |                  |  "CCA" |
+------+-------+----------------+----------+------------------+--------+
+Summary: The step 4 is an explicit example. The code to decode is 68 ("D" in
+ASCII) and the new code is 256. The new word to add to the dictionary is the
+letters of the first word plus the the first letter of the second code (code
+256), i.e. 65 ("A" in ASCII) plus 68 ("D"). So the new word has the letters 68
+and 65 ("AD").
+The step 6 is quite special. The first code to decode is referenced but the
+second new code is not referenced being that the dictionary is limited to 260
+referenced words. We have to make it as the second previously given case, it
+means you must take the word to decode plus its first letter, i.e. "C"+"C"="CC".
+Be care, if any encountered code is *upper* than the dictionary size plus 1, it
+means you have a problem in your data and/or your codecs are...bad!
+Tricks to improve LZW encoding (but it becomes a non-standard encoding):
+- To limit the dictionary to an high amount of words (4096 words maximum enable
+you to encode a stream of a maximmum 7,370,880 letters with the same dictionary)
+- To use a dictionary of less than 258 if possible (example, with 16 color
+pictures, you start with a dictionary of 18 words)
+- To not reinitialize a dictionary when it is full
+- To reinitialize a dictionary with the most frequent of the previous dictionary
+- To use the codes from (current dictionary size+1) to (maximum dictionary size)
+because these codes are not used in the standard LZW scheme.
+Such a compression scheme has been used (successfully) by Robin Watts
+<ct93008@ox.ac.uk>.
+==========================================================+
+|                         Summary                          |
+==========================================================+
+-------------------------------------------------
+RLE type 1:
+Fastest compression. Good ratio for general purpose.
+Doesn't need to read the data by twice.
+Decoding fast.
+-------------------------------------------------
+RLE type 2:
+Fast compression. Very good ratio in general (even for general purposes).
+Need to read the data by twice.
+Decoding fast.
+-------------------------------------------------
+RLE type 3:
+Slowest compression. Good ratio on image file,quite middle for general purposes.
+Need to read the data by twice.
+Change line:
+#define MAX_RASTER_SIZE 256
+into:
+#define MAX_RASTER_SIZE 16
+to speed up the encoding (but the result decreases in ratio). If you compress
+with memory buffers, do not modify this line...
+Decoding fast.
+-------------------------------------------------
+RLE type 4:
+Slow compression. Good ratio on image file, middle in general purposes.
+Change line:
+#define MAX_RASTER_SIZE 66
+into:
+#define MAX_RASTER_SIZE 16
+to speed up the encoding (but the result decreases in ratio). If you compress
+with memory buffers, do not modify this line...
+Decoding fast.
+-------------------------------------------------
+Huffman:
+Fast compression. Good ratio on text files and similar, middle for general
+purposes. Interesting method to use to compress a buffer already compressed by
+RLE types 1 or 2 methods...
+Decoding fast.
+-------------------------------------------------
+LZW:
+Quite fast compression. Good, see even very good ratio, for general purposes.
+Bigger the data are, better the compression ratio is.
+Decoding quite fast.
+-------------------------------------------------
+The source codes work on all kinds of computers with a C compiler.
+With the compiler, optimize the speed run option instead of space option.
+With UNIX system, it's better to compile them with option -O.
+If you don't use a GNU compiler, the source file MUST NOT have a size
+over 4 Gb for RLE 2, 3, and Huffman, because I count the number
+of occurrences of the bytes.
+So, with GNU compilers, 'unsigned lont int' is 8 bytes instead of 4 bytes
+(as normal C UNIX compilers and PCs' compilers, such as Microsoft C++
+and Borland C++).
+Actually:
+* Normal UNIX compilers,                => 4 Gb (unsigned long int = 4 bytes)
+  Microsoft C++ and Borland C++ for PCs
+* GNU UNIX compilers                    => 17179869184 Gb (unsigned long int = 8 bytes)
+==========================================================+
+|                             END                          |
+==========================================================+