## Two interesting ideas

If you’re ever out of ideas for new projects/things to research, you might find these topics interesting :

**Arithmetic coding/compression**

Many compression algorithms, such as Huffman, work by assigning codes to chunks of data and then writing those codes to output file. Chunks that are often encountered get short codes, rare chunks get longer codes, and thus compression is achieved. However this process is imperfect, because these algorithms can’t assign codes of fractional size – for example, you can get a code 2 bits or 3 bits long, but not 2.3 bits.

Arithmetic coding effectively allows you to output arbitrary-length codes. The idea is that, when doing compression, you have two floating point numbers (*low* and *high*) defining a range between 0 and 1, and every new symbol encoded modifies this range, based on it’s own range (determined by the probability of that symbol appearing in the input data). Decompression is done by looking where current *low* value lies and outputting the symbol with the corresponding range (and moddifying *low* and *high*).

In actual applications ranges are represented by whole numbers (obviously you can scale [0;1) to [0;256) or any other values) and some precision is lost due to rounding and such. Still, as far as I know, this is the best coding algorithm you can get.

You can read a very good introduction here (well, at least it’s better than my convoluted summary ;)).

Another idea : you could train a neural network to predict the probabilities of individual symbols appearing in given conditions (or just overall) and use that to create an adaptive compression algorithm.

**Steganography and digital watermarking**

Okay, most semi-experienced programmers will have heard that you can hide data in digital images by overwriting the least-significant bits of color bytes. That’s fine if you intend to keep that image or transmit it unchanged, but this technique becomes mostly useless if you want to insert an invisible copyright message in your image. The embedded data probably won’t survive even the simplest image manipulation, like scaling, rotation and cropping.

You need something else. Some image properties are more likely to remain unchanged or change very little. For example, the frequency distribution/histogram is mostly unaffected by aforementioned manipulations. A histogram is basically a graph that shows how much of each color there is in an image.

It is possible to hide information in this data, you just need to find out how. Perhaps you could modify some bits of scaled histogram data and change the image to match the histogram. Another fine idea is to use *ranges* instead of single bits to store the information, so you could spread a single bit (0-1) over three separate bits (0-7) by saying that [0;3] means “0” and [4;7] means “1”. Or you could…

I’ve seen lots of examples hiding messages by overwriting insignificant bits in various image formats, but I couldn’t find any (Delphi) applicaiton that would store a short message by modifying the frequency distribution. Maybe you’ll be the first to create it 😉

*Note* : if you want to do this properly, there’s a whole lot of calculus-heavy read concerning Fourier transforms, spread spectrum coding and other robust watermarking techniques.

Good luck.

**Related posts :**