Posted March 29 2006
I needed some valid SWF files with constrained character sets for an injection PoC. Putting them here in case someone else needs some.
PHP has finally taken the plunge into the 1960s by implementing GOTO. Here's an implementation for Java.
A patch for a serious Java bug. No longer needed as of June 16.
ZIP File Quine
This is the ZIP file that never ends.
I did a proper writeup in 2005 or so, but the server died before archive.org found it, so it's gone. I might do a proper one at some point.
[updated 2010: I won't have to; this blog post explains it all much better than I ever could]
The LZ77 quine
LZ77 basically defines a simple language with just two operations: "print the following N bytes verbatim" and "repeat M bytes you have already written, starting N bytes from the end"
e.g. "BANANA" could be compressed as
Print the next 3 line(s) B A N Repeat 3 lines from the output, starting 2 lines from the end (i.e. from the A)Note in particular that the copy overlaps with itself:
B B A A N \ N (A) \ A --> A \ N N A AHere's a proper quine, using the "banana" trick twice. The italic parts are "data" the first time around.
Print the next 0 lines (i.e. NOP) Print the next 2 lines Print the next 0 lines (i.e. NOP) Print the next 2 lines Print the next 0 lines (i.e. NOP) Repeat 3 lines from the output, starting 2 lines from the end Print the next 2 lines Repeat 3 lines from the output, starting 2 lines from the end Print the next 2 lines Repeat 3 lines from the output, starting 2 lines from the end(this is one of very few programming problems I know where you can get shorter code by allowing NOPs)
This quine only prints itself. To get a valid ZIP file, it has to print a valid zip header, itself, and a valid footer.
Surviving the Huffman step
The "copy" and "print" commands are encoded as a bit stream. This makes it a bit harder - you can only quote entire bytes at a time, so you need to synchornize with the bit stream every now and then. The "quote" command ("block type 00") forces byte alignment, so it's not a big problem.
Getting the CRC right
One last hurdle is the fact that the ZIP file format uses a CRC-32(W) check to guard against accidental damage to the file. The CRC applies to the uncompressed data, which means the file needs to contain its own CRC.
Each bit of the CRC is a linear function of the bits of the file, so you can reduce the problem to a set of 32 linear equations in GF2 (hence, easily solvable).