|
Posted December 21 2008 PHP has finally taken the plunge into the 1960s by implementing GOTO. Here's an implementation for Java. A patch for a serious Java bug. No longer needed as of June 16. Max & Michael have written a Max/MSP driver based on the multitouch code. This is a little test applet to shows the touch points as reported by the MacBook multitouch pad. The drivers seems to be able to track an impressive 11 fingers at once.
My CSS-fu is weak; please use a recent browser. Random, semi-related image by Eva the Weaver. |
Bit Tricks, Part II: Enumerating BitsHow to enumerate bits quickly using DeBruijn sequences. This is part 2 of a series of posts on bit-twiddling tricks and micro-optimizations. This one looks at ways to try to optimize the following function:
It loops across a word, and performs some action for each 1-bit found. Updated 20090128: Fixed a typo in two of the code snippets. Thanks, Martin! Finding the next bitThe original code tests one bit at a time. This is, of course, a bit slow. Luckily, just about every CPU ever made has specialized hardware for scanning bits: It is used for calculating the carry when adding numbers. This is the classic x &- x operator. This illustrates why it works. Depending on your fonts, ~ (bitwise not) and - (negation) may look very similar. 00100001010001000 x 11011110101110111 ~x 11011110101111000 ~x + 1 a.k.a. -x note the carry! 00000000000001000 x & (~x + 1) a.k.a. x & -x Note that this returns the bit as 2i, not i, so the next step is to find i. Finding the indexOnce we have extracted a single one bit, the next operation is to find its position. Java comes with a very clever library function for doing this, taken from Hacker's Delight. It's basically a binary search, but with branches merged. public static int numberOfTrailingZeros(long i) {
// HD, Figure 5-14
int x, y;
if (i == 0) return 64;
int n = 63;
y = (int)i; if (y != 0) { n = n -32; x = y; } else x = (int)(i>>>32);
y = x <<16; if (y != 0) { n = n -16; x = y; }
y = x << 8; if (y != 0) { n = n - 8; x = y; }
y = x << 4; if (y != 0) { n = n - 4; x = y; }
y = x << 2; if (y != 0) { n = n - 2; x = y; }
return n - ((x << 1) >>> 31);
}
This is a little bit faster for most inputs, but still not very good. Finding the index, slightly fasterThe numberOfTrailingZeros function has 6 conditional branches; it's basically doing a binary search. It's possible to write it in branch-free style, but there's a simpler way to get most of the benefit: We can exploit the fact that 2n - 1 = 20 + 21 + ... + 2n-1 and thus bitCount(2n-1) = n Also, the bitCount() function (also from Hacker's Delight) is branch free:
public static int bitCount(long i) {
// HD, Figure 5-14
i = i - ((i >>> 1) & 0x5555555555555555L);
i = (i & 0x3333333333333333L) + ((i >>> 2) & 0x3333333333333333L);
i = (i + (i >>> 4)) & 0x0f0f0f0f0f0f0f0fL;
i = i + (i >>> 8);
i = i + (i >>> 16);
i = i + (i >>> 32);
return (int)i & 0x7f;
}
Finding the index, fast
By multiplying the single bit (which will have a value of 2i) with a constant, you are essentially shifting the constant left by i bits. Consider the three middle bits here: 001101 * 1 = 000001101 001101 * 10 = 000011010 001101 * 100 = 000110100 001101 * 1000 = 001101000 001101 * 10000 = 011010000 001101 * 100000 = 110100000 Notice how the values marked in red are unique. Using this trick, the indexing operation can be performed using a simple multiplication, bit-shift and table lookup, given that we can find a De Bruijn sequence(W).
If you can control the input data (which you probably do), and you don't care about the order of the calls to process(), you can pre-shuffle the input to avoid the table lookup altogether.
BenchmarksThis graph shows the runtime with various densities (the X axis measures the percentage of 1-bits (randomly distributed) in the input array). Note the runtime for the original version: In the worst case, the "is this bit on" branch will be taken 50% of the time, independently at random, rendering the branch prediction hardware completely useless. Let's go nutsLater, the case where the function process simply increments a counter: void process(int i) {
table[i]++;
}
Also, how to avoid all of the above if inline assembly is available. CommentsHave you considered the ffs() function? :-) (On x86, it maps down to bsf, which is fast -- much faster than a multiply.)
— Steinar H. Gunderson 2009-03-01 [Indeed. If you can run arbitrary instructions, that's definitely the way to go (this was for a project which had to be Pure Java®). Adding ffs() to the benchmark is now on my List®.] |