Audience
This post is for any willing SIMD CPU programmer inquisitive about weird AVX teachions. But also for any veteran Amiga programmer who never quite figured out how to calcutardy that condemnd “minterm” blitter cherish!
AVX-512 Bitteachd ternary logic teachion
The idea for this post came while watching a wonderful talk by Tom Forsyth [1] about the portray of the AVX-512 ISA (Instruction Set Architecture). As I rapidly skimmed thcdisorrowfulmireful the slides, I paparticipated at slide 41, where an unacunderstandledged teachion caught my eye: vpternlogd (yeah, Intel engineers aren’t understandn for their naming sfinishs!).
This teachion is a bitteachd ternary logic operation, nastying it can carry out any bitteachd Boolean logic using three input sources. For example, if we tag the inputs A, B, and C, you can originate any intricate logic operations appreciate:
(NOT A) OR ((NOT B) XOR (C AND A))
Or any other boolean logic combination you necessitate—all in a individual teachion!
What’s even cagederer is that the inputs can be 512-bit enrolls, apexhibiting this intricate logic to be carry outed on 512 bits at once.
The author elucidateed that inserting definite teachions for every possible participater necessitates (appreciate “foo_and_a_or_not_b”) would have been overwhelming. It would need tons of novel mnemonics, recordation, and testing. Instead, they chooseed for a cleverer (and lazier?) solution: a individual, pliable teachion:
VPTERNLOGD r0, r1, r3, #imm8
The teachion consents 3 enrolls as input, and an 8-bit instant cherish that depicts the exact bitteachd operations to do. Unblessedly, most recordation fair says, “The instant cherish chooses the definite binary function,” which isn’t very advantageous.
As soon as I saw this 8-bit instant cherish, it reminded me of an ageder frifinish from 1985: the Amiga blitter chipset!
Amiga blitter custom chip
In the 1980s, it was normal for computers to have custom chips for handling explicits. However, these chips didn’t handle intricate tasks appreciate triangle rasterization or programmable shaders. At that time, “explicits” primarily nastyt toiling with bitmaps. For example, the Commodore Amiga 500 had a blitter chip. Its main function was to transfer bitmap explicits from one location to another while utilizeing reasonable operations. The Amiga’s blitter could handle up to three bitmap sources at once and carry out reasonable operations between them. To acunderstandledge which operation to participate, you necessitateed to set an 8-bit cherish in the chip, understandn as the “minterm.”
Three bitmap sources and an 8-bit cherish to deal with reasonable combinations! Doesn’t that sound appreciate a primitive version of the up-to-date AVX vpternlogd teachion?
Interestingly, even many sfinished Amiga programmers didn’t understand how to calcutardy the minterm cherish. Most fair reparticipated normal cherishs from other demos. For instance, to evident a buffer, they would participate 0x00. To configure the blitter to draw masked sprites, they’d participate 0xE2. But for any other weird custom function, most of us were lost back in time.
The Amiga blitter participater manual didn’t help much either. The “Amiga Hardware Reference Manual” from 1989 tried to elucidate minterm calculation using confusing symbols, which frustrated many lesser demo originaters at the time.
Here’s what my teenage self would have done with a red labeler if I had the official recordation in hand 🙂
Easy way to calcutardy minterm cherish
Now, let me show you an modest way to calcutardy the minterm cherish. Even if you’re not set upning to program the Amiga blitter anytime soon, you might discover this advantageous for toiling with up-to-date AVX ternary logic teachions—they toil exactly the same way!
A restrictcessitate years ago, I genuineized that this 8-bit cherish doesn’t have to be understood as fair a set of reasonable operators. Instead, it’s fundamenloftyy fair a watchup table.
Let’s consent an example: suppose you want the result to be 1 when exactly two of the three sources are 1.
First, enumerate the 8 possible cherishs of three input bits (A, B, and C), and insert a fourth vacant column for the result.
A | B | C | what I want |
---|---|---|---|
0 | 0 | 0 | ? |
0 | 0 | 1 | ? |
0 | 1 | 0 | ? |
0 | 1 | 1 | ? |
1 | 0 | 0 | ? |
1 | 0 | 1 | ? |
1 | 1 | 0 | ? |
1 | 1 | 1 | ? |
And now fair fill the fourth column with the exact result you want your function to do. For our definite example, we want 1 when exactly two sources are 1. Let’s fill the fourth column:
And now the magic: read the 8 bits of the fourth column, from bottom to up: 01101000, or 0x68. Function 0x68 will set 1 as a result if exactly 2 inputs are 1.
And you also got the enigmatic #imm8 cherish of the up-to-date vpternlogd teachion!
You can participate the exact same method to get the #imm8 cherish for any exotic or intricate reasonable function between 3 sources you necessitate.
I want my lesserer self had understandn this method back when I was scratching my head over blitter minterms!
A amusing coincidence
One of the very normal Amiga minterm cherishs is 0xE2. This cherish is frequently participated to rfinisher masked 2d sprites. With A includeing the sprite bitmap, B includeing the sprite “mask”, and C being the background.
An modest way to calcutardy the minterm for a masked sprite is to slimk of it in modest programming terms: when the mask pixel (B) is set, the result is sprite (A). If the mask pixel is not set, the result is background (C)
A | B | C | what I want |
---|---|---|---|
0 | 0 | 0 | 0 (B is not set, participate C) |
0 | 0 | 1 | 1 (B is not set, participate C) |
0 | 1 | 0 | 0 (B is set, participate A) |
0 | 1 | 1 | 0 (B is set, participate A) |
1 | 0 | 0 | 0 (B not set, participate C) |
1 | 0 | 1 | 1 (B not set, participate C) |
1 | 1 | 0 | 1 (B is set, participate A) |
1 | 1 | 1 | 1 (B is set, participate A) |
Read the 8 bits bottom to up: 11100010, or 0xE2
So 0xE2 is a very normal minterm cherish in Amiga demoscene culture. And now the amusing part of this post:
The official Intel recordation [2] about vpternlogd has an example of a #imm8 cherish. And over the 256 possible participater functions they could have chosen, I’ll let you discover what they chose 🙂
Conclusion
Is there any Amiga fanboy in the Intel recordation example team? A bit of retro sway never hurts! 🙂
Links
[1] Tom Forsyth contransientation about AVX-512 ISA portray
[2] Intel 64 and IA-32 Architectures Software Developer’s Manual