My friend and coworker Myles Maxfield recently wrote up this explanation (posted here with his kind permission) of how JSCore encodes multiple types to a single JSValue primitive. I found it very interesting and I think it’s good under-the-hood information for anyone who works with JSCore or JavaScript running thereon.
This is a really neat piece of code, I think. I wrote up an explanation
of how values are encoded:
The JavaScriptCore interpreter has a primitive type (called JSValue,
inside WebKit/Source/JavaScriptCore/runtime/JSCJSValue.h) which can
represent the following types in JavaScript:
- Int32
- Double
- Cell (meaning: a pointer to another data structure)
- Boolean
- Null, undefined, empty, recently deleted (I’m grouping all these
together since they’re conceptually similar and have only one valid value
each, similar to the “Unit Type”)
The size of a JSValue, however, is 64-bits, no matter what type it is
holding. It is not a polymorphic type. Instead, it is implemented as a
union.
JavaScriptCore runs on 32-bit machines and 64-bit machines. The layout of
the type is slightly different for the two architectures. I’ll start with
32-bit machines.
The overall bit-layout of a JSValue on a 32 bit machine is as follows:
|xxxxxxxx|xxxxxxxx|xxxxxxxx|xxxxxxxx|xxxxxxxx|xxxxxxxx|xxxxxxxx|xxxxxxxx|
|----------------tag----------------|--------------payload--------------|
The format for everything that isn’t a double is obvious – assign each
type a unique tag, and put it’s 32-bit payload in bytes 5-8 (since ints
and pointers are both 32 bits). But then, how do we encode a double,
since a double alone takes up the whole 64-bits? JavaScriptCore is tricky
about this – it assigns the tags starting at 0xFFFFFFFF and counts down.
The reason for this is that a double encodes “QNaN” if bits 1-13 are set
(counting from the left). Therefore, if any of the tags are set, the bits
in the JSValue (when interpreted as a double) would encode a NaN. There
are also more NaN values available, so it’s possible to encode an
::actual:: double NaN. So, to tell what type a JSValue is, just see if
the tag matches, and if it doesn’t, the value is a double.
—————-
Now, for a 64-bit machine. The types are all the same size, except that
now pointers are 64 bits (ints are still 32 bits even on a 64-bit
machine). So, how can we tag the pointer, even though it takes up the
whole 64-bits? Well, JavaScriptCore makes an observation that in most
pointers, the top two bytes are usually 0 (Yes, it is true that addresses
are arbitrary due to virtual memory, but usually machines don’t get that
high when allocating addresses. To get up that high in the first place
would require this process claiming 281 terabytes of memory). We can,
then, recognize a pointer as having its top two bytes == 0. So then, how
do we distinguish a pointer whose top two bytes are 0 from a double whose
top two bytes are 0? Well, we can artificially increment the top two
bytes of the double. So, to encode a double, we’d first reinterpret the
bits as if they were an int64, then add 0×0001000000000000. To get back
the original double, just subtract that value and reinterpret the bits as
a double. This won’t overflow because the only values that would overflow
have the leftmost 16 bits set, but that’s a NaN. The same argument can be
used to show that the leftmost two bytes won’t ever end up being 0xFFFF
either.
|xxxxxxxx|xxxxxxxx|xxxxxxxx|xxxxxxxx|xxxxxxxx|xxxxxxxx|xxxxxxxx|xxxxxxxx|
|00000000000000000|--------------------pointer value--------------------|
| 0x0001 - 0xFFFE ------------double value------------------------------|
Well, what about the other types that a JSValue can represent? Our old
tagging mechanism (used on 32-bit processors) won’t work because our tag
has now been shortened to only only 16 bits, but in order to encode a
NaN, the first 13 have to be set. This means that we only have 3 bits to
play with, which isn’t enough (we might want to add more than 8 extra
types). But, never fear! Just like there are invalid double values, there
are also invalid pointer values! Namely, pointer values that would map to
the first page of memory (which is usually mapped with no permissions, so
these addresses are invalid). Therefore, we can use the rightmost few
bits as a tag. All the types except for Int don’t actually encode much
(don’t have much entropy), so JavaScriptCore uses combinations of 0×2,
0×4, and 0×8 to create the necessary tags. However, for ints, it actually
does use one of the NaN values (using a leftmost 0xFFFF tag) which
encodes int32s, so the rightmost 4 bytes can be the payload. This would
be faster than putting the int32 in the middle of the 64-bit value
(something like making bytes 3-6 specify the payload) because you
wouldn’t need extra shifting operations to recreate the int.
|xxxxxxxx|xxxxxxxx|xxxxxxxx|xxxxxxxx|xxxxxxxx|xxxxxxxx|xxxxxxxx|xxxxxxxx|
|FFFFFFFFFFFFFFFFF|00000000000000000|-----------integer value-----------|
|00000000000000000|-------------------------------------------------TAG-|
<- other type