This article shows how to use NaN Boxing to improve the performance of a JSON parser for C# named DynaJson.
DynaJson acts the same as DynamicJson, as shown below. It returns a dynamic type value representing a JSON object resulting from parsing a JSON string. We can access each member of the JSON object with the member access operator like a C# object. This manner is useful to process a large JSON with duck typing.
A benchmark result of DynaJson is shown below. The benchmark measured the time to process a large JSON file named citm_catalog.json1, whose size is about 1.7MB. The benchmark compared DynaJson with Utf8Json, Jil, Newtonsoft.Json, DynamicJson. The measurement ran on .NET Core 3.1 on Ubuntu 18.04 on Azure D2Ds_v4.
The result shows DynaJson is quite faster than others. DynaJson is optimized for this use case, so the result is natural.
An optimizati0n by NaN Boxing
JSON parsers generate a large number of objects representing JSON values on parsing a large JSON file. So DynaJson represents a JSON value as a 16 bytes structure type. Copying a 16 bytes structure is much faster than larger ones because it needs only one load and store instruction for 128bit SIMD operations.
The definition of the structure is shown below. It folds members into 16 bytes with an Explicit layout. The parser uses each of
Dictionary alternately based on the value of
Type, so the members can be laid out at the same offset. How about
Type of an enum type overlaps the upper 4 bytes of
Number. It can seemingly break
NaN Boxing can realize the overlap safely. NaN, standing for Not a Number, represents zero divided by zero and the square root of a negative number in floating-point arithmetic See the Wikipedia entry for more details. One of the binary representations of NaN defined in IEEE 754 is 0xfff8000000000000. The lower 51 bits can have an arbitrary number, so NaN Boxing boxes data in the bits.
The definition of an enum type
JsonType is shown below.
JsonType has not only types of JSON values but also the values themselves for null, true, and false. The definition assigns a legitimate value as the upper 4 bytes of NaN to each member. The values never collide with the upper part of ordinary doubles. Any conjunctions with any lower 4 bytes become a legitimate double.
A usage of
JsonType is shown below.
ToValue is a method to extract a value
JsonValue represents. If the value of
obj.Type is not anything in the defined members, it must be an upper 4 bytes of an ordinary double, so
obj.Number. This manner realizes the clear separation of
The following benchmark result shows the effect of NaN Boxing. The benchmark measured parsing time in each definition of
InternalObject as a class, 20 bytes (8 + 8 + 4) structure without NaN Boxing, and 16 bytes one with NaN Boxing. The 20 bytes structure made the parser 20% faster than the class, and NaN Boxing improved 6.8% more.
Other usages of NaN Boxing
InternalObject into 8 bytes.
- This is one of the files used in Native JSON Benchmark. Many benchmarks for JSON parsers use this file.