Thursday, October 26, 2017

glTF 2.0: I Like It!

Although I'm not a 3d graphics person, I have worked with several 3d file format1234567891011. In general, I've been very disappointed in the design of these file formats. But I've finally found a 3d file format that I've liked. glTF 2.0 is actually pretty nice.

It's a mostly straight-forward, easy to understand file format that's pretty unambiguous. It doesn't try to implement any fancy features like U3D. It doesn't contain weird legacy baggage like X3D or COLLADA. Its design isn't so overly configurable or flexible that it's impossible to know whether what you store in it can be read by other programs like COLLADA or TIFF. It just holds a bunch of triangles and associated data structures. It seems like it was built from the ground up as a proper file format for interchange instead of growing out of some existing system with all sorts of strange behavior based on how the codebase for the original system evolved. It also has good extension points making it easy to store additional application-specific data in a file.

I think part of the reason why it came out so well is that it was originally designed for one purpose only: for sending 3d models to be displayed by WebGL. With a well-defined and basic use case, the designers had the focus to make something straight-forward and easy to work with. With glTF 2.0, the file format has been extended to support more general use cases, but the core use case--holding 3d models--hasn't been diluted by that. Storing 3d models in glTF 2.0 is still clear and concise without a lot of confusion.

I still have a few niggles with it though that could be improved. Right now, the file format doesn't have widespread support yet, but it is starting to grow. Still, given that this is a file format specification, I feel like there should have been at least one proper reference importer/exporter for the file format before it was finalized. There are multiple implementations of the spec, which is good, but none of the implementations are complete and comprehensive and allow for a proper bidirectional interfacing with a proper 3d application, so it's just hard to know if the files I've created are correct or whether all the corners of the file format has been fully tested.

Some parts of specification don't really give proper explanations or context for why they are needed. For example, I still don't understand why accessor.min and accessor.max exist. Like, I'm sure there's a good reason, but they just seem like an unnecessary hassle to me. Especially given that it's impossible to properly encode a 32-bit floating point number as a decimal string, I just can't see what use an inaccurate record of the min and max x,y,z values of some points are. Having more context there would be useful for implementors. Another example are the different buffer, bufferview, and accessor objects needed to refer to memory. It took me a long time to figure out what the difference was. At first, I thought you could the data for everything in a single bufferview, and just use different accessors to refer to different chunks of it. It was only later when I read that bufferviews were intended to refer to OpenGL memory buffers did I finally understand what each level of memory reference is for. The different buffers are meant to refer to different data stored on-disk. Usually, you'll only have one buffer, but if you have different models that share data, you can put this shared data in a separate file/buffer that those two models can share. The bufferview refers to a single in-memory chunk of data loaded into memory for a model. So, having a single bufferview for an entire scene would be wrong. You would normally have one or more bufferview for each 3d object in a scene. In general, when accessing data from a bufferview, you would always read from the start of the bufferview. If you find yourself reading from an offset into the bufferview, then you should probably just use a separate bufferview instead. The accessors describe how to read individual data fields of a bufferview. Notably, the bufferview contains a byteStride property that allows a bufferview to be broken up into different records or entries. An accessor describes how different fields are stored/interleaved inside a record or entry of a bufferview. An accessor's byteOffset is supposed to be used for offsets into a record or entry, not for starting at an offset into a bufferview.

glTF 2.0 also offers a convenient format for storing all the 3d data in a single file called GLB. The GLB specification is nice in that it's really basic and straight-forward, but its design is a little sloppy. The GLB file format has its file size encoded in it, which is unnecessary and prevents the data from being streamed. Even if that were fixed, the design of the chunks inside the file also prevent being able to write out the data in a single stream. All the parts of the file have to be written out separately first, their sizes determined, and then they can be assembled and written out into a GLB file. This is caused by the fact that there can only be a single buffer chunk, and the JSON chunk (which will contain references into the buffer chunk) has to be written out before the buffer chunk.

Overall though, I really like the glTF 2.0 format. I really hope it gets widespread adoption. I definitely see it displacing the .OBJ format in the long term.