Ming's Coding Blog

Wednesday, February 17, 2010

Background Music in JavaScript without Flash

So for the last half a year, I've been playing a bit with making larger and more complicated games in JavaScript. I've found a few neat new tricks, but I haven't gotten around to writing about them until now. One thing I figured out is how to play background music in modern browsers without using Flash. This is useful for JavaScript games where you need some music to help set the mood in a game.

So for non-IE browsers, you can use HTML5. Although the APIs seem to work reasonably well, I'm not sure how stable the specification is. There's some occasional weird things in the specification that suggest that the specification is immature. For example, the audio specification makes use of a lot of "boolean attributes" which are HTML attributes that can be defined or not defined instead of simply being true or false (so setting the attribute to false is equivalent to defining it as true). This sort of bizarre, unnecessarily complex, probably XML-incompatible stuff suggests that the specification still contains idiosyncrasies and pet projects of individual developers instead of being polished into a consistent spec. I'd be concerned of a problem like with ECMAScript 4 where once Microsoft and Yahoo! applied some common sense to the specification, a lot of things might end up changing. So I suggest using HTML5 with caution--you should always check to make sure that every method and field that you want to use is supported by your browser before relying on their existence.

Well, anyways, for HTML5 audio, you first need to check if it's supported:

var hasAudio = true;
try {
   var audiotest = new Audio();
   if (!audiotest.canPlayType) hasAudio = false;
} catch (e) { hasAudio = false; }

Every browser supports different audio formats, so you need to make an MP3 version of your files (you can use iTunes for that) for Safari/Chrome and an OGG version (you can use Audacity for that) for Firefox/Chrome. For some reason, the Chrome browser doesn't support WAV files because they claim the files are "too large," and I think this is a clear example of the immaturity surrounding the HTML5 specification in general. There are many good reasons for supporting WAV files in browsers, and for short sound effects, WAV files can even be smaller than OGG or MP3 files. The reason for not including WAV support in Chrome is simply the result of bias and ignorant opinions as opposed to a mature approach to the problem. So anyways, you need to test which background music file to load depending on which formats are supported.

var clip = new Audio();
clip.autoplay = false;
clip.autobuffer = true;
var canOgg = clip.canPlayType("audio/ogg");
var canMp3 = clip.canPlayType("audio/mpeg");
var canMp3Alt = clip.canPlayType("audio/mp3");
var canWav = clip.canPlayType("audio/wav");
if (canMp3 != "" && canMp3 != "no")
   clip.src = mp3File;
else if (canMp3Alt != "" && canMp3Alt != "no")
   clip.src = mp3File;
else if (canOgg != "" && canOgg != "no")
   clip.src = oggFile;
else if (canWav != "" && canWav != "no")
   clip.src = wavFile;

And from there, playing the file is easy (though Firefox 3.5 doesn't support looping music, so you need to do that manually).

clip.addEventListener("ended", function() {
// I'm not sure if this setTimeout() thing is necessary
window.setTimeout(function() {clip.play();}, 1);});
clip.play();

For Internet Explorer, you can use Windows Media Player to play music. First, you have to check to see if Windows Media Player is available using ActiveX:

var hasWmp = true;
try {
   var player = new ActiveXObject("WMPlayer.OCX.7");
} catch (e) {hasWmp = false;

For some reason, when you create the ActiveX object, it isn't bound to your web page properly, so you can't use relative URLs to refer to music files. If you do want to use relative URLs, you actually have to create the player object as an HTML object in your document.

var tmp = document.createElement('div');
tmp.innerHTML = '<OBJECT CLASSID="CLSID:6BF52A52-394A-11d3-B153-00C04F79FAA6"></OBJECT>';
player = tmp.getElementsByTagName('OBJECT')[0];
tmp = null;
if (!player || !player.versionInfo) hasWmp = false;

From there, playing an mp3 file is easy:

player.settings.setMode("loop", true);
player.URL = mp3File;
player.controls.play();

Update 2010-5-3: Apparently, using MP3 files on a web page or in a game requires an mp3 license. In the future, I guess I will only support Ogg Vorbis and let users of Safari and IE endure the silence.

Friday, January 22, 2010

We're Currently in a Golden Age of Literacy, and It's All Downhill from Here

During the past few months, I've slowly started to notice that a lot of the things that I used to enjoy reading about, I know prefer watching videos about instead. For example, I used to enjoy reading reviews about video games on the web, but recently I'll search for video reviews first, and I'll only fall back to text reviews if I can't find any videos. I'm also starting to do the same for educated political commentary and satire. In fact, if there was enough video content on my favourite topics, I might prefer watching videos to web browsing entirely.

The logical conclusion of all this is that this signals the end of literacy as we know it. Sure, we might now complain about the poor literacy skills of our kids, we decry their unintelligible text messages, and we bemoan their incoherent essays, but ten years from now, we're going to look back on 2010 as a golden age of literacy. It will be considered the pinnacle of literacy in the world--people all over the world actually had to type words in order to find information! People were required to read information and instructions! Children preferred sending messages to each other in the form of text! In between the television age of the 1980s and the Internet video age of the 2020s, the period around 2010 will be considered a renaissance of literacy where people actually had to know how to read and write text to survive in the world!

In ten to twenty years, there will no longer be a need to read and write information; everyone will use video. Instead of web pages, there will be videos. Instead of SMS and tweets, kids will send videos to each other. Instead of textbooks, there will be slides and accompanying video. Society will complain about how kids can't form well-thought out text any more, preferring stream-of-consciousness videos. Kids might now even write text any more but simply make videos, edit it together somehow, and then dump the text out. Similar to how people ignore people talking to themselves in the street (they assume that the person is using a hands-free cellphone) and ignore people checking e-mail during conversations, I think in the future, people will simply get used to the idea of people quietly making short video messages to others in the middle of a meeting (of course, it will seem extremely disruptive to us, but I imagine it's something that future generations will simply get used to).

There are a few weaknesses of video--because it is a linear medium, it's hard to scan through it quickly, so there will still be some text that people will read. This text will mostly be in the form of "headlines," "search results," and "transcripts." When we want to find videos to watch, we'll tell our computer what videos we want to watch (through a webcam, of course!), and the computer will show various video clips with meaningful text headings that we can scan through quickly to find what we want. And every video will have an accompanying transcript (in both text and screencap form) that people can use to quickly jump around to the parts of videos that they're interested in.

I see two impediments to this video future:

patents
speech recognition (specifically video transcription) technology

The main barrier to this future right now are patents on video technology. These patents inhibit standardization (so no two companies can agree on a video e-mail standard or a webcam standard, for example), which prevents a network effect from starting around video technology. The cost and licensing difficulty of patents also inhibit innovation and entrepreneurship in this area. New companies and researchers don't want to develop new ideas in this space because it's too complicated and too expensive to license technology in this area, so the potential rewards (with Internet technology, most of this consumer technology has zero revenue) aren't profitable. I believe this patent issue will work itself out once patents expire or once storage and bandwidth becomes so cheap that we'll just use raw video for everything. The day after a useful video patent expires, a standard for video e-mail will be made (or perhaps there will be some sort of proprietary Facebook thing that comes out before then that everyone will use--unfortunately, it will be a bit of a kludge getting it to work with cellphones and other devices).

The other barrier is the lack of accurate video transcription technology. As I mentioned before, video is a linear medium, so we need a way to search it and to scan through it. The minimum technology we need for this to be practical is automatic video transcription. Currently, speech recognition isn't quite at the point where this is possible, but I imagine this technology will be ready by 2020, perhaps even by 2015 (there might be patent issues on this technology that will delay its widespread use for a few more years).

It's sort of weird knowing that despite all the complaints that people have about literacy in 2010, this is actually the best it's going to get. I'm not sure about the ultimate implications and ramifications of the future age of video on society, but I think it's inevitable, so we might as well start preparing for it now. I suppose other people have also been evangelizing that video is the future, so my musings are perhaps not the most original, but they never tell you that the trade-off is the end of literacy.

Sunday, August 30, 2009

Stitching Together Scanned Images

Recently, I was scanning a lot of legal sized documents, but my scanner could only scan A4 sized paper. So I had to scan each document in parts and then stitch them together afterwards.

There is lots of software available for stitching together photos into a panorama, but surprisingly, I had trouble finding software for stitching together flat images (or for creating a "linear panorama" as some people call it) even though this task would supposedly be a little bit easier than making panoramas. Of course, it's possible to manually load up the images into a paint program and manually align them and blend them, but I had to stitch together a lot of documents, so this was too time-consuming for me.

In the past, I've used software like Hugin, Panotools, or Autostitch for stitching together photos, but I could never quite get the various tutorials on how to handle linear panoramas to work quite right.

I own a copy of Photoshop Elements 6.0, which does contain algorithms for this sort of stitching (under File...New...Photomerge Panorama). Initially, I was quite pleased with the results, but after inspecting the final stitched images more carefully, I began to realize that the algorithm had lots of problems rotating images in order to get good alignment between them. If you only superficially inspect the final result, everything seems fine because Adobe seems to use some sort of fancy blending algorithm (something like Enblend) that hides a lot of these imperfections. Unfortunately, the blending algorithm can't hide the fact that because of poor alignment, lines that should be straight aren't. Even when you manually rotate the images to help the alignment algorithm, Photoshop Elements doesn't help you try to fine-tune the rotation or positioning afterwards, so it's hard to get things aligned quite right (I was hoping that if I could get things close to alignment, the algorithm would "snap" things to the best position). Adobe is always tweaking their algorithms, so it's possible that later versions of Photoshop Elements give better results though.

In the end, I actually found a research tool from Microsoft that provided excellent automated results. Even better, Microsoft's Image Composition Editor was very fast and very easy to use, so I highly recommend this program. Eventually, I hope to figure out what I was doing wrong with Hugin because it's always useful to have an alternative that supports a lot of manual tweaking, but so far the results of the Image Composition Editor have been so good that I've only needed to resort to alternate tools (i.e. Adobe Photoshop Elements) only once, which occurred when I was stitching four images together that had only a small amount of overlap.

Saturday, June 06, 2009

Vista Service Pack Installation Problems

I recently installed Windows Vista SP2, and like Vista SP1, I seemed to have problems getting it installed properly. It would spend a huge amount of time installing itself, reboot, and then fail during its third stage, and then spend a lot of time reverting the install. Windows Update would show Code 80004005 in the error details field for why the update failed.

Fortunately, after browsing the web, I found a forum post (I no longer remember where it is), where people mentioned that the problem comes up with dual-boot machines. My machine has both Linux and Windows installed, and Vista apparently fails its TPM security checks or something like that if the GRUB boot manager starts up in-between the computer starting and Vista starting. Fortunately, I installed GRUB on my Linux partition instead of my MBR, so I could simply change my active/boot partition to boot directly into Windows. Afterwards, the service pack could apply itself, and then I could boot on a Linux live CD to reset the active/boot partition back to the Linux partition.

Personally, I find this sort of annoying. It would be nice if Windows wouldn't die on dual-boot machines or give users a prompt to allow then to ignore TPM problems during the boot process or, at the very least, give more useful error messages.

Saturday, January 24, 2009

MTU Problems When Using Vista Internet Connection Sharing (ICS) with an XBox 360

So like other people, I don't really like the idea of paying $100 for an XBox 360 wifi adapter, so I'm using Internet Connection Sharing through my laptop to get my XBox connected to a wireless access point (yes, there are easier ways of doing this, but like usual, I'm doing something strange).

But when I did this, the XBox kept complaining that my MTU was set too low since it needed an MTU of 1364. This is a little silly since a proper IP stack should automatically fragment its packets to accomodate the MTU, but whatever.

I ran the command netsh interface ipv4 show interfaces to find out what the MTUs on my Vista laptop were, and they were all set to 1300. This seemed a little strange because 1300 is oddly low and also a nice round number that a human must have decided on. But I certainly hadn't set my MTU to 1300, so I'm not sure why it was set to that.

I spent a while browsing around, but finally I figured it out. It was that annoying Cisco VPN client on my laptop. I have no idea why they set the MTU so low (what sort of intermediate router would use 200 bytes of overhead per packet? It's silly to reserve so much), but the VPN Client comes with a Set MTU program that I was able to use to set everything back to normal, and then my XBox stopped complaining.

Friday, January 16, 2009

Running TPC-H Queries on MySQL

Getting the TPC-H queries to work on MySQL isn't too hard, but it isn't always clear what to do. I took the instructions from here and converted them to work with MySQL.

Once TPC-H is started, you can create a database and tpch account:

mysql -u root -p
mysql> CREATE USER 'tpch'@'%' IDENTIFIED BY 'password';
mysql> CREATE DATABASE tpch;
mysql> GRANT ALL ON tpch.* to 'tpch'@'%';
mysql> USE tpch;
mysql> \. tpch/gen/dss.ddl

Then, in the gen directories, you can modify the query generator to generate queries that are as close as possible to what you need:

cp makefile.suite makefile
#Modify makefile to use
# CC = gcc, DATABASE=SQLSERVER, MACHINE=LINUX, WORKLOAD=TPCH
#In tpcd.h, SQLSERVER section:
# change #define SET_DBASE "use %s;\n"
# change #define SET_ROWCOUNT "limit %d;\n\n"
# change #define START_TRAN "BEGIN WORK;"
# change #define END_TRAN "COMMIT WORK;"
make

Then you can start generating database data at the right scale factor

./dbgen -s 1

And also modify the constraints/indices file so that it's compatible with mysql:

# Modify dss.ri
# use a search and replace in order to remove "CONNECT TO TPCD", remove references to "TPCD." and remove the lines "COMMIT WORK;"

Then you can load all the data into the databases with

mysql -u tpch -p
mysql> use tpch;
mysql> LOAD DATA LOCAL INFILE 'customer.tbl' INTO TABLE CUSTOMER FIELDS TERMINATED BY '|';
# Same with orders, lineitem, nation, partsupp, part, region, supplier
mysql> \. dss.ri

You then need to change the case of the table names because the queries use lower-case table names I think whereas the dss.ddl uses upper-case names for the tables:

mysql> alter table NATION rename nation;
# Ditto for supplier, region, partsupp, part, orders, lineitem, customer

Finally, you can test some of the queries

cp dists.dss queries
cd queries
../qgen -c tpch -s 1 1

I think the queries still need to be modified a bit to be compatible. I think queries with a limit may need the semi-colon moved around, the precision indicator during date arithmetic in query 1 may need to be removed, and the method for naming columns in query 13 might need changing.

Thursday, August 14, 2008

Chinese Handwriting Recognition in Windows XP

If you want to write Chinese characters in Windows XP, there are lots of options available like Google's IME, but a lot of these options aren't practical if you don't know any Mandarin. There are websites for inputting Chinese in Cantonese, but still it isn't ideal.

Apparently, the IME in Windows XP can be upgraded to support limited hand-writing recognition in Chinese, but Microsoft conveniently hides the IME upgrades on its websites. Fortunately, I found a forum, which describes how to download these upgrades from the MS Office website. Once I installed these upgrades, I could then use the IMEPad to write Chinese characters using a mouse or tablet. In fact, the IMEPad option was available in my plain version of Windows XP, but clicking on the IMEPad button did not result in anything happening.

Wednesday, March 26, 2008

IE7 Erases .XML Files

So I had spent a few days writing a long .xml file with a corresponding .xsl stylesheet. I was using IE7 to look at the generated html. At one point, I chose Page, then Save As. Then I chose Cancel.

Poof.

IE7 erased my .xml document. Well, I guess I didn't really need the document that badly after all.

Monday, December 03, 2007

Tiles Don't Generalize

For a couple months, I've been experimenting a bit with SVG during my free time to get a feel for how to use it cleanly on the web. In particular, I've been playing around with trying to create a little tile-based game using SVG and JavaScript.

The nice properties of grid-based tiles using an isometric or 3/4 perspective are well-known. As long as every object in the game fits exactly in a 1x1 grid cell, then everything works great. It's easy to figure out an order to draw the objects in order to ensure that nearer objects properly occlude further objects. You can stack objects within a grid and do other nice stuff too.

After seeing all those isometric-perspective computer games with characters running in-between grid tiles and with all manner of odd-shaped objects though, I assumed that the algorithms for choosing the order to draw objects in the isometric perspective generalized in some nice way. Of course, I realized that things couldn't generalize arbitrarily. If you allowed for arbitrary objects in arbitrary poses, then you'd end up with an arbitrary 3d scene drawn from an orthographic perspective. Then you'd need the full painter's algorithm with polygon chopping etc.

But, I assumed that if all objects were in non-overlapping axis-aligned bounding boxes, then I could figure out some way to order the objects without needing to chop any of them (since you can't really do this is SVG). In particular, I wanted the floor tiles and the objects to be all mixed up in the rendering code. But for the isometric perspective, it is possible to create degenerate examples where objects mutually occlude each other. So you can't generalize algorithms for the isometric perspective this way. In particular, my plan of having floor tiles and wall tiles, plus arbitrarily sized objects sitting on the floor tiles just wouldn't work.

In fact, the same thing happens with the three-quarter perspective as well (I've shifted the perspective a bit so that it's easier to see the depth of the objects).

So having arbitrarily sized objects that can stack on top of each other arbitrarily won't work. Having objects in non-overlapping axis-aligned bounding boxes where all the bounding boxes sit on the same plane, that seems to work ok. With the isometric perspective, it seems like it might be a little bit hard to get the right ordering of objects during drawing, but it seems doable at least. With the 3/4 perspective, everything works out well in that particular situation.

So basically, one way to get around this problem is to have a flat floor and use tiles to cover the floor as one layer, then on a second layer, you can have objects and walls and stuff that you can order separately. I imagine there are other cases that lead to an easy ordering of objects (perhaps if everything has a fixed height? or maybe in the 3/4 perspective, if you allow for only two levels: one for arbitrary height terrain and then another for things on top?), but I haven't really figured it out yet. Perhaps in reality, degenerate cases never really end up occuring, so you can just sort of hack in lots of special cases in the rendering engine to make sure all the objects are rendered in the right order.

Well, in any case, the point of this post is to show that tiles are trickier than I thought. They really don't generalize nicely, so you have to put a lot of thought into how to handle things once you allow for objects that don't fall exactly within grid boundaries.

Friday, September 14, 2007

Resolution Independence in Vista

I recently got a Thinkpad notebook that crams a 1400x1050 resolution onto a tiny screen. It's a great screen, but unfortunately everything ends up being a bit too small. If you have resolution independent applications, the increased resolution can be a real joy though because the text becomes so clear and easy on the eyes. Although previous versions of Windows had very limited support for resolution independence, things supposedly had improved quite a bit in Vista.

Unfortunately, things don't quite fit together that well. Just like earlier versions of Windows, you can increase the default font size in Vista. Unfortunately, this causes various glitches in things. For example, text no longer fits in the dialog boxes of Windows Media Player (I suspect that the Windows Media Player people must have written some custom dialog box code because I'm pretty sure the default dialog box code scaled things correctly even back during the Windows 95 days). And lots of fonts on web pages are specified in terms of pixel sizes, and their sizes don't get adjusted, so the text ends up being too small to see. Internet Explorer does have a zoom capability that redefines pixel sizes, but it also increases the size of the default font, which has already been increased, making things look too big.

Vista also has the option of simply scaling up all pixels to a larger size. Basically, Windows will fool legacy applications into thinking that the screen has a lower resolution, so that the applications render at a smaller size. This render is then scaled up to fill the larger resolution. Resolution independent applications, however, can render at the full resolution. Unfortunately, Adobe (Acrobat) Reader is considered a legacy application. Being able to read .pdf documents with crisper text is one of the main reasons I chose the high resolution screen, and you lose all those benefits if you render everything to a lower resolution and then scale it up. I imagine the same problem crops up in other applications as well. For example, Internet Explorer doesn't have strong support for vector drawing formats like SVG, so it's not possible to get high resolution graphics on web pages, so you have to live with scaled up versions of bitmaps instead.

Based on this experience, I imagine it will take another few years before it becomes practical to use Windows comfortably on a high resolution display. The new version of MacOS will supposedly have strong support for resolution independence, but I doubt that they've actually managed to solve the problem in a comprehensive manner. I think that doing resolution indepence right probably requires a larger rethinking of UI widgets and layout managers (basically, it will be the same technology needed to layout a single UI design on both tiny mobile device screens and large LCD monitors with needing lots of manual tweaking), but I haven't really heard much about research into that area by Apple, so I doubt they've hit the final solution yet.

Saturday, February 17, 2007

Serializing Signals without Atomic Compare and Swap

So basically, the problem with event handlers in IE6 is that they might become interrupted by other event handlers, making synchronization difficult. What you want is a more traditional event driven model where only one event handler can execute at a time. Essentially, you want a scheme that serializes signals so that the event handlers execute sequentially. Unfortunately, since JavaScript comes with no synchronization primitives like atomic compare and swap, so doing this is messy.

So I think I've come up with a scheme that lets you wrap event handlers in a function that will test if other event handlers are executing at the same time. If they are, it will add itself to a queue to be executed later. There's various annoying synchronization issues involved though, so this is not as simple as it sounds.

Here's the pseudocode for the event wrapper function:


globals root, queue
if root == null
  root := true
  execute own handler
  root := null
  while !queue.empty()
    root := true
    queue.dispatch()
    root := null
else
  queue.add(self)

And here's the pseudocode for the queue methods:


queue.add(fn)
globals end, isClaims[], handlers[], queueLength
myend := end
while(true)
  if isClaimed[myend] != null
    myend++
    if myend >= queueLength
      myend := 0
    if myend == start
      error!
    continue
  isClaimed[myend] := true
  if handlers[myend] != null
    continue
  handlers[myend] := fn
  break
end := myend
 
queue.empty()
globals isClaimed[], start
return isClaimed[start]==null
 
queue.dispatch()
globals isClaimed[], handlers[], start, end, queueLength
if isClaimed[start] != null
  myend := end
  if myend == start
    myend++
    if myend >= queueLength
      myend := 0
    end := myend
  execute handlers[start]
  handlers[start] := null
  isClaimed[start] := null
  start ++
  if start >= queueLength
    start := 0

And here's how you could code it up in JavaScript. Basically, I define a handler function. You can then use the handler function to wrap your own event handlers before setting them to listen for events:


<script>
// Initialization script that creates the "handler" function for wrapping
// handlers
(function () {
  var SIZE = 100;                   // max number of simultaneous events
  var root = false;                 // boolean, indicating whether
                                    // event handler is the first one in a
                                    // chain or not
  var queue = new Array(SIZE);      // Event handlers to be executed later
  var thisQueue = new Array(SIZE);  // Corresponding this objects
  var isClaimed = new Array(SIZE);  // reserves a spot in queue
  var start = 0;                    // position in queue circular buffer
  var end   = 0;
 
  // Define the global handler wrapper function
  handler = function(eventHandler)
  {
    return function() {
        if (root == false) {
          root = true;
          eventHandler.call(this);
          root = false;
          while (isClaimed[start] != null)
          {
            root = true;
            if (isClaimed[start] != null) {
              var myend = end;
              if (myend == start) {
                myend++;
                if (myend >= SIZE) myend = 0;
                end = myend;
              }
              queue[start].call(thisQueue[start]);
              queue[start] = null;
              thisQueue[start] = null;
              isClaimed[start] = null;
              start++;
              if (start >= SIZE) start = 0;
            }
            root = false;
          }
        } else {
          var myend = end;
          while (true) {
            if (isClaimed[myend] != null) {
              myend++;
              if (myend >= SIZE) myend = 0;
              continue;
            }
            isClaimed[myend] = true;
            if (queue[myend] != null) continue;
            queue[myend] = eventHandler;
            thisQueue[myend] = this;
            break;
          }
          end = myend;
        }
      };
  }
})();  // Execute the initialization code
 
function loaded()
{
  alert('loaded ' + this.src);
}
 
var img1 = new Image();
var img2 = new Image();
img1.onload = handler(loaded);
img2.onload = handler(loaded);
img1.src = "test1.png";
img2.src = "test2.png";
alert('done')
</script>

Asynchronous Callbacks in IE6 JavaScript

Well, back in October, I noticed that Internet Explorer 6 had a few quirks with its usage of event handlers. JavaScript is single-threaded and does not contain any synchronization primitives. As a result, when JavaScript is embedded in a web browser, it is supposed to use an event-driven model: as events occur, they are put into a queue, and then a single thread removes events from the queue and executes handlers for them. Unfortunately, IE6 does not rigorously follow this model. Some of their callbacks follow a model used for interrupts or UNIX signals. Even if another piece of JavaScript code is being executed, the code can be interrupted by an asynchronous event, and the callback for that event will be run. When the event ends, the original code will resume. Unlike signals or interrupts though, IE6 allows these callbacks to interrupt other callbacks.

So, if you put the following code in a web page and run it in IE6, you will get three different dialog boxes showing up simultaneously:

<script>
function loaded()
{
  alert('loaded ' + this.src);
}
 
var img1 = new Image();
var img2 = new Image();
img1.onload = loaded;
img2.onload = loaded;
img1.src = "test1.png";
img2.src = "test2.png";
alert('done')
</script>

I believe this problem was corrected in Internet Explorer 7 because if you run the same code in IE7, you only get one dialog box showing up at a time.

In any case, I think I came up with a solution for this problem back in November, but I never got around to testing it until now. I'll put the code up in my next post.

Wednesday, October 25, 2006

JavaScript in IE is not Multithreaded

After coding up my mutex and experimenting with it, I've discovered that Internet Explorer isn't multi-threaded. Instead, it has something worse. Much, much worse.

In any case, here's my mutex code:

function Mutex(maxThreads)
{
 // This mutex is based on the paper "Tight Bounds for Shared Memory Symmetric
 // Mutual Exclusion Problems" by Eugene Styer and Gary L. Peterson.
 if (maxThreads < 2)
  alert('Mutex is configured incorrectly');
 this.waiting = new Array(maxThreads);
 this.locks = new Array(maxThreads);
 for (var n = 0; n < maxThreads; n++)
 {
  this.waiting[n] = null;
  this.locks[n] = null;
 }
 this.maxThreads = maxThreads;
 this.whoInCriticalSection = null;
 this.turn = null;
 this.get_visible = function(level, retry, me) {  // This is private
  if (level == -1)
  {
   this.waiting[0] = me;
   level = 0;
   retry = true;
  }
  else if (this.waiting[level] != me)
  {
   // Don't write if we can make progress
   if (this.turn != me)
   {
    if (retry)
    {
     // Assume mistake, once
     retry = false;
    }
    else
    {
     level = level + 1;
     retry = true;
    }
    // Make visible again
    this.waiting[level] = me;
   }
  }
  return [level, retry];
 }
 this.enter = function() {
  var me = new Object();  // generate a unique id for this thread
  var level = -1;
  var retry = false;
   while (true)
  {
   // See if lock is available, or if it assigned to me
   while (this.turn != null && this.turn != me)
   {
    var ret = this.get_visible(level, retry, me);
    level = ret[0];
    retry = ret[1];
   }
   this.turn = me;
   var locked;
   while (true)
   {
    // Try to get all locks
    for (var pos = 0; pos < this.maxThreads; pos++)
    {
     if (this.locks[pos] == null)
      this.locks[pos] = me;
    }
    // Do we have all the locks?
    locked = true;
    for (var pos = 0; pos < this.maxThreads; pos++)
    {
     if (this.locks[pos] != me)
      locked = false;
    }
    if (this.turn != me  locked) break;
   }
   if (this.turn != me)
   {
    // Lost, release locks
    for (var pos = 0; pos < this.maxThreads; pos++)
    {
     if (this.locks[pos] == me)
      this.locks[pos] = null;
    }
    var ret = this.get_visible(level, retry, me);
    level = ret[0];
    retry = ret[1];
   }
   else
   {
    break;
   }
  }
  // Acquired lock
  if (this.level > -1 && this.waiting[this.level] == me)
   this.waiting[this.level] = null;
  this.whoInCriticalSection = me;      // Save our id for later
 }
 this.leave = function() {
  var me = this.whoInCriticalSection;  // Get our saved id
  this.whoInCriticalSection = null;
  // Find ID of top waitier or 0 if nobody
  var next = null;
  for (var pos = this.maxThreads - 1; pos >= 0; pos--)
  {
   if (this.waiting[pos] != null) {
    alert('handoff');
    next = this.waiting[pos];
    break;
   }
  }
  // Let them continue, release locks
  this.turn = next;
  for (var pos = 0; pos < this.maxThreads; pos++)
  {
   if (this.locks[pos] == me)
    this.locks[pos] = null;
  }
 }
}

So back to the original point. In the end, JavaScript in Internet Explorer isn't multi-threaded. There's only one thread, but it can be interrupted by callbacks or signals, and the thread won't regain control until after the signal/callback handler completes.

So if you look at this chunk of JavaScript code

var image = new Image();
image.onload = function() {
 alert('a0');
 alert('a1');
 };
image.src = 'blank.gif';
alert('0');
alert('1');

you'll notice that the alerts you get are for a0, a1, 0, and then 1. The onload callback does not give up control back to the main thread until after it finishes giving its two alerts. This means that mutexes and critical sections can't be used in JavaScript because if one thread is inside a critical section and becomes interrupted, the other thread cannot yield to the other thread to let it exit its critical section. So basically, you can't use locks or critical sections or other standard programming techniques to protect important parts of code.

I don't know what the best way to fix this is. It's just messy...very, very messy.

Monday, October 23, 2006

JavaScript Mutexes 2

Actually, I started thinking some more about symmetric mutual exclusion, and I realize that I was actually mistaken. You don't actually need to use any randomization. Although in a real system, the only way to generate unique thread ids would be to use randomization, JavaScript does have one operation that is guaranteed to generate unique keys--just use new to create a new Object since new is atomic. You can then use this object as your thread id. And since it's an object, JavaScript will automatically take care of cleaning it away once you're finished with it.

So, in conclusion, JavaScript mutexes are possible if you use a shared memory symmetric mutual exclusion algorithm and if you are able to put a bound on the number of threads that may attempt to enter a mutex at any one time. This is the best approach to the problem. There are no unacceptable trade-offs that need to be taken into account (beyond the whole messiness of needing to implement your own mutexes in JavaScript in the first place).

Mutual Exclusion in JavaScript

I started reading up on some papers, and I've found that the type of mutual exclusion being attempted by Wallace in JavaScript is actually impossible. Totally anonymous mutual exclusion using atomic RW registers cannot be done.

That leaves two choice:

1. One can hand-out thread ids in advance (i.e. assign one to each possible callback) during a stage when one knows the code will have only one thread running

2. Use randomized mutual exclusion algorithms

Personally, I think option #1 is very hard to implement correctly, so option #2 is probably the most feasible. Basically, you randomly generate a thread id for yourself at any time (in fact, you can randomly generate a thread id every time you want to enter a mutex). Then, you can run a symmetric mutual exclusion protocol using atomic RW registers.

I found a paper that seems to give one such algorithm:

"Tight Bounds for Shared Memory Symmetric Mutual Exclusion Problems" by Eugene Styer and Gary L. Peterson.

Thursday, October 19, 2006

JavaScript is Multi-Threaded

I was doing some JavaScript programming where I hooked a bunch of image onload() callbacks with functions that would generate alerts, when I noticed that various alerts were firing before I had finished hooking up all my callbacks. This caused me a lot of consternation. JavaScript does not come with any threading primitives. As such, it's very hard to do any synchronization. Yet clearly, my version of Internet Explorer was triggering these callbacks asynchronously, in threads other than my main thread, resulting in various race conditions in my code.

Firefox seems to use a more sensible model. It seems like it has only one JavaScript thread (much like the single UI thread in most GUI systems), and asynchronous events queue up and are only fired when your JavaScript thread isn't doing anything (of course, that's brings up the other problem of the fact that Firefox seems to discard events if too many get queued up--but at least that's easier to handle than race conditions).

I went looking around for various mutex code to solve this problem, but nothing seems quite right. Some Bruce Wallace person created a JavaScript version of mutex, but I don't have much confidence in it. First of all, he replaced a fixed size array from Lamport's Bakery algorithm with a Map. The whole point of the fixed size array was that you could perform atomic reads and writes on it--a reasonable assumption. Once you use a Map, you're assuming atomic adds, atomic removes, and atomic iteration, which are much larger assumptions. In any case, if you had an atomic add operation, then you wouldn't need to implement the Bakery algorithm because you could use atomic add to create a mutex. His assignment of thread ids also has a race condition in it. I suppose I could just assign a separate thread id to each callback in advance, thereby avoiding the race condition, but if I did that, then I wouldn't need this Map nonsense and I could use a fixed size array because I would know how many threads there are in advance.

Instead, I simply reshaped my code so as to minimize the race condition to an atomic increment instruction. It seems unsatisfactory though. Personally, I think JavaScript should not be multi-threaded, but if it is multi-threaded, they should at least supply something like Array atomic_add() and atomic_remove() methods so that users at least have the proper primitives available for building synchronization primitives. And I'm not sure whether JavaScript engines are themselves thread-safe. I know that Mozilla's Rhino claims that it is thread-safe (I vaguely remember them stating that get and put operations were atomic), but I don't know about the engines in FireFox and IE.

Update 2011-3-31: I was going through my blog stats for the first time, and I noticed that people seem to read this post occasionally. This is an old blog post, and I explore the situation in more detail in later blog posts. Actually, JavaScript is NOT multi-threaded. In IE6, you do get some weird non-sequential behaviour, but that's due to nested asynchronous callbacks.

Thursday, May 18, 2006

Getting an SVG Bounding Box out of Batik

I was using CorelDRAW to create some GIFs using a script, but CorelDRAW 11 has a lousy GIF exporter, so I couldn't really get the output that I wanted. So, I thought I would export my CorelDRAW data to SVG, and use the Batik SVG libraries to export the data I wanted to GIF (using the GIF exporter I described in the previous post).

I did encounter a problem though in that CorelDRAW defined a page size on the SVG document, and the Batik transcoders always exported the entire page (instead of just the parts of the page that contained my image). If I removed the page information from the SVG document, then Batik would just export an image with a default size. What I needed was a way to get a bounding box on my image, which I could then pass to Batik as the export size.

When I searched on Google, all the information that I could find about creating bounding boxes involved creating UI objects to render into, which triggers Batik to calculate bounding box information. This seemed a little bit overly heavy to me, but by looking at how the Batik transcoders render their images, I was able to just reuse their rendering code to trigger a bounding box calculation without actually rendering to a UI.

The first step is to remove the page sizing information from the SVG document (otherwise, Batik will just use the page size as the bounding box). Just run this XSLT filter over it:

<?xml:namespace prefix = xsl />
<xsl:stylesheet version="1.0" svgns="http://www.w3.org/2000/svg" xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml">

<xsl:template match="@*node()">
<xsl:copy>
<xsl:apply-templates select="@*node()">
</xsl:copy>
</xsl:template>


<xsl:template match="svgns:svg/@height"></xsl:template>
<xsl:template match="svgns:svg/@width"></xsl:template>
<xsl:template match="svgns:svg/@viewBox"></xsl:template>

</xsl:stylesheet>

Then, you can take the resulting SVG code and feed it into Batik to calculate a bounding box like so:

String parser = XMLResourceDescriptor.getXMLParserClassName();
SAXSVGDocumentFactory f = new SAXSVGDocumentFactory(parser);
SVGDocument doc = (SVGDocument)f.createDocument(svg);
GVTBuilder builder = new GVTBuilder();
BridgeContext ctx;
ctx = new BridgeContext(new UserAgentAdapter());
GraphicsNode gvtRoot = builder.build(ctx, doc);
return gvtRoot.getSensitiveBounds();

You can then use the returned information to set the "viewBox," "height," and "width" attributes on the svg tag of the SVG file. When you then run the transformed SVG file through Batik, Batik will export the image with the bounding box set tightly around the contents.

-Ming

Exporting Transparent GIFs from Java

After spending a while trying to create paletted PNG files with transparency that work in Internet Explorer (after much trying, I couldn't coax ImageMagick to create such a file, though I was able to use GIMP to do it), I decided that GIFs were the only practical file format to be used. But it's hard to get a Java program to output GIF files.

Because of the whole patent licensing issue, there aren't really any good, full-featured, unencumbered GIF exporters for Java. But there are various good bits and pieces on the 'net, which can be pieced together to make something reasonable.

First, you need some code that can actually encode the GIF file. The US government has some code that can do this (offered as part of NIH's ImageJ). All you have to do is download ImageJ and grab their ij.io.GifEncoder class.

The ImageJ code only accepts paletted data as input though. Since most images are in 24-bit colour, you need to perform a colour reduction somehow. Fortunately, someone went and took the colour quantizer used in ImageMagick and rewrote it for use in Java. It uses an octree for its colour quantization, which is a technique described in much detail elsewhere on the Internet. The quantization code needs to be reworked a bit to properly handle transparent pixels, but fixing that isn't too much work. A more important fix though is that the code performs an unnecessary optimization to save memory, which results in poor quality quantization. In the constructor for Cube, it sets the tree depth to log_4 of max_colors. Instead, the tree depth should just be set to the maximum depth of 8. Although the original ImageMagick did limit the tree depth to log_4 of max_colors, they used another optimization elsewhere in the code to reduce the effect of that limitation.

Once you combine these two pieces of code together (you have to change some of the APIs to handle transparency and other things), then you end up with a very reasonable GIF exporter.

Monday, March 13, 2006

Firefox Should Stop Being So Smug About It's CSS Support

Because, honestly, Firefox is missing some fairly major CSS features. Notably, it's missing support for "inline-block." This problem has been known since 1999 (7 years ago!) and is listed as bug 9458, but the Mozilla people haven't been particularly inclined to fix it.

I think the problem is fairly important because it's the one CSS feature that allows for arbitrary nesting of inlines and blocks. Usually, you can put inlines in blocks and blocks in other blocks. Inline-block is the feature that lets you put blocks in inlines. This opens up a certain class of layouts that aren't otherwise possible.

Reading the bug description in bugzilla, it seems like Mozilla's Gecko rendering engine is fundamentally flawed and can't really handle this feature without a major rewrite. Internet Explorer handles inline-block well, so, in my opinion, IE has superior CSS compliance than Firefox.

Tuesday, December 06, 2005

Jakarta Commons CLI 1.0 Is Not Very Good

I wanted to do some quick command-line argument parsing in Java, so I grabbed the CLI component from Jakarta Commons and gave it a whirl. Unfortunately, this component is not very good. I fit it into a jar of about 28KB, which is not large but not that small either. The resulting command-line parser is able to identify various command-line arguments, but it isn't able to do more advanced parsing such as automatically converting arguments to integers (like jargs). It also isn't able to make use of Java annotation coolness (like args4j). It has a nice feature for automatically generating help messages, but the generated option list doesn't really match what the parsers will accept. For example, the help message will say that "-gui" is a valid option, but when using the PosixParser, the parser kept misreading the "-gui" as "-g" and rejecting it for some reason. I had to switch to the BasicParser to get it to do something reasonable.

Overall, it's nothing that I can't live with, but I was hoping for something a little better from Jakarta Commons. I think they should junk the multiple parsers, and just focus on a single more full-featured one.

Thursday, November 17, 2005

Recording with Java Sound on the Macintosh

I was trying out my little Java VoIP application with a friend of mine who owned a Macintosh, and it didn't quite work right. Besides the whole dubious Java 5.0 support for the Macintosh, I couldn't open a TargetDataLine at 8000 kHz.

On PCs, you can record and playback audio at arbitrary sample rates. That's not the case on the Macintosh. After getting another friend to do some probing, I found that the Java sound system on Macintoshes can playback at arbitrary sample rates, but can only record at 44100 Hz. You can record at 8/16-bit depths, with (un)signed types, and at different endianness though, I think.

Saturday, November 12, 2005

VoIP in Computer Games

About a year and a half ago, I started to become curious as to why there were so few computer games with VoIP support. It should be relatively easy, shouldn't it? You just grab an open source sound codec like Speex, throw in some sound code, add some network packet buffering code, mix, and voila--VoIP support. Unfortunately, I became distracted by something else and never got around to looking at the issue more deeply.

Just recently, I became curious again, so I decided to code up a small VoIP application. And, as expected, it did end up being fairly easy. I have to admit that I did waste a week going down the wrong path--I configured my sound card wrongly, resulting in a lot of feedback in the signal, so I spent a week examining algorithms for echo and feedback cancellation. But once I figured out what I was doing wrong, I was able to bang out a working little VoIP application in about a week.
The only part that wasn't really obvious (to someone who's been thinking about the problem for the past year) was the packet buffering code. Conceptually, it should be easy, you just store incoming packets somewhere and wait a few milliseconds before using them to compensate for packets coming in out of order and with jittered delay. But that leaves a lot of unresolved issues such as what should you do if you use up all the packets waiting for you? You can skip a packet (i.e. assume that they are lost) or skip a "heartbeat" (i.e. assume that you're consuming packets too quickly and wait for the packet to arrive). And what happens if you receive too many packets?

The scheme that I ended up using was to trigger everything based on the difference between the highest packet sequence number in the packet buffer and the next packet sequence number that the program is expecting. So whenever my program tries to grab the next packet, if the difference in sequence numbers is below a certain threshhold, then I skip a heartbeat; otherwise, I grab the next packet, skipping it if it hasn't arrived yet. If the difference in sequence numbers ever gets above a certain threshhold, then I advance the expected sequence number to get below the threshhold. I also had different threshholds for deciding when to adjust for clock-skew. Oddly enough, my sound card seemed to playback audio at a different sample rate than what I specified for some reason (it could quite well be my fault), so I put in some code for stretching incoming sound samples to take up more or less time. All of these threshholds should probably be adaptive somehow, but I couldn't be bothered, so I just set them to some default values.

Other than that, everything was pretty straight-forward. In Windows Java, the System.currentTimeMillis() method is very low-resolution, which caused my server to alternate between thinking it was very fast or very slow. Fortunately, newer versions of Java have a nanoTime() method, which provides a better high-resolution timer. I also had a bit of a problem with silence detection because my sound card didn't use 0 as a baseline. Sound samples seemed to center around -100 or so. I was able to get around this by taking the max and min sound samples of every chunk of sound, averaging them, and then exponentially averaging them with my previous estimate of what the baseline was. This seemed to work as an adaptive baseline. I'm not sure how to erect an adaptive silence detection algorithm on top of that though since I put a static algorithm there instead.

So the conclusion of all this is that VoIP support isn't included in more computer games due to laziness on the part of developers. It really is very easy to do. There may be some latency issues where computer game code is stealing so many CPU cycles that the VoIP code can't get CPU resources at regular enough intervals that it causes latency, but I'm sure it's something that game developers could work out with thread priorities and stuff like that.

Wednesday, September 14, 2005

Adding a Little Colour to Sine Waves

I've never really been much of a audio creation buff, but when I heard that Propellerhead were releasing their ReBirth audio program for free, I quickly downloaded it because I've always been curious as to what all the fuss was about. ReBirth is apparently an emulator of the TB-303, which is a piece of techno hardware from the 80s that could be used to generate weird instrument sounds and to sequence those sounds into some simple tracks. It also emulates some other pieces of hardware (namely some rhythm boxes). I actually found ReBirth to be really difficult to use and completely confusing, so I uninstalled it.

Fortunately, I also downloaded the free Rubberduck program from d-lusion, and it was quite fun and easy to use (not that I have sufficient talent to get the program to do anything that sounds good, but at least I can make those crappy sounds easily as opposed to with ReBirth). After playing with it for a while, I noticed that if I turned off all the filtering effects, I could still make some nice instrument sounds just by combining sine waves and square waves.

This was significant to me because I occasionally create little Java games, but it's hard to get any music into those games. Java includes support for MIDI output, and the full Java VM even comes with Roland sound patches for its instruments, but the existence of these patches isn't guaranteed, so you can't rely on them for game music. Instead, you have to supply game music via sound files, but then you need to generate these sound files somehow. I purchased some cheap music composition software, but it creates sound files by playing the music out through the computer sound card and recording it. Since my sound card is really cheap, the resulting sound files end up containing much too much noise to be useful. There are open-source programs that can take MIDI songs and output sound files directly, meaning you don't get any noise, but these programs all use sound patch sets of questionable legality, so I didn't want to use them. But if I could just make my own instrument sounds by combining some sine waves and square waves, then I could just create a program for converting MIDI songs into sound files myself without the problems of noise or legal encumbrances.

Unfortunately, once I wrote a program for combining sine waves and square waves in various ways, I realized that the sounds that my program was generating were puny and weak. Even the simple sine wave sound generated by d-lusion's Rubberduck was much fuller and colourful than the sine wave generated by my program.

The documentation for d-lusion's Rubberduck said that they pass their sound through a low-pass filter, but I thought I had turned all the filtering parameters off. Besides, a low-pass filter shouldn't really affect a sine wave. In any case, I'm really weak with signal processing, so I couldn't really design such a filter anyways.

Well, anyways, I was looking over the "Cookbook formulae for audio EQ biquad filter coefficients" document by Robert Bristow-Johnson which is, essentially, the dummies guide to creating various filters using digital feedback mechanisms. When I looked at what Rubberduck did to square waves (it rounded out the corners), I suspected that maybe I could just sort of randomly feedback things into itself and maybe get a similar effect.

So that's what I did:

y[n] = 0.8 * y[n-1] + 0.2 * x[n]

where x[n] is a sine or square wave and y[n] is the final sound output, and the result is exactly what I wanted. Whereas a simple sine wave is teeny and annoying, when you pass it through this filtering, you get a richer, rounder, more bloated sound. Of course, when you look at the raw wave form, it still looks like a sine wave, but I suspect that it's slightly reshaped so that you get some interesting side frequencies going on. I tried doing a spectral analysis, but I couldn't really interpret it (it looked consistent with a sine wave to me), but it sounds a lot different, so there must be some sort of subtle colouring there.

Of course, to get really interesting sounds, you need adjustable filters that allow you to get various cool effects, but I think I have pieces of code for making blip and bop noises for my Java games.

Monday, July 25, 2005

Computer Generation of Road Networks

For the past couple of days, I've been thinking of how to get a computer to automatically generate a city road network. I really have no idea what the best way to approach this is.

Doing a quick search through the Internet, there are computer techniques for automatically generating terrain, for simulating the growth and sprawl of cities, for simulating the growth of small neighbourhoods, and for simulating traffic on a road network, but only a few simulations for generating accurate road networks themselves. In fact, the simulations that do exist seem to focus on simulating lot use with the expectation that the road network just sort of "falls out."

I think the difficulty is that there isn't that much scientific use for simulating the development of road networks, so a lot of research hasn't gone into that area. Of course, the people who do research on urban sprawl really should do more research into road networks too because urban sprawl tends to be dependent on the location of existing highways etc. Anyways, the problem is also hard because road networks tend to be highly planned and need to take a lot of factors such as geography, cost, etc. into account during their evolution, so it's hard to get a computer to randomly generate this type of data.

Still, I wonder what it takes to build a road network generator. I imagine that an urban growth simulation model might actually be the wrong approach because it'll end up being too complex. I think a top-down approach that lays out some highways, throws down a coast-hugging road network near water, builds a grid patterned downtown, and a squiggly suburb area might actually be the easiest approach. Hmm.

Friday, June 03, 2005

Theoretical Underpinnings for Kids' Programming Tools 2

Y'know, yesterday I was fretting about the fact that without strong theoretical underpinnings, my programming website for kids would never become a useful tool. But there no reason for me to have such a crisis of confidence because, honestly, which of these people is most likely to make the best programming tool for kids:

A PhD from MIT who has been professionaly developing his patented educational programs for more than 10 years.

A Turing award winner who has spent decades studying user interactions with programming languages.

Me.

Obviously, my approach will be the best. :-)

Teaching about Loops

So I finally finished most of the artwork for the "if" section of my programming website for kids. It seems that the Java VM has improved a bit since I last used it. I'm finally getting consistent synchronization behaviour between when my applet running in a web broswer and my applet running stand-alone. And Swing now seems to support automatic text cut&paste with Windows applications (sigh, all of this could have been avoided if they used native widgets!).

And I'm now starting the section on loops. I've spent a few months thinking about this topic, but I'm still not sure about what the best approach is.

In traditional programming books about BASIC, the loop section would talk about FOR...NEXT loops:

FOR n=1 to 5
REM Stuff that happens 5 times
NEXT n

Unfortunately, the Javascript for loop is a little bit cryptic (especially since I haven't covered increment operators, the less-than operator, or the greater-than operator). Additionally, the convention for loops is that they start at 0. But, people are more comfortable with loops that start incrementing at 1 (like in BASIC). However, if I use 1-based loops throughout my examples, people might get confused when they later start working with normal code.

I suppose I could treat the syntax of the "for" loop as gobbledygook that should be treated as a black-box. This then reduces the number of concepts that need to be understood to work with loops.

Another alternative is that I could start with while loops. The advantage is that the word "while" actually means something, so it might be easier to understand. Unfortunately, there isn't all that much that one can do with a while loop without understanding the concept of sentry variables, which involves teaching 2 or 3 abstract concepts all at the same time.

I could teach the concept of the infinite loops first via "while(true)," but assuming that I don't teach boolean algebra beforehand (which I don't), then the whole command would need to be treated as a black-box, defeating the whole purpose of starting with the "while" loop in the first place. Then, I could follow that by teaching the concept of using "break" to break out of loops. This seems a little odd to me because no one else teaches loops in this way. Perhaps the "breaking out of loops" concept is simply too abstract a concept. But it does seem like a lot less work than discussing sentry variables and such.

Well, I don't know. For the past few months, I've been keen on the idea of starting with the while loop, but now I'm starting to come around to the idea of starting with infinite for loops. I'll have to ponder this some more.

Thursday, June 02, 2005

Theoretical Underpinnings for Kids' Programming Tools

During the weekend, I happened to come across a cheap scanner for sale, and I grabbed one so I would have the tools to start working on my programmingbasics.org website again. This then inspired me to take another look at the tools available for teaching kids to program. It looks like the same stuff as always--Squeak, ToonTalk, and Logo. Of course, all of these tools have strong theoretical underpinnings describing interaction models, learning processes, etc., whereas my website simply focuses on recreating the programming environment available to me when I was a kid. I'm no longer confident that I'm taking the right approach any more. Will my website (assuming that I ever finish it) have the correct foundation for teaching kids to program? Should I take a more holistic approach to the site and try to provide a complete environment for "experimentation" as opposed to simply creating something that focuses solely on just programming? Hmm. Perhaps I'm being naive. Kids today associate computer with great graphics and sound. Creating an environment that focuses on programming and doesn't allow kids to immediately generate flashy animations might lose their interest too quickly.

When I learned programming, all computers were textual. In fact, I didn't even own a computer. I was fascinated by flowcharts and wrote my first computer program out on paper by hand. The books available for teaching kids to program focused on real languages and on real syntax issues. And that's what I wanted to learn when I read those books. That's why my website is based on Javascript and focuses on syntax. But maybe those old programming books for kids died out for a reason, and maybe that reason was that kids aren't interested in learning programming that way any more. I guess I'll have to think about it.

FBX File Format

Previously, I've complained about the U3D file format for having such an obscure encoding scheme that I felt it was impossible to read a U3D file without using the unreleased Intel U3D libraries that everyone else apparently uses to read and write this file format.

But after looking at some other 3d file formats, perhaps this isn't so unusual. For example, Kaydara's FBX file interchange format (now owned by Alias) doesn't even have a documented encoding format. As far as I can tell, the only way to read a FBX file is to use the provided SDK libraries. Personally, I can't imagine how you could call something an interchange file format if you don't actually describe what the file format is. But maybe I'm just not used to how things work with 3d graphics.

Oddly enough, the only light-weight 3d file format that seems to be well-documented is Microsoft's DirectX file format. Unfortunately, it's apparently no longer under active development.

Wednesday, April 27, 2005

Fedora Core 3 and User-Mode Linux

Previously, I made a Fedora Core 3 disk image for User-Mode Linux by simply copying the disk image of a running system into a loopback disk image and starting User-Mode Linux using that disk image. That works, but I later found that some of my services didn't start up correctly in UML.

It seems that Mysql doesn't work with the current User-Mode Linuxes based on 2.6 kernels. Apparently, UML doesn't implement some of the new threading features of 2.6 kernels, which causes various problems. To avoid that problem, you have to erase the libraries that use these new threading features. I found a forum that listed this fix

mv /lib/tls /lib/tls.backup

that prevents the libraries that make use of the new threading features from being found. While you're at it, it's probably also a good idea to disable the SELinux features because they haven't been designed to be easily configurable in Fedora Core 3 anyways. Just modify /etc/selinux/config so that SELINUX=disabled (instead of "enforcing").

I also had problems with postgresql, but that was because I forgot to make /tmp world-writable. Unfortunately, my postgresql didn't seem to log anything, so I couldn't really diagnose the problem. I tried to play with the boot commands a bit to get it to log something, but in the end, I found it was easiest to simply run the postgres postmaster from the command-line with

su postgres
postmaster -D /var/lib/pgsql/data

Currently, my Fedora Core 3 image for User-Mode Linux seems to be running fine. httpd does seem to crash the UML kernel when it exits, but I can live with that for now.

Sunday, April 17, 2005

U3D is Half-Baked

So during the weekend, I decided to try writing a U3D file viewer. I had high hopes for U3D because U3D seemed to avoid many of the weaknesses of the competing X3D specification. After working with U3D for a bit though, I've concluded that the U3D specification is half-baked. It's almost as if the 33 members of the 3D Industry Forum performed almost no due diligence in reviewing the specification. Personally, I think the spec is unworkable and unimplementable. This is a real shame because Adobe went and added U3D support to Acrobat, meaning that a lot of marketing muscle and mindshare has just been put behind a losing proposition. Even if a better spec were designed in the future, since it wouldn't be supported in Acrobat like U3D, it would likely fail.

I can ignore minor annoying aspects like the fact that the spec itself wasn't well-designed, treating things like arrays in several different ways throughout the spec and going out of its way to avoid simplifying the descriptions of the fields, and such.

But the killer of the spec is its ridiculously overbearing bit encoding scheme. I just cut & paste the code from the specification and translated it to Java, and it's just awful. The code uses something like over 30 lines of code to read an uncompressed 8-bit integer. And the encoding/compression scheme has some characteristics that simply ruin the rest of the specification.

For example, the compression scheme sometimes requires you to feed in parameters of live data structures in order to read subsequent data. This means that you can't simply read the data into data structures and then process it later. You literally have to create a live mesh data structure, manipulate the mesh as data is being read in, and feed the resulting properties of the mesh back into the compression scheme in order to read the next bit of data.

And the encoding scheme has this weird property whereby previous data values read can influence future data values read. So even though the file format is supposedly tag/block-based (meaning that you should be able to process only the types of data that you are interested in and skip the rest), you still need to read, process, and understand every single bit of data in a file in order to read later bits of the file because otherwise the decoding state won't be properly configured when you get there. This even holds true for uncompressed data, which is just wonky. I wouldn't have believed it if I couldn't see it right there in the code that I cut & paste from the spec.

After having spent something like 5+ hours on trying to read in a CLOD Progressive Mesh Continuation Block from a file and failing miserably, I can only conclude that it's simply not worth the trouble to implement. I am now very suspicious about why the 3DIF still hasn't released a reference encoder/decoder three months after the release of the specification. In any case, as far as I'm concerned, U3D is a write-off. It might gain a little bit of mindshare because of its Acrobat-support, but I think it will eventually die off because the only way to create U3D files is to use the rumoured Intel SDK (since it's essentially impossible to write one's own encoder), and most 3D tools developers will decide to ignore it.

Friday, April 15, 2005

Some First Impressions on the U3D Format

I haven't actually done any examining of U3D files yet, but I did a quick skim through of the specification, and I have a couple of first impressions.

The first thing I noticed was that they put a "file size" field in the file format. Personally, I think no self-respecting file format includes a "file size" field. The inclusion of such a field makes it impossible to stream data to stdout. Instead, you have to write out all the data, then rewind all the way back to the beginning of the file, just to fill-in that field. It would be different if the field served some sort of useful purpose, but I imagine all U3D file readers will just ignore it.

Then, the basic geometry type of the file format is the Continuous-Level-of-Detail (CLOD) triangle mesh. This is a little unusual because I thought the consensus in the 3d graphics community was that no one actually uses this stuff (of course, not being in the 3d graphics community myself, I can't be 100% sure, but that's the impression I get from reading papers). The main uses of CLOD meshes are for progressively refining 3d models during data streaming and for animation. I think the Internet experience has been that progressively refining anything during data streaming just isn't worth the trouble. For example, who uses the "interlace" option on their PNGs and GIFs? And properly getting the synchronization right on browsers so that they can handle the rendering of partially downloaded files is just a big mess. Just look at the old Java image APIs that supported these features: they were almost unbearable to use. It's easiest to simply use less-detailed thumbails/models. And I think CLOD meshes aren't used for animation either. If refined versions of CLOD meshes are computed by the microprocessor, then they have to be transferred to the video card for rendering, which is a waste of video bandwidth. It may be possible to do the refining of CLOD meshes on the video card, but I think no one does this because a) it's hard, b) CLOD meshes take up a lot of memory c) refining CLOD meshes aren't easily parallelizable and exhibit poor memory access behaviour d) you can get nicer looking low-polygon models by editing them by hand and using bump-maps and stuff. I suspect that the only reason CLOD meshes were included in the specification is that Intel wanted them there, but I would have thought that the rest of the 3DIF members would have convinced Intel to leave the feature out for the first version and only include it in version 2.0 (and then decide never to release a version 2.0 of the specification). I think it's possible not to use the CLOD mesh features though.

The U3D file format also uses a large variety of data types. I guess it isn't a real problem, but in this age where bandwidth is cheap (and things can be compressed fairly efficiently anyways), I think it makes more sense to have a couple of floating point types, an integer type, a byte type, and that's it. I haven't really examined what all the different types are used for though--there might be a good reason.

And the "U3D" way of doing things means that most resources are given string names. This seems a little inefficient (especially if you use something like UTF-16 encoding). It would seem to me that named resources would be more the exception than the rule, so it would make sense to have most resources be unnamed, and provide an optional naming interface for those resources that do, in fact, need names so that they can be accessed externally.

Also, the file format uses some weird lossless compression scheme. If the compression scheme were decent (like, say, the lossless compression that comes with jpg, mpg, and png), then I guess it might be worthwhile, but it looks like it's possible to get even better compression just by feeding the data into a zip compressor, so I'm surprised they even bothered.

I haven't really dug all that deep into the U3D file format yet though, so until then, I can't really say anything intelligent about the standard. But hopefully, I'll get around to doing that in the future.

Finding U3D Files

For the past few weeks, I've been wanting to play with the U3D file format a bit. Unfortunately, no actual U3D files are available on the web. There are U3D files embedded in pdf documents that can be viewed with Acrobat Reader, but if you want to look at an actual U3D file, you're out of luck. Intel apprently has a SDK for creating and reading U3D files, but they haven't released it to the public yet, so the most widespread way for people to get an actual working U3D file right now is to get a trial version of commercial 3d file format conversion software (which has Intel's U3D SDK embedded in it) and to convert some existing files over to U3D.

I tried e-mailing the 3DIF, asking them to put up a plain U3D file somewhere so that I could take a look at it, but they never bothered replying after a week. Fortunately, after a little bit of research, I found a utility called Multivalent. This utility comes with a tool called Uncompress, which is able to take a pdf document and uncompress all the files attached inside. Of course, afterwards, you still have a pdf document with U3D files embedded inside, but since the U3D data is now uncompressed, it's possible to examine the raw U3D data.

Since all U3D files start with the string "U3D\0" you can just discard all of the pdf data that comes before that string, and everything after that is a valid U3D file (I presume). I used a little Python script like this to discard the initial useless pdf data:

f=open('file.pdf')
data = f.read()
f.close()
# You need to take the second "U3D"
idx=data.index('U3D', data.index('U3D')+1)
data = data[idx:]
f=open('file.u3d')
f.write(data)
f.close()

Saturday, April 02, 2005

Maybe X3D Isn't Such a Bad Spec After All

After previously denouncing the X3D specification, I decided to take a look at the competing U3D specification. I skimmed through the specification trying to think about what would be required to implement a U3D reader (the spec mandates a binary encoding with special compression used) when I realized that there aren't any U3D files available for testing U3D implementations. Nowhere. Except when they're embedded in pdf documents. Perhaps one has to be a member of the 3dif to get access to them, but the problems I have with the X3D specification pale in comparison to the no-no of developing a specification that uses a complicated encoding without providing an example file anywhere. Well, I e-mailed the 3dif. Maybe I was just looking in the wrong places.

3D Tools for the Casual 3D Artist

Every few months, I get the urge to learn how to do rudimentary 3d art, so I do a quick tour of 3d tools available for casual users, and I always come back disappointed. For casual users, it's important that tools be cheap, not require users to have artistic talent in order to create simple objects, and be easy to use.

Immediately, the big, mainstream 3d modelling packages can be ruled out because they're too expensive. Of course, it's possible to use illegal copies of these programs, but I prefer not to do that. Then there are the mid-tier tools, but I'm always hesitant to shell out several hundred dollars for little-known tools from little-known companies that may or may not be any good.

This pretty much leaves low-cost and open source tools. The tool with the most mindshare currently is Blender. Unfortunately, Blender is a good example of a software application where little or no attention has been paid to the user-interface. If you read the development docs, this was an intentional decision on the part of the developers. For example, up until a couple of months ago, Blender proudly touted the fact that it had no undo functionality (telling you to save often instead). Personally, I'm tempted to use terms like "amateur-hour" here, but honestly, if I were a professional 3d artist who had just shelled out thousands of dollars for this tool, I would probably be ok with learning all of the weird UI quirks and hotkeys needed to be productive with the tool. Since I'm not a professional 3d artist and the tool is free, it's probably unfair of me to apply my standards on the tool. I have made a few honest attempts at trying to learn Blender, but all the hotkeys made my head spin and it seemed to crash quite often for me. Plus, based on the tutorials, it seems like users need powerful 3d visualisation skills to make effective use of Blender--skills that I don't possess. There's no way I can think in terms of "take a sphere, extrude this section while rotating it like so, perform a lathe on that object, and then edit the mesh." At best, I can look at a 3d object and play with the vertices a bit to get the shape that I'm looking for.

Milkshape3d is another popular 3d tool, and I've been tempted on many occasions to shell out the cash to go buy it. Milkshape3D is nice because it does only a few small things, but it does those things reasonably well. Unfortunately, every time I'm about to mail away my money to Switzerland, I start to think about the tedium of clicking in multiple windows everytime I want to add a single point to the model, and how I always get confused when I have to keep looking at different windows to know where I am, and I decide to wait for something better. I'm probably being hopelessly optimistic because manipulating 3d geometry on a 2d screen with a 2d input system (the mouse) is, by definition, hopelessly complex. But still, when I've used scaled back versions of professional tools (e.g. GMAX), they always provide nice facilities for manipulating polygon vertices without much hassle, so maybe that's what I'm looking for.

Art of Illusion seems to be reasonably powerful, but I think its 3d graphics isn't hardware accelerated, so it always seems a little sluggish to me. Plus, I can never figure out what the icons mean and the tooltips take too long to show up, so I spend much of my time holding my mouse pointer over icons to figure out which one does what I want.

Recently, I tried Wings 3D, and I really liked it. It's possible to perform lots of complex operations with complete ease. Pretty much all of the UI is self-explanatory and mouseable. The main advantage of the UI is that since everything is explained on-screen, you only have to memorize five things to be productive--namely what G, L, Q, space bar, and the middle mouse button do. With everything else, all you need is a general idea of what you want to do, and you'll probably be able to find it in the menus somewhere. The only problem with Wings 3d is that it doesn't work with many graphics cards. I tried it on an ATI Rage card, and an Intel integrated Extreme Graphics, and the UIs ended up being too horribly mangled to be usable. Finally, on an ATI 7500, the UI still didn't quite render correctly, but it was enough to be usable, and it was a joy to use. I was tempted to dig into the code and see if I could find any workarounds for my problems, but Wings 3D is coded in Erlang, so...yeah.

So in conclusion, if you want to dip your toe into 3d graphics on the cheap, try Wings 3D, and if it doesn't work, give up. Of course, I may be the only person who finds xfig to be an easy-to-learn program that one can quickly become productive in, so perhaps my experiences may differ from that of others.

Wednesday, March 30, 2005

Creating Fedora Core 3 Disk Images for User Mode Linux

So, for the past couple of weeks I've been working on creating a Fedora Core 3 disk image for UML. In the end, the process ended up being quite straight-forward, but in a case of "a little bit of knowledge can be a dangerous thing," I kept screwing it up.

Pretty much, all you have to do is to create an ext2 loopback disk image as suggested in the UML docs, and then copy a running fedora core system (using cp -r -d --parents --preserve=all or whatever) into it (obviously, you should avoid copying the disk image into itself). That's pretty much it. Labeling the disk image as /1 doesn't seem to be enough to get UML to mount the disk image properly as its root file system, so you do have to modify the /etc/fstab file so that the root file system is mapped to /dev/ubda (for some reason, the virtual drive appears as /dev/ubda instead of /dev/ubd/0 like in other images) using an ext2 format (unless you compiled ext3 support into the kernel). It's probably a good idea to enable networking in the kernel as well by turning on these kernel options:

UML Network Devices/Virtual network device
UML Network Devices/TUN/TAP transport

Then, you can just boot up UML into the image you created. You don't have to change the Kudzu configuration, change the networking to use something other than eth0, or anything. It just works. Below are the things I tried to do, which ended up being completely unnecessary:

Although Fedora Core 3 intially boots with a small initrd RAM disk image as its root file system before switching to using the hard disk as its root, this initrd is only for loading additional kernel modules. Since User Mode Linux has most options compiled directly into the kernel, it is not necessary to use an initrd image with UML. Also, it is not necessary to create ubd devices in /dev since Fedora seems to do that automatically. And hostfs mounting /sbin or /bin doesn't work out too well. You can mostly get away with hostfs mounting /usr but you have to prevent xfs from starting up (simply remove /etc/init.d/xfs) because I think it tries writing stuff into /usr, which won't work because on the host-side, UML won't have sufficient permissions to write there. I also didn't bother properly installing uml-tools, meaning that UML couldn't use the port-mapper to bring up the virtual console xterms. Since Fedora Core 3 requires you to log-in through a virtual console, you need to get the port-mapper working properly (I had to modify UML to find the port-mapper in my home directory instead of its default location). Also, Virtual Console #7 and #8 won't appear or won't show anything, so don't think the boot-up has failed if you don't see anything there. And that pretty much summarizes many, many days of mistakes.

Monday, March 21, 2005

Off-Screen Buffers in Java

I recently discovered that off-screen buffers in Java can be really slow. Off-screen buffers are commonly used in Java games so that a frame of animation can be constructed and then displayed on the screen without resulting in flicker. Since the Swing toolkit is double-buffered, most applications don't normally need to create their own off-screen buffers, but I don't like using the UI thread for game logic and rendering in my games (plus my games use AWT), so I have to manage the off-screen buffers myself.

Anyway, many graphics operations in Java are not hardware accelerated yet, and some of the buffer code that I'd been using for a long time has been using some really slow operations; I'd simply never noticed until I started working on a computer with a really old PCI Rage 128 graphics card. With such low bandwidth across the PCI bus, blitting the off-screen buffer images to the screen ended up being really, really slow.

The problem is that I'd created the BufferedImages backing the off-screen buffers as TYPE_INT_ARGB. I think the inclusion of an alpha channel, combined with an assumed lack of hardware acceleration, resulted in a lot of data needing to flow back and forth across the PCI bus unnecessarily. Once I switched the buffers to be TYPE_INT_RGB, everything was back to being snappy and fast.

Monday, February 21, 2005

Running User Mode Linux from 2.6.10 Kernel

I don't really know anything about mucking around with the Linux kernel (I'm the type of guy whose favourite text editor is pico and who can't figure out how to use Debian meaning he has to use RedHat instead), but somehow I signed up to do a term project involving a bit of kernel hacking, so I guess I have to learn. Since kernel stuff seems a bit messy, I opted to work with User Mode Linux (UML).

Now, when one browses around through the main website, there's a lot of stuff about ancient versions of UML based on 2.4 kernels, there's a couple of mentions of 2.6 kernels, and tucked away somewhere, there's a brief aside that UML has been integrated into Linux kernels 2.6.9 and beyond. Seeing as everything else on the main website seemed a little dated, I thought that it would be best to use the UML that's integrated into the main Linux branch because it would be more likely to be maintained and tested there. In actuality, the UML code that's been integrated into the kernel doesn't work all that well, and it took me a few days of searching through various websites, wikis, forum discussions, etc. to figure that out. So here are the goods on how I got UML working on my Fedora Core 3 system.

Apparently, the UML in the current 2.6.10 vanilla kernel doesn't work. Instead, grab the 2.6.9 vanilla kernel from kernel.org or somewhere. Then, go to Paolo Giarrusso's UML site and grab the latest 2.6.9 patch (it's in his archives in the guest patches section). After you've unzipped and patched everything, simply taking the default configuration from

make menuconfig ARCH=um

isn't sufficient. One has to enable these options in the kernel

Character Devices/File Description Channel Support
Character Devices/Port Channel Support
Character Devices/tty Channel Support
Character Devices/xterm Channel Support
Block Devices/Virtual Block Device
File Systems/Pseudo File Systems/dev File System Support

Then, you can follow the normal instructions about compiling, stashing a root_fs file system somewhere, and then using the devfs=(no)mount option appropriately.

This is enough to get UML to bring up a prompt, but that's as deep as I've dug so far. I'm still weighing whether it would have been easier simply to have started with a Debian package of UML.

Tuesday, February 08, 2005

CorelDraw 11 Import/Export Woes

Being from Ottawa, I'm a CorelDraw user (Corel is an Ottawa-based company). I'm not a very good CorelDraw user (I just use it to draw simple diagrams), but I'm still a CorelDraw user. Just like other CorelDraw users, I have to live with both the good and the bad features of CorelDraw. It comes with a lot of features and applications for a good price, but you have to live with the ever expanding resource usage of each version, buggy features, an under-designed user interface, and a general lack of polish. Perhaps things will get better now that they're under new management.

I bought a discount version of CorelDraw 11 a few months ago, and I use it for mostly computer science type things: presentations and LaTeX diagrams. In order to use CorelDraw in this sort of configuration, you need to do a lot of bulk importing and exporting. I just thought I would blog my approach to making CorelDraw 11 do what I want in case the one other person who uses CorelDraw in this sort of configuration is interested :-).

For example, I usually draw all my LaTeX figures in one CorelDraw document and then use a bulk export macro to export each page as a different eps file that I can reference in LaTeX. CorelDraw 11 doesn't let you configure eps export options from a macro, so you have to export one eps file by hand first and configure the export options there, and then when you run the macro, it will just reuse the export options that you set before.

Also, pdflatex prefers to import vector graphics in the pdf format. Unfortunately, CorelDraw 11's pdf exporter doesn't support bounding boxes, so the exported pdf files do not import correctly. I found this program called eps2pdf (I use the one that comes with a nice Windows GUI because I can't figure out how to invoke the preferred eps to pdf conversion utility "epstopdf") that can take eps files exported from CorelDraw and converts them into pdf documents that can be imported into pdflatex. The program chokes if the Corel-exported eps files have text converted to curves or if the Corel-exported eps files are too large. It's possible to invoke eps2pdf from the command-line to convert all eps files in a single directory, which is useful.

Just recently, I've tried to import a lot of xfig-created eps files into CorelDraw 11. This sounds a little odd, because it doesn't make sense to create diagrams in xfig when you own a copy of CorelDraw. The reason I needed to do this was that I needed to generate a lot of complicated machine-generated Postscript diagrams. I don't understand Postscript, but xfig generates very clean, easy to understand Postscript output. So when I'm in this situation, I draw a couple of simple primitives in xfig, export it as eps, then write a program that follows the template generated by xfig in creating its own Postscript diagrams. Unfortunately, Corel's eps importer imports the text in xfig eps files as curves instead of as text. I managed to side-step this problem by using ps2ai.bat to convert the eps Postscript files into ai Postscript files. CorelDraw 11 then imports the text in these ai files without problem. Unfortunately, somewhere along this conversion process, extra outlines get added to objects. It's possible to simply delete these outlines by hand (especially easy if you use the ObjectManager view), or you can import the xfig eps files directly, copy all of the non-text objects, and then use these objects to replace the non-text objects in the imported ai files. There's probably an easier way to do this, but I'm still trying to figure it out.

Friday, February 04, 2005

Extensibility in X3D

During my rant against X3D yesterday, I wanted to complain about the extensibility model of X3D as well, but I couldn't think of a better approach to supporting extensibility in a 3d file format, so I decided that it would be inappropriate to make a big deal about it.

Of course, I thought up of a better scheme last night, so now I feel justified in complaining about the approach used by X3D to support language extensions. Although I complained about how the X3D event model was too abstract to usefully express certain concepts, I find that the X3D scene graph hierarchy to be too concrete and specific to support extensibility in a graceful manner.

In X3D, scene graph nodes are well-defined types. Each node has specific fields and no others. To add support for different objects, one has to create a new scene graph node with the fields one needs (or insert a lot of meta data, but I'll discuss that later). If an X3D browser encounters a node type that it doesn't understand in an X3D file, probably the best it can do is to simply ignore that node. Unfortunately, this doesn't degrade gracefully. So, for example, consider the mesh objects that are currently in X3D. If I want to make an articulated mesh, I need a way to add weights to each of the mesh vertices describing the influence of various bones on the position of the mesh point. To do this, I need to make a new scene graph node for articulated mesh objects that has a field for holding the bone weights. So now, if an X3D browser encounters my new articulated mesh node type in a file, it won't know what to do and will simply not render it. Ideally, the browser should simply degrade gracefully and render the data as a normal mesh, but it's not possible because there's no way for the browser to know that there's a link between an articulated mesh and a normal X3D mesh. Over time, as more and more node types are added to the standard, the standard becomes so big that no browser is able to implement the whole specification, resulting in many X3D files that are simply incompatible with many X3D viewers. In fact, this can already be seen in the current standard which has different node types for IndexedFaceSets, IndexedTriangleSets, IndexedTriangleFanSets, and IndexedTriangleStripSet. All of these node types are variations on the same theme, but an X3D browser must be explicitly coded to handle all four types separately.

One way around this problem is to use the metadata facilities of X3D. So an articulated mesh could be stored as a normal mesh with all the bone weights stored as metadata in the mesh. A browser that doesn't understand the metadata can just render the mesh as a normal mesh, and a more advanced browser can interpret the metadata and extract the bone weight information. Similarly, all the different triangle sets could be encoded as IndexedFaceSets with metadata suggesting the optimisation of rendering the node as triangles. And therein lies the way to gracefully supporting extensibility in X3D. There shouldn't be set node fields in X3D. Instead, all fields should be metadata. Most 3d objects can be described as a control surface with various extra descriptive data thrown in. As such, X3D should simply abstract all 3d objects to being a base control surface type, and all the extra descriptive data about normals, colours, and shape, etc. should just be metadata. So a cone is a box node that is tagged as a "cone" in its metadata. A height map is just a mesh node that is tagged as being a "height map" with some metadata describing the orientation of the heights. It may not be the most efficient way of storing data, but the benefits in terms of gracefully supporting extensibility is worth it. A minimal X3D browser then only needs to be aware of a small number of abstract node types. Even on very complex 3d models, an X3D browser will always be able to extract something that it can render.

Instead, X3D simply took the flawed VRML model and recoded it in XML. I guess we can always wait until X3D 2.0.

Thursday, February 03, 2005

U3D vs. X3D

I was looking at the website of the purveyor of the X3D format yesterday, and I noticed that they had a newspost slamming the rival 3D format U3D there. I haven't read the U3D spec yet, but based on the newspost, it sounds pretty good. In fact, I think that if the U3D spec had been available when I started my X3D project, I would have used U3D instead.

The newspost complains about how X3D only supports triangle meshes and the like. Honestly though, having just finished implementing a horrible n^4 concave polygon triangulation algorithm for my X3D viewer, I'm liking the idea of a format only supporting triangle meshes more and more.

I've only implemented a small amount of the X3D spec, and I can't help but feel that it lacks a certain elegance and simplicity in its design. I think many of the problems evolved out of the fact that the designers wanted people to be able to code X3D by hand. As such, the spec supports lots of shortcuts and features to aid people coding up X3D manually such as the ability to leave out certain tags or to let the browser automatically calculate normals etc. These sorts of things simply make the implementation more complicated and results in lots of "special cases" in the specification. In reality, no one designs a 3d model using text (believe me, I tried this once. It's totally hopeless), so it would have been better to leave out stuff like that.

I'm not implementing the X3D event model, so I'm not too familiar with it, but my gut feeling is that it is likely too expressive. The Postscript language has a similar problem in that it is a full programming language, meaning that extracting meaning from the language is extremely difficult without actually executing it. For example, let's say I wanted to export an animation to another program. The exported animation will have a clock object whose timing is fed into some sort of coordinate generator whose events would be fed into the actual object being animated or something like that. Would an X3D importer be able to interpret the combination of event objects and event routing as being an animation and to import it as such? Or would it simply have to import these event nodes as is and leave it to the user to figure out that it represents an animation? Just as human language is too expressive for computers to understand, which is why we have programming languages, the X3D model might be too expressive for an import tool to recognize the patterns, which is why more restrictive languages are often more useful. If a language is too expressive for an import tool to understand its meaning, resulting in it being imported as a black box, then the language might as well be some standard language like ECMAScript that people are already familiar with.

And of course, there's my bias against "platforms." I feel that the best specifications are for little self-contained toolkits that can be bolted on to existing applications. X3D was designed as a complete platform for 3D browsing, so it has a tendency to want to take over your application as opposed to being a simple bolt-in. For example, to implement a module for importing animation, you need to add the event model to your application, some sort of reflection mechanism for X3D objects to parse the event code, the actual X3D objects themselves, etc. etc. Afterwards, it's no longer a simple little import tool; you've essentially just written an X3D browser. At that point, you might as well just write your whole application to be X3D-based.

X3D simply tries to do too much and as a result is too complex. U3D focuses on one small aspect of 3D file exchange and hopefully ends up being small, graceful, and easy to use. It's somewhat like the TIFF file format which is so complicated and supports so many features that no one really uses it for anything. It's a coin toss as to whether an arbitrary TIFF file might be importable by a TIFF import tool or not: there might be multiple pages in there, encoded in some weird colour space, compressed using some unknown scheme, etc. With PNG, you're pretty much guaranteed success.