Why Bubblemark is a poor ui benchmark

By Sean Christmann | Posted April 11th, 2008 | AIR, Flash, Html, Silverlight

A few months ago someone on the Adobe boards asked why the Flex testcase in Bubblemark seemed to act so different in AIR versus in the browser. Yesterday, I saw the same question come up again and I figured I’d finally weigh in on the topic. The simple answer is that the test was created improperly, the complex answer has to do with the inherent limitations of the test itself.

First off, for those who don’t know what the Bubblemark test is, its a simple animation test case implemented in different GUI frameworks, its kinda like an Acid2 test for rendering speed. The charts should ideally give you a base number to understand how well one technology compares against another for rendering. As a GUI developer I’ve been a bit underwhelmed with the whole thing and heres why:

  1. The author doesn’t understand Flash’s rendering engine. The easiest way to illustrate how incorrectly the Flash test was designed, is to download the source and change the compiled framerate to 1 fps. Re-compile and run the test and you’ll notice the benchmark framerate running at ~50 fps. You can clearly see the balls only moving once per second, yet the test thinks its flying along. This is because the testcase makes the incorrect assumptions that changing the properties of a DisplayObject causes it to render right away. The reality is, Flash holds on to all display updates till the next render pass and applies all the latest changes at once. Changing the position of an object every 5 milliseconds is meaningless when Flash is bound by a 33 millisecond render pass (or whatever you’re framerate divided by 1000 happens to be). A correct test case would rely on an ENTER_FRAME handler to change x and y values and get rid of any Timer calls.
  2. Framerate tests above 60 fps are meaningless. Seriously, any GUI benchmark designed to test above 60 fps is bogus. In fact, a pretty simple optimization technique for Adobe or Sun would be to cap the paint requests that get forwarded to OS X or Windows, simply because the majority of computer users these days are on LCD panels which natively run at 60 fps. Some operating systems even go a step further and limit the effective framerate of paint requests it sends to the videocard (see Beam Sync on Mac). So when you see the Java test case fly up to 120 fps on Bubblemark, you can realistically only see 60 of those frames, and there might be a chance the other 60 are never even calculated by Javas layout engine.
  3. The test just moves balls around! This is my biggest beef with the benchmark because it only tests one simple aspect of the rendering engine in these technologies, which is bitmap translation. How do bitmaps moving around the screen tell you anything about the capabilities of the respective technologies? Do the JavaFX guys really think optimizing this usecase will make their technology relevant? The only thing Bubblemark will tell you is which runtimes might best handle bitmap particle emitters….thats about it. Theres a lot more that goes into both the layout engine and the rendering pipeline of these different technologies and its a shame that only the most basic aspect is being tested. The funny thing is, if you open up your task manager while running the tests, you’ll notice that several of them don’t even try to run at full speed, my CPU is sitting as low as 20% in some cases. This means the runtimes don’t even consider the test difficult enough to give it full attention and have opted for using less power over faster motion.

I don’t mean to cut down the developers responsible for Bubblemark because at least they came up with a simple way to help us all compare these different technologies, I just think its a bit misguided to put any meaning behind these numbers. When evaluating your options for a GUI framework in our flashy web 2.0 world, you need to consider how well a technology can handle object scaling, alpha transparencies, rotations, text reflow, along with basic x and y translation and dynamic redraws. Even more realistically, developers need to be aware of the limits in the 25-45 framerate region since this is where you can efficiently balance render complexity with smooth animation. I’ve uploaded a couple quick test cases in Flash, HTML, and Silverlight that I think provide a good foundation for stressing a rendering engine and hopefully I’ll get a chance to expand them more into a full test suite.

Reader Comments (9) Leave a Comment

  1. Chet Haase | April 11, 2008 at 2:10 pm | permalink

    Sean,

    Good points, similar to the beefs that James Ward and I had with the test. Note, however, in your first point that just changing the timing mechanism to use an ENTER_FRAME handler instead of Timer won’t fix the timing resolution issue I mentioned in my blog post; they’re both capped in the browser to a much lower rate than they could handle (basically, the refresh rate of the monitor). So updaing the values in ENTER_FRAME might be more realistic from a Flash rendering perspective, but it doesn’t get at one of the serious constraints of the original benchmark.

    Chet.

  2. Sean Christmann | April 11, 2008 at 2:46 pm | permalink

    That must be why my cpu stays so low during the test. I really think its necessary that tests get developed that are so taxing on these runtimes that they can’t get above 25-30 fps, then we’ll really see which vendor can crunch out those few extra fps over the competition.

  3. Alexey Gavrilov | April 11, 2008 at 4:06 pm | permalink

    Good points, and a nice test case (although for Silverlight it shows fps anywhere between 6 and 100 — you should probably use a longer time window for fps calculations). I really happy to see that other people care, and we are getting more ways to compare RIA frameworks performance. As for relevance of bubblemark results — it depends on what you are trying to do of course. I’m not a Flash expert — that’s why all source code is up there for you guys to tell me, if I’m wrong.

  4. Sean Christmann | April 11, 2008 at 4:43 pm | permalink

    Thanks for leaving a comment Alexey, I hope I didn’t ruffle your feathers too much with the post, as I really do appreciate the effort you’ve put in to Bubblemark. In terms of relevance, I’m mostly concerned with testing from an RIA framework point-of-view. Most of those use cases deal with animation involving data manipulation, box constraints, charts and maybe even gaming.

  5. Mark Finkle’s Weblog » Buublemark | April 14, 2008 at 7:52 pm | permalink

    [...] balls (or non-balls too, I guess), this test has you covered. Thankfully, I’m not the only one. The Flash guys have more than a few gripes with the test. Sean Christmann sums up my biggest [...]

  6. Metalink » Bubblemark update | April 16, 2008 at 11:43 am | permalink

    [...] made the one, which is geared towards measuring complex UI performance. You can check it out here along with an explanation on why bubblemark is not particularly relevant for the [...]

  7. Jacob | May 7, 2008 at 8:11 pm | permalink

    I just found the bubblemark page this evening and I have to agree. In fact I quickly scouted the code, making a few changes that completely changed the results.

    Run this in FireFox for best performance and you’ll see how far off the bubblemark results are: http://kong.arcanecoder.com/misc/bubblemark.html (IE limits to under 70 fps).

    More of my comments are here: http://www.kongregate.com/forums/1/topics/7411?page=1#posts-144487

  8. [...] shares alot in common with another RIA benchmark Bubblemark. I’ve written a bit about Bubblemark and why I think an alternative is necessary, but I do believe Bubblemark and [...]

  9. Longchamp Le Pliage outlet | April 12, 2012 at 11:30 pm | permalink

    Classic exposition, I have also mentioned it in my blog article. But it is a pity that almost no friend discussed it with me. I am very happy to see your article.
    Great article and your blog template is so cool. Is this template free or not? If so, Where could i download this template? if not, how much does it cost? Thanks a lot!