Posted by Sean Christmann
Introduction
Two years ago I had an itch that needed scratching. “RIA” was the future of the web and every major company seemed to have a solution to get us there. I developed the first version of GUIMark to not only get a good understanding of the respective technologies, but also to give my clients through EffectiveUI and everyone else something to actively gauge the rendering performance of the different runtimes. After releasing it I got a good response from both the tech community as well as several platform engineers interested in resolving problems. There were however two serious flaws in the test that immediately stood out. First, the test was relying too heavily on text layout performance. I was barely engaging the the vector and bitmap side of the rendering engines. Secondly, the test was too artificial and developers have a tendency of resisting optimizing apis against unrecognizable test cases.
Evolution
Fast forward to today and the web is a different beast. Attempt to shine a positive light on a plugin technology and you will be booed off the stage. Create something fun and silly in HTML5 and you’ll have hundreds of thousands of visitors pounding down the front door of your blog to speculate on the death of Flash. It’s undeniable that a new anchor technology is taking root in the web space, and needless to say I’ve got a new itch to scratch.
GUIMark 2
Like the first GUIMark, this new benchmark is designed for one sole purpose, to burn a hole in your CPU. I still believe that by completely saturating the rendering pipeline, we can get a better idea of which technologies are best suited for running interactive content on the web. Developers tend to focus primarily on the speed of the programming language itself, when in reality, most of your cpu time is spent inside internal rendering APIs. I also firmly believe that any benchmark testing rendering performance should stick to sub 60 fps numbers. Almost all users on the web today are browsing with 60Hz LCD monitors and there’s no reason to design a test that has to throw away frame data.
While the new benchmark sticks to the original in theory, this version introduces some much needed changes. First, I’ve split GUIMark into 3 separate tests: Vector, Bitmap, and Text rendering, and I’ve attempted to make the test cases as real world as possible. Second, I’ve only implemented these tests in HTML5 and Flash. I’m not opposed to adding Silverlight and JavaFX to the benchmark, it’s just that I didn’t have the time to build them right now and something tells me a much smaller percentage of the internet crowd is interested in those results anyway. (Feel free to flame me in the comments section for that one). Lastly, I’ve added mobile versions of some of the tests, we’ve all heard inflammatory statements from certain CEOs about mobile web performance, let’s see if the numbers back that up.
Enough already, on to the results.
Test environment
All of the tests below were performed on a 15″ unibody Macbook Pro with a 2.53 GHz Intel Core 2 Duo and an NVIDIA GeForce 9400m. On the Mac side I’m running Snow Leopard with Flash player 10.0.45.2 installed. On the PC side I have Windows 7 32bit with Aero turned on and again with Flash player 10.0.45.2. For Linux I ran a Linux Mint 8 Live CD with Firefox and Flash player 10.0.45.2. Unfortunately running off the Live CD meant no access to Nvidia drivers.
Vector Charting Test
This benchmark is designed to stress the vector apis by simulating a streaming stock chart. The test makes heavy use of strokes with complex alpha fills. Originally I had added gradient fills into the mix to make sure that a good majority of vector APIs were being flexed, but there was no significant difference in the results so I pulled them out to make the visuals cleaner. While the source may appear to be heavy on the javascript side, the actual speed of code excluding canvas draw calls is less then 1 millisecond.
HTML5 | Flash 10 | |
---|---|---|
Windows 7 | ||
Internet Explorer 8.0.7600 | N/A | 30.7 |
Firefox 3.6.3 | 15.73 | 29.65 |
Chrome 4.1.249 | 6.41 | 26 |
Opera 10.53 | 24.77 | 29.9 |
Safari 4.0.5 | Safari* | 29.5 |
Avg (15.64) fps |
Avg (29.15) fps |
|
Snow Leopard | ||
Safari 4.0.5 | 4.04 | 20.55 |
Firefox 3.6.3 | 3 | 23.92 |
Chrome 5.0.342 | 2.86 | 25.48 |
Opera 10.10 | 12.22 | 15.24 |
Avg (5.53) fps |
Avg (21.29) fps |
|
Linux Mint | ||
Firefox 3.5.9 | 14.61 fps |
22.88 fps |
*Safari on Windows 7 will not animate the chart, it will only render one frame each time I press down on my mouse button.
Results are all over the place for this test. On the HTML5 side Opera delivers the best performance on both platforms. Flash on the PC is consistently high, but on OS X, Chrome takes the top spot. Linux pulls off great numbers despite running off a Live CD. HTML5 on the Mac side requires closer inspection though. When I first made the test and showed it to my co-worker John Blanco, he started ripping apart the code to find any mistakes I might have made. What he discovered was that by changing the stroke size on my lines from 2 pixels to 1 pixel, performance in Safari, Firefox and Chrome shot up to rates closer to Flash, while Opera stayed at the exact same FPS.
1 Pixel Stroke Results | Safari – 23.59 fps | Firefox – 17.43 fps | Chrome – 17.12 fps | Opera – 12.12 fps |
Flash on the Mac, as well as HTML5 and Flash on PC were largely unaffected by this change though, gaining maybe a single frame rate by changing to 1 pixel strokes. I’m not sure what to make of these findings. What kind of bug causes this and what side effects might be introduced by fixing it? Will a change allow both 1 and 2 pixel strokes to run at higher speeds, or will they both settle somewhere near Operas numbers.
Bitmap Gaming Test
The bitmap test was designed to simulate a tower defense type game. The test stresses pushing around lots of bitmap assets that animate each frame. The entire rectangle view needs to be cleared each frame to account for all the changes happening in the scene. The test supports a minimal amount of z depth ordering but not so much as to cause user scripts to take more then 1 millisecond to execute. Both environments are using anti-aliasing to scale the bitmap images.
HTML5 | Flash 10 | |
---|---|---|
Windows 7 | ||
Internet Explorer 8.0.7600 | N/A | 17.34 |
Firefox 3.6.3 | 5.78 | 17.7 |
Chrome 4.1.249 | 10.1 | 15.98 |
Opera 10.53 | 13.59 | 17.23 |
Safari 4.0.5 | Error* | 17.29 |
Avg (9.82) fps |
Avg (17.1) fps |
|
Snow Leopard | ||
Safari 4.0.5 | 11.76 | 13.21 |
Firefox 3.6.3 | 7.5 | 14.09 |
Chrome 5.0.342 | 7.4 | 19.96 |
Opera 10.10 | 5.86 | 14.53 |
Avg (8.13) fps |
Avg (15.44) fps |
|
Linux Mint | ||
Firefox 3.5.9 | 4.84 fps |
10.91 fps |
*Safari on PC again only renders one frame per mousedown event, so the results are impossible to verify.
These results are really surprising. Chrome on OS X manages to push Flash higher then even Windows based browsers. I was so surprised I ended up rebooting and running the test again just to make sure something wasn’t wrong. We’re starting to see a trend where HTML5 on average runs slower for Canvas based animations and I’ll explain why a bit further below. Linux takes a huge performance hit in this test but the percentage difference mirrors the other platforms exactly. With Nvidia drivers I’d imagine the real numbers would be closer to Mac performance.
Text Column Test
This test is designed to push the text layout and rendering engine in HTML and Flash. The test utilizes custom fonts introduced with CSS3 as well as multibyte character string. This is my least favorite test in the group because it doesn’t simulate any real world test cases, however it should provide a good estimate of how quickly a page full of text can be calculated. I call it the “iceberg” test since 80% of the hit on the CPU happens outside the renderable view. It works because although text that overflows outside the textblock doesn’t get rendered, it does have to get calculated in order to know how many lines of text can be scrolled. HTML pages do this all the time when you load a site with text below the fold.
HTML5 | Flash 10 | |
---|---|---|
Windows 7 | ||
Internet Explorer 8.0.7600 | 21.79* | 1.51 |
Firefox 3.6.3 | 24.7 | 1.5 |
Chrome 4.1.249 | 23.58* | 1.44 |
Opera 10.53 | 21.16 | 1.49 |
Safari 4.0.5 | 30* | 1.46 |
Avg (24.24) fps |
Avg (1.48) fps |
|
Snow Leopard | ||
Safari 4.0.5 | 27.26 | 16.24 |
Firefox 3.6.3 | 23.61 | 18.71 |
Chrome 5.0.342 | 26.07* | 22.85 |
Opera 10.10 | 22.72 | 15.22 |
Avg (24.91) fps |
Avg (18.25) fps |
|
Linux Mint | ||
Firefox 3.5.9 | 25.89 | 11.67 |
*Safari continues to show problems on PC. Safari reports 30 fps but it looks like it’s running at 10 fps. I’ve included the results but they’re really wrong.
*Internet Explorer renders the view, but is unable to display the custom fonts.
*Chrome on both platforms is unable to render the Jedi custom font.
I didn’t have time to investigate whether the super slow PC performance in Flash is my fault or Adobe’s, but I expect that will be uncovered soon enough. As for the general differences between HTML and Flash in the text test, this is exactly what I was expecting. HTML was built for text rendering and this is further proof that browsers do this best.
GUIMark Mobile
The Vector and Bitmap tests have been ported into miniature forms to test on mobile devices with a minimum resolution of 320×480. This is the area I imagine will see a lot of updates over the next 6 months. I’ve ordered the results by the release date of each phone tested.
HTML5 Vector | HTML5 Bitmap | Flash Vector | Flash Bitmap | |
---|---|---|---|---|
Palm Pre c/o Kevin O’Shea | 21.46 | 32.89 | ||
iPhone 3GS | 10.79 | 12.86 | ||
Motorola Droid | 8.95 | 12.59 | ||
Nokia N900 Flash 9 | 9.51 | 9.65 | 16.69 | 19.78 |
Nexus One | 15.86 | 18.83 | ||
HTC HD2 c/o Matt Emory | 10.43 | 17.59 | 29.91 | 37.62 |
Two phones running the Flash player isn’t conclusive evidence about Flash’s performance in general in the mobile space, but it does cast immediate doubts on claims that Flash is slow on ARM based smart phones. Meanwhile, if you want the best performance in HTML5 based web content, Palm Pre and Nexus One are sitting at the top of the pile. If you have results you’d like to see added to the chart, you can email results to mech {at} craftymind dot com.
What about video comparison?
I had really hoped to add a video test to this benchmark but I quickly found out there’s no reliable way to record rendering performance for video objects. As far as I can tell, HTML5 video doesn’t provide an api to catch frame dropping events, or a way to determine the playback fps. Blindly running a Timer object on the main thread didn’t seem to help either. At that point I didn’t even bother seeing what hurdles Flash had to testing playback.
Parsing the Results
I imagine half of the people reading this page will have one of two thoughts at this point, “Who cares if HTML5 is slower, I just want Flash to die” and “HTML5 is still brand new, it’s going to get a lot faster”. While I’m not interested in addressing the first point, developers should have context around the second point. There is a fundamental difference between the rendering models used in HTML5 Canvas and Flash which heavily influence the performance divide. The difference is, Canvas uses an immediate mode renderer while Flash uses a retained mode renderer.
When you write a line of javascript that draws a vector or bitmap to a Canvas the browser will immediately render that change before moving on to the next line of javascript. Since the browser has to block that line of javascript while rendering, it means the environment is most efficient when running on a single thread. Text rendering on the other hand occurs at the end of the event loop, behaving more like Flash.
In contrast, Flash commits all renderable changes to an internal store, and after the main event loop finishes processing user code it hands out rendering tasks to all available cores. As a result, Flash scales with both the speed of your processor cores and the number of cores available. Here’s an illustration to better understand.
Theoretically you might achieve twice the performance in Flash on a dual core system, but in practice there is overhead that you need to take into account like z ordering, bounds checking and re-compositing, and dividing tasks between cores is never perfect. All of this might not seem like a big deal to HTML5 developers, but the truth is the next 10 years are going to be dominated by increases to cpu core count, not single threaded execution speed. You can already see the results of this on a quad core i7 2.67 GHz processor.
Windows 7 Firefox 3.6.3 | HTML5 Bitmap Game – 6.07 fps | Flash 10 Bitmap Game – 30.1 fps |
HTML5 Canvas performance saw virtually no increase jumping to 4 cores, while Flash performance nearly doubled. Without a major shift in execution processing, Javascript based animations and interactions are going to remain stagnate over the coming years. Unfortunately I don’t see that change coming. All the talks of multithreading coming from browser vendors right now is between the browser interface and the html view, but not the HTML rendering model.
HTML5 video is largely exempt from this problem however. While the video api exposes hooks to the main thread for playback control, all rendering and sound is processed under the hood on secondary threads. As a result, media performance increases with GPU and CPU cores.
TL;DR
There is no doubt that HTML5 adoption will grow significantly in the next 2 years, and that more and more content will be targeted to SVG and Canvas implementations. But developers need to be cautious with adopting one technology or another wholesale. HTML5 may not be fast, but it is proficient at a good amount of tasks. If you need static or limited interactive content on your website, HTML5 will soon be your best option. If you need complex interactive content, you’re probably better targeting Flash. As for me, you’ll find me abusing the hell out of both technologies and posting the results to this blog.
In the meantime, if you want the best HTML5 performance on Windows, you should be using Opera right now. On the Mac side, it’s a tossup depending on the type of content you’re interacting with. If you’re looking for good Flash performance on Windows any browser will do, whereas on the Mac side Chrome is clearly outperforming everyone else.
The sources for each test should be linked within the test itself if you want to peruse the code. I tried to make sure everything could be contained to one file whenever possible and not rely on external dependencies. You can download all the fonts and tower defense assets here. If you find any errors with the results, or feel like taking a stab at testing another technology, feel free to email me at mech {at} craftymind dot com.
Flame on!
Updates
10/05/07 – As I should have assumed, quite a few people have started sending in updated tests with their own levels of optimizations. I want to be very clear on certain improvements, the purpose of these benchamarks is to stress the graphics APIs available to developers, not cancel them out. An optimization that affects both platforms equally (like caching the grid lines behind the charting test) doesn’t further the goal of exposing how efficient the two platforms are internally. If you have a unique optimization that can only be applied to one platform and not the other, please let me know and I’ll try to incorporate the change and retest.
10/05/09 – Changed some language on the rendering model for HTML5. Canvas paints immediately, standard text paints at the end of the event loop