How can I make my ARM device fast?

In a previous article, I described common performance pitfalls that ARM devices typically succumb to. Here, I will lay out how to avoid most of those problems.

Tip 1: Give your designers representative hardware to test on.

The latest ARM hardware has roughly the same performance as a good netbook. So give your UI designers netbooks or nettops (based on Atom, ION or AMD Fusion), if not as their main workstation, then as a performance-appropriate test platform. They aren’t very expensive and won’t take up much desk space, and can usually be multiplexed into their existing keyboard, mouse and monitor.

This will encourage them to write efficient software in the first place, so do this as soon as possible when setting up the project.

Even better is to give them the real hardware to try out, but a netbook has the advantage of being usable in a desktop-like way, so it can be used for debugging.

Tip 2: Understand, before specifying hardware, what you need it to do.

Do you want simple 2D graphics? Video? Alpha blending (and is it premultiplied or not)? Alpha blending on top of video (which probably requires “textured video”)? 3D? Two-way video? High definition? Dual displays? Touch gestures – with or without stylus? ClearType? More than one of these simultaneously?

What about sound? Security? Updates? Recovery? Power consumption? Battery recharging? Ethernet? Wifi? Bluetooth? USB (as host and/or slave)? GPS (does it require Assistance)? Mobile data?

What happens if you need to use a lot of textures / a lot of pixmaps / a very very large pixmap/texture? A very prominent and popular website uses several extremely tall images, over 20,000 pixels on one side, to store it’s customisable skins, so this is a relevant question even if you “only” want a Web browser.

A lot of hardware fails to effectively support at least one of the above, but will seem attractive because it is cheap. But trying to add even one of these features after committing to the hardware spec will be *expensive*. Beware of false economies.

Tip 3: Pick hardware that works out of the box and can reliably support everything you will ask of it.

Get assurances from the vendor, in writing and tied into penalties for non-compliance, that the features you require will actually work at the performance you need. Remember also that it doesn’t matter if the vendor’s own tech demos show something working, if the drivers are so unreliable or non-standards-compliant that you can’t integrate it into your product.

Get the vendor to set the hardware up with your favoured operating system, with your engineers present (and your subcontractor, if they’re doing the work) so that they can later replicate it easily. At this stage, the features on your checklist should all be demonstrated working – individually is okay. If more than one hardware vendor is involved, get them all in the same room for this purpose.

Do this *before* starting billable engineering effort on software integration. Meanwhile, get your software working on those netbooks.

It should not take two weeks to figure out how to flash an OS image in and boot the device, followed by six man-months to make the 3D engine work. That’s what you’re paying the hardware vendor for.

Tip 4: Pick middleware carefully.

Most software toolkits, such as Adobe Flash, Qt, Gstreamer, and X11, offer a very rich array of capabilities to applications. They practically guarantee that if you ask them to do something, they will do it. You might think this is a good thing, and on the desktop it is a good thing.

But what they do not offer is any indication of whether your command will be done quickly or smoothly. What’s worse, most toolkits don’t provide any way for you to determine what can and can’t be done efficiently – which is called introspection. It doesn’t even always match up with the hardware’s capabilities.

There is one prominent graphics API which does not share this problem: OpenGL ES. The base API is designed specifically around the capabilities of common GPUs, and new GPUs are expected to accelerate these features as a minimum. Extra capabilities are explicitly advertised at runtime via queriable constants and extensions – you can write simple test programs to see them yourself.

GLES hardware vendors generally don’t advertise features which they haven’t managed to get running acceptably fast, for one simple reason: it risks games running slowly on their hardware. There is no such built-in restraint for most other APIs.

You can still make GLES run slowly by simply giving it too much to do, or by using a feature which is not expected to be fast (like reading back the contents of a texture or the framebuffer). But the hardware-centric design does make it far less likely that you will be surprised by it. At the very least, if you use GLES directly, you get to choose whether you need to read back the framebuffer.

GLES can be used for 2D UIs as well. The iPhone uses it to provide it’s famously slick UI, despite (in the older versions) having a slightly feeble ARM11 CPU. There’s absolutely no reason, in principle, why you can’t do the same.

Of course, you don’t have to use GLES if you don’t want to – after all, it can’t do absolutely everything. But if you are choosing another API because it supports more features, you should ask yourself exactly how it implements them, and whether you’ll get the performance you require.

Some APIs are designed to run well on top of GLES, explicitly using it’s strengths and avoiding it’s weaknesses. Others run into trouble when they stumble across something that they promise to do but the hardware can’t accelerate, and don’t think ahead to avoid a major penalty. A select few are actually performance-tested regularly, using real application traces – Cairo is among these.

Tip 5: Insist on usable video acceleration support (if you need video).

Many vendors provide some kind of video decode accelerator, which can often cope with typical H.264 video at 720p30, and some are now appearing with claims of 1080p30 support. ARM CPUs, even the latest multicore NEON-enabled versions, should not be expected to decode high-definition video unassisted.

You will need one of the following features to use your decoder:

1) Video-to-GLES-texture support. This is usually done via OpenMAX and various EGL extensions, and is essentially required for accelerated Adobe Flash support. Often called “textured video”.

2) Direct scaled output from the video decoder to the framebuffer or a hardware overlay. This is not sufficient for Adobe Flash support, but it is useful for many relatively simple applications, including two-way video calls. Note that you may need to scale small videos up and large videos down, so check that both work properly and look good.

3) Cached (or otherwise fast) CPU access to the video decoder’s output buffers. This is the only truly acceptable alternative to explicit video-texture support, as you can copy (or convert) the data into any point in the graphics pipeline.

We have not yet seen an implementation in this last category – the video decoder (along with the rest of the GPU) always seem to hang directly off the main bus rather than the CPU cache, and flushing the cache is not made fast enough to make that a viable method of maintaining coherency. Note that standard uncached access is too slow to be useful.

CPU and SoC vendors take note: including the GPU in the cache hierarchy makes sense, and that’s how Sandy Bridge does it – ie. cache the DRAM, not the CPU. Or at least include a fast address-range cache flush and expose it via a kernel API or an unprivileged cp15 instruction. The problem we need to avoid is a full column-address (or even row-address) latency on the memory bus for every single load instruction in a performance-critical graphics routine.

Tip 6: When in doubt, ask an expert with a track record.

That’s us. :-)

In particular, if you have any doubt as to whether a particular toolkit or API uses the hardware (or the underlying APIs) efficiently and effectively – which is often not clear from the marketing claims or the desktop performance – we can probably investigate it for you.

Tip 7: Resist the temptation to add features once the project is underway.

Seriously, this has been the most basic feature of Project Management since The Mythical Man Month was published decades ago. Yet we still see it happening, and these projects always end up adding months to their schedule.

Once you’ve specified your platform for a specific job, expanding that job runs a very high risk that the platform won’t live up to it. You can’t “just bolt on” a video player or a Flash plugin. You might not even be able to run the video decoder and the 3D engine at the same time. You might run into VRAM limitations if you add something as “simple” as permitting custom theming of the UI, or a marginally acceptable fillrate might be destroyed if you add a tiny translucent corner or shadow to a window. So think very carefully when considering any change to the spec.

How can I make my ARM device fast?

Tip 1: Give your designers representative hardware to test on.

Tip 2: Understand, before specifying hardware, what you need it to do.

Tip 3: Pick hardware that works out of the box and can reliably support everything you will ask of it.

Tip 4: Pick middleware carefully.

Tip 5: Insist on usable video acceleration support (if you need video).

Tip 6: When in doubt, ask an expert with a track record.

Tip 7: Resist the temptation to add features once the project is underway.

Trending Articles

Bath man appears in court charged with attempted murder of a man...

MACLEAN, Allan

Black Angus Grilled Artichokes

Practice Sheet of Right form of verbs for HSC Students

Police blotter for Jan. 12

99 God Status for Whatsapp, Facebook

Rajasthan Board 12th Science Result 2018 name wise- RBSE 12th commerce result...

Notorious Naushad of Ippa gang nabbed

Child Kidnapping: Amy McNeil was kidnapped on her way to school by 5 adults;...

Sonible Smartlimit v1.1.5-R2R

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Arrow Flash 2 – Sinhala Dubbed – Episode 23 – 20th March 2016

[GET] AI Traffic Goldmine

[E² Plugin] HDF-Radio

Universal Multi-Patch v1.3 By RADIXX11

IWAN – Thanks and Praise ( Throw Back Thursday )

RONALD P SONDERGAARD Arrested by Miami-Dade County Corrections on Mar 03, 2017

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

HSSC Excise & Taxation Inspector Result 2017 Scorecard/ Category Wise Merit List