Processing power is moving from the CPU to the GPU, writes Don Sambandaraksa
AMD has fired the first shots around a massively parallel computing architecture in the form of the ATi Radeon HD 4800 series GPGPU featuring a mind-blowing 800 cores (or shader units). A GPGPU, or General Purpose Graphics Processing Unit, blurs the distinction between CPU and GPU and promises to usher in an entirely new paradigm for programmers to learn. The future has arrived in a chip that delivers more than one teraflop of computing power, and, best of all, it has arrived in the form of a $200 (6,700 baht) mid-range graphics card.
Director for technical marketing at AMD Paul Ayscough explained how the 4800 marked the beginning of a new design philosophy. Rather than aim at the top-end market with huge monolithic and power hungry chip, AMD is aiming at the mid-range market with its 4800 series.
Large chips mean fewer chips per wafer, lower yields (percentage of chips that work) and draw more power, which has led to today's 300 watt designs requiring two-slot cooling solutions.
On the software side, the traditional approach - where last year's chips are reduced in price and continue to be sold - means fragmentation in software and a lag of six to 12 months before the latest features of the new chip are used.
The 4800 series avoids this by being aimed as a mid-range chip. A future "R700" high-end solution will involve taking two, three or more 4800-series chips and running them in parallel to increase performance. This means that between them, the R700, 4870 and 4850 and a future lower-end chip can all run the same software code base, and that software includes DirectX 10.1, thus drastically simplifying game development and enriching the gaming experience.
DirectX 10.1 features a new technique called tessellation. This means that the number of polygons that make up an object is greater for objects near the viewpoint and are less for those far away. This not only makes frames more realistic but also speeds up processing.
Ayscough also noted that rather than focus on performance, as the past, the new design philosophy focuses on performance per watt and performance per square millimetre of silicon.
 |
| AMD showed a short movie with a robot rampaging through a virtual New York, with cinema-grade graphics rendered in real-time (24 frames per second). Hollywood would take hours for each frame and weeks for such a project on its current render farm technology. A mini documentary about scorpions in a cage was also shown, almost indistinguishable from a real life shot, animals, dust and dirt on the mirror and all. PICTURES COURTESY OF AMD |
 |
| AMD showed of a mini documentary about scorpions in a cage, all rendered in real-time (24 frames per second) indistinguishable from a real life shot complete with dust, flare and dirt on the glass. |
 |
| The massive multicore future is here today. This unassuming ATi Radeon 4850 packs 800 cores and over one teraflop of computing power for around $200. Future versions will use multiple chips on the same card so programmers can write a single code base for low- to high-end graphics cards. AMD is working with Intel, nVidia, Apple and others in defining the programming languages and paradigms of this massively multi-core future. |
"Even though we have made an efficient chip, it is also probably the most powerful chip in the world at one teraflop. That is one trillion floating point operations per second. In 1996, just 12 years ago, ASCI Red was the world's first teraflop supercomputer, filling a room and taking 500 kilowatts of power to run and another 500kw to cool. Today, this teraflop computer takes just 110 watts of power and is the size of my fingernail. It also costs $199 which is a bit cheaper than the original teraflop computer," he said.
The other key difference is that it allows general purpose computing on a graphics card.
Terry Makedon, manager for product development, explained that the chip, code-named "Makedon", has 956 million transistors and is made on a 55nm process.
The main difference between the Radeon 4850 and 4870 is clock speed (625MHz vs 750MHz) and the use of GDDR3 vs GDDR5.
Cinema 2.0
Makedon demonstrated a glimpse of the future of cinema, a short movie about scorpions and the ATi mascot, Ruby, running away from a robot rampaging through New York. The quality was very good and most life-like, especially for the scorpion demonstration documentary. More significantly, this movie-grade rendering was done in real-time which means 24 frames per second.
Today, Hollywood has huge rendering farms and each frame can take hours to render and weeks and months to render a short story.
"We're trying to work with Hollywood to deliver video that looks more cinematic and more like a movie than anything you've ever seen before," he said.
More than graphics
Makedon said that this GPGPU could do much more than play games and show movies.
Another demonstration was transcoding, using PowerDVD to turn four 1080P WMV files into 1080i MPEG 2 files at the same time at a speed faster than real-time. All the work was done on the GPGPU.
Folding at Home, a distributed medical project by Stanford University uses spare time on home PCs to fold proteins and find new drugs and which genes cause cancer. These calculations are perfect fo the Radeon and even the previous generation R3800 series GPU was already faster than a Playstation 3 with its much vaunted IBM Cell supercomputer-on-a-chip processor.
But the most impressive demonstration was a simple virtual world with "Froglins" (frog/goblins) who go about their daily work of mining virtual gold and piling it in the virtual town centre. The unique point is that the Froglins' artificial intelligence runs not on the CPU, but on the GPGPU. When a ghost appeared in the world, the Froglins all ran away in fear, all based on the AI running on the graphic card's 800 cores.
A new future for programming
AMD is working to a develop a new standard called OpenCL, along with Apple, Intel and nVidia. AMD has provided its Cal software development kit and Brook+ compiler to the project. Software vendors Adobe and Cyberlink (makers of PowerDVD) are also on board, as is Stanford University. In addition to GPGPUS, Open CL will have enormous impact on cloud computing and high end financial programming in the future.
On a parallel track, AMD and Intel are working on the Havoc physics engine to enable the GPGPUs to run physics calculations (such as flying debris and exploding shot bodies) and free up the CPU.
Ayscough said that the engagement with Intel will start at the end of June and Intel is also working on its own massively multicore chip.
"The specific nature of what is better to put on a CPU and what is better to put on a GPU is a new science that has not yet been determined. If you have something that can be split up to work on a massive set of cores that is when the GPU will have an advantage. Where it is a single thread of software that requires a calculation on top of a calculation, it is something that is more linear and will stay on the CPU."
Today, software and hardware companies alike are ploughing massive amounts of money into research into massive multi-core programming and Stanford and other big universities in California are focusing on a multi-core future.
Prev
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Next