Friday, October 28, 2011

ATI Radeon HD 6970 2GB Review

ATI Radeon HD 6970 2GB Review ATI Radeon HD 6970 Review
The previous Radeon HD 6800-series got our hopes up in two ways – firstly it sounded like it’d be more than a mid-range GPU that went toe-to-toe with Nvidia’s GeForce GTX 460 cards, and secondly it wasn’t all that new. But while the Barts GPU of the HD 6800 looked more like an overclocked HD 5830 with a tweaked front-end unit, the Cayman GPU of the Radeon HD 6900 is a completely new design throughout.

Before we plunge into the details of the new GPU, let’s clear up the naming. We’re sticking with calling these cards ATI Radeons because that’s what most people know them as, and AMD has said that it’s fine with it partners gradually transitioning from ATI to AMD Graphics until 2011. We still plan to switch when AMD’s Fusion APU is launched in early 2011, as it’ll be silly to talk about a single piece of silicon that has both AMD and ATI technology.



Previously ATI has reserved the HD x900 family branding for its dual-GPU cards to give a bolder indication to the customer that a dual-GPU card should be much faster than a single-GPU HD x800-series card. However, because ATI needed the HD 5700-series to exist with the HD 6000-series, ‘Radeon HD 6800’ was already taken. The Radeon HD 6950 2GB and Radeon HD 6970 2GB are therefore both single-GPU cards. There is a plan to release a dual-GPU card based on two Cayman GPUs which we assume will be called the Radeon HD 6990 4GB. With the names out of the way, let’s stuck into what’s inside the silicon that makes a Cayman worthy of its HD 6900 name.

A Dual Front-End Design

While Nvidia went nuts with its Fermi design, breaking apart the elements of a typical GPU front-end and scattering them throughout the chip, ATI has been much more reserved. However, we’ve been expecting ATI to get a bit more radical for a while – after all, a GPU with only one tessellator and one setup engine is starting to look a bit anachronistic these days. While the Barts GPU added merely had a tessellator upgrade (to what ATI is called its 8th Gen Tessellator, which allows off-chip buffering) the Cayman design of the Radeon HD 6900-series has two entire front-end units.

There are some obvious advantages of having two front-end engines: you get two setup engines, two 8th Gen Tessellators, two geometry engines and the ability to send twice as much work per clock to the stream processors than before. ATI claims that the HD 6970 2GB has up to three times the tessellation rate of the HD 5870 1GB. These two front-end units can also load-balance the work flowing to them, and ATI has implemented ‘asynchronous dispatch’.

ATI Radeon HD 6970 2GB Review ATI Radeon HD 6970 Review
The new Cayman GPU of the Radeon HD 6970 2GB has two Fron-End Engines.

Asynchronous dispatch is like the ability of Nvidia’s Fermi GPUs to work on two distinct kernels simultaneously. However, ATI says that its technology is ‘completely new in the marketplace’ as it allows multiple, different programs to execute on the GPU at the same time. ‘It’s not like other solutions where you [only] have one program that can spawn multiple kernels to run on the graphics card, you can genuinely have multiple, different applications running on the GPU at the same time’ Dave Bauman, Senior Product Manager, told us.

Click to enlarge

This should make the GPU more flexible for general-purpose work – the GPU will act more like a modern CPU. The two bidirectional DMA (Direct Memory Access) engines of the GPU are also pitched as enhancing the GPU Compute capabilities of the GPU, as they allow two simultaneous reads or writes or a simultaneous read and write per unit.

However, we’re unsure of the need for such advanced capabilities when it comes to gaming. A game runs as a single application, meaning that Nvidia’s technology is perfectly adequate – a game can invoke a DirectCompute kernel and throw it, as well as DirectX shader code, at Fermi GPU without concern.

Where the ATI technology will be useful is if many different applications try to use your graphics card, especially if they don’t have the courtesy to wait until you’ve finished gaming before doing so. It’s not impossible that anti-virus applications could be written in OpenCL, for example, as virus scanning







Upgraded Stream Processors

The four stream processors are the red units,  with the Branch Unit to the left and the General Purpose register below.
The four stream processors are the red units,
with the Branch Unit to the left and
the General Purpose register below.
The doubled-up capability of Cayman’s front-end has been twinned with a new stream processor layout. Even the HD 6800-series Bart GPU used the familiar VLIW5 (Very Long Instruction Word 5-way) which arranged the stream processors in groups of five-wide units.

Each of these units contained one ‘T-Unit’ super-stream processor to handle double-precision calculations and similarly long and high-precision work. ATI arranged 16 of these groups into each SIMD (Single Input, Multiple Dispatch) Engine, of which the Radeon HD 5870 1GB had 20 to give it its 1,600 stream processor count.

The Cayman GPU uses a new VLIW4 design, meaning that it groups its stream processors in fours, for 4-way co-issue. However, ATI has ditched the idea of the T-Unit as all four of these VLIW4 stream processors have the same abilities. This means that the double-precision rate of the HD 6900 is higher than that of previous ATI GPUs – ATI claims it’s one quarter the speed of the single-precision rate design.

Bizarrely, this new arrangement has meant that the new top-end HD 6970 2GB and its dual Front-End Engines actually has fewer stream processors than the Radeon HD 5870 1GB – only 1,536 rather than 1,600. However, ATI says that the VLIW4 layout delivers 10 per cent more speed per mm2, and that the simpler ‘all smart’ stream processor capabilities makes for simpler register and scheduling management. The top-end HD 6970 has 24 SIMD Engines for its total of 1,536 stream processors, while the lesser HD 6950 has 22 SIMD Engines for 1,408 stream processors.

ATI Radeon HD 6970 2GB Review ATI Radeon HD 6900 VLIW4 Stream Processors


New ROPs and Anti-Aliasing

As if doubling the front-end capabilities of the Cayman GPU while radically overhauling the stream processor layout wasn’t enough, ATI has also upgraded the ROPs at the back-end of the GPU. It’s not entirely clear whether the upgrades are just to enable yet another new form of AA or to also boost the performance of current, standard AA techniques. However, ATI claims that write operations can be coalesced, that 16-bit integer operations are two times faster on a HD 6900 than previous GPUs, and that 32-bit floating point operations are 2-4 times faster.

The memory interface is still 256-bit wide, though the GDDR5 memory of the two HD 6900-series card runs comparably fast to deliver plenty of memory bandwidth. The 5.5GHz (effective) memory of the HD 6970 2GB gives it 176GB/sec of memory bandwidth, while the HD 6850 2GB has 160GB/sec of memory bandwidth thanks to its 5GHz (effective) memory. Both cards have 2GB of memory, which is great news for memory-intensive games, high-resolution gaming and using shed-loads of AA.

Click to enlarge

The new AA technique is grandly titled Enhance Quality Anti-Aliasing, and it promises improved AA quality with very little penalty. EQAA is said to have such a small impact on performance because it takes twice as many coverage samples as normal, but deletes most of these values once it’s worked out which is the most accurate and saves only that.

This cuts down on the amount of data being processed and on the stress placed on the memory interface, but delivers better data to work with.

While EQAA is only available on the HD 6900-series, it also supports the Morphological AA technique that was introduced with the HD 6800-series. Oddly, ATi is called Morphological AA ‘MLAA’, and its recommending using MLAA in conjunction with other AA techniques.

For example, 4x MSAA is very good at picking up fine lines that aren’t quite one pixel wide (as it’s sampling four points within each pixel). ATI says that even 4xMSAA produces some jagged edges, which MLAA can smooth out.





PowerTune

Judging by the amount of time we and fellow journalists spent asking questions about this new feature, we’ll try to take out time with this one and go for clarity. PowerTune is being pitched by ATI as a way to deliver more performance in typical applications by managing the maximum power draw of the GPU, and therefore the card. That’s true… from a certain point of view.

Just as Nvidia did with the GeForce GTX 570 1.3GB and the GeForce GTX 580 1.5GB, ATI has set an upper power draw limit for the two cards of the HD 6900-series. If at any time the GPU starts to consume any more than this, the GPU will automatically clock itself down to reduce the power draw. ATI says this allows it to set higher frequencies for its GPUs, as it no longer has to consider worst-case, thermal-virus-like applications.

For example, the HD 6970 2GB has a default frequency of 880MHz and it should run at this frequency whether it’s churning through World of Warcraft, Arma II or Bad Company 2. However, if the GPU should encounter something a little more taxing (and no, Crysis doesn’t count) the GPU might drop to a frequency of 800MHz. Previously, ATI would have had to set the frequency to 790MHz, but now it can deliver 90MHz more of gaming performance, hence the claims that PowerTune helps to deliver more performance.

ATI Radeon HD 6970 2GB Review ATI PowerTune, and Radeon HD 6900 Dual BIOSes ATI Radeon HD 6970 2GB Review ATI PowerTune, and Radeon HD 6900 Dual BIOSes


However, there are a few differences between Nvidia’s and ATI’s technologies. For a start, ATI’s technology is entirely hardware-based. There are activity monitors in every part of the GPU, ‘based on an algorithm, we calculate how much power the GPU is actually drawing at any one time… By having [PowerTune] as something completely calculated on the GPU, this also means we can accommodate any future application that may have a higher power draw as well’ Bauman told us.

Even better, ATI Catalyst Control Center gives you control over the maximum power draw of your graphics card. You can crank the upper limit up by 20 per cent to give the HD 6970 2GB the maximum 300W of the PCI Express 2.0 standard and allow maximum performance in even in the nastiest of applications (no, Crysis still doesn’t count). Alternatively, you could drop the power draw limit by up to 20 per cent to save power when playing undemanding games.

Twin BIOSes, Display Outputs and a Vapour Chamber Cooler

All HD 6900-series cards should have two BIOSes, as ATI thinks that a significant number of owners of high-end graphics cards like to flash their vBIOS with updated clock speeds and fan management profiles. Sometimes these vBIOS flashes go wrong, though, as you’re left with a particularly expensive desk ornament. A switch on a HD 6900-series card allows you to switch to a safe back-up vBIOS.

ATI Radeon HD 6970 2GB Review ATI PowerTune, and Radeon HD 6900 Dual BIOSes ATI Radeon HD 6970 2GB Review ATI PowerTune, and Radeon HD 6900 Dual BIOSes
HIS and XFX managed to drop us some cards, and Sapphire has a
special bundle for its HD 6970 2GB card that we'll cover separately.

At this point it seems that the two vBIOS are distinct, and the safe vBIOS won’t overwrite a bricked second chip as some motherboards do. We were going to test this, but we thought we’d better
the cards before breaking them.

HD 6900-series cards should ship with the same output configuration as the HD 6800-series – two DVI (annoyingly, only one of which is dual-link) plus two mini-DisplayPort 1.2 outputs and a HDMI 1.4a. The latter allows 3D, while DisplayPort 1.2 allows daisy-chaining via screens that support this or a Multi Stream Transport (MST) hub. There’s still no news on when we’ll see an MST hub, or how much it’ll cost; we’ve also not heard anything about a HD 6000-series Eyefinity 6 card

ATI has used what it describes as a ‘massive’ vapour chamber cooler for the HD 6900-series, which should help keep down the noise while providing top-notch cooling. Skip to the HD 6970 Thermals page to see how effective this cooler is.






Radeon HD 6970 2GB Specifications

Click to enlarge

While the Barts GPU of the HD 6800-series was a bit same-old, same-old, the new Cayman GPU of the HD 6900 is a big change. For a start, there’s two entire Front-End Engines, so the Cayman GPU has two triangle setup units, two tessellators, two rasterisers and can send twice as much work per clock to the stream processors as previous Radeon GPUs.

Those stream processors have been upgraded, so they’re all capable of the high-precision work that only a fifth of the stream processors of previous generations were able to calculate. They’re organised in groups of four, with 16 of these groups per SIMD Engine (or stream processors cluster).

While this means that the HD 6970 2GB has fewer stream processors than the HD 5870 1GB, they’re more capable. ATI says the new layout, called VLIW4, is 10 per cent faster per mm2.

The ROPs have seen upgrades too, and both HD 6900-series cards will have 2GB of GDDR5 memory running at comparably high frequencies to give the large amount of memory bandwidth from the 256-bit memory interface.

Nvidia GeForce GTX 580 1.5GBATI Radeon HD 6970 2GBNvidia GeForce GTX 480 1.5GBATI Radeon HD 6950 2GBATI Radeon HD 5870 1GBATI Radeon HD 6870 1GB
GPU
CodenameGF110Cayman XTGF100Cayman ProCypress XTBarts XT
Frequency772MHz880MHz700MHz800MHz850MHz900MHz
Stream Processors512 (1,544MHz)1,536 (880MHz)480 (1,4GHz)1,408 (800MHz)1,600 (850MHz)1,120 (900MHz)
Layout16 SMs, 4 GPCs24 SIMD Engines15 SMs, 4GPCs22 SIMD Engines20 SIMD Engines14 SIMD Engines
Rasterisers424222
Tesselation Units16215211
Texture Units649660888056
ROPs483248323232
Transistors3bn2.6bn3bn2.6bn2.15bn1.7bn
SizeXXXmm2Unknown530mm2Unknown334mm2255mm2
Process40nm40nm40nm40nm40nm40nm
Memory
Amount1.5GB GDDR52GB GDDR51.5GB GDDR52GB GDDR51GB GDDR51GB GDDR5
Frequency1.02GHz (4.08GHz effective)1.375GHz (5.5GHz effective)924MHz (3.7GHz effective)1.25GHz (5GHz effective)1,050MHz (4.2GHz effective)1,050MHz (4.2GHz effective)
Interface384-bit256-bit384-bit256-bit256-bit256-bit
Bandwidth192.4GB/sec176GB/sec177GB/sec160GB/sec134.4GB/sec134.4GB/sec
Card Specifications
Power Connectors1 x 6-pin, 1 x 8-pin1 x 6-pin, 1 x 8-pin1 x 6-pin, 1 x 8-pin2 x 6-pin PCI-E2 x 6-pin PCI-E2 x 6-pin PCI-E
Maximum Power Draw244W250W250W225W188W151W
Idle Power DrawUnknownUnknownUnknownUnknown27W19W
Recommended PSU600WUnknown600WUnknown500WUnknown
Typical Street Price£399£310£330£235£320£220

No comments:

Post a Comment