Home / Game consoles / Video cards. Test configuration, tools and testing methodology

Video cards. Test configuration, tools and testing methodology

AMD has decided to start a new battle for dominance in the GPU market. But today, both manufacturers, AMD and Nvidia, have to face new challenges and work in new conditions. In particular, AMD has to switch to new technology 28 nm production, and a completely new GPU architecture, as it turned out. NVIDIA also plans to move to 28 nm, but only in a few months, and with a new architecture. But AMD was the first, and in our article we will talk about the new generation of GPUs in the form of the AMD Radeon HD 7970.

AMD believes that PC gaming is in for a boom, and in the short term - especially given that consoles are updated for quite a long time. And since modern graphics engines benefit from the capabilities of advanced graphics cards, this development will only intensify. The PC games market was worth $15 billion last year and is expected to grow to $20 billion by 2013. And don't forget that gamers today prefer to play at ever higher resolutions. 1080p resolution has already become the de facto standard, which is reinforced by rapidly becoming cheaper displays with large diagonal. In addition, AMD is focusing on higher GPU efficiency and GPU computing capabilities. The latter area is very important for AMD today as the company wants to work around the limitations found in Cayman architecture GPUs.


At the moment, AMD has only introduced the Radeon HD 7970, as you can see on the slide, but new graphics cards should appear in the Radeon HD 7900 line soon.

NVIDIA GeForce GTX 570

NVIDIA GeForce GTX 580

AMD Radeon HD 6950 AMD Radeon HD 6970 AMD Radeon HD 7970
GPU GF110 GF110 Cayman PRO Cayman XT Tahiti XT
Process technology 40 nm 40 nm 40 nm 40 nm 28 nm
Number of transistors 3 billion 3 billion 2.6 billion 2.6 billion 4.3 billion
Crystal area 530 mm² 530 mm² 389 mm² 389 mm² 365 mm²
GPU clock speed 732 MHz 772 MHz 800 MHz 880 MHz 925 MHz
Memory clock 950 MHz 1000 MHz 1250 MHz 1375 MHz 1375 MHz
Memory type GDDR5 GDDR5 GDDR5 GDDR5 GDDR5
Memory 1280 MB 1536 MB 2048 MB 2048 MB 3072 MB
Memory bus width 320 bit 384 bit 256 bit 256 bit 384 bit
Memory Bandwidth 152 GB/s 192 GB/s 160 GB/s 176 GB/s 264 GB/s
Shader Model 5.0 5.0 5.0 5.0 5.0
DirectX 11 11 11 11 11.1
Number of stream processors 480(1D) 512(1D) 1408 (352 4D) 1536 (384 4D) 2048(1D)
Clock speed of stream processors 1464 MHz 1544 MHz 800 MHz 880 MHz 925 MHz
Number of texture blocks 60 64 88 96 128
Number of ROPs 40 48 32 32 32
Maximum power consumption 219 W 244 W 200 W 250 W 250 W
Minimum power consumption - 30-32W 20 W 20 W 2.6W
CrossFire/SLI SLI SLI CrossFireX CrossFireX CrossFireX

The Radeon HD 7970 graphics card is based on the "Tahiti XT" GPU, which is manufactured using a 28nm process technology. Total GPU has 4.3 billion transistors. For comparison, Intel processors Sandy Bridge-E (excluding quad-core models) has 2.27 billion transistors. And the predecessor from the Cayman Radeon HD 6900 family worked with 2.6 billion transistors. The crystal area is 365 mm². As you can see, the area is slightly less than 389 mm² for the "Cayman" GPUs, which are manufactured using the 40-nm process technology. NVIDIA's GF110 GPU contains 3 billion transistors in an area of ​​530 mm². Most of the GPU transistor budget was spent on 2048 stream processors. GPU and stream processors operate at a clock speed of 925 MHz. AMD decided to keep the same memory as the Radeon HD 6970, i.e. GDDR5 at 1375 MHz. But the memory interface has been expanded from 256 bits to 384 bits, increasing the memory bandwidth to 264 GB/s. In addition, the capacity has increased from 2048 MB to 3072 MB. The Radeon HD 7970 has 128 texture units and 32 raster operations pipelines (ROPs) - we get an increase in texture units compared to the Radeon HD 6970, but the number of ROPs remains the same. AMD has listed the maximum power draw for the Radeon HD 7970 at 250W, which is also the limit for PowerTune. Typical graphics card power consumption is 210W. Let's recall that the Radeon HD 6970 had a maximum power consumption of 250 watts, and a typical one under load - 190 watts. Thanks to ZeroCore Power technology (more on that below), power consumption in idle mode does not exceed three watts.

GPU-Z 0.5.7, as you can see in the screenshot, does not display all AMD Radeon HD 7970 data correctly. On our test system, the Socket 1366 interface was listed as PCI Express 3.0 x16 and the clock speed is 500 MHz. Also given incorrect values ​​for pixel and texture bandwidth. The correct values ​​are 925 MHz for the GPU, 29.6 Gpixel/s and 118.4 Gtexel/s.

Rumors about the release of the updated video card Radeon HD 7970, and at Computex 2012 only the lazy did not talk about it. Of course, we mean the Radeon HD 7970 GHz Edition. Meanwhile, AMD has been producing "Southern Island" processors in 28nm at TSMC for several months now, which is enough time to optimize the manufacturing process and increase the yield of chips. Especially since high performance NVIDIA's GeForce GTX 680 forced AMD to look for a new faster version of the Radeon HD 7970 to compete with. In our review, we will look at how worthy an opponent the Radeon HD 7970 GHz Edition will become compared to the GeForce GTX 680, what improvements we will get compared to the standard HD 7970 model.

Manufacturers who have already made a name for themselves by releasing factory overclocked graphics cards are planning to do the same with the new Radeon HD 7970 GHz Edition. AMD has clearly set its sights on being able to push GPU frequencies above the 1GHz bar while maintaining the same voltage levels as the original model. This applies to both manual overclocking by enthusiasts and factory overclocking by video card manufacturers. The "old" model Radeon HD 7970 will be on sale for now, but AMD is positioning the GHz Edition one step higher in performance and, accordingly, in price

The technical specifications are shown in the following table:


NVIDIA GeForce GTX 680 AMD Radeon HD 7970 AMD Radeon HD 7970 GHz Edition
Retail price about 460 euros in Europe
about 18.5 thousand rubles in Russia
about 380 euros in Europe
about 17 thousand rubles in Russia
$499
Products webpage NVIDIA AMD AMD
Technical specifications
GPU GK104 (GK104-400-A2) Tahiti XT Tahiti XT2
Process technology 28 nm 28 nm 28 nm
Number of transistors 3.54 billion 4.3 billion 4.3 billion
GPU clock speed 1006 MHz (Boost: 1058 MHz) 925 MHz 1000 MHz (Boost: 1050 MHz)
Memory clock 1502 MHz 1375 MHz 1500 MHz
Memory type GDDR5 GDDR5 GDDR5
Memory 2048 MB 3072 MB 3072 MB
Memory bus width 256 bit 384 bit 384 bit
Memory Bandwidth 192.3 GB/s 264 GB/s 288 GB/s
DirectX Version 11.1 11.1 11.1
Stream Processors 1536(1D) 2048(1D) 2048(1D)
texture blocks 128 128 128
ROP 32 32 32
Pixel Fill Rate 32.2 Gpixel/s 29.6 Gpixel/s 33.6 Gpixel/s
Minimum power consumption 15 W 2.6W 2.6W
Maximum power consumption 195 W 250 W 250 W
SLI/CrossFire SLI CrossFire CrossFire

Architecturally, the new version of the GHz Edition does not differ from the Radeon HD 7970. AMD relied only on process optimization, the ability to operate the GPU at a lower voltage, which made it possible to increase the nominal GPU clock frequency from 925 MHz to 1000 MHz. Interestingly, 1000 MHz corresponds to the base frequency, since AMD has implemented a Boost mode. It will increase the clock frequency to 1050 MHz for the Radeon HD 7970 GHz Edition video card. That is, compared with the initial frequency of 925 MHz, we get an overclock of 13.5 percent.

It's also nice that the "Tahiti XT2" GPU in idle mode runs on only 0.807 V. The Radeon HD 7970, recall, the voltage was 0.85 V. Under load, the clock speeds increase to the promised level of 1050 MHz by AMD, while the GPU voltage 1.201 - 1.221 V. The "old" Radeon HD 7970 GPU ran at 1.139 V.

The Powertune mechanism is well known from previous generations of GPUs. But in the case of the Radeon HD 7970 GHz Edition, AMD's Powertune technology gives the Boost clock speed a boost. In addition to the previously known "High P-State", AMD is adding another "Boost P-State" P-state. It allows you to get even higher clock speeds, which are made possible by dynamic voltage changes.

But, unlike NVIDIA, AMD does not indicate the minimum Boost mode - it is fixed at 1050 MHz. In addition, technology known from Trinity processors is used to work. Namely, "Digital Temperature Estimation", which evaluates the load in advance and sets the clock frequencies accordingly. At the architectural level, the Tahiti chips in the two Radeon HD 7970 video cards do not differ from each other. Thus, Powertune is implemented through VBIOS and a driver; theoretically, the technology can also work on older video cards.

The memory was also overclocked. As you can see from the specs above, the VRAM is clocked at 1500 MHz, increasing throughput from 264GB to 288GB per second. Due to the wider memory interface, AMD was able to break away even further from NVIDIA in this regard.

The theoretical performance of the novelty is 4.3 teraflops with single precision and 1.08 teraflops with double precision. NVIDIA recently announced the Tesla K10 compute accelerator based on two GK104 GPUs, which delivers 4.58 teraflops of single precision performance. But the GK104 double precision performance is 1/24 of that of single precision. This situation will only change with the GK110 and Tesla K20 chip, when we can expect a threefold increase in double precision performance. Thus, if the Fermi-based Tesla M2090 gives 665 gigaflops, then the GK110 can be expected to perform 1.5 teraflops or more.

For more details on the "Graphics Core Next" architecture and the "Southern Island" generation, we recommend that you refer to our .

So the time has come to replace the Palit GeForce GTX 460, which has worked with honor for three years. As a replacement, I chose the flagship from Radeon - HD 7970 from Asus. It turned out to be very difficult to find a card on this chip, there was a large shortage in stores, especially in our Far East. Managed to purchase ASUS Radeon HD 7970 DirectCU II only for 18,000 rubles, which today, unfortunately, is quite a lot.

The main hope: that the video card will justify its price, having pleased me with its performance.

Video card specification:

Packaging and equipment

A large box with a branded knight immediately attracts the attention of a potential buyer. The manufacturer boasts proprietary DirectCU II cooling system, a unique VGA HotWire function that allows you to connect it to the ROG series motherboard. Likewise we see important information, which you need to consider when buying: a power supply unit from 600 watts. with a current of 42A along the + 12v line.

In the box, the video card is securely packed, and careless transportation is not terrible for such valuable contents.

The kit includes a disk with drivers and utilities, among which there is GPU Tweak, which I later used.

Detailed instructions with color pictures. A flexible CrossFireX bridge, an adapter from DVI to HDMI, there is no such output on the board itself. An adapter for an 8-pin PCI-E power connector, not all PSUs have two such connectors. And there is also a heatsink that can be glued with double-sided tape to the power stabilization unit if you install liquid cooling.

There were no bonuses in the form of games or keys to games.

Appearance

The card looks large and solid, it occupies three expansion slots. But it fit into my new building without any problems, and it is quite spacious there.

You can connect up to 6 monitors to the video card, for this there are 4 Display ports, and two DVI. But one Display port works if one DVI port is switched to Single -Link mode by a special switch.

The proprietary cooling system makes the video cards of this manufacturer "twins": red stripes in the center, two fans, and a backplate that prevents the textolite from bending and takes on the weight of a massive cooling system.



Now it is clear why there is a knight on the box: the video card is all clad in thick, powerful armor.

Power is supplied to the board through two eight-pin connectors, which should provide a power reserve.
The card is equipped with 3 GB of GDDR5 video memory, initially running at 5500 MHz. There is a 384-bit wide bus between the memory and the Tahiti XT chip. The chip is manufactured according to the 28 nm process technology and includes 2048 unified pipelines, as well as 32 rasterization units.

The cooling system consists of a top block with two 90mm fans.



The fans work on two aluminum heatsinks that draw heat away from the six heatpipes. The effectiveness of such a system has long proven itself, and I will check it in operation using MSI Afterburner.

Testing

Test stand:

I tried the card in this case. Here, when the fans are turned on at 100%, the noise is clearly audible in the form of a powerful buzz. When overclocking the core to 1100 and the memory to 1500, the card gave out 615 MHash when mining LTC. With the current complexity and course, this is $ 100 per month, which is clearly not cost-effective.

conclusions

The impressions of the card are very positive, I think that I found a worthy replacement for my old video card. The fans, even under load, do not accelerate by 100%, so in a good case they are almost inaudible. The temperature does not rise above 70 degrees, and the heating of the card does not affect other components. In games at high settings, the card produces a very playable number of frames per second. When this is not enough, the video card can be overclocked, which will increase its performance by twenty percent.
Well, the disadvantages listed below are relative. For a spacious case, the size of the video card does not matter, but you can more efficiently arrange a cooling system for three slots. The price is also relative; when today I went to the grocery store with my wife, I realized that everything was fine, no overpayment: - ((|=:

Advantages:
Quiet
productive
Good overclocking potential, able to increase performance up to 20%
Efficient cooling

disadvantages
Large size, won't fit in any case
High price

At the very end of last year, AMD revealed the source code for its new GPU architecture, called the Southern Islands. One of the first incarnations of this innovation was the SAPPHIRE HD 7970 3GB GDDR5 graphics card.

This architecture was the product of some decline in the development of 28 nm technology and was called by AMD representatives nothing less than revolutionary and designed for 1.4x acceleration relative to the previous generation. In addition, in SAPPHIRE HD 7970 we get PCIe 3 support, 3 GB of high-speed GDDR5 memory, DX 11.1 compatibility, Power Tune, Zero Core and Eyefinity 2.0 technology support, which has gained new functions and features. The new core from AMD, called the Graphics Core Next Tahiti, is a move from VLIW design to a non-VLIW SIMD engine, which means higher compute performance.



This new core features a significantly increased transistor count (4.31 milliadres), 2048 stream processors with 32 raster units, 128 texture units, and a 384-bit high bandwidth memory bus that delivers a fold increase in compute power and memory bandwidth. All of these features look more than impressive on paper and should take the gaming experience to the next level.

Characteristics of SAPPHIRE HD 7970

exits 1 x Dual Link DVI
1 x HDMI 1.4a
2 x Mini-DisplayPort
Display Port 1.2
GPU Core clock 925 MHz
28 nm chip production technology
Number of stream processors - 2048
Memory Volume - 3072 MB
Type - 384-bit GDDR5
Efficiency - 5500 MHz
Dimensions 275(l)x115(w)x36(h) mm
ON CD with drivers
SAPPHIRE TriXX Utility
Accessories CrossFire™ Bridge Interconnect Cable
Power cable 8 PIN to 4 PIN
Mini Display Port to HDMI adapter
Mini DP to SL-DVI Passive adapter
Power cable 6 PIN to 4 PIN
HDMI to SL-DVI adapter
HDMI 1.4a high speed cable (1.8 meters)
Mini DP to SL-DVI Active adapter

SAPPHIRE HD 7970: Tests

The SAPPHIRE HD 7970 test was compared with other devices of the same class and consisted of a complex game tests and a synthetic benchmark. The cards selected for comparison are nominally either equal or nominally superior in performance to the HD 7970, so the test results should fully reflect the real performance.

The configuration and settings of the system will not change during all tests. Video cards will be tested first at stock speed and then in overclocked configuration (description of HD 7970 overclocking process and results is given below) in order to evaluate the effectiveness of device acceleration. The 11.12 Catalyst driver was used for AMD cards, and 290.53 for NVIDIA based cards.

System under test configuration:

  • CPU: Core i7 2600K @ 4.4GHz 100x44
  • CPU cooling: Corsair Hydro Series H100
  • mother card: Gigabyte Z68AP-D3
  • Memory: Mushkin 991996 Redline PC3-17000 9-11-10-28 8 GB
  • video card: Sapphire Radeon HD 7970
  • Power Supply: Corsair AX1200
  • HDD: 1 x Seagate 1TB SATA
  • optical drive: Lite-On Blu-Ray
  • Operating system: Windows 7 Professional 64-bit

Comparable video cards:

  • XFX HD 6970
  • ASUS HD 6950
  • ASUS GTX 580 Direct CU II
  • ASUS GTX 570 Direct CU II
  • Sapphire HD 6990
  • ASUS GTX 590

Gaming test: Metro 2033

Part FPS, part horror, Metro 2033 is powered by the 4A Engine with support for DirectX 11, NVIDIA PhysX and NVIDIA 3D Vision.

Settings:

  • DirectX 11
  • 16xAF
  • Global settings = high
  • Physx = on




In Metro 2033, the SAPPHIRE HD 7970 showed very strong results at both resolutions, both stock and overclocked.

Playtest: Battlefield 3

Battlefield 3 is a first-person shooter developed by EA Digital Illusions CE and powered by the Frostbyte 2 engine. This game was released on October 25, 2011. It supports DirectX 10 and 11.

Settings:

  • 4x AA to CP
  • 16X AF in CP
  • Game Settings = High


Compared to the previous generation Cayman-based HD 6970, the Tahiti-based HD 7970 showed a significant power boost in this game.

Dirt 3 gaming test

Dirt 3 is the third game in the legendary racing series developed by Codemasters. It is built on the EGO 2.0 engine. The release took place in May 2011.

Settings:

  • 4xAA
  • 16AF in CP
  • Settings = Ultra


In this game, released with an “AMD” marker on the box, by the way, the HD 7970 was at the level of the GTX 580. Overclocking helped the GTX 580 more than the HD 7970.

Testing with a synthetic benchmark 3DMark 11

3DMark 11 is the latest Futuremark in the 3DMark series, adapted for testing Microsoft DirectX 11 systems. This program consists of six tests, four of which are for graphics testing, one for physics simulation and one combined. For testing on a physical model, the Bullet Physics library is used. Two demos are supplied with the benchmark, both of them are based on tests, but unlike tests, they contain basic audio.

Settings:

  • Default test settings
  • Initial test 1024 x 600
  • Performance test 1280 x 720
  • Extreme test 1920 x 1080

In the 3DMark11 benchmark, the SAPPHIRE HD 7970 scored higher than the GTX 580 in both stock and overclocked configurations.

In the course of temperature tests, it was found that SAPPHIRE HD 7970 both at stock frequencies and in overclocked state showed values ​​8 degrees lower than the latest generation of HD 6970 cards, which is an excellent result for such a powerful device.

At regular and higher frequencies, Zero Core technology perfectly reduces power consumption in standby mode. Under load without increasing the processor voltage, the overall power consumption of the card does not noticeably increase.

Overclocking

From the official releases of cards from AMD, the nominal core speed of which exceeds 1000 MHz, we can conclude that with the new Southern Islands Tahiti we have a great overclocking prospect. In fact, 1000 MHz is just a starting point and it looks like the card will be able to go beyond the limits set in Catalyst Control. Achieving 1125 MHz on the core is provided just by rearranging the voltage applied to it using the settings available from CC. By exposing the supplied voltage to the memory to the CC limits, this node was brought to a speed of 1575 MHz. These frequencies indicate that there is at least another 200 MHz left in both GPU cores and GDDR5 memory. These are very good indicators. Without additional voltage applied, the temperature on the GPU did not rise significantly. When manually setting the fan speed to 100%, the temperature of the overclocked card did not exceed 57 degrees. Next, anyone will have to look for utilities (for BIOS or software) in order to exceed the CC limits and see what the video card is really capable of. It's worth noting that speeding up the fan on AMD cards always helps keep temperatures down, but only at the cost of a serious increase in noise levels. In the case of the SAPPHIRE RADEON HD 7970, AMD improved both cooling and noise performance with a new cooler design.

Let's summarize our overclocking: 200 MHz is a 21% increase on the core and about 15% of the memory clock frequency at the first overclocking stage, we can talk about the bright future of the video card.

Reviews: pros and cons

When we try to understand whether the new release provides us with everything that we wanted and expected from it, we understand that the new video card not only surpasses previous generations of devices, but also leaves most of the direct modern competitors behind. Review of SAPPHIRE HD 7970 - the video card is extremely convincing. It easily outperforms the Northern Islands Cayman-based HD 6970 and the Nvidia GTX 580 in just about every test. At the same time, even gaming performance is already impressive at stock clock speeds, and the space provided by the device for overclocking opens up really exciting prospects. We were able to easily push the GPU core and memory speeds to the AMD Catalyst Control Center limits and set them to 1125 MHz core and 1575 MHz memory - both nodes achieved 200 MHz gains effortlessly. This extra power allows a single card to play Eyefinity technology at resolutions up to 5760 x 1080. The new architecture of the SAPPHIRE HD 7970 card supports new version Eyefinity 2.0 technology, which offers a number of enhancements, including individual media channels for each output, a new 5x1 monitor configuration, and more.

It is worth noting the improved performance of the AMD cooling system. Both at stock and overclocked temperatures, the HD 7970 was about 4 degrees Celsius lower than the HD 6970 at stock frequencies in idle mode and 8 degrees in other modes.

Although the HD 7970's power consumption was higher than the HD 6970's under load, AMD's ZeroCore technology helped reduce power consumption by about half when idle.

The cost for all of the HD 7970's goodies is around $550, which may come as a surprise to some buyers. But for this money, you get a really powerful card that far outperforms its competitors, including the HD 6970. If you search, you can buy two HD 6970s for about $50 less than the indicated one and get performance at the HD 6990+ level, paying more than the money price high level noise and power consumption. Buying the SAPPHIRE HD 7970 3GB GDDR5, you get the fastest video card with a single GPU today, which will easily and without brakes run any modern game! AMD and partners have made a great product again!

Pros:

  • Fastest single GPU graphics card
  • Excellent overclocking capabilities
  • High performance
  • Playing with Eyefinity
  • New architecture
  • Zero core technology
  • Noise reduction

Minuses:

  • Fan is still loud at 100% speed

Views: (1943)

Introduction

AMD's graphics processing unit (ATI) architecture hasn't changed much since the Radeon HD 2000 series, with the VLIW design up until the HD 6000. What it is? First, let's remember how the central processor works in our personal computers. Modern CPUs are superscalar, that is, their computing units can execute several instructions from one thread at the same time. But the instructions must be independent of each other, so the processor constantly checks when it is possible to perform parallel operations, and when it is necessary to wait for the next dependency to be resolved. In addition, the CPU does branch prediction and can do some of the work in advance (out-of-order). Optimizing these functions is a complex technical task, and the circuitry they are built on takes up a good part of the CPU die.

But there is another way: to set the order of execution of instructions at the stage of code compilation. The compiler itself finds instructions that can be executed simultaneously, and forms long compound constructions from them. Hence the term VLIW - very long instruction word. VLIW generally shows high efficiency when the code contains few dependencies and the program flow is predictable. The compiler "knows" the code from beginning to end and can set the execution of certain fragments with a large margin of time. But planning turns out to be tough, and in the case when the course of the program depends on external data, ingenious compilation does not help much, the execution units are idle and performance goes down.

But rendering 3D graphics is a predictable task and parallelizes well. Therefore, the bet on VLIW, which was then made by an independent Canadian company, fully justified itself. By shifting the functions of the scheduler to the compiler, ATI could make relatively compact chips with crazy hundreds of execution elements inside, and as a result, video cards turned out to be relatively inexpensive. AMD's high point for VLIW came during the 5,000-series Radeon HD, when the debut of NVIDIA's Fermi architecture (GeForce 400) stalled a bit. And no wonder, because the "greens" have to make huge chips, up to three billion transistors. And even now, when the Fermi architecture is already working at full capacity in the GeForce 500 adapters, and the top NVIDIA accelerators beat AMD products in benchmarks, the 6000th Radeons still provide excellent gaming performance.

In that case, why did AMD decide to take such a sharp turn? It would seem that it would be enough to polish the design of the GPU a little, increase the computing units here and there, introduce a thinner technological process— and VLIW will live happily ever after. Why waste time and money developing a completely new architecture? But it's not just about the games. GPUs are slowly evolving from pure 3D rendering devices to general purpose GPUs (GPGPUs) that can be used for any massive parallel computing. However, today it turned out that if we say GPGPU, then we mean CUDA. Neither the native "red" API called ATI Stream, nor Open CL are as popular as NVIDIA's CUDA. Meanwhile, AMD really wants to take a bite out of this market, but for this to be possible, the good old VLIW architecture will have to be abandoned. It is not suitable for non-graphics calculations, because they are less predictable than 3D rendering, and the GPU is simply not able to work to its full potential.

Graphics Core Next architecture

Let's take the latest representative of AMD's VLIW architecture, the Cayman processor, which underlies the Radeon HD 6950/6970/6990 adapters. The main component of the shader domain is the SIMD Engine - a block of sixteen stream processors. All of them simultaneously execute one VLIW instruction, but in relation to different data (that's why SIMD - single instruction, multiple data). In turn, up to four scalar operations can be packed in one VLIW instruction, which corresponds to four ALUs inside one stream processor.

The building block of the Graphics Cores Next (GCN) is called the Compute Unit, and it works quite differently. It also has 64 ALUs, but they are divided into four separate vector SIMD modules of 16 each plus a scheduler block. Simply put, parallelism used to be implemented through several operations in a single instruction, and now through several separate SIMD blocks. And if the performance of the old architecture depends on how many scalar operations the compiler can code in one VLIW instruction, then the Compute Unit in the GCN core can dynamically distribute the load between SIMD blocks.

The load for parallel execution in the SIMD block comes in the form of an array (wavefront) of 64 instructions, which is executed in four cycles. And although only four arrays can be in operation at the same time, another 28 are directly accessible from the Compute Unit, due to which the scheduler gets room for maneuver. In a situation where a dependency in the code prevents the combined SIMD block of the VLIW processor from working at full capacity, the individual SIMD blocks of the GCN chip will simply switch to other arrays from the same task or completely different tasks.

The highlight of GCN is a separate scalar unit in each Compute Unit. It is intended for one-time operations that do not fit into the wavefront (which will save SIMD modules from inefficient use), and also for program execution control: conditional branches, transitions and other events that Cayman had difficulty digesting. The scalar module performs one operation per cycle.

cache memory

The new execution module design requires faster and larger cache memory compared to the VLIW design. Each CU has a separate 16KB L1 cache plus 16KB and 32KB storage for instructions and data shared by the four CUs, a buffer to share data between arrays. There is also a fully coherent L2 cache, divided into portions of 64 KB between dual-channel memory controllers. It stores copies of the above buffers

The L1 and L2 cache buses are 64 bytes wide. AMD reports that L1 throughput reaches almost 2 TB/s, and L2 - 700 GB/s, and, apparently, this means the total value for a processor with 32 CU.

For comparison: in Cayman, each SIMD module has an L1 cache of 8 KB with a 16 byte bus.

Geometry processing, rasterization

AMD's presentations accompanying the release say little about the actual graphic components of the chip. Judging by the block diagram, their internal structure has not changed, only the Tesselator has been upgraded to the ninth version and provides a huge increase in performance in the corresponding tasks.

Meanwhile, if you believe information from third-party sources and slides from AMD itself from the June Fusion Development Summit, then the Geometry Engine and Tesselator look completely different from the inside. Like Cayman, the GCN core contains two Graphics Engines, but if earlier they consisted of separate blocks for rasterization, tessellation, and so on, now each GE can have an arbitrary number of pipelines for processing pixels and geometric primitives.

Probably, such a design will help the manufacturer to easily increase the graphics power or release budget GPUs that are cut in this area. Fast work with geometry will come in handy in modern games.

PCI-E3.0

The headline speaks for itself: AMD has implemented a new generation PCI-E bus with twice the bandwidth. It is not clear whether it is needed today for 3D rendering, but for non-graphical calculations it will certainly come in handy. AMD has made a lot of innovations in the GCN architecture with a long eye on such applications and a special graphics feature that also fits perfectly with the new interface.

New FeaturesGCN

GCN has two additional command distribution units called the Asynchronous Compute Engine, which operate completely independently of each other and the GPU. AMD plans to open access to ACE through Open CL, and then programmers will have three individual devices, each with its own command queue. In addition, according to third-hand information, ACE provides out-of-order execution at the level of individual tasks. The CUs themselves, though smarter than the SIMD modules of the VLIW architecture, can process their wavefronts strictly in direct order.

The GCN core and the computer's CPU may share a common address space. In this case, all instructions that get executed by the GPU point to addresses in the x86-64 space, and it will independently recode them into local video memory addresses using a special module. As a result, the GPU gets direct access to the system memory. In addition, the GCN core was endowed with a number of functions to support high-level languages: virtual functions, pointers, recursion, and so on. This will allow programmers to write generic code suitable for execution on the CPU or GPU.

The new GPUs are fully compatible with the OpenCL 1.2 API, DirectCompute 11.1 (and DirectX 11.1 per se) and C++ AMP. Appeared special instructions useful for the production of multimedia content. In addition, chips based on the GCN architecture are the first GPUs with an integrated H.264 video encoder, which can be used as soon as AMD releases the required software library.

In turn, the decoder has acquired support for several additional formats: MVC, MPEG-4/DivX and Dual Stream HD + HD. In general, Radeon video cards were strong in terms of video playback back in the days of ATI. The seven thousandth series has a lot of picture “improvers”, for example, the Steady Video algorithm that eliminates camera shake.

Partially Resident Textures is another trick with virtual memory, which is already intended for 3D rendering: an application or shader works with an address space that exceeds the amount of on-board memory of the adapter, and it itself acts only as a fast cache. Thus, you can use textures up to 32 TB, portions of which the GPU will dynamically pump closer to itself. OS support is not required for this.

Brakes that will inevitably occur when loading textures from system memory, AMD partly compensates by using MIP mapping. The giant texture will probably be stored in several versions with different resolutions (mipmaps). Each of them is divided into fragments of 64 KB. When the adapter needs a certain fragment, and it is already in the local video memory, then there is no problem. If there is no fragment, then the program can immediately pull it from the system memory, or it can postpone reading and take the corresponding low-resolution copy of the fragment for the current frame (if it is already in the video memory).

A small addition to the question of tessellation. GCN implements the Ptex (Per-face texture mapping) algorithm. In general, in 3D modeling, the texture is applied to the entire model, and the vertices must be carefully aligned with the desired areas of the 2D canvas. It is not difficult to imagine how hardware tessellation, which produces additional vertices, complicates the designer's task. When using Ptex, a separate texture is applied to each polygon, as a result, there are no visible joints. In addition, Ptex allows you to pack textures with different resolutions into one file.

Finally, AMD did some work on anisotropic filtering to eliminate the subtle flickering on high resolution textures. Changing the algorithm should not affect performance.

Energy management

AMD notes that GPU and video card manufacturers always play it safe on power consumption and set clock speeds to take into account peak load, which is possible only in the most greedy applications or even in stress tests (FurMark. OCCT). And in normal games, the GPU could run at a higher frequency. In order to always squeeze the maximum out of the GPU, the PowerTune technology is designed - a calculator that calculates the card's power consumption in real time at millisecond intervals based on an analysis of the task being performed (without any analog sensors). And if possible, the GPU clock speed is increased. Note that this is not a frequency reset relative to the nominal value when the power threshold is reached, but vice versa - precisely adjusted dynamic acceleration.

And the GCN core can completely turn off when there is nothing on the screen for a long time, and stop the cooler (ZeroCore technology). In the CrossFire configuration, the processors on the additional cards (and on the same one) don't work at all without a 3D load.

Eyefinity 2.0

With the Radeon HD 7000 debuts the second version of Eyefinity technology, which brought a lot of innovations. Many of the presented "features" do not need comments, so we list them briefly:

  • Configurations with five displays in a row in landscape or portrait orientation are officially supported.
  • The center monitor in a row can now be vertically larger than the others.
  • Simultaneous operation of Eyefinity, AMD HD3D and CrossFire.
  • The maximum resolution of the combined screen is 15x15 thousand pixels.
  • Arbitrary permissions.
  • Moving the panel Windows tasks to any screen.
  • Output individual audio streams to multiple displays.

The new Radeons support DisplayPort 1.2, which means Multi-Stream technology. With its help, you can connect three displays to one output in a chain or through a special hub. Moreover, the output of the hub can be not only DisplayPort, but also HDMI, DVI and VGA interfaces. AMD promises that the hubs will be available in the summer of 2012.

The HDMI output complies with the 1.4a standard, so it can send a dual signal to a 3D TV at 24 frames per channel. And especially for games, there is support for 3 GHz HDMI with a frequency of 60 Hz per channel.

In addition, the DisplayPort 1.2 HBR 2 and 3 GHz HDMI standards will be useful for connecting upcoming displays with a resolution of 4096x2160.

Radeon HD 7970

Specifications

The HD 7970 is the single-chip flagship of the line, representing the GCN architecture in all its might. Its GPU is called Tahiti and contains 32 CUs (Compute Units), which are described in detail above. If we calculate this by the number of separate ALUs, as AMD has done so far, then we get 2048 pieces - one and a half times more than in the Cayman core! And the TMUs (texture mapping units) in Tahiti are also 128 versus 96. The memory bus is 384-bit instead of 256-bit. Considering how much additional logic has been added to the architecture, it is not at all surprising that Tahiti consists of 4.31 billion transistors. Just for comparison, the Cayman has 2.64 billion and NVIDIA's GF110 has three. The entire economy operates at a frequency of 925 MHz. Appearance, construction

In the design of the 7000th series, AMD stepped back from the brutal forms of the Radeon HD 6000 and chose a catchy design with smooth lines and a glossy casing surface. The recognizable red textolite has returned, this time with a raspberry hue. In terms of dimensions, the Radeon HD 7970 does not differ from the previous single-chip AMD/ATI flagships.

AMD Brick Factory Products

The card is heavy. You take it in your hand and you feel the power. It's all about the cooling system with a large evaporation chamber attached to a thick frame. Since the time of the Radeon HD 6970, the design has not changed much, except that the turbine fan has become wider.

For better cooling one DVI port was removed from the stub to completely occupy the slot with an exhaust grille.

On the back side, as before, there is a clamping cross. It was decided to refuse a solid cover.

On the printed circuit board, like the HD 6970, there is a switch between main and backup BIOS. And on the back surface are scattered several small dual switches of unknown purpose, which we, out of harm's way, decided not to touch. It is possible that we have only an engineering sample of HD 7970 in front of us, and these strange elements will no longer be on serial boards.

At the tail of the board are seven inductors and an eight-phase voltage controller CHiL CHL8228G, which, no doubt, overclockers will be happy, because about n has already been used on Radeon HD 6970, . Most likely, the power supply scheme of the card is organized in the old way: six phases fall on the GPU and one is given to power the internal circuits of the GDDR5 microcircuits. In the opposite corner of the board is a two-phase uP1509P chip from uP Semiconductor with its own coil, which, by analogy with the HD 6970, should control the voltage on the video memory I/O buffers.