16.08.2022

uptostart.ru News. Games. Instructions. Internet. Office.

Latest Articles

Home / Mobile OS / nvidia gpu boost 2.0 technology. Video cards. Ansel Game Screenshot Platform

nvidia gpu boost 2.0 technology. Video cards. Ansel Game Screenshot Platform

21.10.2020 Mobile OS

GPU Boost 2.0

With the NVIDIA GeForce GTX 680 graphics card, we have an important new feature: GPU Boost. And the new NVIDIA GeForce GTX Titan goes one step further by expanding this feature to GPU Boost 2.0. The first version of GPU Boost 1.0 focused on the maximum power consumption achieved in the most demanding modern games. At the same time, the GPU temperature did not play a special role, except perhaps if it came close to the critical threshold. The maximum clock frequency was determined based on the relative voltage. The disadvantage was quite obvious: GPU Boost 1.0 could not prevent situations where, even at non-critical voltages, the temperature increased excessively.

NVIDIA GeForce GTX Titan - GPU Boost 2.0

The GeForce GTX Titan has already evaluated two parameters: voltage and temperature. That is, the relative voltage (Vref) is already determined on the basis of these two parameters. Of course, the reliance on individual GPUs will remain, since there is variation in chip production, so each video card will be different from any other. But NVIDIA points out that, technically, the addition of temperature allowed for an average of 3-7 percent higher Boost overclocking. GPU Boost 2.0 technology could theoretically be ported to older graphics cards, but this is unlikely to happen.

NVIDIA GeForce GTX Titan - GPU Boost 2.0

Let's take a closer look at GPU Boost 2.0. Utilities like EVGA Precision Tool or MSI Afterburner already support GPU Boost 2.0. We used the EVGA Precision Tool in version 4.0.

NVIDIA GeForce GTX Titan - GPU Boost 2.0

GPU Boost 2.0 is temperature aware, and at low temperatures, the technology can increase performance more significantly. The target temperature (Ttarget) is set to 80 °C by default.

NVIDIA GeForce GTX Titan - GPU Boost 2.0

GPU Boost 2.0 technology contains all the features familiar to us from the first generation of technology, but at the same time additionally makes it possible to set a higher voltage, and therefore higher clock frequencies. For overclockers, it is possible to change the settings. You can enable GPU Overvoltage, but be aware of the potential reduction in graphics card life.

NVIDIA GeForce GTX Titan - GPU Boost 2.0

Overclockers can raise Vref and Vmax (OverVoltaging). Many users wanted this on the GK104, but NVIDIA did not entrust such an opportunity to either users or manufacturers. And the EVGA GTX 680 Classified video card we tested (test and review) is just a great example. With this video card, a special EVGA Evbot module provided users with control over voltages. But NVIDIA urgently demanded that EVGA remove additional hardware from their graphics cards. In the case of GPU Boost 2.0 and OverVoltaging, NVIDIA itself took a step in this direction. So graphics card manufacturers can release several models of the GeForce GTX Titan, such as standard versions and factory overclocked versions. Activation of OverVoltaging is done through the VBIOS switch (that is, explicitly for the user so that he is aware of the possible consequences).

Parameter	Meaning
Chip code name	GP104
Production technology	16nm FinFET
Number of transistors	7.2 billion
Core area	314 mm²
Architecture
DirectX hardware support
Memory bus
	1607 (1733) MHz
Computing blocks	20 Streaming Multiprocessors including 2560 IEEE 754-2008 floating point scalar ALUs;
Texturing blocks	160 texture addressing and filtering units with support for FP16 and FP32 components in textures and support for trilinear and anisotropic filtering for all texture formats

Monitor support

GeForce GTX 1080 Reference Graphics Specifications
Parameter	Meaning
Core frequency	1607 (1733) MHz
	2560
Number of texture blocks	160
Number of blending blocks	64
Effective memory frequency	10000 (4×2500) MHz
Memory type	GDDR5X
Memory bus	256-bit
Memory size	8 GB
	320 GB/s
	about 9 teraflops
	103 gigapixels/s
	257 gigatexels/s
Tire	PCI Express 3.0
Connectors
Energy consumption	up to 180 W
Extra food	One 8-pin connector
	2
Recommended price	$599-699 (USA), 54990 RUB (Russia)

The new model of the GeForce GTX 1080 video card received a logical name for the first solution of the new GeForce series - it differs from its direct predecessor only in a changed generation figure. The novelty not only replaces the top-end solutions in the current line of the company, but also became the flagship of the new series for some time, until the Titan X was released on even more powerful GPUs. Below it in the hierarchy is also the already announced model GeForce GTX 1070, based on a stripped-down version of the GP104 chip, which we will consider below.

The suggested prices for Nvidia's new graphics card are $599 and $699 for regular and Founders Editions (see below), respectively, which is a pretty good deal considering the GTX 1080 is ahead of not only the GTX 980 Ti, but also the Titan X. Today, the new product is the best solution in terms of performance on the single-chip video card market without any questions, and at the same time it is cheaper than the most powerful video cards of the previous generation. So far, the GeForce GTX 1080 has essentially no competitor from AMD, so Nvidia was able to set a price that suits them.

The video card in question is based on the GP104 chip, which has a 256-bit memory bus, but the new type of GDDR5X memory operates at a very high effective frequency of 10 GHz, which gives a high peak bandwidth of 320 GB / s - which is almost on par with the GTX 980 Ti with 384 -bit bus. The amount of memory installed on a video card with such a bus could be 4 or 8 GB, but it would be foolish to set a smaller amount for such a powerful solution in modern conditions, so the GTX 1080 got 8 GB of memory, and this amount is enough to run any 3D applications with any quality settings for several years to come.

The GeForce GTX 1080 PCB is understandably quite different from the company's previous PCBs. The value of typical power consumption for new items is 180 watts - slightly higher than the GTX 980, but noticeably lower than the less powerful Titan X and GTX 980 Ti. The reference board has the usual set of connectors for connecting image output devices: one Dual-Link DVI, one HDMI and three DisplayPort.

Founders Edition reference design

Even with the announcement of the GeForce GTX 1080 in early May, a special edition of the video card called Founders Edition was announced, which has a higher price than regular video cards from the company's partners. In fact, this edition is the reference design of the card and cooling system, and it is produced by Nvidia itself. You can have different attitudes towards such options for video cards, but the reference design developed by the company's engineers and manufactured using quality components has its fans.

But whether they will pay several thousand rubles more for a video card from Nvidia itself is a question that only practice can answer. In any case, at first it will be reference video cards from Nvidia that will appear on sale at an increased price, and there is not much to choose from - this happens with every announcement, but the reference GeForce GTX 1080 is different in that it is planned to be sold in this form throughout the entire period of its life, until the release of next-generation solutions.

Nvidia believes that this edition has its merits even over the best works of partners. For example, the two-slot design of the cooler makes it easy to assemble both gaming PCs of a relatively small form factor and multi-chip video systems based on this powerful video card (even despite the three- and four-chip mode not recommended by the company). The GeForce GTX 1080 Founders Edition has some advantages in the form of an efficient cooler using a evaporative chamber and a fan that pushes heated air out of the case - this is Nvidia's first such solution that consumes less than 250 watts of power.

Compared to the company's previous reference product designs, the power circuit has been upgraded from four-phase to five-phase. Nvidia also talks about improved components on which the new product is based, electrical noise has also been reduced to improve voltage stability and overclocking potential. As a result of all the improvements, the power efficiency of the reference board has increased by 6% compared to the GeForce GTX 980.

And in order to differ from the "ordinary" models of the GeForce GTX 1080 and outwardly, an unusual "chopped" case design was developed for the Founders Edition. Which, however, probably also led to the complication of the shape of the evaporation chamber and radiator (see photo), which may have been one of the reasons for paying $100 extra for such a special edition. We repeat that at the beginning of sales, buyers will not have much choice, but in the future it will be possible to choose both a solution with their own design from one of the company's partners, and performed by Nvidia itself.

New generation of Pascal graphics architecture

The GeForce GTX 1080 video card is the company's first solution based on the GP104 chip, which belongs to the new generation of Nvidia's Pascal graphics architecture. Although the new architecture is based on the solutions worked out in Maxwell, it also has important functional differences, which we will write about later. The main change from a global point of view was the new technological process The on which the new GPU is made.

The use of the 16 nm FinFET process technology in the production of GP104 graphics processors at the factories of the Taiwanese company TSMC made it possible to significantly increase the complexity of the chip while maintaining a relatively low area and cost. Compare the number of transistors and the area of the GP104 and GM204 chips - they are close in area (the chip of the novelty is even physically smaller), but the Pascal architecture chip has a significantly larger number of transistors, and, accordingly, execution units, including those providing new functionality.

From an architectural point of view, the first gaming Pascal is very similar to similar solutions of the Maxwell architecture, although there are some differences. Like Maxwell, Pascal architecture processors will have different configurations of Graphics Processing Clusters (GPCs), Streaming Multiprocessors (SMs), and memory controllers. The SM multiprocessor is a highly parallel multiprocessor that schedules and runs warps (warps, groups of 32 instruction streams) on CUDA cores and other execution units in the multiprocessor. You can find detailed information about the design of all these blocks in our reviews of previous Nvidia solutions.

Each of the SM multiprocessors is paired with the PolyMorph Engine, which handles texture sampling, tessellation, transformation, vertex attribute setting, and perspective correction. Unlike the company's previous solutions, the PolyMorph Engine in the GP104 chip also contains a new Simultaneous Multi-Projection block, which we will discuss below. The combination of the SM multiprocessor with one Polymorph Engine is traditionally called TPC - Texture Processor Cluster for Nvidia.

In total, the GP104 chip in the GeForce GTX 1080 contains four GPC clusters and 20 SM multiprocessors, as well as eight memory controllers combined with 64 ROPs. Each GPC cluster has a dedicated rasterization engine and includes five SMs. Each multiprocessor, in turn, consists of 128 CUDA cores, 256 KB register file, 96 KB shared memory, 48 KB L1 cache, and eight TMU texture units. That is, in total, GP104 contains 2560 CUDA cores and 160 TMU units.

Also, the graphics processor on which the GeForce GTX 1080 is based contains eight 32-bit (as opposed to the 64-bit previously used) memory controllers, which gives us a final 256-bit memory bus. Eight ROPs and 256 KB of L2 cache are tied to each of the memory controllers. That is, in total, the GP104 chip contains 64 ROPs and 2048 KB of L2 cache.

Thanks to architectural optimizations and a new process technology, the first gaming Pascal has become the most energy efficient GPU ever. Moreover, there is a contribution to this both from one of the most advanced technological processes 16 nm FinFET, and from the architecture optimizations carried out in Pascal, compared to Maxwell. Nvidia was able to increase the clock speed even more than they expected when moving to a new process technology. The GP104 runs at a higher frequency than a hypothetical GM204 made using the 16nm process. To do this, Nvidia engineers had to carefully check and optimize all the bottlenecks of previous solutions that prevent overclocking above a certain threshold. As a result, the new GeForce GTX 1080 runs at over 40% higher clock speeds than the GeForce GTX 980. But that's not all there is to the GPU clock changes.

GPU Boost 3.0 Technology

As we well know from previous Nvidia graphics cards, they use GPU Boost hardware technology in their GPUs, designed to increase the operating clock speed of the GPU in modes where it has not yet reached its power consumption and thermal limits. Over the years, this algorithm has undergone many changes, and the third generation of this technology is already used in the Pascal architecture video chip - GPU Boost 3.0, the main innovation of which is a finer setting of turbo frequencies, depending on voltage.

If you remember how it works previous versions technology, then the difference between the base frequency (guaranteed minimum value frequency below which the GPU does not fall, at least in games) and the turbo frequency was fixed. That is, the turbo frequency was always at a certain amount of megahertz above base. GPU Boost 3.0 introduced the ability to set turbo frequency offsets for each voltage separately. The easiest way to understand this is with an illustration:

On the left is the GPU Boost of the second version, on the right - the third, which appeared in Pascal. The fixed difference between the base and turbo frequencies did not allow to reveal the full capabilities of the GPU, in some cases, GPUs of previous generations could work faster on set voltage, but a fixed excess of the turbo frequency did not allow this to be done. In GPU Boost 3.0, this feature appeared, and the turbo frequency can be set for each of the individual voltage values, completely squeezing all the juice out of the GPU.

Handy utilities are required to manage overclocking and set the turbo frequency curve. Nvidia itself does not do this, but helps its partners create such utilities to facilitate overclocking (within reasonable limits, of course). For example, new functionality GPU Boost 3.0 has already been revealed in EVGA Precision XOC, which includes a dedicated overclocking scanner that automatically finds and sets the non-linear difference between base frequency and turbo frequency at different voltages by running a built-in performance and stability test. As a result, the user gets a turbo frequency curve that perfectly matches the capabilities of a particular chip. Which, moreover, can be modified in any way in manual mode.

As you can see in the screenshot of the utility, in addition to information about the GPU and the system, there are also settings for overclocking: Power Target (defines typical power consumption during overclocking, as a percentage of the standard), GPU Temp Target (maximum allowed core temperature), GPU Clock Offset (exceeding the base frequency for all voltage values), Memory Offset (exceeding the frequency of the video memory over the default value), Overvoltage (additional opportunity to increase the voltage).

The Precision XOC utility includes three overclocking modes: Basic, Linear, and Manual. In the main mode, you can set a single overclock value (fixed turbo frequency) over the base one, as was the case for previous GPUs. Linear mode allows you to set a linear change in frequency from the minimum to the maximum voltage values for the GPU. Well, in manual mode, you can set unique GPU frequency values \u200b\u200bfor each voltage point on the graph.

The utility also includes a special scanner for automatic overclocking. You can either set your own frequency levels or let Precision XOC scan the GPU at all voltages and find the most stable frequencies for each point on the voltage and frequency curve fully automatically. During the scanning process, Precision XOC incrementally increases the frequency of the GPU and checks its operation for stability or artifacts, building an ideal frequency and voltage curve that will be unique to each specific chip.

This scanner can be customized to your own requirements by setting the time interval to test each voltage value, the minimum and maximum frequency to be tested, and its step. It is clear that in order to achieve stable results, it would be better to set a small step and a decent duration of testing. During testing, unstable operation of the video driver and the system may be observed, but if the scanner does not freeze, it will restore operation and continue to find the optimal frequencies.

New type of video memory GDDR5X and improved compression

So, the power of the GPU has grown significantly, and the memory bus has remained only 256-bit - will the memory bandwidth limit the overall performance and what can be done about it? It seems that the promising second-generation HBM is still too expensive to manufacture, so other options had to be looked for. Ever since the introduction of GDDR5 memory in 2009, Nvidia engineers have been exploring the possibilities of using new types of memory. As a result, developments have come to the introduction of a new memory standard GDDR5X - the most complex and advanced standard to date, giving a transfer rate of 10 Gbps.

Nvidia gives an interesting example of just how fast this is. Only 100 picoseconds elapse between transmitted bits - during this time, a beam of light will travel a distance of only one inch (about 2.5 cm). And when using GDDR5X memory, the data-receiving circuits have to choose the value of the transmitted bit in less than half of this time before the next one is sent - this is just so you understand what modern technology has come to.

Achieving this speed required the development of a new I/O system architecture that required several years of joint development with memory chip manufacturers. In addition to the increased data transfer rate, power efficiency has also increased - GDDR5X memory chips use a lower voltage of 1.35 V and are manufactured using new technologies, which gives the same power consumption at a 43% higher frequency.

The company's engineers had to rework the data transmission lines between the GPU core and memory chips, paying more attention to preventing signal loss and signal degradation all the way from memory to GPU and back. So, in the illustration above, the captured signal is shown as a large symmetrical "eye", which indicates good optimization of the entire circuit and the relative ease of capturing data from the signal. Moreover, the changes described above have led not only to the possibility of using GDDR5X at 10 GHz, but also should help to get a high memory bandwidth on future products using the more familiar GDDR5 memory.

Well, we got more than 40% increase in memory bandwidth from the use of the new memory. But isn't that enough? To further increase memory bandwidth efficiency, Nvidia continued to improve the advanced data compression introduced in previous architectures. The memory subsystem in the GeForce GTX 1080 uses improved and several new lossless data compression techniques designed to reduce bandwidth requirements - already the fourth generation of on-chip compression.

Algorithms for data compression in memory bring several positive aspects at once. Compression reduces the amount of data written to memory, the same applies to data transferred from video memory to L2 cache, which improves the efficiency of using the L2 cache, since a compressed tile (a block of several framebuffer pixels) has a smaller size than an uncompressed one. It also reduces the amount of data sent between different points, like the TMU texture module and the framebuffer.

The data compression pipeline in the GPU uses several algorithms, which are determined depending on the "compressibility" of the data - the best available algorithm is selected for them. One of the most important is the delta color compression algorithm. This compression method encodes the data as the difference between consecutive values instead of the data itself. The GPU calculates the difference in color values between the pixels in a block (tile) and stores the block as some average color for the entire block plus data on the difference in values for each pixel. For graphic data, this method is usually well suited, since the color within small tiles for all pixels often does not differ too much.

The GP104 GPU in the GeForce GTX 1080 supports more compression algorithms than previous Maxwell chips. Thus, the 2:1 compression algorithm has become more efficient, and in addition to it, two new algorithms have appeared: a 4:1 compression mode, suitable for cases where the difference in the color value of the pixels of a block is very small, and an 8:1 mode, which combines a constant 4:1 compression of 2×2 pixel blocks with 2x delta compression between blocks. When compression is not possible at all, it is not used.

However, in reality, the latter happens very infrequently. This can be seen from the example screenshots from the game Project CARS, which Nvidia cited to illustrate the increased compression ratio in Pascal. In the illustrations, those frame buffer tiles that the GPU could compress were shaded in magenta, and those that could not be compressed without loss remained with the original color (top - Maxwell, bottom - Pascal).

As you can see, the new compression algorithms in GP104 really work much better than in Maxwell. Although the old architecture was also able to compress most of the tiles in the scene, a lot of grass and trees around the edges, as well as car parts, are not subject to legacy compression algorithms. But with the inclusion of new techniques in Pascal, a very small number of image areas remained uncompressed - improved efficiency is evident.

As a result of improvements in data compression, the GeForce GTX 1080 is able to significantly reduce the amount of data sent per frame. In numbers, improved compression saves an additional 20% of effective memory bandwidth. In addition to the more than 40% increase in memory bandwidth of the GeForce GTX 1080 relative to the GTX 980 from using GDDR5X memory, all together this gives about a 70% increase in effective memory bandwidth compared to the previous generation model.

Support for Async Compute

Most modern games use complex calculations in addition to graphics. For example, calculations when calculating the behavior of physical bodies can be carried out not before or after graphical calculations, but simultaneously with them, since they are not related to each other and do not depend on each other within the same frame. Another example is the post-processing of already rendered frames and the processing of audio data, which can also be performed in parallel with rendering.

Another prime example of this functionality is the Asynchronous Time Warp technique used in VR systems to change the rendered frame according to the movement of the player's head just before it is rendered, interrupting the rendering of the next one. Such asynchronous loading of GPU capacities allows increasing the efficiency of using its execution units.

These workloads create two new GPU usage scenarios. The first of these includes overlapping loads, since many types of tasks do not use the full capabilities of GPUs, and some resources are idle. In such cases, you can simply run two different tasks on the same GPU, separating its execution units to get more efficient use - for example, PhysX effects that run in conjunction with the 3D rendering of the frame.

To improve the performance of this scenario, the Pascal architecture introduced dynamic load balancing. In the previous Maxwell architecture, overlapping workloads were implemented as a static distribution of GPU resources between graphics and compute. This approach is effective provided that the balance between the two workloads roughly corresponds to the division of resources and the tasks are performed equally in time. If non-graphical calculations take longer than graphical ones, and both are waiting for the completion of the common work, then part of the GPU will be idle for the remaining time, which will cause a decrease in overall performance and nullify all the benefits. Hardware dynamic load balancing, on the other hand, allows you to use the freed up GPU resources as soon as they become available - for understanding, we will give an illustration.

There are also tasks that are time-critical, and this is the second scenario for asynchronous computing. For example, the execution of the asynchronous time distortion algorithm in VR must complete before the scan out or the frame will be discarded. In such a case, the GPU must support very fast task interruption and switching to another task in order to take a less critical task from execution on the GPU, freeing its resources for critical tasks - this is called preemption.

A single render command from a game engine can contain hundreds of draw calls, each draw call in turn contains hundreds of rendered triangles, each containing hundreds of pixels to be calculated and drawn. The traditional GPU approach uses only high-level task interruption, and the graphics pipeline has to wait for all that work to complete before switching tasks, resulting in very high latency.

To fix this, the Pascal architecture first introduced the ability to interrupt a task at the pixel level - Pixel Level Preemption. Pascal GPU execution units can constantly monitor the progress of rendering tasks, and when an interrupt is requested, they can stop execution, saving the context for later completion, quickly switching to another task.

Thread-level interrupt and toggle for compute operations works similarly to pixel-level interrupt for graphics computing. Computational workloads consist of multiple grids, each containing multiple threads. When an interrupt request is received, the threads running on the multiprocessor terminate their execution. Other blocks save their own state to continue from the same point in the future, and the GPU switches to another task. The entire task switching process takes less than 100 microseconds after the running threads exit.

For gaming workloads, the combination of pixel-level interrupts for graphics, and thread-level interrupts for compute tasks gives Pascal architecture GPUs the ability to quickly switch between tasks with minimal time loss. And for computing tasks on CUDA, it is also possible to interrupt with minimal granularity - at the instruction level. In this mode, all threads stop execution at once, immediately switching to another task. This approach requires saving more information about the state of all registers of each thread, but in some cases of non-graphical calculations it is quite justified.

The use of fast interrupt and task switching in graphical and computational tasks was added to the Pascal architecture so that graphical and non-graphical tasks could be interrupted at the level of individual instructions, rather than entire threads, as was the case with Maxwell and Kepler. These technologies can improve the asynchronous execution of various GPU workloads and improve responsiveness when running multiple tasks at the same time. At the Nvidia event, they showed a demonstration of the work of asynchronous calculations using the example of calculating physical effects. If without asynchronous calculations the performance was at the level of 77-79 FPS, then with the inclusion of these features, the frame rate increased to 93-94 FPS.

We have already given an example of one of the possibilities for using this functionality in games in the form of asynchronous time distortion in VR. The illustration shows the operation of this technology with traditional interruption (preemption) and fast. In the first case, the process of asynchronous time distortion is tried to be carried out as late as possible, but before the start of updating the image on the display. But the work of the algorithm must be given to the execution in the GPU a few milliseconds earlier, since without a fast interruption there is no way to accurately execute the work at the right time, and the GPU is idle for some time.

In the case of precise interruption at the pixel and thread level (shown on the right), this capability gives greater accuracy in determining the moment of interruption, and asynchronous time warping can be started much later with confidence in the completion of the work before the update of the information on the display begins. And idle for some time in the first case, the GPU can be loaded with some additional graphic work.

Simultaneous Multi-Projection Technology

The new GP104 GPU now supports new technology multi-projection (Simultaneous Multi-Projection - SMP), which allows the GPU to render data on modern imaging systems more efficiently. SMP allows the video chip to simultaneously display data in several projections, which required the introduction of a new hardware block in the GPU as part of the PolyMorph engine at the end of the geometric pipeline before the rasterization block. This block is responsible for working with multiple projections for a single geometry stream.

The multi-projection engine processes geometric data simultaneously for 16 pre-configured projections that combine the projection point (cameras), these projections can be independently rotated or tilted. Since each geometric primitive can appear simultaneously in multiple projections, the SMP engine provides this functionality, allowing the application to instruct the video chip to replicate the geometry up to 32 times (16 projections at two projection centers) without additional processing.

The whole processing process is hardware accelerated, and since multiprojection works after the geometry engine, it does not need to repeat all the stages of geometry processing several times. The saved resources are important when rendering speed is limited by the performance of geometry processing, such as tessellation, when the same geometric work is performed several times for each projection. Accordingly, in the peak case, multi-projection can reduce the need for geometry processing by up to 32 times.

But why is all this necessary? There are several good examples where multi-projection technology can be useful. For example, a multi-monitor system of three displays mounted at an angle to each other close enough to the user (surround configuration). In a typical situation, the scene is rendered in one projection, which leads to geometric distortions and incorrect geometry rendering. The correct way is three different projections for each of the monitors, according to the angle at which they are located.

With a video card on a chip with Pascal architecture, this can be done in one geometry pass, specifying three different projections, each for a different monitor. And the user, thus, will be able to change the angle at which the monitors are located to each other not only physically, but also virtually - by rotating the projections for the side monitors in order to get the correct perspective in the 3D scene with a noticeably wider viewing angle (FOV). True, there is a limitation here - for such support, the application must be able to render the scene with a wide FOV and use special SMP API calls to set it. That is, you can’t do this in every game, you need special support.

In any case, the days of a single projection on a single flat monitor are over, there are now many multi-monitor configurations and curved displays that can also use this technology. Not to mention virtual reality systems that use special lenses between the screens and the user's eyes, which require new techniques for projecting a 3D image into a 2D image. Many of these technologies and techniques are still in early development, the main thing is that older GPUs cannot effectively use more than one planar projection. They require multiple rendering passes, multiple processing of the same geometry, and so on.

Maxwell chips had limited Multi-Resolution support to help increase efficiency, but Pascal's SMP can do much more. Maxwell could rotate the projection by 90 degrees for cube mapping or different projection resolutions, but this was only useful in a limited range of applications like VXGI.

Other possibilities for using SMP include rendering at different resolutions and single-pass stereo rendering. For example, rendering at different resolutions (Multi-Res Shading) can be used in games to optimize performance. When applied, a higher resolution is used in the center of the frame, and at the periphery it is reduced to obtain a faster rendering speed.

Single-pass stereo rendering is used in VR, it has already been added to the VRWorks package and uses the multi-projection feature to reduce the amount of geometric work required in VR rendering. If this feature is used, the GeForce GTX 1080 GPU processes the scene geometry only once, generating two projections for each eye at once, which reduces the geometric load on the GPU by half, and also reduces the losses from the driver and OS.

An even more advanced technique for improving the efficiency of VR rendering is Lens Matched Shading, which uses multiple projections to simulate the geometric distortions required in VR rendering. This method uses multi-projection to render a 3D scene onto a surface that approximates the lens-adjusted surface when rendered for VR headset output, avoiding many extra pixels on the periphery that would be discarded. The easiest way to understand the essence of the method is by illustration - four slightly expanded projections are used in front of each eye (in Pascal, you can use 16 projections for each eye - to more accurately simulate a curved lens) instead of one:

This approach can lead to significant performance savings. For example, a typical Oculus Rift image per eye is 1.1 megapixels. But due to the difference in projections, to render it, the original image is 2.1 megapixels - 86% more than necessary! The use of multi-projection, implemented in the Pascal architecture, allows reducing the resolution of the rendered image to 1.4 megapixels, obtaining a 1.5-fold saving in pixel processing speed, and also saves memory bandwidth.

And along with a twofold saving in geometry processing speed due to single-pass stereo rendering, the GPU of the GeForce GTX 1080 graphics card is able to provide a significant increase in VR rendering performance, which is very demanding on geometry processing speed, and even more so on pixel processing.

Improvements in video output and processing blocks

In addition to performance and new functionality related to 3D rendering, it is necessary to maintain a good level of image output, as well as video decoding and encoding. And the first Pascal architecture graphics processor did not disappoint - it supports all modern standards in this sense, including the hardware decoding of the HEVC format, which is necessary for viewing 4K videos on a PC. Also, future owners of GeForce GTX 1080 graphics cards will soon be able to enjoy streaming 4K video from Netflix and other providers on their systems.

In terms of display output, the GeForce GTX 1080 has support for HDMI 2.0b with HDCP 2.2 as well as DisplayPort. So far, the DP 1.2 version has been certified, but the GPU is ready for certification for newer versions of the standard: DP 1.3 Ready and DP 1.4 Ready. The latter allows 4K screens to be displayed at 120Hz, and 5K and 8K displays at 60Hz using a pair of DisplayPort 1.3 cables. If for the GTX 980 the maximum supported resolution was 5120x3200 at 60Hz, then for the new GTX 1080 model it has grown to 7680x4320 at the same 60Hz. The reference GeForce GTX 1080 has three DisplayPort outputs, one HDMI 2.0b and one digital Dual-Link DVI.

The new model of the Nvidia video card also received an improved block for decoding and encoding video data. Thus, the GP104 chip complies with the high standards of PlayReady 3.0 (SL3000) for streaming video playback, which allows you to be sure that playing high-quality content from well-known providers such as Netflix will be of the highest quality and energy efficient. Details about the support of various video formats during encoding and decoding are given in the table, the new product is clearly different from previous solutions for the better:

But an even more interesting novelty is support for the so-called High Dynamic Range (HDR) displays, which are about to become widespread in the market. TVs are on sale as early as 2016 (with four million HDR TVs expected to be sold in just one year), and monitors next year. HDR is the biggest breakthrough in display technology in years, delivering double the color tones (75% visible spectrum versus 33% for RGB), brighter displays (1000 nits) with higher contrast (10000:1) and rich colors.

The emergence of the ability to play content with a greater difference in brightness and richer and more saturated colors will bring the image on the screen closer to reality, the black color will become deeper, the bright light will dazzle, just like in the real world. Accordingly, users will see more detail in bright and dark areas of images compared to standard monitors and TVs.

To support HDR displays, the GeForce GTX 1080 has everything you need - 12-bit color output, support for BT.2020 and SMPTE 2084 standards, and HDMI 2.0b 10/12-bit 4K HDR output. resolution, which was the case with Maxwell. In addition, Pascal has added support for decoding the HEVC format in 4K resolution at 60 Hz and 10- or 12-bit color, which is used for HDR video, as well as encoding the same format with the same parameters, but only in 10-bit for HDR video recording or streaming. Also, the novelty is ready for DisplayPort 1.4 standardization for HDR data transmission via this connector.

By the way, HDR video encoding may be needed in the future in order to transfer such data from a home PC to a SHIELD game console that can play 10-bit HEVC. That is, the user will be able to broadcast the game from a PC in HDR format. Wait, where can I get games with such support? Nvidia is constantly working with game developers to implement this support, passing them everything they need (driver support, code samples, etc.) to correctly render HDR images that are compatible with existing displays.

At the time of release of the video card, the GeForce GTX 1080, games such as Obduction, The Witness, Lawbreakers, Rise of the Tomb Raider, Paragon, The Talos Principle and Shadow Warrior 2 have support for HDR output. But this list is expected to be replenished in the near future .

Changes to multi-chip SLI rendering

There were also some changes related to the proprietary SLI multi-chip rendering technology, although no one expected this. SLI is used by PC gaming enthusiasts to boost performance either to the extreme by running the most powerful single-chip graphics cards in tandem, or to get very high frame rates by limiting themselves to a couple of mid-range solutions that are sometimes cheaper than one top-end ( controversial decision, but they do it). With 4K monitors, players have almost no other options than installing a couple of video cards, since even top models often cannot provide a comfortable game at maximum settings in such conditions.

One of the important components Nvidia SLI are bridges that connect video cards into a common video subsystem and serve to organize digital channel for transferring data between them. GeForce graphics cards have traditionally featured dual SLI connectors that were used to connect between two or four graphics cards in 3-Way and 4-Way SLI configurations. Each of the video cards had to be connected to each, since all the GPUs sent the frames they rendered to the main GPU, which is why two interfaces were needed on each of the boards.

Starting with the GeForce GTX 1080, all Nvidia graphics cards based on the Pascal architecture have two SLI interfaces linked together to increase the performance of data transfer between graphics cards, and this new dual-channel SLI mode improves performance and comfort when displaying visual information on very high-resolution displays or multi-monitor systems.

For this mode, new bridges were also needed, called SLI HB. They combine a pair of GeForce GTX 1080 video cards via two SLI channels at once, although the new video cards are also compatible with older bridges. For resolutions of 1920×1080 and 2560×1440 pixels at a refresh rate of 60 Hz, standard bridges can be used, but in more demanding modes (4K, 5K and multi-monitor systems), only new bridges will provide better results in terms of smooth frame change, although the old ones will work, but somewhat worse.

Also, when using SLI HB bridges, the GeForce GTX 1080 data interface runs at 650 MHz, compared to 400 MHz for conventional SLI bridges on older GPUs. Moreover, for some of the tough old bridges, a higher data transfer rate is also available with video chips of the Pascal architecture. With an increase in the data transfer rate between the GPU via a doubled SLI interface with an increased frequency of operation, a smoother display of frames on the screen is also provided, compared to previous solutions:

It should also be noted that support for multi-chip rendering in DirectX 12 is somewhat different from what was customary before. AT latest version graphics API, Microsoft has made many changes related to the operation of such video systems. There are two multi-GPU options available to software developers in DX12: Multi Display Adapter (MDA) and Linked Display Adapter (LDA) modes.

Moreover, the LDA mode has two forms: Implicit LDA (which Nvidia uses for SLI) and Explicit LDA (when the game developer takes on the task of managing multi-chip rendering. The MDA and Explicit LDA modes were just implemented in DirectX 12 in order to give game developers have more freedom and opportunities when using multi-chip video systems.The difference between the modes is clearly visible in the following table:

In LDA mode, the memory of each GPU can be connected to the memory of another and displayed as a large total volume, of course, with all the performance limitations when the data is taken from "foreign" memory. In MDA mode, each GPU's memory works separately, and different GPUs cannot directly access data from another GPU's memory. LDA mode is designed for multi-chip systems of similar performance, while MDA mode is less restrictive and can work together with discrete and integrated GPUs or discrete solutions with chips from different manufacturers. But this mode also requires more attention and work from developers when programming collaboration so that GPUs can communicate with each other.

By default, the GeForce GTX 1080 based SLI system supports only two GPUs, and three- and four-GPU configurations are officially deprecated, as modern games are becoming increasingly difficult to achieve performance gains from adding a third and fourth GPU. For example, many games rely on the capabilities of the system's central processor when operating multi-chip video systems, and new games increasingly use temporal (temporal) techniques that use data from previous frames, in which the efficient operation of several GPUs at once is simply impossible.

However, the operation of systems in other (non-SLI) multi-chip systems remains possible, such as MDA or LDA Explicit modes in DirectX 12 or a two-chip SLI system with a dedicated third GPU for PhysX physical effects. But what about the records in benchmarks, is Nvidia really abandoning them altogether? No, of course not, but since such systems are in demand in the world by almost a few users, a special Enthusiast Key was invented for such ultra-enthusiasts, which can be downloaded from the Nvidia website and unlock this feature. To do this, you first need to get a unique GPU ID by running a special application, then request the Enthusiast Key on the website and, after downloading it, install the key into the system, thereby unlocking the 3-Way and 4-Way SLI configurations.

Fast Sync Technology

Some changes have taken place in synchronization technologies when displaying information on the display. Looking ahead, there is nothing new in G-Sync, nor is Adaptive Sync technology supported. But Nvidia decided to improve the smoothness of the output and synchronization for games that show very high performance when the frame rate noticeably exceeds the refresh rate of the monitor. This is especially important for games that require minimal latency and fast response, and which are multiplayer battles and competitions.

Fast Sync is a new alternative to vertical sync that does not have visual artifacts such as tearing in the image and is not tied to a fixed refresh rate, which increases latency. What is the problem with vertical sync in games like Counter-Strike: Global Offensive? This game on powerful modern GPUs runs at several hundred frames per second, and the player has a choice whether to enable v-sync or not.

In multiplayer games, users most often chase for minimal delays and turn off VSync, getting clearly visible tearing in the image, which is extremely unpleasant even at high frame rates. However, if you turn on v-sync, then the player will experience a significant increase in delays between his actions and the image on the screen, when the graphics pipeline slows down to the monitor's refresh rate.

This is how a traditional pipeline works. But Nvidia decided to separate the process of rendering and displaying the image on the screen using Fast Sync technology. This allows you to continue as much as possible efficient work for the part of the GPU that renders frames at full speed, storing these frames in a special temporary Last Rendered Buffer.

This method allows you to change the display method and take the best from the VSync On and VSync Off modes, getting low latency, but without image artifacts. With Fast Sync, there is no frame flow control, the game engine runs in sync-off mode and is not told to wait to draw the next one, so latencies are almost as low as VSync Off mode. But since Fast Sync independently selects a buffer for displaying on the screen and displays the entire frame, there are no picture breaks either.

Fast Sync uses three different buffers, the first two of which work similar to double buffering in a classic pipeline. Primary buffer (Front Buffer - FB) is a buffer, information from which is displayed on the display, a fully rendered frame. The back buffer (Back Buffer - BB) is the buffer that receives information when rendering.

When using vertical sync in a high frame rate environment, the game waits until the refresh interval is reached in order to swap the primary buffer with the secondary buffer to display the image of a single frame on the screen. This slows down the process, and adding additional buffers, as in the traditional triple buffering will only add delay.

With Fast Sync, a third Last Rendered Buffer (LRB) is added, which is used to store all the frames that have just been rendered in the secondary buffer. The name of the buffer speaks for itself, it contains a copy of the last fully rendered frame. And when the time comes to update the primary buffer, this LRB buffer is copied to the primary in its entirety, and not in parts, as from the secondary with disabled vertical synchronization. Since copying information from buffers is inefficient, they are simply swapped (or renamed, as it will be more convenient to understand), and the new logic of swapping buffers, introduced in GP104, manages this process.

In practice, the inclusion of a new synchronization method Fast Sync still provides a slightly larger delay compared to completely disabled vertical synchronization - on average 8 ms more, but it displays frames on the monitor in its entirety, without unpleasant artifacts on the screen that tear the image. The new method can be enabled from the Nvidia control panel graphics settings in the vertical sync control section. However, the default value remains application control, and you simply do not need to enable Fast Sync in all 3D applications, it is better to choose this method specifically for games with high FPS.

Virtual reality technology Nvidia VRWorks

We've touched on the hot topic of VR more than once in this article, but it's mostly been about boosting framerates and ensuring low latency, which is very important for VR. All this is very important and there is indeed progress, but so far VR games look nowhere near as impressive as the best of the "regular" modern 3D games. This happens not only because the leading game developers are not yet particularly involved in VR applications, but also because VR is more demanding on the frame rate, which prevents the use of many of the usual techniques in such games due to the high demands.

In order to reduce the difference in quality between VR games and regular games, Nvidia decided to release a whole package of related VRWorks technologies, which included a large number of APIs, libraries, engines and technologies that can significantly improve both the quality and performance of VR- applications. How does this relate to the announcement of the first gaming solution in Pascal? It's very simple - some technologies have been introduced into it that help increase productivity and improve quality, and we have already written about them.

And although it concerns not only graphics, first we will talk a little about it. The set of VRWorks Graphics technologies includes the previously mentioned technologies, such as Lens Matched Shading, which use the multi-projection feature that appeared in the GeForce GTX 1080. The new product allows you to get a performance increase of 1.5-2 times in relation to solutions that do not have such support. We also mentioned other technologies, such as MultiRes Shading, designed to render at different resolutions in the center of the frame and on its periphery.

But much more unexpected was the announcement of VRWorks Audio technology, designed for high-quality calculation of sound data in 3D scenes, which is especially important in virtual reality systems. In conventional engines, the positioning of sound sources in a virtual environment is calculated quite correctly, if the enemy shoots from the right, then the sound is louder from this side of the audio system, and such a calculation is not too demanding on computing power.

But in reality, sounds go not only towards the player, but in all directions and bounce off various materials, similar to how light rays bounce. And in reality, we hear these reflections, although not as clearly as direct sound waves. These indirect sound reflections are usually simulated by special reverb effects, but this is a very primitive approach to the task.

VRWorks Audio uses sound wave rendering similar to ray tracing in rendering, where the path of light rays is traced to multiple reflections from objects in a virtual scene. VRWorks Audio also simulates the propagation of sound waves in the environment when direct and reflected waves are tracked, depending on their angle of incidence and the properties of reflective materials. In its work, VRWorks Audio uses the high-performance Nvidia OptiX ray tracing engine known for graphics tasks. OptiX can be used for a variety of tasks, such as indirect lighting calculation and lightmapping, and now also for sound wave tracing in VRWorks Audio.

Nvidia has built accurate sound wave calculation into its VR Funhouse demo, which uses several thousand rays and calculates up to 12 reflections from objects. And in order to learn the advantages of the technology using a clear example, we suggest you watch a video about the operation of the technology in Russian:

It is important that Nvidia's approach differs from traditional sound engines, including the hardware-accelerated method from the main competitor using a special block in the GPU. All of these methods provide only accurate positioning of sound sources, but do not calculate the reflections of sound waves from objects in a 3D scene, although they can simulate this using the reverb effect. However, the use of ray tracing technology can be much more realistic, since only such an approach will provide an accurate imitation of various sounds, taking into account the size, shape and materials of objects in the scene. It is difficult to say whether such computational accuracy is required for a typical player, but we can say for sure: in VR, it can add to users the very realism that is still lacking in conventional games.

Well, it remains for us to tell only about the VR SLI technology, which works in both OpenGL and DirectX. Its principle is extremely simple: a two-GPU video system in a VR application will work in such a way that each eye is allocated a separate GPU, as opposed to the AFR rendering familiar to SLI configurations. This greatly improves the overall performance, which is so important for virtual reality systems. Theoretically, more GPUs can be used, but their number must be even.

This approach was required because AFR is not well suited for VR, since with its help the first GPU will draw an even frame for both eyes, and the second one will render an odd one, which does not reduce the delays that are critical for virtual reality systems. Although the frame rate will be quite high. So with the help of VR SLI, work on each frame is divided into two GPUs - one works on part of the frame for the left eye, the second for the right, and then these halves of the frame are combined into a whole.

Splitting work like this between a pair of GPUs brings about a 2x increase in performance, resulting in higher frame rates and lower latency compared to systems based on a single graphics card. True, the use of VR SLI requires special support from the application in order to use this scaling method. But VR SLI technology is already built into VR demo apps such as Valve's The Lab and ILMxLAB's Trials on Tatooine, and this is just the beginning - Nvidia promises other applications to come soon, as well as implementation of the technology in game engines Unreal Engine 4, Unity and Max Play.

Ansel Game Screenshot Platform

One of the most interesting announcements related to the software was the release of a technology for capturing high-quality screenshots in gaming applications, named after one famous photographer - Ansel. Games have long been not just games, but also a place to use playful hands for various creative personalities. Someone changes scripts for games, someone releases high-quality texture sets for games, and someone makes beautiful screenshots.

Nvidia decided to help the latter by presenting new platform to create (namely, create, because this is not such an easy process) high-quality shots from games. They believe that Ansel can help create a new kind of contemporary art. After all, there are already quite a few artists who spend most of their lives on the PC, creating beautiful screenshots from games, and they still did not have a convenient tool for this.

Ansel allows you to not only capture an image in the game, but change it as the creator needs. Using this technology, you can move the camera around the scene, rotate and tilt it in any direction in order to obtain the desired composition of the frame. For example, in games like first-person shooters, you can only move the player, you can’t really change anything else, so all the screenshots are pretty monotonous. With a free camera in Ansel, you can go far beyond game camera, choosing the angle that is needed for a good picture, or even capture a full-fledged 360-degree stereo image from the required point, and in high resolution for later viewing in a VR helmet.

Ansel works quite simply - with the help of a special library from Nvidia, this platform is embedded in the game code. To do this, its developer only needs to add a small piece of code to his project to allow the Nvidia video driver to intercept buffer and shader data. There is very little work to be done, bringing Ansel into the game takes less than one day to implement. So, the inclusion of this feature in The Witness took about 40 lines of code, and in The Witcher 3 - about 150 lines of code.

Ansel will come with an open development package - SDK. The main thing is that the user gets with him a standard set of settings that allow him to change the position and angle of the camera, add effects, etc. The Ansel platform works like this: it pauses the game, turns on the free camera and allows you to change the frame to the desired view by recording the result in the form of a regular screenshot, a 360-degree image, a stereo pair, or just a panorama of high resolution.

The only caveat is that not all games will receive support for all the features of the Ansel game screenshot platform. Some of the game developers, for one reason or another, do not want to include a completely free camera in their games - for example, because of the possibility of cheaters using this functionality. Or they want to limit the change in viewing angle for the same reason - so that no one gets an unfair advantage. Well, or so that users do not see miserable sprites in the background. All this is quite normal desires of game creators.

One of the most interesting features of Ansel is the creation of screenshots of simply huge resolution. It doesn't matter that the game supports resolutions up to 4K, for example, and the user's monitor is Full HD. Using the screenshot platform, you can capture a much higher quality image, limited rather by the size and performance of the drive. The platform captures screenshots up to 4.5 gigapixels with ease, stitched together from 3600 pieces!

It is clear that in such pictures you can see all the details, up to the text on the newspapers lying in the distance, if such a level of detail is in principle provided for in the game - Ansel can also control the level of detail, setting the maximum level to get the best picture quality. But you can still enable supersampling. All this allows you to create images from games that you can safely print on large banners and be calm about their quality.

Interestingly, a special hardware-accelerated code based on CUDA is used to stitch large images. After all, no video card can render a multi-gigapixel image as a whole, but it can do it in pieces, which you just need to combine later, taking into account the possible difference in lighting, color, and so on.

After stitching such panoramas, a special post-processing is used for the entire frame, also accelerated on the GPU. And to capture images in a higher dynamic range, you can use a special image format - EXR, an open standard from Industrial Light and Magic, the color values in each channel of which are recorded in 16-bit floating point format (FP16).

This format allows you to change the brightness and dynamic range images by post-processing, bringing it to the right one for each specific display in the same way as it is done with RAW formats from cameras. And for the subsequent use of post-processing filters in image processing programs, this format is very useful, since it contains much more data than the usual image formats.

But the Ansel platform itself contains a lot of post-processing filters, which is especially important because it has access not only to the final image, but also to all the buffers used by the game when rendering, which can be used for very interesting effects, like depth of field. To do this, Ansel has a special post-processing API, and any of the effects can be included in the game with support for this platform.

Ansel post-filters include: color curves, color space, transformation, desaturation, brightness/contrast, film grain, bloom, lens flare, anamorphic glare, distortion, heathaze, fisheye, color aberration, tone mapping, lens dirt, lightshafts , vignette, gamma correction, convolution, sharpening, edge detection, blur, sepia, denoise, FXAA and others.

As for the appearance of Ansel support in games, then we will have to wait a bit until the developers implement and test it. But Nvidia promises that such support will soon appear in such well-known games as The Division, The Witness, Lawbreakers, The Witcher 3, Paragon, Fortnite, Obduction, No Man's Sky, Unreal Tournament and others.

The new 16nm FinFET process technology and architecture optimizations have allowed the GeForce GTX 1080 based on the GP104 GPU to achieve a high clock speed of 1.6-1.7 GHz even in the reference form, and the new generation guarantees operation at the highest possible frequencies in games GPU Boost technologies. Together with an increased number of execution units, these improvements make it not only the highest performing single-chip graphics card of all time, but also the most energy efficient solution on the market.

The GeForce GTX 1080 is the first graphics card to feature the new GDDR5X graphics memory, a new generation of high-speed chips that achieve very high data rates. In the case of a modified GeForce GTX 1080, this type of memory operates at an effective frequency of 10 GHz. Combined with improved framebuffer compression algorithms, this resulted in a 1.7x increase in effective memory bandwidth for this GPU compared to its direct predecessor, the GeForce GTX 980.

Nvidia prudently decided not to release a radically new architecture on a completely new process technology for itself, so as not to encounter unnecessary problems during development and production. Instead, they seriously improved the already good and very efficient Maxwell architecture by adding some features. As a result, everything is fine with the production of new GPUs, and in the case of the GeForce GTX 1080 model, engineers have achieved a very high frequency potential - in overclocked versions from partners, the GPU frequency is expected up to 2 GHz! Such an impressive frequency became a reality thanks to the perfect technical process and painstaking work of Nvidia engineers in the development of the Pascal GPU.

Although Pascal is a direct follower of the Maxwell business, and these graphics architectures are fundamentally not too different from each other, Nvidia has introduced many changes and improvements, including display capabilities, video encoding and decoding engine, improved asynchronous execution of various types of calculations on the GPU, made changes to multi-chip rendering and introduced a new synchronization method, Fast Sync.

It is impossible not to highlight the Simultaneous Multi-Projection technology, which helps to improve performance in virtual reality systems, get more correct display of scenes on multi-monitor systems, and introduce new performance optimization techniques. But VR applications will get the biggest boost in speed when they support multi-projection technology, which helps to save GPU resources by half when processing geometric data and one and a half times in per-pixel calculations.

Among purely software changes, the platform for creating screenshots in games called Ansel stands out - it will be interesting to try it in practice not only for those who play a lot, but also for those who are simply interested in high-quality 3D graphics. The novelty allows you to advance the art of creating and retouching screenshots to a new level. Well, such packages for game developers as GameWorks and VRWorks, Nvidia just continues to improve step by step - so, in the latter, an interesting possibility of high-quality sound calculation has appeared, taking into account numerous reflections of sound waves using hardware ray tracing.

In general, in the form of the Nvidia GeForce GTX 1080 video card, a real leader entered the market, having all the necessary qualities for this: high performance and wide functionality, as well as support for new features and algorithms. The first buyers of this video card will be able to appreciate many of the benefits mentioned immediately, and other possibilities of the solution will be revealed a little later, when there is wide support from the outside. software. The main thing is that the GeForce GTX 1080 turned out to be very fast and efficient, and, as we really hope, Nvidia engineers managed to fix some of the problem areas (the same asynchronous calculations).

Graphics accelerator GeForce GTX 1070

Parameter	Meaning
Chip code name	GP104
Production technology	16nm FinFET
Number of transistors	7.2 billion
Core area	314 mm²
Architecture	Unified, with an array of common processors for stream processing of numerous types of data: vertices, pixels, etc.
DirectX hardware support	DirectX 12, with support for Feature Level 12_1
Memory bus	256-bit: eight independent 32-bit memory controllers supporting GDDR5 and GDDR5X memory
GPU frequency	1506 (1683) MHz
Computing blocks	15 active (out of 20 in the chip) streaming multiprocessors, including 1920 (out of 2560) scalar ALUs for floating point calculations within the framework of the IEEE 754-2008 standard;
Texturing blocks	120 active (out of 160 in the chip) texture addressing and filtering units with support for FP16 and FP32 components in textures and support for trilinear and anisotropic filtering for all texture formats
Raster Operations Units (ROPs)	8 wide ROPs (64 pixels) with support for various anti-aliasing modes, including programmable and with FP16 or FP32 frame buffer format. Blocks consist of an array of configurable ALUs and are responsible for depth generation and comparison, multisampling and blending
Monitor support	Integrated support for up to four monitors connected via Dual Link DVI, HDMI 2.0b and DisplayPort 1.2 (1.3/1.4 Ready)

GeForce GTX 1070 Reference Graphics Specifications
Parameter	Meaning
Core frequency	1506 (1683) MHz
Number of universal processors	1920
Number of texture blocks	120
Number of blending blocks	64
Effective memory frequency	8000 (4×2000) MHz
Memory type	GDDR5
Memory bus	256-bit
Memory size	8 GB
Memory Bandwidth	256 GB/s
Computing performance (FP32)	about 6.5 teraflops
Theoretical maximum fill rate	96 gigapixels/s
Theoretical texture sampling rate	181 gigatexels/s
Tire	PCI Express 3.0
Connectors	One Dual Link DVI, one HDMI and three DisplayPort
Energy consumption	up to 150 W
Extra food	One 8-pin connector
Number of slots occupied in the system chassis	2
Recommended price	$379-449 (US), 34,990 (Russia)

The GeForce GTX 1070 video card also received a logical name similar to the same solution from the previous GeForce series. It differs from its direct predecessor GeForce GTX 970 only in a changed generation figure. The novelty becomes a step below the current top solution GeForce GTX 1080 in the current line of the company, which became the temporary flagship of the new series until the release of even more powerful GPU solutions.

The recommended prices for Nvidia's new top video card are $379 and $449 for regular Nvidia Partners and Founders Edition, respectively. Compared to the top model, this is very good price given that the GTX 1070 is about 25% behind in the worst case. And at the time of the announcement and release, the GTX 1070 becomes the best performance solution in its class. Like the GeForce GTX 1080, the GTX 1070 has no direct competitors from AMD, and can only be compared with the Radeon R9 390X and Fury.

The GP104 GPU in the GeForce GTX 1070 modification decided to leave a full 256-bit memory bus, although they did not use a new type of GDDR5X memory, but a very fast GDDR5, which operates at a high effective frequency of 8 GHz. The amount of memory installed on a video card with such a bus can be 4 or 8 GB, and in order to ensure maximum performance of the new solution in conditions of high settings and rendering resolutions, the GeForce GTX 1070 video card model was also equipped with 8 GB of video memory, like its older sister. This volume is enough to run any 3D applications with maximum quality settings for several years.

GeForce GTX 1070 Founders Edition

With the announcement of the GeForce GTX 1080 in early May, a special edition of the video card called Founders Edition was announced, which has a higher price than regular video cards from the company's partners. The same applies to the novelty. In this article, we will again talk about a special edition of the GeForce GTX 1070 video card called Founders Edition. As in the case of the older model, Nvidia decided to release this version of the manufacturer's reference video card at a higher price. They claim that many gamers and enthusiasts who buy expensive top-end graphics cards want a product with an appropriate "premium" look and feel.

Accordingly, it is for such users that the GeForce GTX 1070 Founders Edition video card will be released to the market, which is designed and manufactured by Nvidia engineers from premium materials and components, such as the GeForce GTX 1070 Founders Edition aluminum cover, as well as a low-profile back plate that covers the back of the PCB and quite popular among enthusiasts.

As you can see from the photos of the board, the GeForce GTX 1070 Founders Edition inherited exactly the same industrial design from the reference version of the GeForce GTX 1080 Founders Edition. Both models use a radial fan that blows heated air out, which is very useful in both small cases and multi-chip SLI configurations with limited physical space. By blowing heated air out instead of circulating it inside the case, you can reduce thermal stress, improve overclocking results, and extend the life of system components.

Under the cover of the GeForce GTX 1070 reference cooling system, there is a specially shaped aluminum radiator with three built-in copper heat pipes that remove heat from the GPU itself. The heat dissipated by the heat pipes is then dissipated by an aluminum heatsink. Well, the low-profile metal plate on the back of the board is also designed to provide better thermal performance. It also features a retractable section for better airflow between multiple graphics cards in SLI configurations.

As for the board's power system, the GeForce GTX 1070 Founders Edition has a four-phase power system optimized for a stable power supply. Nvidia claims that the use of special components in the GTX 1070 Founders Edition improves power efficiency, stability, and reliability over the GeForce GTX 970, delivering better overclocking performance. In the company's own tests, the GeForce GTX 1070 GPUs easily surpassed 1.9 GHz, which is close to the results of the older GTX 1080 model.

The Nvidia GeForce GTX 1070 graphics card will be available in retail stores starting June 10th. The recommended prices for the GeForce GTX 1070 Founders Edition and partner solutions are different, and this is the main question for this special edition. If Nvidia partners sell their GeForce GTX 1070 graphics cards starting at $379 (in the US market), then Nvidia's reference design Founders Edition will cost as little as $449. Are there many enthusiasts who are ready to overpay for, let's face it, the dubious advantages of the reference version? Time will tell, but we believe that the reference fee is more interesting as an option available for purchase at the very beginning of sales, and later the point of purchasing it (and even at a high price!) is already reduced to zero.

It remains to be added that the printed circuit board of the reference GeForce GTX 1070 is similar to that of the older video card, and both of them differ from the device of the company's previous boards. The typical power consumption value for the new product is 150 W, which is almost 20% less than the value for the GTX 1080 and close to the power consumption of the previous generation GeForce GTX 970 video card. The Nvidia reference board has the already familiar set of connectors for connecting image output devices: one Dual-Link DVI , one HDMI and three DisplayPort. Moreover, there is support for new HDMI versions and DisplayPort, which we wrote about above in the review of the GTX 1080 model.

Architectural changes

The GeForce GTX 1070 is based on the GP104 chip, the first of a new generation of Nvidia's Pascal graphics architecture. This architecture was based on the solutions developed back in Maxwell, but it also has some functional differences, which we wrote about in detail above - in the part devoted to the top GeForce GTX 1080 video card.

The main change of the new architecture was the technological process by which all new GPUs will be executed. The use of the 16 nm FinFET manufacturing process in the production of GP104 made it possible to significantly increase the complexity of the chip while maintaining a relatively low area and cost, and the very first chip of the Pascal architecture has a significantly larger number of execution units, including those providing new functionality, compared to Maxwell chips of similar positioning.

The GP104 video chip is similar in its design to similar Maxwell architecture solutions, and you can find detailed information about the design of modern GPUs in our reviews of previous Nvidia solutions. Like previous GPUs, the chips of the new architecture will have a different configuration of Graphics Processing Cluster (GPC), Streaming Multiprocessor (SM) and memory controllers, and the GeForce GTX 1070 has already undergone some changes - part of the chip was locked and inactive ( highlighted in grey):

Although the GP104 GPU includes four GPC clusters and 20 SM multiprocessors, in the version for the GeForce GTX 1070 it received a stripped-down modification with one GPC cluster disabled by hardware. Since each GPC cluster has a dedicated rasterization engine and includes five SMs, and each multiprocessor consists of 128 CUDA cores and eight TMUs, 1920 CUDA cores and 120 TMUs of 2560 stream processors are active in this version of GP104 and 160 physical texture units.

The graphics processor on which the GeForce GTX 1070 is based contains eight 32-bit memory controllers, resulting in a 256-bit memory bus - exactly like in the case of the older GTX 1080 model. The memory subsystem has not been cut in order to provide a sufficiently high bandwidth memory with the condition of using GDDR5 memory in the GeForce GTX 1070. Each of the memory controllers has eight ROPs and 256 KB of L2 cache, so the GP104 chip in this modification also contains 64 ROPs and 2048 KB of L2 cache level.

Thanks to architectural optimizations and a new process technology, the GP104 GPU has become the most energy efficient GPU to date. Nvidia engineers were able to increase the clock speed more than they expected when moving to a new process, for which they had to work hard, carefully checking and optimizing all the bottlenecks of previous solutions that did not allow them to work at a higher frequency. Accordingly, the GeForce GTX 1070 also operates at a very high frequency, more than 40% higher than the reference value for the GeForce GTX 970.

Since the GeForce GTX 1070 is, in essence, just a slightly less productive GTX 1080 with GDDR5 memory, it supports absolutely all the technologies we described in the previous section. For more details on the Pascal architecture, as well as the technologies it supports, such as improved output and video processing units, support for Async Compute, Simultaneous Multi-Projection technology, changes in SLI multi-chip rendering, and the new Fast Sync type of synchronization, it is worth reading with a section on the GTX 1080.

High-performance GDDR5 memory and its efficient use

We wrote above about changes in the memory subsystem of the GP104 GPU, on which the GeForce GTX 1080 and GTX 1070 models are based - the memory controllers included in this GPU support both the new type of GDDR5X video memory, which is described in detail in the GTX 1080 review, as well as and the good old GDDR5 memory that we have known for several years now.

In order not to lose too much memory bandwidth in the lower model GTX 1070 compared to the older GTX 1080, all eight 32-bit memory controllers were left active in it, getting a full 256-bit common video memory interface. In addition, the video card was equipped with the fastest GDDR5 memory available on the market - with an effective operating frequency of 8 GHz. All this provided a memory bandwidth of 256 GB / s, in contrast to 320 GB / s for the older solution - the computing capabilities were cut by about the same amount, so the balance was maintained.

Keep in mind that while peak theoretical bandwidth is important for GPU performance, you need to pay attention to its efficiency as well. During the rendering process, many different bottlenecks can limit the overall performance, preventing the use of all available memory bandwidth. To minimize these bottlenecks, GPUs use special lossless data compression to improve the efficiency of data reads and writes.

The fourth generation of delta compression of buffer information has already been introduced in the Pascal architecture, which allows the GPU to more efficiently use the available capabilities of the video memory bus. The memory subsystem in the GeForce GTX 1070 and GTX 1080 uses improved old and several new lossless data compression techniques designed to reduce bandwidth requirements. This reduces the amount of data written to memory, improves L2 cache efficiency, and reduces the amount of data sent between different points on the GPU, like the TMU and the framebuffer.

GPU Boost 3.0 and overclocking features

Most Nvidia partners have already announced factory overclocked solutions based on the GeForce GTX 1080 and GTX 1070. special utilities for overclocking, allowing you to use the new functionality of GPU Boost 3.0 technology. One example of such utilities is EVGA Precision XOC, which includes an automatic scanner to determine the voltage-to-frequency curve - in this mode, for each voltage, by running a stability test, a stable frequency is found at which the GPU provides a performance boost. However, this curve can also be changed manually.

We know GPU Boost technology well from previous Nvidia graphics cards. In their GPUs, they use this hardware feature, which is designed to increase the operating clock speed of the GPU in modes where it has not yet reached the limits of power consumption and heat dissipation. In Pascal GPUs, this algorithm has undergone several changes, the main of which is a finer setting of turbo frequencies, depending on the voltage.

If earlier the difference between the base frequency and the turbo frequency was fixed, then in GPU Boost 3.0 it became possible to set turbo frequency offsets for each voltage separately. Now the turbo frequency can be set for each of the individual voltage values, which allows you to fully squeeze all the overclocking capabilities out of the GPU. We wrote about this feature in detail in the GeForce GTX 1080 review, and you can use the EVGA Precision XOC and MSI Afterburner utilities for this.

Since some details have changed in the overclocking methodology with the release of video cards with support for GPU Boost 3.0, Nvidia had to make additional explanations in the instructions for overclocking new products. There are different overclocking techniques with different variable characteristics that affect the final result. For each specific system, a particular method may be better suited, but the basics are always about the same.

Many overclockers use the Unigine Heaven 4.0 benchmark to test system stability, which loads the GPU well, has flexible settings and can be run in windowed mode along with an overclocking and monitoring utility window nearby, like EVGA Precision or MSI Afterburner. However, such a check is enough only for initial estimates, and to firmly confirm the stability of overclocking, it must be checked in several gaming applications, because different games require different loads on different functional units of the GPU: mathematical, texture, geometric. The Heaven 4.0 benchmark is also convenient for overclocking because it has a looped mode of operation, in which it is convenient to change overclocking settings, and there is a benchmark for evaluating the speed increase.

Nvidia advises running Heaven 4.0 and EVGA Precision XOC windows together when overclocking the new GeForce GTX 1080 and GTX 1070 graphics cards. At first, it is desirable to immediately increase the fan speed. And for serious overclocking, you can immediately set the speed value to 100%, which will make the video card very loud, but it will cool the GPU and other components of the video card as much as possible by lowering the temperature to the lowest possible level, preventing throttling (reduction in frequencies due to an increase in GPU temperature above a certain value ).

Next, you need to set the target power value (Power Target) also to the maximum. This setting will provide the GPU with the maximum amount of power possible by increasing the power consumption level and the target temperature of the GPU (GPU Temp Target). For some purposes, the second value can be separated from the Power Target change, and then these settings can be adjusted individually - to achieve less heating of the video chip, for example.

The next step is to increase the GPU Clock Offset value - it means how much higher the turbo frequency will be during operation. This value raises the frequency for all voltages and results in better performance. As usual, when overclocking, you need to check the stability when increasing the frequency of the GPU in small steps - from 10 MHz to 50 MHz per step before you notice a hang, driver or application error, or even visual artifacts. When this limit is reached, you should reduce the frequency value by a step down and once again check the stability and performance during overclocking.

In addition to the GPU frequency, you can also increase the video memory frequency (Memory Clock Offset), which is especially important in the case of the GeForce GTX 1070 equipped with GDDR5 memory, which usually overclocks well. The process in the case of the memory frequency exactly repeats what is done when finding a stable GPU frequency, the only difference is that the steps can be made larger - add 50-100 MHz to the base frequency at once.

In addition to the above steps, you can also increase the Overvoltage limit, because a higher GPU frequency is often achieved at increased voltage, when unstable parts of the GPU receive additional power. True, the potential disadvantage of increasing given value is the possibility of damage to the video chip and its accelerated failure, so you need to use voltage increase with extreme caution.

Overclocking enthusiasts use slightly different techniques, changing the parameters in a different order. For example, some overclockers share experiences on finding a stable GPU and memory frequency so that they do not interfere with each other, and then test the combined overclocking of both the video chip and memory chips, but these are already insignificant details of an individual approach.

Judging by the opinions in the forums and comments on articles, some users did not like the new GPU Boost 3.0 operation algorithm, when the GPU frequency is initially driven very high, often higher than the turbo frequency, but then, under the influence of a rise in GPU temperature or increased power consumption above the set limit, it can drop to much lower values. This is just the specifics of the updated algorithm, you need to get used to the new behavior of the dynamically changing GPU frequency, but it does not have any negative consequences.

The GeForce GTX 1070 is the second model after the GTX 1080 in Nvidia's new line of graphics processors based on the Pascal family. The new 16nm FinFET manufacturing process and architecture optimizations have enabled this graphics card to achieve high clock speeds, which is supported by the new generation of GPU Boost technology. Even though the number of functional blocks in the form of stream processors and texture modules has been reduced, their number remains sufficient for the GTX 1070 to become the most profitable and energy efficient solution.

Installing GDDR5 memory on the youngest of a pair of released models of Nvidia video cards on a GP104 chip, unlike the new type of GDDR5X that distinguishes the GTX 1080, does not prevent it from achieving high performance indicators. Firstly, Nvidia decided not to cut the memory bus of the GeForce GTX 1070 model, and secondly, they put the fastest GDDR5 memory on it with an effective frequency of 8 GHz, which is only slightly lower than 10 GHz for the GDDR5X used in the older model. Taking into account the improved delta compression algorithms, the effective memory bandwidth of the GPU has become higher than the same parameter for similar model previous generation GeForce GTX 970.

The GeForce GTX 1070 is good in that it offers very high performance and support for new features and algorithms at a much lower price compared to the older model announced a little earlier. If a few enthusiasts can afford to buy a GTX 1080 for 55,000, then a much larger circle of potential buyers will be able to pay 35,000 for only a quarter of a less productive solution with exactly the same capabilities. It was the combination of relatively low price and high performance that made the GeForce GTX 1070 perhaps the most profitable purchase at the time of its release.

Graphics accelerator GeForce GTX 1060

Parameter	Meaning
Chip code name	GP106
Production technology	16nm FinFET
Number of transistors	4.4 billion
Core area	200 mm²
Architecture	Unified, with an array of common processors for stream processing of numerous types of data: vertices, pixels, etc.
DirectX hardware support	DirectX 12, with support for Feature Level 12_1
Memory bus	192-bit: six independent 32-bit memory controllers supporting GDDR5 memory
GPU frequency	1506 (1708) MHz
Computing blocks	10 streaming multiprocessors, including 1280 scalar ALUs for floating point calculations within the IEEE 754-2008 standard;
Texturing blocks	80 texture addressing and filtering units with support for FP16 and FP32 components in textures and support for trilinear and anisotropic filtering for all texture formats
Raster Operations Units (ROPs)	6 wide ROPs (48 pixels) with support for various anti-aliasing modes, including programmable and with FP16 or FP32 frame buffer format. Blocks consist of an array of configurable ALUs and are responsible for depth generation and comparison, multisampling and blending
Monitor support	Integrated support for up to four monitors connected via Dual Link DVI, HDMI 2.0b and DisplayPort 1.2 (1.3/1.4 Ready)

GeForce GTX 1060 Reference Graphics Specifications
Parameter	Meaning
Core frequency	1506 (1708) MHz
Number of universal processors	1280
Number of texture blocks	80
Number of blending blocks	48
Effective memory frequency	8000 (4×2000) MHz
Memory type	GDDR5
Memory bus	192-bit
Memory size	6 GB
Memory Bandwidth	192 GB/s
Computing performance (FP32)	about 4 teraflops
Theoretical maximum fill rate	72 gigapixels/s
Theoretical texture sampling rate	121 gigatexels/s
Tire	PCI Express 3.0
Connectors	One Dual Link DVI, one HDMI and three DisplayPort
Typical Power Consumption	120 W
Extra food	One 6-pin connector
Number of slots occupied in the system chassis	2
Recommended price	$249 ($299) in the US and 18,990 in Russia

The GeForce GTX 1060 video card also received a name similar to the same solution from the previous GeForce series, differing from the name of its direct predecessor GeForce GTX 960 only by the changed first digit of the generation. The novelty has become in the current line of the company one step lower than the previously released GeForce GTX 1070 solution, which is average in terms of speed in the new series.

The recommended prices for Nvidia's new video card are $249 and $299 for the regular versions of the company's partners and for the special Founder's Edition, respectively. Compared to the two older models, this is a very favorable price, since the new GTX 1060 model, although inferior to top-end motherboards, is nowhere near as much as it is cheaper. At the time of the announcement, the novelty definitely became the best performance solution in its class and one of the most profitable offers in this price range.

This model of Nvidia's Pascal family video card came out to counteract the fresh decision of the rival company AMD, which released the Radeon RX 480 a little earlier. You can compare the new Nvidia video card with this video card, although not quite directly, since they still differ quite significantly in price . The GeForce GTX 1060 is more expensive ($249-299 versus $199-229), but it's also clearly faster than its competitor.

The GP106 graphics processor has a 192-bit memory bus, so the amount of memory installed on a video card with such a bus can be 3 or 6 GB. A smaller value in modern conditions is frankly not enough, and many game projects, even in Full HD resolution, will run into a lack of video memory, which will seriously affect the smoothness of rendering. To ensure maximum performance of the new solution at high settings, the GeForce GTX 1060 model was equipped with 6 GB of video memory, which is enough to run any 3D applications with any quality settings. Moreover, today there is simply no difference between 6 and 8 GB, and such a solution will save some money.

The typical power consumption value for the new product is 120 W, which is 20% less than the value for the GTX 1070 and is equal to the power consumption of the previous generation GeForce GTX 960 graphics card, which has much lower performance and capabilities. The reference board has the usual set of connectors for connecting image output devices: one Dual-Link DVI, one HDMI and three DisplayPort. Moreover, there was support for new versions of HDMI and DisplayPort, which we wrote about in the review of the GTX 1080 model.

The length of the GeForce GTX 1060 reference board is 9.8 inches (25 cm), and among the differences from the older options, we separately note that the GeForce GTX 1060 does not support the SLI multi-chip rendering configuration, and does not have a special connector for this. Since the board consumes less power than older models, one 6-pin PCI-E external power connector was installed on the board for additional power.

GeForce GTX 1060 video cards have appeared on the market since the day of the announcement in the form of products from the company's partners: Asus, EVGA, Gainward, Gigabyte, Innovision 3D, MSI, Palit, Zotac. A special edition of the GeForce GTX 1060 Founder’s Edition, produced by Nvidia itself, will be released in limited quantities, which will be sold at a price of $299 exclusively on the Nvidia website and will not be officially presented in Russia. The Founder's Edition is distinguished by the fact that it is made of high quality materials and components, including an aluminum case, and uses an efficient cooling system, as well as low resistance power circuits and specially designed voltage regulators.

Architectural changes

The GeForce GTX 1060 video card is based on a completely new GP106 model graphics processor, which is functionally no different from the first-born of the Pascal architecture in the form of the GP104 chip, on which the GeForce GTX 1080 and GTX 1070 models described above are based. This architecture was based on solutions worked out back in Maxwell, but it also has some functional differences, which we wrote about in detail earlier.

The GP106 video chip is similar in its design to the top-end Pascal chip and similar solutions of the Maxwell architecture, and you can find detailed information about the design of modern GPUs in our reviews of previous Nvidia solutions. Like previous GPUs, the chips of the new architecture have a different configuration of Graphics Processing Cluster (GPC), Streaming Multiprocessor (SM) and memory controllers:

The GP106 graphics processor incorporates two GPC clusters, consisting of 10 streaming multiprocessors (Streaming Multiprocessor - SM), that is, exactly half of the GP104. As in the older GPU, each of the multiprocessors contains 128 cores, 8 TMU texture units, 256 KB each register memory, 96 KB of shared memory, and 48 KB of L1 cache. As a result, the GeForce GTX 1060 contains a total of 1,280 compute cores and 80 texture units, half that of the GTX 1080.

But the memory subsystem of the GeForce GTX 1060 was not halved relative to the top solution, it contains six 32-bit memory controllers, giving the final 192-bit memory bus. With an effective frequency of GDDR5 video memory for the GeForce GTX 1060 equal to 8 GHz, the bandwidth reaches 192 GB / s, which is quite good for a solution in this price segment, especially considering the high efficiency of its use in Pascal. Each of the memory controllers has eight ROPs and 256 KB of L2 cache associated with it, so in total full version The GP106 GPU contains 48 ROPs and 1536 KB L2 cache.

To reduce memory bandwidth requirements and make more efficient use of the available Pascal architecture, lossless on-chip data compression has been further improved, which is able to compress data in buffers, gaining efficiency and performance gains. In particular, new 4:1 and 8:1 delta compression methods have been added to the new family of chips, providing an additional 20% to the bandwidth efficiency compared to previous solutions of the Maxwell family.

The base frequency of the new GPU is 1506 MHz - the frequency should not fall below this mark in principle. The typical Boost Clock is much higher at 1708 MHz, which is the average of the actual frequency that the GeForce GTX 1060 graphics chip runs at in a wide range of games and 3D applications. The actual Boost frequency depends on the game and the conditions in which the test takes place.

Like the rest of the Pascal family, the GeForce GTX 1060 not only operates at a high clock speed, providing high performance, but also has a decent margin for overclocking. The first experiments indicate the possibility of reaching frequencies of the order of 2 GHz. It is not surprising that the company's partners are also preparing factory overclocked versions of the GTX 1060 video card.

So, the main change in the new architecture was the 16 nm FinFET process, the use of which in the production of GP106 made it possible to significantly increase the complexity of the chip while maintaining a relatively low area of 200 mm², so this Pascal architecture chip has a significantly larger number of execution units compared to a Maxwell chip of similar positioning produced using the 28 nm process technology.

If the GM206 (GTX 960) with an area of 227 mm² had 3 billion transistors and 1024 ALUs, 64 TMUs, 32 ROPs and a 128-bit bus, then the new GPU contained 4.4 billion transistors, 1280 ALUs, in 200 mm², 80 TMUs and 48 ROPs with a 192-bit bus. Moreover, at almost one and a half times higher frequency: 1506 (1708) versus 1126 (1178) MHz. And this is with the same power consumption of 120 watts! As a result, the GP106 GPU has become one of the most energy efficient GPUs, along with the GP104.

New Nvidia Technologies

One of the most interesting technologies of the company, which is supported by the GeForce GTX 1060 and other solutions of the Pascal family, is the technology Nvidia Simultaneous Multi-Projection. We already wrote about this technology in the GeForce GTX 1080 review, it allows you to use several new techniques to optimize rendering. In particular - to simultaneously project a VR image for two eyes at once, significantly increasing the efficiency of using the GPU in virtual reality.

To support SMP, all GPUs of the Pascal family have a special engine, which is located in the PolyMorph Engine at the end of the geometric pipeline before the rasterizer. With its help, the GPU can simultaneously project a geometric primitive onto several projections from one point, while these projections can be stereo (ie, up to 16 or 32 projections are supported simultaneously). This feature allows Pascal GPUs to accurately reproduce a curved surface for VR rendering, as well as display correctly on multi-monitor systems.

It is important that Simultaneous Multi-Projection technology is already being integrated into popular game engines (Unreal Engine and Unity) and games, and to date, support for the technology has been announced for more than 30 games in development, including such well-known projects as Unreal Tournament , Poolnation VR, Everest VR, Obduction, Adr1ft and Raw Data. Interestingly, although Unreal Tournament is not a VR game, it does use SMP to achieve better visuals and performance.

Another long-awaited technology is a powerful tool for creating screenshots in games. Nvidia Ansel. This tool allows you to create unusual and very high-quality screenshots from games with previously inaccessible features, saving them in very high resolution and supplementing them with various effects, and share your creations. Ansel allows you to literally build a screenshot the way the artist wants it, allowing you to install a camera with any parameters anywhere in the scene, apply powerful post-filters to the image, or even take a 360-degree shot for viewing in a virtual reality helmet.

Nvidia has standardized the integration of the Ansel UI into games, and doing so is as easy as adding a few lines of code. You don’t need to wait for this feature to appear in games anymore, you can evaluate Ansel’s abilities right now in Mirror’s Edge: Catalyst, and a little later it will become available in Witcher 3: Wild Hunt. In addition, many Ansel-enabled game projects are in development, including games such as Fortnite, Paragon and Unreal Tournament, Obduction, The Witness, Lawbreakers, Tom Clancy's The Division, No Man's Sky, and more.

The new GeForce GTX 1060 GPU also supports the toolkit Nvidia VRWorks, which helps developers to create impressive projects for virtual reality. This package includes many utilities and tools for developers, including VRWorks Audio, which allows you to perform very accurate calculation of the reflections of sound waves from scene objects using GPU ray tracing. The package also includes integration into VR and PhysX physics effects to ensure physically correct behavior of objects in the scene.

One of the most exciting VR games to take advantage of VRWorks is VR Funhouse, Nvidia's own VR game that's available for free on Valve's Steam service. This game is powered by Unreal Engine 4 (Epic Games) and runs on GeForce GTX 1080, 1070 and 1060 graphics cards in conjunction with HTC Vive VR headsets. Moreover, the source code of this game will be publicly available, which will allow other developers to use ready-made ideas and code already in their VR attractions. Take our word for it, this is one of the most impressive demonstrations of the possibilities of virtual reality.

Including thanks to SMP and VRWorks technologies, the use of the GeForce GTX 1060 GPU in VR applications provides quite enough for entry level virtual reality performance, and the GPU in question meets the minimum required hardware level, including for SteamVR, becoming one of the most successful acquisitions for use in systems with official support VR.

Since the GeForce GTX 1060 model is based on the GP106 chip, which is in no way inferior to the GP104 graphics processor, which became the basis for older modifications, it supports absolutely all the technologies described above.

The GeForce GTX 1060 is the third model in Nvidia's new line of graphics processors based on the Pascal family. The new 16nm FinFET process and architecture optimizations have allowed all new graphics cards to achieve high clock speeds and place more functional blocks in the GPU in the form of stream processors, texture modules and others, compared to previous generation video chips. That is why the GTX 1060 has become the most profitable and energy efficient solution in its class and in general.

It is especially important that the GeForce GTX 1060 offers sufficiently high performance and support for new features and algorithms at a much lower price compared to older solutions based on the GP104. The GP106 graphics chip used in the new model delivers best-in-class performance and power efficiency. The GeForce GTX 1060 model is specially designed and perfectly suited for all modern games at high and maximum graphics settings at a resolution of 1920x1080 and even with full-screen anti-aliasing enabled by various methods (FXAA, MFAA or MSAA).

And for those who want even more performance with ultra-high-resolution displays, Nvidia has the top-of-the-line GeForce GTX 1070 and GTX 1080 graphics cards, which are also quite good in terms of performance and power efficiency. And yet, the combination of low price and sufficient performance quite favorably distinguishes the GeForce GTX 1060 from the background of older solutions. Compared to the competing Radeon RX 480, Nvidia's solution is slightly faster with less complexity and GPU footprint, and has significantly better power efficiency. True, it is sold a little more expensive, so each video card has its own niche.

NVIDIA GeForce GTX 780 video card review | GeForce Experience and ShadowPlay

GeForce Experience

As computer enthusiasts, we appreciate the combination of different settings that affect the performance and quality of games. The easiest way is to spend a lot of money on a new video card and set all the graphics settings to the maximum. But when a parameter turns out to be too heavy for the card and it has to be reduced or turned off, there is an unpleasant feeling and the realization that the game could work much better.

However, setting the optimal settings is not so easy. Some settings produce better visuals than others, and the impact on performance can vary greatly. The GeForce Experience program is NVIDIA's attempt to make it easy to choose game settings by comparing your CPU, GPU, and resolution against a configuration database. The second part of the utility helps you determine if drivers need updates.

It is likely that enthusiasts will continue to choose the settings themselves and will negatively perceive additional program. However, most gamers who want to install the game and start playing right away without checking the drivers and going through various settings will definitely be happy with this opportunity. Either way, NVIDIA's GeForce Experience helps people make the most of their gaming experience, and is therefore a useful utility for PC gaming.

GeForce Experience identified all nine games installed on our test system. Naturally they didn't save the default settings as we applied certain settings for the sake of testing. But it's still interesting how GeForce Experience would have changed the options we've chosen.

For Tomb Raider, GeForce Experience wanted to disable TressFX technology, even though NVIDIA GeForce GTX 780 with the function enabled, it showed an average of 40 frames per second. For some reason, the program was unable to determine the configuration Far Cry 3, although the settings she suggested were quite high. For unknown reasons for Skyrim, the utility wanted to disable FXAA.

It's nice to get a set of screenshots for each game describing the effect of a certain setting on image quality. Of the nine examples we reviewed, GeForce Experience came close to the optimal settings, in our opinion. However, the utility is also biased, patronizing NVIDIA-specific features such as PhysX (which the program put up on high level in Borderlands 2) and preventing features from AMD (including TressFX in Tomb Raider) from being enabled. Disabling FXAA in Skyrim does not make sense at all, since the game averages 100 FPS. It's possible that enthusiasts will want to install GeForce Experience once the NVIDIA Shield system ships, as the Game Streaming feature appears to be available through the NVIDIA app.

ShadowPlay: Always-On Video Recorder for Gaming

WoW fans often record their raids, but this requires a fairly powerful system, Fraps, and a lot of disk space.

NVIDIA recently announced new feature ShadowPlay, which can greatly simplify the recording process.

When activated, ShadowPlay uses the fixed NVEnc decoder built into the Kepler GPU, which automatically records the last 20 minutes of gameplay. Or you can manually start and stop ShadowPlay. Thus, technology replaces software solutions like Fraps, which give a higher load on the CPU.

For reference: NVEnc only works with H.264 encoding at resolutions up to 4096x4096 pixels. ShadowPlay is not yet available on the market, but NVIDIA says it will be able to record 1080p video at up to 30 FPS by the time it launches this summer. We would like to see a higher resolution as it was previously stated that the encoder has the potential to support it in hardware.

NVIDIA GeForce GTX 780 video card review | GPU Boost 2.0 and possible overclocking issues

GPU Boost 2.0

In review GeForce GTX Titan We didn't get to test the 2nd Gen NVIDIA GPU Boost technology extensively, but now it's here NVIDIA GeForce GTX 780. Here is a short description of this technology:

GPU Boost is an NVIDIA mechanism that changes the performance of graphics cards depending on the type of task being processed. As you probably know, games have different GPU resource requirements. Historically, the frequency has to be tuned for the worst-case scenario. But when processing "light" GPU tasks worked for nothing. GPU Boost monitors various parameters and increases or decreases frequencies depending on the needs of the application and the current situation.

The first implementation of GPU Boost worked under a certain power threshold (170 W in the case of GeForce GTX 680). However, the company's engineers have found that they can safely exceed this level if the GPU temperature is low enough. Thus, performance can be further optimized.

In practice, GPU Boost 2.0 differs only in that NVIDIA now accelerates the frequency based not on the power limit, but on a certain temperature, which is 80 degrees Celsius. This means that higher frequency and voltage values will now be used up to the temperature of the chip up to 80 degrees. Don't forget that the temperature mainly depends on the profile and settings of the fan: the higher the speed, the lower the temperature and, therefore, the higher the GPU Boost values (and the noise level, unfortunately, too). The technology still evaluates the situation once every 100ms, so NVIDIA has more work to do in future versions.

The temperature-dependent settings make the testing process even more difficult compared to the first version of GPU Boost. Anything that raises or lowers the temperature of the GK110 changes the clock of the chip. Therefore, achieving consistent results between runs is quite difficult. In laboratory conditions, one can only hope for a stable ambient temperature.

In addition to the above, it is worth noting that you can increase the temperature limit. For example, if you want to NVIDIA GeForce GTX 780 lowered the frequency and voltage at the level of 85 or 90 degrees Celsius, this can be configured in the parameters.

Want to keep the GK110 as far away from your chosen temperature limit as possible? fan curve NVIDIA GeForce GTX 780 fully adjustable, allowing you to adjust the duty cycle according to temperature values.

Possible overclocking problems

During our acquaintance with GeForce GTX Titan company representatives showed us an internal utility capable of reading the status various sensors: so it simplifies the process of diagnosing non-standard behavior of the card. If the temperature of the GK110 rises too high during overclocking, this information will be recorded in the log even when throttling.

Now the company implements this function through the Precision X application, which launches a warning algorithm "reasons" if during acceleration there were actions that prevent its effective continuation. This is a great feature because you no longer have to guess about potential bottlenecks. There is also an OV max limit indicator that will let you know if you have reached the absolute peak voltage of the GPU. In this case, there is a risk of burning the card. You can consider this as a suggestion to lower the overclocking parameters.

NVIDIA GeForce GTX 780 video card review | Test stand and benchmarks

Test bench configuration
CPU	Intel Core i7-3770K (Ivy Bridge) 3.5GHz @ 4.0GHz (40*100MHz), LGA 1155, 8MB shared L3 cache, Hyper-Threading enabled, Power-savings enabled
Motherboard	Gigabyte Z77X-UD5H (LGA 1155), Z77 Express chipset, BIOS F15q
RAM	G.Skill 16GB (4 x 4GB) DDR3-1600, F3-12800CL9Q2-32GBZL @ 9-9-9-24 at 1.5V
Storage device	Crucial m4 SSD 256GB SATA 6Gb/s
Video cards	Nvidia GeForce GTX 780 3 GB AMD Radeon HD 7990 6 GB AMD Radeon HD 7970 GHz Edition 3 GB Nvidia GeForce GTX 580 1.5 GB Nvidia GeForce GTX 680 2 GB Nvidia GeForce GTX Titan 6 GB Nvidia GeForce GTX 690 4 GB
Power Supply	Cooler Master UCP-1000W
System software and drivers
OS	Windows 8 Professional 64-bit
DirectX	DirectX 11
Graph. drivers	AMD Catalyst 13.5 (Beta 2) Nvidia GeForce Release 320.00 Nvidia GeForce Release 320.18 (for GeForce GTX 780)

Getting the correct frame rate value

Observant readers will notice that the figures on the following pages are more modest than in the review. AMD Radeon HD 7990, and there's a reason for that. Previously, we presented synthetic and real frame rates, and then showed time fluctuations between frames along with dropped and short frames. The fact is that this method does not reflect the real feeling of the operation of the video card, and on our part it would be unfair to condemn AMD, relying on synthetic indicators of time delay between frames.

That's why, along with frame rate fluctuations, we're now providing more practical dynamic frame rate metrics. The results are not as high, but at the same time they are very eloquent in games where AMD is experiencing difficulties.

Tests and settings
Battlefield 3	Graphics Quality - Ultra, v-sync off, 2560x1440, DirectX 11, Going Hunting, 90-seconds, FCAT
Far Cry 3	Graphics Quality - Ultra, DirectX 11, v-sync off, 2560x1440, run your own route, 50-second, FCAT
Borderlands 2	Graphics Quality - Highest, PhysX Low, 16x Anisotropic Filtering, 2560x1440, Run Your Own Route, FCAT
Hitman: Absolution	Graphics quality - Ultra, MSAA off, 2560x1440, built-in benchmark, FCAT
The Elder Scrolls V: Skyrim	Graphics Quality - Ultra, FXAA Enabled, 2560x1440, run your own route, 25-seconds, FCAT
3DMark	Fire Strike Benchmark
BioShock Infinite	Graphics quality - Ultra, DirectX 11, diffuser depth of field, 2560x1440, built-in benchmark, FCAT
Crysis 3	Graphics quality - very high, MSAA: Low (2x), high resolution text, 2560x1440, run on your own route, 60-seconds, FCAT
Tomb Raider	Graphics Quality - Ultimate, FXAA enabled, 16x anisotropic filtering, TressFX Hair, 2560x1440, run your own route, 45-seconds, FCAT
Lux Mark 2.0	64-bit Binary, Version 2.0, Sala Scene
SiSoftware Sandra 2013 Professional	Sandra Tech Support (Engineer) 2013.SP1, Cryptography, Financial Analysis Performance

CONTENT

Similar posts