Home / Miscellaneous / Support for hyper threading technology. Hyper-Threading Technology from Intel. Productivity is never enough

Support for hyper threading technology. Hyper-Threading Technology from Intel. Productivity is never enough

Users who have at least once been configuring the BIOS have probably already noticed that there is an Intel Hyper Threading parameter that is incomprehensible to many. Many do not know what this technology is and for what purpose it is used. Let's try to figure out what Hyper Threading is and how you can enable the use of this support. We will also try to figure out what advantages it gives to a computer. this setting. In principle, there is nothing difficult to understand here.

Intel Hyper Threading: what is it?
If you do not go deep into the jungle of computer terminology, but put it in simple language, then this technology was designed to increase the flow of commands processed simultaneously by the CPU. Modern processor chips, as a rule, use only 70% of the available computing capabilities. The rest remains, so to speak, in reserve. As for processing the data stream, in most cases only one thread is used, despite the fact that the system uses a multi-core processor.

Basic principles of work
In order to increase the capabilities of the central processor, a special Hyper Threading technology was developed. This technology makes it easy to split one command stream into two. It is also possible to add a second stream to an existing one. Only such a stream is virtual and does not work at the physical level. This approach allows you to significantly increase the performance of the processor. The whole system, accordingly, starts to work faster. The increase in CPU performance can fluctuate quite a lot. This will be discussed separately. However, the developers of Hyper Threading technology themselves claim that it falls short of a full-fledged core. In some cases, the use of this technology is fully justified. If you know the essence of Hyper Threading processors, then the result will not be long in coming.

History reference
Let's dive a little into the history of this development. Support for Hyper Threading first appeared only in Intel Pentium 4 processors. Later, the implementation of this technology was continued in the Intel Core iX series (X stands for processor series here). It should be noted that for some reason it is absent in the line of Core 2 processor chips. True, then the increase in productivity was rather weak: somewhere at the level of 15-20%. This indicated that the processor did not have the necessary processing power, and the technology created was practically ahead of its time. Today, support for Hyper Threading technology is already available in almost all modern chips. To increase the power of the central processor, the process itself uses only 5% of the crystal surface, while leaving room for processing commands and data.

A question of conflicts and performance
All this is certainly good, but in some cases, when processing data, there may be a slowdown in work. This is mostly due to the so-called branch prediction module and insufficient cache size when it is constantly reloaded. If we talk about the main module, then in this case the situation develops in such a way that in some cases the first thread may require data from the second, which may not be processed at that moment or are in the queue for processing. Also, no less common are situations when the central processor core has a very serious load, and the main module, despite this, continues to send data to it. Some programs and applications, such as resource-intensive online games, can seriously slow down just because they lack optimization for the use of Hyper Threading technology. What happens with games? The user's computer system, for its part, tries to optimize the data flows from the application to the server. The problem is that the game does not know how to independently distribute data streams, dumping everything in one heap. By and large, it may simply not be designed for this. Sometimes in dual-core processors, the performance increase is significantly higher than in 4-core ones. They just don't have the processing power.

How to enable Hyper Threading in BIOS?
We have already figured out a little about what Hyper Threading technology is and got acquainted with the history of its development. We have come close to understanding what Hyper Threading technology is. How to activate this technology for use in the processor? Here everything is done quite simply. You must use the BIOS management subsystem. The subsystem is entered using the Del, F1, F2, F3, F8, F12, F2+Del, etc. keys. If you are using a Sony Vaio laptop, then they have a specific input when using the dedicated ASSIST key. In the BIOS settings, if the processor you are using supports Hyper Threading technology, there should be a special setting line. In most cases, it looks like Hyper Threading Technology, and sometimes it looks like Function. Depending on the subsystem developer and BIOS version, the setting of this parameter can be contained either in the main menu or in the advanced settings. To enable this technology, you must enter the options menu and set the value to Enabled. After that, you need to save the changes made and reboot the system.

Why is Hyper Threading useful?
In conclusion, I would like to talk about the benefits that the use of Hyper Threading technology provides. What is all this for? Why is it necessary to increase the processor power when processing information? Those users who work with resource-intensive applications and programs do not need to explain anything. Many people probably know that graphic, mathematical, design software packages require a lot of system resources in the process of work. Because of this, the entire system is loaded so much that it starts to slow down terribly. To prevent this from happening, it is recommended to enable Hyper Threading support.

Many Intel processors include modules with Hyper support-Threading Technology, which, in accordance with the idea of ​​the developers, should help increase the performance of the chip and speed up the PC as a whole. What is the specifics of this solution from an American corporation? How can you take advantage of Hyper-Threading?

Technology Basics

Let's take a look at the key details about Hyper-Threading. What is this technology? It was developed by Intel and first introduced to the public in 2001. The purpose of its creation was to increase the performance of servers. The main principle implemented in Hyper-Threading is the distribution of processor calculations to several threads. Moreover, this is possible even if only one core is installed on the corresponding type of microcircuit (in turn, if there are 2 or more of them, and the threads in the processor are already distributed, the technology successfully complements this mechanism).

Ensuring the operation of the main PC chip within several threads is carried out by creating copies of architectural states in the course of calculations. In this case, the same set of resources on the chip is used. If the application uses the appropriate capability, then practically significant operations are carried out much faster. It is also important that the technology in question is supported by the computer's input / output system - the BIOS.

Enabling Hyper-Threading

If the processor installed in the PC supports the appropriate standard, then it is usually activated automatically. But in some cases, you have to manually perform the necessary actions in order for the Hyper-Threading technology to work. How to enable it? Very simple.

You need to enter the main BIOS interface. To do this, at the very beginning of the computer boot, you need to press DEL, sometimes - F2, F10, less often - other keys, but the desired one always appears in one of the lines of text displayed on the screen immediately after turning on the PC. In the BIOS interface, you need to find the Hyper-Threading item: in versions of the I / O system that support it, it is usually located in a prominent place. Having selected the appropriate option, you should press Enter and activate it, marking it as Enabled. If this mode is already set, then Hyper-Threading Technology is working. You can use all its advantages. After activating the technology in the settings, you should save all entries in the BIOS by selecting Save and Exit Setup. After that, the computer will restart in the mode when the processor is working with Hyper-Theading support. Similarly, Hyper-Threading is disabled. To do this, select another option in the corresponding item - Disabled and save the settings.

Having studied how to enable Hyper-Threading and deactivate this technology, let's take a closer look at its features.

CPUs with Hyper Threading Support

The first processor on which the company's concept was implemented, according to some sources, is the Intel Xeon MP, also known as the Foster MP. This chip is similar in a number of architectural components to the Pentium 4, which also later implemented the technology in question. Subsequently, the multi-threaded computing feature was implemented on Xeon server processors with the Prestonia core.

If we talk about the current prevalence of Hyper-Threading - which "pros" support it? Among the most popular chips of this type are those that belong to the Core and Xeon families. There is also information that similar algorithms are implemented in processors such as Itanium and Atom.

Having studied the basic information about Hyper-Threading, processors with its support, let's look at the most remarkable facts about the history of the technology's development.

Development history

As we noted above, Intel showed the concept in question to the public in 2001. But the first steps in the creation of technology were made in the early 90s. The engineers of the American company noticed that the resources of the PC processors are not fully utilized when performing a number of operations.

As Intel experts calculated, during a user's work on a PC, the microcircuit is not actively used for significant intervals - almost most of the time - by about 30%. The opinions of experts regarding this figure are very different - someone considers it clearly underestimated, others fully agree with the thesis of American developers.

However, most IT specialists agreed that even if not 70% of the processor's capacity is idle, but a very significant amount of them.

The main task of developers

Intel has decided to correct this state of affairs through a qualitatively new approach to ensuring the efficiency of the main PC chips. It was proposed to create a technology that would contribute to a more active use of the capabilities of processors. In 1996, Intel specialists began its practical development.

According to the concept of an American corporation, the processor, processing data from one program, could direct idle resources to work with another application (or a component of the current one, but having a different structure and requiring the use of additional resources). The corresponding algorithm also assumed effective interaction with other PC hardware components - RAM, chipset, and programs.

Intel managed to solve the problem. Initially, the technology was called Willamette. In 1999, it was introduced into the architecture of some processors, and its testing began. Soon the technology received its modern name - Hyper-Threading. It's hard to say what exactly it was - a simple rebranding or fundamental adjustments to the platform. We already know further facts concerning the appearance of the technology in public and its implementation in various models of Intel processors. Among the development names common today is Hyper-Threading Technology.

Aspects of compatibility with technology

How well is the support for Hyper-Threading technology implemented in operating systems? It can be noted that if we are talking about modern Windows versions, then there will be no problems for the user to fully utilize the benefits of Intel Hyper-Threading Technology. Of course, it is also very important that the I / O system supports the technology - we talked about this above.

Software and hardware factors

Regarding older versions of the OS - Windows 98, NT and the relatively outdated XP, a necessary condition for compatibility with Hyper-Threading is ACPI support. If it is not implemented in the OS, then not all computational flows that are formed by the corresponding modules will be recognized by the computer. Note that Windows XP as a whole ensures the use of the advantages of the technology in question. It is also highly desirable that multithreading algorithms be implemented in applications used by the PC owner.

Sometimes you may need a PC - if you install processors with Hyper-Threading support on it instead of those that were originally on it and were not compatible with the technology. However, as in the case of operating systems, there will be no particular problems if the user has a modern PC or at least the corresponding hardware components to the first Hyper Threading processors, as we noted above, implemented in the Core line, and adapted to it chipsets on motherboards fully support the corresponding functions of the chip.

Acceleration Criteria

If the computer at the level of hardware and software components is not compatible with Hyper-Threading, then this technology, in theory, can even slow down its work. This state of affairs has led some IT professionals to doubt the prospects of the solution from Intel. They decided that it was not a technological leap, but a marketing move that underlies the concept of Hyper Threading, which, due to its architecture, is not capable of significantly speeding up the PC. But the critics' doubts were quickly dispelled by Intel engineers.

So, the basic conditions for the technology to be successfully used:

Support for Hyper-Threading by the I / O system;

Compatibility of the motherboard with the processor of the corresponding type;

The support of a technology by the operating system and the specific application running on it.

If there shouldn't be any particular problems on the first two points, then in terms of compatibility of programs with Hyper-Threading, there may still be some overlays. But it can be noted that if an application supports, for example, work with dual-core processors, then it will be compatible, almost guaranteed, with technology from Intel.

At least there are studies confirming the increase in performance of programs adapted to dual-core microcircuits by about 15-18% if Intel Hyper Threading modules work in the processor. We already know how to disable them (in case the user has doubts about the advisability of using the technology). But there are probably very few tangible reasons for their appearance.

Practical Usefulness of Hyper-Threading

Has the technology in question made a tangible impact on Intel? There are different opinions on this matter. But many people note that Hyper-Threading technology has become so popular that this solution has become indispensable for many manufacturers of server systems, and it was also positively received by ordinary PC users.

Hardware data processing

The main advantage of the technology is that it is implemented in a hardware format. That is, the main part of the calculations will be performed inside the processor on special modules, and not in the form of software algorithms transmitted to the level of the main core of the microcircuit - which would imply a decrease in the overall performance of the PC. In general, according to IT experts, Intel engineers managed to solve the problem that they identified at the beginning of the development of the technology - to make the processor function more efficiently. Indeed, as tests have shown, when solving many tasks that are practically significant for the user, the use of Hyper-Threading has made it possible to significantly speed up the work.

It can be noted that among 4 those microcircuits that were equipped with support modules for the technology under consideration worked much more efficiently than the first modifications. This was largely expressed in the ability of the PC to function in real multitasking mode - when several different types of Windows applications, and it is highly undesirable that, due to the increased consumption of system resources by one of them, the speed of the others would decrease.

Simultaneous solution of different tasks

Thus, processors with support for Hyper-Threading are better adapted than microcircuits that are not compatible with it, to simultaneously launch, for example, a browser, play music, and work with documents. Of course, all these advantages are felt by the user in practice only if the software and hardware components of the PC are sufficiently compatible with this mode of operation.

Similar developments

Hyper-Threading is not the only technology designed to improve PC performance through multi-threaded computing. She has analogues.

For example, the POWER5 processors released by IBM also support multithreading. That is, each of (in total, 2 corresponding elements are installed on it) can perform tasks within 2 threads. Thus, the microcircuit processes 4 streams of calculations simultaneously.

AMD also has some great work in the area of ​​multithreading concepts. So, it is known that the Bulldozer architecture uses algorithms similar to Hyper-Threading. A feature of AMD's solution is that each of the threads processes separate processor blocks. When the second level remains common. Similar concepts are implemented in the Bobcat architecture developed by AMD, which is adapted for laptops and small PCs.

Of course, direct analogues of the concept from AMD, IBM and Intel can be considered very conditionally. As well as approaches to designing the architecture of processors in general. But the principles implemented in the respective technologies can be considered quite similar, and the goals set by the developers in terms of improving the efficiency of the functioning of microcircuits are very close in essence, if not identical.

These are the key facts regarding the most interesting technology from Intel. What it is, how to enable Hyper-Threading or, conversely, deactivate it, we have determined. The point is probably in the practical use of its advantages, which can be used by making sure that the PC in hardware and software components supports the technology.

Hyper Threading (hyper threading, 'hyper threading', hyper threading - rus.) - a technology developed by the company Intel, which allows the processor core to execute more data streams than one (usually two). Since it was found that a conventional processor in most tasks uses no more than 70% of all computing power, it was decided to use a technology that allows, when certain computing units are idle, to load them with work with another thread. This allows you to increase the performance of the kernel from 10 to 80% depending on the task.

View how Hyper-Threading works .

Suppose the processor performs simple calculations and at the same time a block of instructions is idle and SIMD extensions.

The addressing module detects this and sends data there for further calculation. If the data is specific, then these blocks will execute them more slowly, but the data will not be idle. Or they will pre-process them, for further fast processing by the appropriate block. This gives an additional performance gain.

Naturally, the virtual thread does not reach the full-fledged core, but this allows you to achieve almost 100% efficiency of computing power, loading almost the entire processor with work, not letting it idle. With all this, to implement HT technology only approx. 5% additional die space, and performance can sometimes be added to 50% . This additional area includes additional blocks of registers and branch prediction, which stream-calculate where computing power can be used at a given moment and send data from an additional address block there.

For the first time, the technology appeared on processors Pentium 4, but there was no big increase in performance, since the processor itself did not have high computing power. The growth was at best 15-20% , and in many tasks the processor worked much slower than without HT.

Slowdown processor due to technology Hyper Threading, happens if:

  • Not enough cache for all given and it reboots cyclically, slowing down the processor.
  • The data cannot be processed correctly by the branch predictor. Occurs mainly due to lack of optimization for specific software or support from the operating system.
  • It may also occur due to data dependencies, when, for example, the first thread requires immediate data from the second, but they are not yet ready, or are waiting in line for another thread. Or cyclic data needs certain blocks for fast processing, and they are loaded with other data. There can be many variations of data dependence.
  • If the core is already heavily loaded, and the “not smart enough” branch prediction module still sends data that slows down the processor (relevant for Pentium 4).

After Pentium 4, Intel started using the technology only from Core i7 first generation, skipping the series 2 .

The processing power of the processors has become sufficient for the full implementation of hyperthreading without much harm, even for non-optimized applications. Later, Hyper Threading appeared on processors of the middle class and even budget and portable. Used on all series core i (i3; i5; i7) and on mobile processors atom(not at all). Interestingly, dual-core processors with HT, receive a greater performance gain than quad-core from using Hyper Threading, standing on 75% full-fledged four nuclear.

Where is HyperThreading useful?

It will be useful for use in conjunction with professional, graphic, analytical, mathematical and scientific programs, video and audio editors, archivers ( Photoshop, Corel Draw, Maya, 3D's Max, WinRar, Sony Vegas &etc). All programs that use a lot of calculations, HT will definitely be useful. Thankfully, in 90% cases, such programs are well optimized for its use.

hyperthreading indispensable for server systems. Actually for this niche it was partially developed. Thanks to HT, you can significantly increase the return on the processor when there are a large number of tasks. Each thread will be unloaded by half, which has a beneficial effect on data addressing and branch prediction.

Many computer games , are negatively related to the presence Hyper Threading, which reduces the number of frames per second. This is due to the lack of optimization for Hyper Threading from the side of the game. One optimization on the part of the operating system is not always enough, especially when working with unusual, heterogeneous and complex data.

On motherboards that support HT, you can always disable hyperthreading technology.

January 20, 2015 at 07:43 pm

More about Hyper-Threading

  • IT systems testing,
  • Programming

There was a time when it was necessary to evaluate memory performance in the context of Hyper-threading technology. We came to the conclusion that its influence is not always positive. When the quantum of free time appeared, there was a desire to continue research and consider the ongoing processes with an accuracy of machine cycles and bits, using software own development.

Researched Platform

The object of experiments - ASUS laptop N750JK with Intel Core i7-4700HQ processor. The clock speed is 2.4GHz, boosted by Intel Turbo Boost mode up to 3.4GHz. 16 gigabytes installed random access memory DDR3-1600 (PC3-12800) operating in dual-channel mode. Operating system - Microsoft Windows 8.1 64 bits.

Fig.1 Configuration of the studied platform.

The processor of the platform under study contains 4 cores, which, when Hyper-Threading technology is enabled, provides hardware support for 8 threads or logical processors. The platform firmware passes this information to the operating system via the MADT (Multiple APIC Description Table) ACPI table. Since the platform contains only one RAM controller, there is no SRAT (System Resource Affinity Table) that declares the proximity of processor cores to memory controllers. Obviously, the laptop in question is not a NUMA platform, but operating system, for the purposes of unification, considers it as a NUMA system with one domain, as indicated by the line NUMA Nodes = 1. The fact that is fundamental for our experiments is that the first-level data cache has a size of 32 kilobytes for each of the four cores. Two logical processors sharing the same core share the L1 and L2 caches.

Investigated operation

We will investigate the dependence of the data block reading speed on its size. To do this, we will choose the most productive method, namely, reading 256-bit operands using the VMOVAPD AVX instruction. On the charts, the X-axis shows the block size, and the Y-axis shows the reading speed. In the vicinity of point X, corresponding to the size of the L1 cache, we expect to see an inflection point, since performance should drop after the block being processed goes out of the cache. In our test, in the case of multithreading, each of the 16 initiated threads works with a separate address range. To control Hyper-Threading technology within an application, each thread uses the SetThreadAffinityMask API function, which sets a mask in which each logical processor corresponds to one bit. A single value of the bit allows the use of the specified processor by the specified thread, a zero value prohibits it. For 8 logical processors of the studied platform, mask 11111111b allows using all processors (Hyper-Threading enabled), mask 01010101b allows using one logical processor in each core (Hyper-Threading disabled).

The following abbreviations are used on the graphs:

MBPS (Megabytes per Second)block read speed in megabytes per second;

CPI (Clocks per Instruction)number of cycles per instruction;

TSC (Time Stamp Counter)processor cycle counter.

Note: The clock speed of the TSC register may not match the clock speed of the processor when running in Turbo Boost mode. This must be taken into account when interpreting the results.

On the right side of the graphs, a hexadecimal dump of the instructions that make up the body of the cycle of the target operation performed in each of the program threads, or the first 128 bytes of this code, is visualized.

Experience number 1. One thread



Fig.2 Reading in one thread

The maximum speed is 213563 megabytes per second. The inflection point occurs at a block size of about 32 kilobytes.

Experience number 2. 16 threads on 4 processors, Hyper-Threading disabled



Fig.3 Reading in sixteen threads. The number of logical processors used is four

Hyper-threading is disabled. The maximum speed is 797598 megabytes per second. The inflection point occurs at a block size of about 32 kilobytes. As expected, compared to reading with a single thread, the speed increased by about 4 times, in terms of the number of working cores.

Experience number 3. 16 threads on 8 processors, Hyper-Threading enabled



Fig.4 Reading in sixteen threads. The number of logical processors used is eight

Hyper-threading enabled. The maximum speed of 800722 megabytes per second, as a result of the inclusion of Hyper-Threading, almost did not increase. The big minus is that the inflection point occurs at a block size of about 16 kilobytes. Enabling Hyper-Threading slightly increased the maximum speed, but now the speed drop occurs at half the block size - about 16 kilobytes, so the average speed has dropped significantly. This is not surprising, each core has its own L1 cache, while the logical processors in the same core share it.

conclusions

The investigated operation scales quite well on a multi-core processor. The reasons are that each of the cores contains its own cache memory of the first and second levels, the size of the target block is comparable to the size of the cache memory, and each of the threads works with its own range of addresses. For academic purposes, we created such conditions in a synthetic test, realizing that real applications are usually far from ideal optimization. But the inclusion of Hyper-Threading, even under these conditions, had a negative effect, with a slight increase in peak speed, there is a significant loss in the processing speed of blocks, the size of which is in the range from 16 to 32 kilobytes.

We wrote that the use of single-processor Xeon systems makes no sense, since at a higher price their performance will be the same as that of a Pentium 4 of the same frequency. Now, after a closer examination, this statement will probably have to be slightly amended. The Hyper-Threading technology implemented in the Intel Xeon with the Prestonia core really works and gives quite a tangible effect. Although there are also many questions when using it ...

Give performance

"Faster, even faster ...". The race for performance has been going on for years, and sometimes it's even hard to say which of the computer components is accelerating faster. For this, more and more new ways are being invented, and the further, the more skilled labor and high-quality brains are invested in this avalanche-like process.

A constant increase in performance is, of course, needed. At the very least, it's a profitable business, and there's always a nice way to encourage users to upgrade yesterday's "super-performing CPU" to tomorrow's "even more super…". For example, simultaneous speech recognition and simultaneous translation into another language - isn't this the dream of everyone? Or unusually realistic games of almost "cine" quality (completely absorbing attention and sometimes leading to serious changes in the psyche) - isn't this the desire of many gamers, young and old?

But let's leave out the marketing aspects in this case, focusing on the technical ones. Moreover, not everything is so gloomy: there are urgent tasks (server applications, scientific calculations, modeling, etc.), where everything is more high performance, in particular central processing units, is really needed.

So, what are the ways to increase their performance?

Clock boost. Can be further "thinned" technological process and increase the frequency. But, as you know, this is not easy and is fraught with all sorts of side effects, such as problems with heat dissipation.

Increasing processor resources- for example, increasing the volume of the cache, adding new blocks (Execution Units). All this entails an increase in the number of transistors, the complexity of the processor, an increase in the area of ​​\u200b\u200bthe crystal, and, consequently, in cost.

In addition, the previous two methods give, as a rule, by no means a linear increase in productivity. This is well known in the Pentium 4 example: errors in branch prediction and interrupts cause a reset of a long pipeline, which greatly affects the overall performance.

multiprocessing. Installing multiple CPUs and distributing work between them is often quite efficient. But this approach is not very cheap - each additional processor increases the cost of the system, and a dual motherboard is much more expensive than a regular one (not to mention motherboards supporting four or more CPUs). In addition, not all applications benefit from multiprocessing enough to justify the cost.

In addition to "pure" multiprocessing, there are several "intermediate" options that allow you to speed up application execution:

Chip Multiprocessing (CMP)- two processor cores are physically located on the same chip, using a common or separate cache. Naturally, the size of the crystal turns out to be quite large, and this cannot but affect the cost. Note that several of these "dual" CPUs can also work in a multiprocessor system.

Time Slice Multithreading. The processor switches between program threads at fixed intervals. The overhead can sometimes be quite hefty, especially if a process is waiting.

Switch-on-Event Multithreading. Task switching when long pauses occur, such as "cache misses" (cache misses), a large number of which are typical for server applications. In this case, a process that is waiting to load data from relatively slow memory into the cache is suspended, freeing up CPU resources for other processes. However, Switch-on-Event Multithreading, like Time-Slice Multithreading, does not always achieve optimal use of processor resources, in particular due to branch prediction errors, instruction dependencies, etc.

Simultaneous Multithreading. In this case, program threads execute on the same processor "simultaneously", i.e. without switching between them. CPU resources are distributed dynamically, according to the principle "if you don't use it, give it to someone else." It is this approach that underlies Intel technologies Hyper-Threading, to which we now turn.

How Hyper-Threading Works

As you know, the current "computing paradigm" involves multithreaded computing. This applies not only to servers, where such a concept exists initially, but also to workstations and desktop systems. Threads can belong to the same or different applications, but almost always there are more than one active thread (to see this, it is enough to open the Task Manager in Windows 2000/XP and turn on the display of the number of threads). At the same time, a conventional processor can only execute one of the threads at a time and is forced to constantly switch between them.

For the first time, Hyper-Threading technology was implemented in the Intel Xeon MP (Foster MP) processor, on which it was "running in". Recall that Xeon MP, officially presented at IDF Spring 2002, uses a Pentium 4 Willamette core, contains 256 KB L2 cache and 512 KB/1 MB L3 cache, and supports 4-processor configurations. Hyper-Threading support is also present in the Intel Xeon processor for workstations (Prestonia core, 512 KB L2 cache), which entered the market a little earlier than the Xeon MP. Our readers are already familiar with dual-processor configurations on Intel Xeon, so we will consider the possibilities of Hyper-Threading using these CPUs as an example - both theoretically and practically. Be that as it may, the "simple" Xeon is more mundane and digestible than the Xeon MP in 4-processor systems...

The principle of operation of Hyper-Threading is based on the fact that at any given time only a part of the processor's resources is used when executing program code. Unused resources can also be loaded with work - for example, one more application (or another thread of the same application) can be used for parallel execution. In one physical Intel Xeon processor, two logical processors (LP - Logical Processor) are formed, which share the computing resources of the CPU. The operating system and applications "see" exactly two CPUs and can distribute work between them, as in the case of a full-fledged two-processor system.

One of the goals of implementing Hyper-Threading is to allow only one active thread to run at the same speed as on a normal CPU. To do this, the processor has two main modes of operation: Single-Task (ST) and Multi-Task (MT). In ST mode, only one logical processor is active and uses the available resources undivided (ST0 and ST1 modes); the other LP is stopped by the HALT command. When a second program thread appears, the idle logical processor is activated (via an interrupt) and the physical CPU is put into MT mode. Stopping unused LPs with the HALT command is the responsibility of the operating system, which is ultimately responsible for the same fast execution of one thread as in the case without Hyper-Threading.

For each of the two LPs, the so-called Architecture State (AS) is stored, which includes the state of registers of various types - general purpose, control, APIC and service. Each LP has its own APIC (interrupt controller) and a set of registers, for correct work with which the concept of Register Alias ​​Table (RAT) is introduced, which tracks the correspondence between eight IA-32 general-purpose registers and 128 physical CPU registers (one RAT for each LP ).

When running two threads, two corresponding sets of Next Instruction Pointers are supported. Most of the instructions are taken from the Trace Cache (TC), where they are stored in decoded form, and the two active LPs access the TC in turn, every clock. At the same time, when only one LP is active, it gains exclusive access to the TC without interleaving on clocks. Similarly, access to Microcode ROM occurs. ITLB blocks (Instruction Translation Look-aside Buffer), which are activated in the absence of the necessary instructions in the instruction cache, are duplicated and deliver instructions each for their own thread. The IA-32 Instruction Decode instruction decoding block is shared and, in the case when instruction decoding is required for both streams, it serves them one by one (again, every cycle). The Uop Queue and Allocator blocks are split in two, with half of the elements allocated for each LP. Schedulers of 5 pieces process queues of decoded commands (Uops) despite belonging to LP0 / LP1 and direct commands to be executed by the necessary Execution Units - depending on the readiness for execution of the first and the availability of the second. Caches of all levels (L1/L2 for Xeon, as well as L3 for Xeon MP) are completely shared between two LPs, however, to ensure data integrity, entries in DTLB (Data Translation Look-aside Buffer) are supplied with descriptors in the form of logical processor IDs.

Thus, the instructions of both logical CPUs can be executed simultaneously on the resources of one physical processor, which are divided into four classes:

  • duplicated (Duplicated);
  • fully shared (Fully Shared);
  • with descriptors of elements (Entry Tagged);
  • dynamically divided (Partitioned) depending on the mode of operation of ST0/ST1 or MT.

However, most applications that get accelerated on multiprocessor systems can also be accelerated on a CPU with Hyper-Threading enabled without any modifications. But there are also problems: for example, if one process is in a waiting loop, it can take up all the resources of the physical CPU, preventing the second LP from working. Thus, performance when using Hyper-Threading can sometimes drop (up to 20%). To prevent this, Intel recommends using the PAUSE instruction instead of empty wait loops (introduced in IA-32 starting with Pentium 4). Serious work is also underway on automatic and semi-automatic optimization of code during compilation - for example, compilers of the Intel OpenMP C ++ / Fortran Compilers () series have made significant progress in this regard.

Another goal of the first implementation of Hyper-Threading, according to Intel, was to minimize the growth in the number of transistors, die area and power consumption while appreciably increasing performance. The first part of this commitment has already been fulfilled: the addition of Hyper-Threading support to the Xeon/Xeon MP increased die area and power consumption by less than 5%. What happened with the second part (performance), we have yet to check.

Practical part

For obvious reasons, we did not test 4-processor server systems on Xeon MP with Hyper-Threading enabled. First, it is quite labor intensive. And secondly, if we decide on such a feat, it is still now, less than a month after the official announcement, it is absolutely unrealistic to get this expensive equipment. Therefore, it was decided to confine ourselves to the same system with two Intel Xeon 2.2 GHz, on which the first tests of these processors were carried out (see the link at the beginning of the article). The system was based on motherboard Supermicro P4DC6+ ( Intel chipset i860), contained 512 MB of RDRAM, a video card on a GeForce3 chip (64 MB DDR, Detonator 21.85 drivers), HDD western digital WD300BB and 6X DVD-ROM; Windows 2000 Professional SP2 was used as the OS.

First, a few general impressions. When installing one Xeon with Prestonia kernel at startup system BIOS displays a message about the presence of two CPUs; if two processors are installed, the user sees a message about four CPUs. The operating system normally recognizes "both processors", but only if two conditions are met.

First, in CMOS Setup, the latter BIOS versions Supermicro P4DCxx boards introduced the Enable Hyper-Threading item, without which the OS will only recognize the physical processor(s). Second, ACPI is used to tell the OS that there are additional logical processors. Therefore, to enable Hyper-Threading, the ACPI option must be enabled in the CMOS Setup, and the HAL (Hardware Abstraction Layer) with ACPI support must also be installed for the OS itself. Luckily, in Windows 2000 changing HAL from Standard PC (or MPS Uni-/Multiprocessor PC) to ACPI Uni-/Multiprocessor PC is easy - by changing the "computer driver" in the device manager. At the same time, for Windows XP, the only legal way to migrate to the ACPI HAL is to reinstall the system on top of the existing installation.

But now all the preparations are made, and our Windows 2000 Pro already firmly believes that it works on a two-processor system (although in fact there is only one processor installed). Now, as usual, it's time to decide on the goals of testing. So we want:

  • Assess the impact of Hyper-Threading on the performance of applications of various classes.
  • Compare this effect with the effect of installing a second processor.
  • Check how "fairly" resources are given to the active logical processor when the second LP is idle.

To evaluate the performance, we took a set of applications already familiar to readers, which was used in testing workstation systems. Let's start, perhaps, from the end and check the "equality" of logical CPUs. Everything is extremely simple: we first run tests on one processor with Hyper-Threading disabled, and then repeat the process with Hyper-Threading enabled and using only one of the two logical CPUs (using Task Manager). Since in this case we are only interested in relative values, the results of all tests are reduced to the "bigger is better" form and normalized (indicators of a single-processor system without Hyper-Threading are taken as a unit).

Well, as you can see, Intel's promises are fulfilled here: with only one active thread, the performance of each of the two LPs is exactly equal to the performance of a physical CPU without Hyper-Threading. The idle LP (both LP0 and LP1) is actually suspended, and the shared resources, as far as can be judged from the results obtained, are completely transferred to the use of the active LP.

Therefore, we draw the first conclusion: two logical processors are actually equal in rights, and enabling Hyper-Threading "does not interfere" with the operation of one thread (which is not bad in itself). Now let's see if this inclusion "helps", and if so, where and how?

rendering. The results of four tests in 3D-modeling packages 3D Studio MAX 4.26, Lightwave 7b and A|W Maya 4.0.1 are combined into one diagram due to their similarity.

In all four cases (for Lightwave - two different scenes), the CPU load in the presence of one processor with disabled Hyper-Threading is almost constantly kept at 100%. However, when Hyper-Threading is enabled, the calculation of scenes is accelerated (as a result, we even had a joke about CPU usage over 100%). In three tests, we can see a performance increase from Hyper-Threading of 14--18% - on the one hand, not much compared to the second CPU, but on the other hand, it's quite good, considering the "free" effect of this effect. In one of the two tests with Lightwave, the performance gain is almost zero (apparently, the specificity of this application full of oddities affects). But there is no negative result anywhere, and a noticeable increase in the other three cases is encouraging. And this despite the fact that parallel rendering processes do similar work and, for sure, can not use the resources of the physical CPU at the same time in the best way.

Photoshop and MP3 encoding. The GOGO-no-coda 2.39c codec is one of the few that supports SMP, and it shows a 34% performance increase from dual processor. At the same time, the effect of Hyper-Threading in this case is zero (we do not consider a difference of 3% to be significant). But in the test with Photoshop 6.0.1 (a script consisting of a large set of commands and filters), you can see a slowdown when Hyper-Threading is enabled, although the second physical CPU adds 12% performance in this case. Here, in fact, is the first case when Hyper-Threading causes a drop in performance ...

Professional OpenGL. It has long been known that SPEC ViewPerf and many other OpenGL applications often slow down on SMP systems.

OpenGL and dual processor: why they are not friends

Many times in the articles we drew the readers' attention to the fact that dual-processor platforms rarely show any significant advantage over single-processor ones when performing professional OpenGL tests. And what's more, there are cases when installing a second processor, on the contrary, degrades the system's performance when rendering dynamic 3D scenes.

Naturally, not only we noticed this oddity. Some testers simply silently sidestepped this fact - for example, by providing SPEC ViewPerf comparison results only for two-processor configurations, thus avoiding explanations "why is a two-processor system slower?". Others made all sorts of fanciful assumptions about cache coherence, the need to maintain it, the resulting overhead, and so on. And for some reason, no one was surprised that, for example, processors were impatient to monitor coherence precisely in windowed OpenGL rendering (in its "computational" essence, it is not much different from any other computational task).

In fact, the explanation, in our opinion, is much simpler. As you know, an application can run faster on two processors than on one if:

  • there are more than two or more simultaneously executing program threads (threads);
  • these threads do not interfere with each other's execution - for example, they do not compete for a shared resource such as an external drive or network interface.

Now, let's take a simplified look at what OpenGL rendering looks like, performed by two threads. If an application, "seeing" two processors, creates two OpenGL-rendering threads, then for each of them, according to the rules of OpenGL, its own gl-context is created. Accordingly, each thread renders to its own gl context. But the problem is that for the window in which the image is displayed, only one gl-context can be current at any time. Accordingly, the threads in this case simply "in turn" output the generated image to the window, making their context alternately current. Needless to say, such "context interleaving" can be very costly in terms of overhead?

Also, as an example, we will give graphs of the use of two CPUs in several applications that display OpenGL scenes. All measurements were carried out on a platform with the following configuration:

  • one or two Intel Xeon 2.2 GHz (Hyper-Threading disabled);
  • 512 MB RDRAM;
  • Supermicro P4DC6+ motherboard;
  • ASUS V8200 Deluxe video card (NVidia GeForce3, 64 MB DDR SDRAM, Detonator 21.85 drivers);
  • Windows 2000 Professional SP2;
  • video mode 1280x1024x32 bpp, 85 Hz, Vsync disabled.

Blue and red are CPU 0 and CPU 1 load graphs, respectively. The line in the middle is the final CPU Usage graph. The three graphs correspond to two scenes from 3D Studio MAX 4.26 and part of the SPEC ViewPerf test (AWadvs-04).


CPU Usage: Animation 3D Studio MAX 4.26 - Anibal (with manipulators).max


CPU Usage: Animation 3D Studio MAX 4.26 - Rabbit.max


CPU Usage: SPEC ViewPerf 6.1.2 - AWadvs-04

The same pattern is repeated in a host of other applications that use OpenGL. Two processors do not bother with work at all, and the total CPU Usage is at the level of 50-60%. At the same time, for a single-processor system, in all these cases, CPU Usage is confidently kept at the level of 100%.

So it's not surprising that a lot of OpenGL applications don't get much faster on dual systems. Well, the fact that they sometimes even slow down has, in our opinion, a completely logical explanation.

We can state that with two logical CPUs, the performance drop is even more significant, which is quite understandable: two logical processors interfere with each other in the same way as two physical ones. But their overall performance, of course, turns out to be lower in this case, so when Hyper-Threading is enabled, it decreases even more than just when two physical CPUs are running. The result is predictable and the conclusion is simple: Hyper-Threading, like "real" SMP, is sometimes contraindicated for OpenGL.

CAD applications. The previous conclusion is confirmed by the results of two CAD tests - SPECapc for SolidEdge V10 and SPECapc for SolidWorks. The graphics performance of these tests for Hyper-Threading is similar (although the result is slightly higher in the case of an SMP system for SolidEdge V10). But the results of the CPU_Score tests that load the processor make you think: 5-10% increase from SMP and 14-19% slowdown from Hyper-Threading.

But at the end of the day, Intel honestly acknowledges the potential for performance degradation with Hyper-Threading in some cases - for example, when using empty wait loops. We can only assume that this is the reason (a detailed examination of the SolidEdge and SolidWorks code is beyond the scope of this article). After all, everyone knows the conservatism of CAD application developers who prefer proven reliability and are not in a hurry to rewrite the code taking into account new trends in programming.

Summing up, or "Attention, the right question"

Hyper-Threading works, there is no doubt about it. Of course, the technology is not universal: there are applications for which Hyper-Threading "becomes bad", and in the case of the spread of this technology, it would be desirable to modify them. But didn't the same thing happen to MMX and SSE and continues to happen to SSE2?..

However, this raises the question of the applicability of this technology to our realities. We will immediately discard the option of a single-processor system on Xeon with Hyper-Threading (or consider it only as a temporary one, in anticipation of buying a second processor): even a 30% performance increase does not justify the price in any way - then it is better to buy a regular Pentium 4. The number of CPUs left is from two or more.

Now let's imagine that we are buying a two-processor Xeon system (say, with Windows 2000/XP Professional). Two CPUs are installed, Hyper-Threading is enabled, the BIOS finds as many as four logical processors, now let's take off ... Stop. But how many processors will our operating system see? That's right, two. Only two, because it is simply not designed for a larger number. These will be two physical processors, i.e. everything will work exactly the same as with disabled Hyper-Threading - not slower (two "additional" logical CPUs will simply stop), but not faster (verified by additional tests, the results are not cited for their obviousness). Hmmm, not much fun...

What remains? Well, do not install Advanced Server or .NET Server on our workstation, really? No, the system will install, recognize all four logical processors and will function. That's just the server OS looks at the workstation, to put it mildly, a little strange (not to mention the financial aspects). The only reasonable case is when our two-processor Xeon system will act as a server (at least some builders have no hesitation in releasing servers on workstation Xeon processors). But for dual workstations with the corresponding operating systems, the applicability of Hyper-Threading remains in question. Intel is now actively advocating OS licensing based on the number of not logical, but physical CPUs. Discussions are still ongoing, and, in general, a lot depends on whether we see a workstation OS with support for four processors.

Well, with servers, everything comes out quite simply. For example, a Windows 2000 Advanced Server installed on a two-socket Xeon system with Hyper-Threading enabled will see four logical processors and run smoothly on it. To evaluate what Hyper-Threading brings to server systems, we present the results of Intel Microprocessor Software Labs for two-processor systems on Xeon MP and several Microsoft server applications.

A performance boost of 20-30% for a two-processor server "for free" is more than tempting (especially compared to buying a "real" 4-processor system).

So it turns out that at the moment the practical applicability of Hyper-Threading is possible only in servers. The issue with workstations depends on the solution with OS licensing. Although one more application of Hyper-Threading is quite real - if desktop processors get support for this technology. For example (let's imagine) what is wrong with a Pentium 4 system with Hyper-Threading support, on which Windows 2000/XP Professional with SMP support is installed? .. However, there is nothing incredible in this: enthusiastic Intel developers promise the widespread introduction of Hyper-Threading - from servers to desktop and mobile systems.