Why a new architecture?
After thirteen years of service in all processors from the 386 to the Pentium 4 the IA-32 architecture is ripe for replacement. This architecture is a CISC (Complex Instruction Set Computing) system, a system under which the processor has a large number of possible instructions. Because the processor can not really perform all these different functions CISC processors often have an internal RISC (Reduced Instruction Set Computing) design, in other words, they try to reduce the number of functions as much as possible. Because the internal design of the processor is completely different from the way it is presented to the software a lot of translation has to be performed. This is not very easy and shows that IA-32 is past its prime.
Some current problems:
A processor has a number of execution units that are responsible for the raw calculations. Other parts of the chip make sure that instructions and data are being brought in and results taken out. To get the most out of a processor it is imperative to supply the execution units with data and instructions continuously. This doesn't seem difficult but it takes quite a bit of care because although the instructions are neatly arranged in memory it hardly ever happens that the program executes them in exactly that order. This is because some instructions should only be executed if certain conditions have been met and sometimes a choice has to be made between two functions dependent upon the results of the previous one.
To complicate matters further, the processor works with a pipeline. At the front of the pipeline instructions go in and after traveling through the internals of the processor the result is put in the right place. Because the pipeline is long multiple instructions can travel through at the same time. As long as everything is executed in order this poses no problems but if the program gets to a branch in the code it has to wait for the result of instruction A before the processor knows if instruction B or C should be entered into the pipeline. This is a problem because, as stated before, the execution units have to be fed and as long as nothing enters the pipeline these things are just picking their nose.
A solution for this is looking at the other instructions in the cache when such a branch-wait occurs, if there are any instructions waiting to be executed which are independent of the instructions waiting for the result and because of this will not hamper the running of the program then these can be executed in between. This method is referred to as OOO, Out Of Order execution. The advantage is that the execution units can get some work done while waiting for the result of instruction A, of course finding suitable instructions takes time too, in some cases more time than to wait. That's why something else has been thought of; gambling.
For this very effective pieces of hardware and statistical algorithms have been invented, branch predictors, these can guess the answer to the branch with 95% accuracy. In the cases that they're wrong however, it's a small disaster for the processor. Suppose the result of instruction A determines if instruction B or E should be executed. De branch predictor says B en enters A, B, C and D into the pipeline. As soon as A comes out at the other end it becomes clear that it should have been E. At this point the effects of B, C and D have to be undone, something that takes quite a few clock cycles. Only when the pipeline is empty again (flushed) and the values have been restored can E, F and G be entered into the pipeline. The occurrence of mis-predictions like this is something that unfortunately cannot be avoided and with the pipelines increasing in length it is becoming a bigger problem.
Another big disadvantage of IA-32 is that there are too few registers, small pieces of memory on the processor that are even faster than L1 cache. This means the scalability of the processor is limited; for example, it is not possible to efficiently execute more than three instructions per clockcycle on an IA-32 processor. The 32 bit limit is starting to become a problem too, if one wants to work with extremely large files or address very large amounts of memory 32 bit don't cut it anymore. While this is not a big problem at the moment, considering the growth of information technology it won't take too long before it does become a serious bottleneck.
A processor has a number of execution units that are responsible for the raw calculations. Other parts of the chip make sure that instructions and data are being brought in and results taken out. To get the most out of a processor it is imperative to supply the execution units with data and instructions continuously. This doesn't seem difficult but it takes quite a bit of care because although the instructions are neatly arranged in memory it hardly ever happens that the program executes them in exactly that order. This is because some instructions should only be executed if certain conditions have been met and sometimes a choice has to be made between two functions dependent upon the results of the previous one.
To complicate matters further, the processor works with a pipeline. At the front of the pipeline instructions go in and after traveling through the internals of the processor the result is put in the right place. Because the pipeline is long multiple instructions can travel through at the same time. As long as everything is executed in order this poses no problems but if the program gets to a branch in the code it has to wait for the result of instruction A before the processor knows if instruction B or C should be entered into the pipeline. This is a problem because, as stated before, the execution units have to be fed and as long as nothing enters the pipeline these things are just picking their nose.
A solution for this is looking at the other instructions in the cache when such a branch-wait occurs, if there are any instructions waiting to be executed which are independent of the instructions waiting for the result and because of this will not hamper the running of the program then these can be executed in between. This method is referred to as OOO, Out Of Order execution. The advantage is that the execution units can get some work done while waiting for the result of instruction A, of course finding suitable instructions takes time too, in some cases more time than to wait. That's why something else has been thought of; gambling.
For this very effective pieces of hardware and statistical algorithms have been invented, branch predictors, these can guess the answer to the branch with 95% accuracy. In the cases that they're wrong however, it's a small disaster for the processor. Suppose the result of instruction A determines if instruction B or E should be executed. De branch predictor says B en enters A, B, C and D into the pipeline. As soon as A comes out at the other end it becomes clear that it should have been E. At this point the effects of B, C and D have to be undone, something that takes quite a few clock cycles. Only when the pipeline is empty again (flushed) and the values have been restored can E, F and G be entered into the pipeline. The occurrence of mis-predictions like this is something that unfortunately cannot be avoided and with the pipelines increasing in length it is becoming a bigger problem.
![]() |
Another big disadvantage of IA-32 is that there are too few registers, small pieces of memory on the processor that are even faster than L1 cache. This means the scalability of the processor is limited; for example, it is not possible to efficiently execute more than three instructions per clockcycle on an IA-32 processor. The 32 bit limit is starting to become a problem too, if one wants to work with extremely large files or address very large amounts of memory 32 bit don't cut it anymore. While this is not a big problem at the moment, considering the growth of information technology it won't take too long before it does become a serious bottleneck.
Next page (EPIC: the solution - 3/10)

