William Cohen has been a developer of performance tools at Red Hat for over a decade and has worked on a number of the performance tools in Red Hat Enterprise Linux and Fedora such as OProfile, PAPI, SystemTap, and Dyninst.
Many developers would like to run their existing applications in a container with restricted capabilities to improve security. However, it may not be clear which capabilities the application uses because the code uses libraries or other code developed elsewhere. The developer could run the application in an unrestricted container that allows all syscalls and capabilities to be used to avoid possible hard to diagnose failures caused by the application's use of forbidden capabilities or syscalls. Of course, this eliminates the...
Maybe you have so much memory in your computer that you never have to worry about it --- then again, maybe you find that some C or C++ application is using more memory than expected. This could be preventing you from running as many containers on a single system as you expected, it could be causing performance bottlenecks, and it could even be forcing you to pay for more memory in your servers. You do some quick "back of the...
No one wants the hardware in their computer sitting idle - we all want to get as much useful work out of our hardware as possible. Mechanisms such as cache and branch prediction have been incorporated into processors to minimize the amount of processor idle time caused by memory accesses and changes in program flow; however, these mechanism are not perfect. There are still times that the processor could be idle waiting for data or computational results to become available...
The classic 1984 movie Ghostbusters offered an important safety tip for all of us: " Don't cross the streams." - "Why not?" - "I t would be bad." - " I’m fuzzy on the whole good/bad thing. What do you mean, 'bad'?" - "Try to imagine all life as you know it stopping instantaneously and every molecule in your body exploding at the speed of light." - "Right. That’s bad. Okay. All right. Important safety tip. Thanks..." Similarly, in computing...
In the traditional processor pipeline model under ideal circumstances one new instruction enters the processor's and one instruction completes execution each cycle. Thus, for the best case the processor can have an average execution rate of one clock per instruction. A superscalar processor allows multiple unrelated instructions to start on the same clock cycle on separate hardware units or pipelines. Under ideal conditions a superscalar processors could have an average clocks per instruction (CPI) be less one, meaning your 2GHz...
A pipelined processor requires a steady stream of instructions to be fed into the pipeline. Any delay in feeding instructions into the pipeline will hurt performance. For a sequence of instructions without branches it is relatively easy to determine the next instruction to feed into the pipeline, particularly for processors with fixed sized instructions. Variable-sized instructions might complicate finding the start of each instruction, but it is still a contiguous, linear stream of bytes from memory. Keeping the processor pipeline...
The simple programmer's model of a processor executing machine language instructions is a loop of the following steps with each step finished before moving on the the next step: Fetch instruction Decode instruction and fetch register operands Execute arithmetic computation Possible memory access (read or write) Writeback results to register As mentioned in the introduction blog article even if the processor can get each step down to a single cycle that would would be 2.5ns (5*0.5ns) for a 2GHz (2x10^9...
The simple programmer's model of processor executing machine language instruction is a loop of the following steps each step finished before moving on the the next step: Fetch instruction Decode instruction and fetch register operands Execute arithmetic computation Possible memory access (read or write) Writeback results to register At a minimum it takes one processor clock cycle to do each step. However, for steps 1 and 4 accessing main memory may take much longer than one cycle. Modern processors typically...