Enhyper: 2012

Monday, May 14, 2012

PAPI - Performance API

PAPI

The Performance API (PAPI) project specifies a standard application programming interface (API) for accessing hardware performance counters available on most modern microprocessors. These counters exist as a small set of registers that count Events, occurrences of specific signals related to the processor's function. Monitoring these events facilitates correlation between the structure of source/object code and the efficiency of the mapping of that code to the underlying architecture. This correlation has a variety of uses in performance analysis including hand tuning, compiler optimization, debugging, benchmarking, monitoring and performance modeling. In addition, it is hoped that this information will prove useful in the development of new compilation technology as well as in steering architectural development towards alleviating commonly occurring bottlenecks in high performance computing.

Thursday, February 23, 2012

netmap - user space NIC ring buffer

netmap looks promising but it's about to be blown away by the ability to inject packets into L3 cache in the next iteration of Intel chips which have DCA - direct cache access

Friday, January 20, 2012

John Nolan on FPGA and GPU

A good overview of FPGA and GPU technology in this presentation http://www.infoq.com/interviews/nolan-hardware-acceleration

Monday, January 09, 2012

LEON3

The LEON3 is a synthesisable VHDL model of a 32-bit processor compliant with the SPARC V8 architecture. The model is highly configurable, and particularly suitable for system-on-a-chip (SOC) designs. The full source code is available under the GNU GPL license, allowing free and unlimited use for research and education. LEON3 is also available under a low-cost commercial license, allowing it to be used in any commercial application to a fraction of the cost of comparable IP cores. The LEON3 processor has the following features:

SPARC V8 instruction set with V8e extensions
Advanced 7-stage pipeline
Hardware multiply, divide and MAC units
High-performance, fully pipelined IEEE-754 FPU
Separate instruction and data cache (Harvard architecture) with snooping
Configurable caches: 1 - 4 ways, 1 - 256 kbytes/way. Random, LRR or LRU replacement
Local instruction and data scratch pad RAM, 1 - 512 Kbytes
SPARC Reference MMU (SRMMU) with configurable TLB
AMBA-2.0 AHB bus interface
Advanced on-chip debug support with instruction and data trace buffer
Symmetric Multi-processor support (SMP)
Power-down mode and clock gating
Robust and fully synchronous single-edge clock design
Up to 125 MHz in FPGA and 400 MHz on 0.13 um ASIC technologies
Fault-tolerant and SEU-proof version available for space applications
Extensively configurable
Large range of software tools: compilers, kernels, simulators and debug monitors
High Performance: 1.4 DMIPS/MHz, 1.8 CoreMark/MHz (gcc -4.1.2)

The LEON3 processor is distributed as part of the GRLIB IP library, allowing simple integration into complex SOC designs. GRLIB also includes a configurable LEON3 multi-processor design, with up to 4 CPU's and a large range of on-chip peripheral blocks.

Enhyper

Monday, May 14, 2012

PAPI - Performance API

Thursday, February 23, 2012

netmap - user space NIC ring buffer

Friday, January 20, 2012

John Nolan on FPGA and GPU

Monday, January 09, 2012

LEON3

Followers

Blog Archive

Contributors

Del.Icio.Us

Favourites