Thursday, August 23, 2007

From Hot Interconnects

I could find some time to attend the second day today at Hot Interconnects at Stanford. Here are some notes. Although some papers may not be directly relate dto the discussions in this blog, I decided to include them anyways.

Paper 1:

A Memory-Balanced Linear Pipeline Architecture for Trie-based IP Lookup on FPGA, USC

The trie data structure is used for IP lookup. Efficient pipelining is possible for IP lookup. In current approaches there exists unbalanced memory distribution across stages. Their solution is to improve pipelining in trie lookups.

Paper 2:
Building a RCP (Rate Control Protocol) Test Network, Stanford
They build an experimental network to support RCP using NetFPGA implementation and modifying Linux by adding a shim layer inbetween to support RCP.

Paper 3:
ElephantTrap: A low cost device for identifying large flows, Stanford
ElephantTrap identifies and measures big flows in a network using random sampling of packets and algorithms to have select the large flows in the network.

Intel invited talk titled On-Die Interconnect and Other Challenges for Chip-Level Multi-Processing

Chips will become increasingly multicore because performance tapes of with increase in transistors in a single core. However, gain from parallelism is lost in communication overhead. Speaker advocated on-chip high speed interconnect which ties cache, memory, network I/O etc to processors. Also advocated ring based topology for interconnect. He talked briefly about Intel terascale processor with 80 cores.

Paper 4:
An Analysis of 10-Gigabit Ethernet Protocol Stacks in Multicore
Environments, Virginia Tech

They consider the interaction of protocols with multicore architectures. They consider TCP/IP and iWarp protocols and the interaction between application, TCP/IP stack and network I/O. They show that if application is scheduled on different core than where interrupt processing is (but in same socket), it leads to best performance as CPU load is distributed across cores and it leads to lesser cache misses too. Did not consider MSI/MSI-X or multithreaded applications.

Paper 5:
Assessing the Ability of Computation/Communication Overlap and Communication Progress in Modern Interconnects, Queen's University

They consider interplay between processing and I/O so as to have minimum stalling cycles. They consider various MPI based applications over different interconnects: Infiniband, Myricom and 10 Gig Ethernet

Paper 6:
Performance Analysis and Evaluation of Mellanox ConnectX InfiniBand
Architecture with Multi-Core Platforms, Ohio State Univ

They demonstrate how ConnectX (the latest generation of Infiniband technology) performs on multicore architectures.

Paper 7:
Memory Management Strategies for Data Serving with RDMA, Ohio Supercomputer Center

They demonstrate that virtual to physical memory translation is a considerable overhead in high-speed networks. They demonstrate how to optimize memory registeration to achieve better performance in RDMA

Paper 8:
Reducing the Impact of the Memory Wall for I/O Using Cache Injection, Universiy of New Mexico

Cache Injection is a technique by which data is transferred directly fron device to L2/L3 cache. They show the tradeoff between cache injection and prefetching and propose an algorithm to perform efficient cache injection for network I/O,

No comments: