• What are the different set ups and combinations for HFT architecture?
I have experience of four different architectures:
- traditional monolithic event queue and broadcast
- reflective memory, distributed processing
- dma, shared memory, multi-process and multicast
- trading engine on a card
I'd be interested in other approaches
• Is massive multicore or specialist silicon (FPGA, GPU etc.) the next frontier?
Multicore is attractive for a multi strategy play, but it requires careful design to avoid data races and performance issues like tlb cache misses and memory barriers. FPGA has always been attractive for dealing with FIX and conversion of ascii to binary (ie parsing). GPU has promise in the equities world where dynamic pricing and portfolio analysis are required. What's been widely overlooked is DSP. There are some very interesting things you can do with DSP.
• Which solutions provide maximum scalability, configuration, and customisation to ensure continual upgrade and development of your
systems to survive in tough technology race?
It has to be a message orientated multicast pure layer 2 architecture with software routing.
• Managing microbursts: the art and engineering of data capacity management
Again, a high performance messaging system combined with high resolution time and accurate traffic analysis are a must. Combine this with knowledge of the underlying network hardware in order to utilise multiple hardware queues on the switches, judicious use of QoS and correctly configuring the messaging system to utilise multiple channels yields good results.
• Taming the data torrent: conflating direct and aggregate market data feeds
Here's where an FPGA enabled network card helps e.g. by coalescing multiple keystations or A and B feeds, hashing messages and dropping duplicates and translating ascii to binary help greatly. Combine this with multicast for efficient data transport.
• Is asynchronous processing inevitable? What are the implications?
This has been used in HFT for many years and are necessary for effective parallelisation. One pattern I use is an "asynchronous n-slot put-take connector" which is a way of joining different processes in a way that allows each process to utilise its full timeslice. The implications of not using it are latency...
• What is next for complex event processing and tick databases? Will they be able to keep up?
CEP is a necessary evil, however, any strategy that uses it is almost impossible to debug.
With regards to tick databases I fail to see why people store stuff in databases at all - all our
market data is captured on an electrically connected secondary machine using HDF5 in date order directories. it's then shipped up to a disk array overnight. Backtest generally uses only three months worth of data.
• Surviving the technical glitch at high speed: designing robust architectures
The ability to run multiple strategies and services on a multicast message bus means recovery
from failure is straightforward.