Infrastructure Management: Reducing costs, Improving performance, Professor Roger Woods, Queens University, Belfast
Prof Woods gave an impassioned talk about a tool that he has developed which takes c++, allows you to navigate the code and identify subsystems which you can target to run on hardware or emulation of hardware.
- He worked on the JP Morgan collaboration with Maxellor and was bullish about the technology.
- Two years from pilot to production.
- Developed at tool that allows identification of sections that are suitable for FPGA
- Key issue: programming FPGA bitstreams (http://en.wikipedia.org/wiki/Bitstream) - took six months
- C++ is translated into C (manually) before being cross compiled into Java which is what the Maxellor compiler requires.
- This is to remove c++ abstraction which "kills parellisation" (see slides)
- Focus was hardware FFT - all other logic in software - comms via FPGA bitstream
In summary:
- ideal for risk calculation and monte carlo where algorithm does not change.
- C++ legacy code does not parallelise easily and is not a candidate for FPGA
- Three year dev cycle.
- Complex, manual process
- JPM own 20% of Maxellor
Resources
This continued to a panel hosted by Chris Skinner
Panel: The Parallel Paradigm Shift: are we about to enter a new chapter in the algorithmic arms race
Moderator: Chris Skinner, Panel: Prof Woods, Steven Weston, Global Head of Analytics, JPM. Andre Nedceloux, Sales guy, Excelian
- FPGA Plant needs to be kept hot to achieve best latency. To keep FPGA busy you need a few regular host cores loading work onto them.
- Programming/debugging directly in VHDL is ‘worse than a nightmare’, don’t try.
- Isolate the worst performing pieces, (Amdahl’s law) de-abstract and place on FPGA, they call each of the isolated units a ‘kernel’ .
- Compile times are high for Maxeler compiler to output VHDL, 4 hours for a model on a 4 core box.
- Iterative model for optimisation and implementation. They improved both the mathematics in the models and the implementation onto FPGA – ie, consider it not just a programming problem, but also a maths modeling one.
- They use python to manage the interaction with the models (e.g pulling reports)
- Initially run a model on the FPGA hosts and then incrementally update it through the day - when market data or announcements occur.
- No separate report running phase – it is included in the model run and report is kept in memory. Data only written out to a database at night time, if it is destroyed then it can be re-created.
- Low-latency is no longer a competitive advantage but now a status quo service for investment banking.
- Requires specialist non-general/outsourced programmers required who can understand hardware and algorithms who work alongside the business.
Panel
How low can you go? Ultra-low-latency trading
Moderator: David Berry
Members: Jogi Narain, CTO, FGS Capital LLP, Benjamin Stopford, Architect, RBS. Chris Donan, Head of Electronic Trading - Barcap.
This was a well run panel with some good insights from Chris Donan in particular:
- Stock programmers don't understand the stack from network to nic to stack to application and the underlying hardware operations
- Small teams of experienced engineers produce the best results
- Don't develop VHDL skills in house - use external resources.
- Latency gains correlate to profitability
- FPGA is good for market data (ie fixed problem) and risk
- Software parallelism is the future.