Enhyper

Thursday, February 23, 2012

netmap - user space NIC ring buffer

netmap looks promising but it's about to be blown away by the ability to inject packets into L3 cache in the next iteration of Intel chips which have DCA - direct cache access

Friday, January 20, 2012

John Nolan on FPGA and GPU

A good overview of FPGA and GPU technology in this presentation http://www.infoq.com/interviews/nolan-hardware-acceleration

Monday, January 09, 2012

The LEON3 is a synthesisable VHDL model of a 32-bit processor compliant with the SPARC V8 architecture. The model is highly configurable, and particularly suitable for system-on-a-chip (SOC) designs. The full source code is available under the GNU GPL license, allowing free and unlimited use for research and education. LEON3 is also available under a low-cost commercial license, allowing it to be used in any commercial application to a fraction of the cost of comparable IP cores. The LEON3 processor has the following features:

SPARC V8 instruction set with V8e extensions
Advanced 7-stage pipeline
Hardware multiply, divide and MAC units
High-performance, fully pipelined IEEE-754 FPU
Separate instruction and data cache (Harvard architecture) with snooping
Configurable caches: 1 - 4 ways, 1 - 256 kbytes/way. Random, LRR or LRU replacement
Local instruction and data scratch pad RAM, 1 - 512 Kbytes
SPARC Reference MMU (SRMMU) with configurable TLB
AMBA-2.0 AHB bus interface
Advanced on-chip debug support with instruction and data trace buffer
Symmetric Multi-processor support (SMP)
Power-down mode and clock gating
Robust and fully synchronous single-edge clock design
Up to 125 MHz in FPGA and 400 MHz on 0.13 um ASIC technologies
Fault-tolerant and SEU-proof version available for space applications
Extensively configurable
Large range of software tools: compilers, kernels, simulators and debug monitors
High Performance: 1.4 DMIPS/MHz, 1.8 CoreMark/MHz (gcc -4.1.2)

The LEON3 processor is distributed as part of the GRLIB IP library, allowing simple integration into complex SOC designs. GRLIB also includes a configurable LEON3 multi-processor design, with up to 4 CPU's and a large range of on-chip peripheral blocks.

Tuesday, November 22, 2011

Waters European Trading Architecture Summit 2011

Some feedback from this event which I attended today.

Event http://events.waterstechnology.com/etas

Infrastructure Management: Reducing costs, Improving performance, Professor Roger Woods, Queens University, Belfast

Prof Woods gave an impassioned talk about a tool that he has developed which takes c++, allows you to navigate the code and identify subsystems which you can target to run on hardware or emulation of hardware.

He worked on the JP Morgan collaboration with Maxellor and was bullish about the technology.
Two years from pilot to production.
Developed at tool that allows identification of sections that are suitable for FPGA
Key issue: programming FPGA bitstreams (http://en.wikipedia.org/wiki/Bitstream) - took six months
C++ is translated into C (manually) before being cross compiled into Java which is what the Maxellor compiler requires.
This is to remove c++ abstraction which "kills parellisation" (see slides)
Focus was hardware FFT - all other logic in software - comms via FPGA bitstream

In summary:

ideal for risk calculation and monte carlo where algorithm does not change.
C++ legacy code does not parallelise easily and is not a candidate for FPGA
Three year dev cycle.
Complex, manual process
JPM own 20% of Maxellor

Resources

http://www.ecit.qub.ac.uk/Card/?name=r.woods
eFutures Portal - http://efutures.ac.uk/

This continued to a panel hosted by Chris Skinner

Panel: The Parallel Paradigm Shift: are we about to enter a new chapter in the algorithmic arms race

Moderator: Chris Skinner, Panel: Prof Woods, Steven Weston, Global Head of Analytics, JPM. Andre Nedceloux, Sales guy, Excelian

FPGA Plant needs to be kept hot to achieve best latency. To keep FPGA busy you need a few regular host cores loading work onto them.
Programming/debugging directly in VHDL is ‘worse than a nightmare’, don’t try.
Isolate the worst performing pieces, (Amdahl’s law) de-abstract and place on FPGA, they call each of the isolated units a ‘kernel’ .
Compile times are high for Maxeler compiler to output VHDL, 4 hours for a model on a 4 core box.
Iterative model for optimisation and implementation. They improved both the mathematics in the models and the implementation onto FPGA – ie, consider it not just a programming problem, but also a maths modeling one.
They use python to manage the interaction with the models (e.g pulling reports)
Initially run a model on the FPGA hosts and then incrementally update it through the day - when market data or announcements occur.
No separate report running phase – it is included in the model run and report is kept in memory. Data only written out to a database at night time, if it is destroyed then it can be re-created.
Low-latency is no longer a competitive advantage but now a status quo service for investment banking.
Requires specialist non-general/outsourced programmers required who can understand hardware and algorithms who work alongside the business.

Panel

How low can you go? Ultra-low-latency trading

Moderator: David Berry

Members: Jogi Narain, CTO, FGS Capital LLP, Benjamin Stopford, Architect, RBS. Chris Donan, Head of Electronic Trading - Barcap.

This was a well run panel with some good insights from Chris Donan in particular:

Stock programmers don't understand the stack from network to nic to stack to application and the underlying hardware operations
Small teams of experienced engineers produce the best results
Don't develop VHDL skills in house - use external resources.
Latency gains correlate to profitability
FPGA is good for market data (ie fixed problem) and risk
Software parallelism is the future.

Friday, June 10, 2011

Thomson Reuters Expert Session

I was kindly invited to give an expert session by Thomson Reuters when I was between assignments. I gave an hour presentation which has been edited into four sessions:

On the FX Business Model http://thomsonreuters.na4.acrobat.com/p72695781/

On FX and the OTC Market http://thomsonreuters.na4.acrobat.com/p97524929/

On High Frequency Trading http://thomsonreuters.na4.acrobat.com/p67792422/

On FX Strategies http://thomsonreuters.na4.acrobat.com/p10287782/

I'm now gainfully employed so back to silent running for me.

Tuesday, March 15, 2011

Global Connectivity Vendor Selection

TeleGeography produce a topological map of submarine cables which is of interest to hft firms. There are many aspects to consider when procuring global connectivity.

Before you choose a suppler, perhaps some important questions to ask are:

What are the fibre miles versus physical miles of long haul links
How do you measure your quoted latency
What is your underlying network technology and how is it provisioned
What network equipment do you use for your core network
How many hardware queues on the above equipment do you allocate to our service
When using third parties, what is their underlying technology and distribution methodology
What hand-off equipment do you use
What can the above deliver in terms of serialisation capability
What traffic shaping do you perform
How often do you sample cdr violation
Do you report CDR violation
Do you offer QoS
Do you offer IGMP snooping

Not an exhaustive list, but a good boilerplate for vendor assessment. I'll expand on each point in due course.

Monday, March 07, 2011

Cavium Octeon II

Met with Barry, CTO of Tervela on Friday. He recommended taking a look at the Cavium Octeon II NPU card which has 32 cores, a C like interface and shiny new architecture.

Wednesday, March 02, 2011

FTQ for platform jitter analysis

FTQ is a useful tool dug up by Bruce which we've started using for jitter analysis and it's showing up some surprising results. The idea is simple - how many iterations of a variable can be performed in a fixed time.

I started by running the threaded version on our 8 core, dual cpu server for approximately 3 minutes using the following command:

t_ftq -t 8 -n 450000

Using Octave, I calculated the variance (42133) and standard deviation (2485.1.) Plotting this gave this over populated graph:

Next I thought I'd run it over seven cores and got a smoother profile. But graphs are fine and dandy but you need to look at the data and the percentiles. So as a first pass, I wrote this nifty awk script:

#!/bin/bash

FACTOR=2
CORES="`grep -c processor /proc/cpuinfo`"
THREADS=`echo "$CORES * $FACTOR" | bc`

while [ "$THREADS" -gt 1 ]
do
./t_ftq -t $THREADS

for FILE in ftq*counts.dat
do

awk 'BEGIN {

minimum = 4500000
maximum = 0
average = 0
}
{
if($1 < minimum =" $1"> maximum)
{
maximum = $1
}

average += $1

}
END {
printf("THREADS=%d min=%d:max=%d:avg=%d:var=%d\n", '"$THREADS"', minimum, maximum, average/NR, maximum-minimum)
}' $FILE

done

THREADS="`expr $THREADS - 1`"

rm -f *.dat

echo

done

exit 0

Which produced this output when run with a loading factor of 1:

THREADS=8 min=19080:max=43090:avg=41247:var=24010
THREADS=8 min=8401:max=43090:avg=41971:var=34689
THREADS=8 min=8401:max=43090:avg=42596:var=34689
THREADS=8 min=8956:max=43090:avg=42453:var=34134
THREADS=8 min=21515:max=43090:avg=42326:var=21575
THREADS=8 min=11157:max=43090:avg=42548:var=31933
THREADS=8 min=6351:max=43090:avg=42619:var=36739
THREADS=8 min=6351:max=43090:avg=42381:var=36739

THREADS=7 min=20666:max=43090:avg=42217:var=22424
THREADS=7 min=7591:max=43090:avg=42264:var=35499
THREADS=7 min=7591:max=43090:avg=42487:var=35499
THREADS=7 min=25263:max=43090:avg=42566:var=17827
THREADS=7 min=20513:max=43090:avg=42603:var=22577
THREADS=7 min=15328:max=43090:avg=42528:var=27762
THREADS=7 min=9555:max=43090:avg=41859:var=33535

THREADS=6 min=9324:max=43090:avg=40872:var=33766
THREADS=6 min=10144:max=43090:avg=41454:var=32946
THREADS=6 min=29223:max=43090:avg=42749:var=13867
THREADS=6 min=25239:max=43090:avg=42590:var=17851
THREADS=6 min=20013:max=43090:avg=42357:var=23077
THREADS=6 min=4612:max=43090:avg=42114:var=38478

THREADS=5 min=457:max=43090:avg=42351:var=42633
THREADS=5 min=457:max=43090:avg=41645:var=42633
THREADS=5 min=15064:max=43090:avg=41190:var=28026
THREADS=5 min=16821:max=43090:avg=41614:var=26269
THREADS=5 min=15204:max=43090:avg=41272:var=27886

THREADS=4 min=21561:max=43090:avg=42436:var=21529
THREADS=4 min=23847:max=43090:avg=42158:var=19243
THREADS=4 min=5588:max=43090:avg=41406:var=37502
THREADS=4 min=5588:max=43090:avg=41282:var=37502

THREADS=3 min=26739:max=43090:avg=42303:var=16351
THREADS=3 min=19834:max=43090:avg=42021:var=23256
THREADS=3 min=12879:max=43090:avg=41332:var=30211

THREADS=2 min=10438:max=43090:avg=41910:var=32652
THREADS=2 min=10438:max=43090:avg=41816:var=32652

Which is quite surprising in that on the two thread run, there are surprising minimums. In 5, 7 and 8, two adjacent threads have the same minima/maxima which is weird. So with FACTOR set to 2, this is what we get:

THREADS=16 min=23:max=43090:avg=39844:var=43067
THREADS=16 min=22:max=43090:avg=41978:var=43068
THREADS=16 min=9:max=43090:avg=39131:var=43081
THREADS=16 min=9:max=43090:avg=37050:var=43081
THREADS=16 min=17:max=43090:avg=39012:var=43073
THREADS=16 min=17:max=43090:avg=40153:var=43073
THREADS=16 min=4:max=43090:avg=41036:var=43086
THREADS=16 min=23:max=43090:avg=40206:var=43067
THREADS=16 min=32:max=43090:avg=40174:var=43058
THREADS=16 min=68:max=43090:avg=40551:var=43022
THREADS=16 min=23:max=43090:avg=40927:var=43067
THREADS=16 min=23:max=43090:avg=40747:var=43067
THREADS=16 min=28:max=43090:avg=40886:var=43062
THREADS=16 min=8:max=43090:avg=39380:var=43082
THREADS=16 min=8:max=43090:avg=36551:var=43082
THREADS=16 min=22:max=43090:avg=38743:var=43068

THREADS=15 min=139:max=43090:avg=39622:var=42951
THREADS=15 min=12:max=43090:avg=40690:var=43078
THREADS=15 min=64:max=43090:avg=39721:var=43026
THREADS=15 min=3:max=43090:avg=39207:var=43087
THREADS=15 min=3:max=43090:avg=40143:var=43087
THREADS=15 min=3213:max=43090:avg=41611:var=39877
THREADS=15 min=18:max=43090:avg=39399:var=43072
THREADS=15 min=18:max=43090:avg=39894:var=43072
THREADS=15 min=3:max=43090:avg=39579:var=43087
THREADS=15 min=3:max=43090:avg=39027:var=43087
THREADS=15 min=9:max=43090:avg=39910:var=43081
THREADS=15 min=77:max=43090:avg=40085:var=43013
THREADS=15 min=16:max=43090:avg=40392:var=43074
THREADS=15 min=13:max=43090:avg=41455:var=43077
THREADS=15 min=12:max=43090:avg=41152:var=43078

THREADS=14 min=63:max=43090:avg=41229:var=43027
THREADS=14 min=64:max=43090:avg=40931:var=43026
THREADS=14 min=12:max=43090:avg=39935:var=43078
THREADS=14 min=12:max=43090:avg=39307:var=43078
THREADS=14 min=37:max=43090:avg=39408:var=43053
THREADS=14 min=202:max=43090:avg=41830:var=42888
THREADS=14 min=18517:max=43090:avg=42397:var=24573
THREADS=14 min=87:max=43090:avg=41449:var=43003
THREADS=14 min=87:max=43090:avg=41352:var=43003
THREADS=14 min=17:max=43090:avg=41919:var=43073
THREADS=14 min=17:max=43090:avg=41896:var=43073
THREADS=14 min=5902:max=43090:avg=42156:var=37188
THREADS=14 min=3620:max=43090:avg=41960:var=39470
THREADS=14 min=64:max=43090:avg=41448:var=43026

THREADS=13 min=20:max=43090:avg=39998:var=43070
THREADS=13 min=124:max=43090:avg=40715:var=42966
THREADS=13 min=1:max=43090:avg=38856:var=43089
THREADS=13 min=1:max=43090:avg=39265:var=43089
THREADS=13 min=18:max=43090:avg=40026:var=43072
THREADS=13 min=18:max=43090:avg=40526:var=43072
THREADS=13 min=1:max=43090:avg=38695:var=43089
THREADS=13 min=1:max=43090:avg=38107:var=43089
THREADS=13 min=76:max=43090:avg=40457:var=43014
THREADS=13 min=76:max=43090:avg=39891:var=43014
THREADS=13 min=283:max=43090:avg=40472:var=42807
THREADS=13 min=119:max=43090:avg=40724:var=42971
THREADS=13 min=119:max=43090:avg=40402:var=42971

THREADS=12 min=130:max=43090:avg=42537:var=42960
THREADS=12 min=10:max=43090:avg=40826:var=43080
THREADS=12 min=54:max=43090:avg=39270:var=43036
THREADS=12 min=151:max=43090:avg=41114:var=42939
THREADS=12 min=151:max=43090:avg=40087:var=42939
THREADS=12 min=466:max=43090:avg=41241:var=42624
THREADS=12 min=164:max=43090:avg=42035:var=42926
THREADS=12 min=164:max=43090:avg=41621:var=42926
THREADS=12 min=3398:max=43090:avg=41298:var=39692
THREADS=12 min=3398:max=43090:avg=41979:var=39692
THREADS=12 min=758:max=43090:avg=42505:var=42332
THREADS=12 min=10:max=43090:avg=41605:var=43080

THREADS=11 min=1416:max=43090:avg=41151:var=41674
THREADS=11 min=9554:max=43090:avg=42649:var=33536
THREADS=11 min=1416:max=43090:avg=41709:var=41674
THREADS=11 min=21903:max=43090:avg=42534:var=21187
THREADS=11 min=93:max=43090:avg=41279:var=42997
THREADS=11 min=93:max=43090:avg=40962:var=42997
THREADS=11 min=239:max=43090:avg=41907:var=42851
THREADS=11 min=53:max=43090:avg=42096:var=43037
THREADS=11 min=53:max=43090:avg=41543:var=43037
THREADS=11 min=408:max=43090:avg=40986:var=42682
THREADS=11 min=1971:max=43090:avg=42006:var=41119

THREADS=10 min=27331:max=43090:avg=42582:var=15759
THREADS=10 min=5713:max=43090:avg=42033:var=37377
THREADS=10 min=3765:max=43090:avg=41529:var=39325
THREADS=10 min=3765:max=43090:avg=42201:var=39325
THREADS=10 min=207:max=43090:avg=42670:var=42883
THREADS=10 min=207:max=43090:avg=41863:var=42883
THREADS=10 min=4105:max=43090:avg=40956:var=38985
THREADS=10 min=140:max=43090:avg=41083:var=42950
THREADS=10 min=140:max=43090:avg=42134:var=42950
THREADS=10 min=176:max=43090:avg=41888:var=42914

THREADS=9 min=629:max=43090:avg=41771:var=42461
THREADS=9 min=1938:max=43090:avg=41748:var=41152
THREADS=9 min=435:max=43090:avg=41567:var=42655
THREADS=9 min=435:max=43090:avg=41126:var=42655
THREADS=9 min=7019:max=43090:avg=40533:var=36071
THREADS=9 min=133:max=43090:avg=41031:var=42957
THREADS=9 min=133:max=43090:avg=41695:var=42957
THREADS=9 min=118:max=43090:avg=41558:var=42972
THREADS=9 min=65:max=43090:avg=41412:var=43025

THREADS=8 min=3028:max=43090:avg=41970:var=40062
THREADS=8 min=4713:max=43090:avg=41803:var=38377
THREADS=8 min=4713:max=43090:avg=41633:var=38377
THREADS=8 min=1184:max=43090:avg=41842:var=41906
THREADS=8 min=1184:max=43090:avg=41401:var=41906
THREADS=8 min=12598:max=43090:avg=41587:var=30492
THREADS=8 min=19076:max=43090:avg=42217:var=24014
THREADS=8 min=9136:max=43090:avg=42355:var=33954

THREADS=7 min=12260:max=43090:avg=41692:var=30830
THREADS=7 min=12489:max=43090:avg=42036:var=30601
THREADS=7 min=272:max=43090:avg=42520:var=42818
THREADS=7 min=272:max=43090:avg=42526:var=42818
THREADS=7 min=18847:max=43090:avg=42556:var=24243
THREADS=7 min=12026:max=43090:avg=42078:var=31064
THREADS=7 min=12026:max=43090:avg=41752:var=31064

THREADS=6 min=14357:max=43090:avg=42024:var=28733
THREADS=6 min=14357:max=43090:avg=42175:var=28733
THREADS=6 min=22221:max=43090:avg=42552:var=20869
THREADS=6 min=23168:max=43090:avg=42747:var=19922
THREADS=6 min=26899:max=43090:avg=42721:var=16191
THREADS=6 min=6890:max=43090:avg=42610:var=36200

THREADS=5 min=22566:max=43090:avg=42447:var=20524
THREADS=5 min=16706:max=43090:avg=42329:var=26384
THREADS=5 min=16706:max=43090:avg=42252:var=26384
THREADS=5 min=15030:max=43090:avg=42335:var=28060
THREADS=5 min=15030:max=43090:avg=42263:var=28060

THREADS=4 min=7988:max=43090:avg=42158:var=35102
THREADS=4 min=8031:max=43090:avg=42410:var=35059
THREADS=4 min=10691:max=43090:avg=42238:var=32399
THREADS=4 min=10691:max=43090:avg=41725:var=32399

THREADS=3 min=15163:max=43090:avg=42264:var=27927
THREADS=3 min=17850:max=43090:avg=42188:var=25240
THREADS=3 min=6638:max=43090:avg=41799:var=36452

THREADS=2 min=6497:max=43090:avg=41353:var=36593
THREADS=2 min=6497:max=43090:avg=41521:var=36593

So a very rough heuristic visual analysis tells me that I'd be best having 6 cores at most running my trading engine. Time to play with Octave...

Friday, February 11, 2011

HIFREQ 2011 Panel Discussion Input

Here's my input to the panel discussion for HIFREQ 2011.

• What are the different set ups and combinations for HFT architecture?

I have experience of four different architectures:

traditional monolithic event queue and broadcast
reflective memory, distributed processing
dma, shared memory, multi-process and multicast
trading engine on a card

The latter appears to be the dream set up. It consists of an FPGA enabled network card with the strategy running on the card itself. This has been implemented by several large prop trading outfits who have arb strats. Up until June last year, we were able to compete with an aggressive arb strat - but we our fill rates have dropped off dramatically and we're consistently being beaten on speed. So we've moved the goalposts - we now focus on market making, news and global multi-venue trading.

I'd be interested in other approaches

• Is massive multicore or specialist silicon (FPGA, GPU etc.) the next frontier?

Multicore is attractive for a multi strategy play, but it requires careful design to avoid data races and performance issues like tlb cache misses and memory barriers. FPGA has always been attractive for dealing with FIX and conversion of ascii to binary (ie parsing). GPU has promise in the equities world where dynamic pricing and portfolio analysis are required. What's been widely overlooked is DSP. There are some very interesting things you can do with DSP.

• Which solutions provide maximum scalability, configuration, and customisation to ensure continual upgrade and development of your
systems to survive in tough technology race?

It has to be a message orientated multicast pure layer 2 architecture with software routing.

• Managing microbursts: the art and engineering of data capacity management

Again, a high performance messaging system combined with high resolution time and accurate traffic analysis are a must. Combine this with knowledge of the underlying network hardware in order to utilise multiple hardware queues on the switches, judicious use of QoS and correctly configuring the messaging system to utilise multiple channels yields good results.

• Taming the data torrent: conflating direct and aggregate market data feeds

Here's where an FPGA enabled network card helps e.g. by coalescing multiple keystations or A and B feeds, hashing messages and dropping duplicates and translating ascii to binary help greatly. Combine this with multicast for efficient data transport.

• Is asynchronous processing inevitable? What are the implications?

This has been used in HFT for many years and are necessary for effective parallelisation. One pattern I use is an "asynchronous n-slot put-take connector" which is a way of joining different processes in a way that allows each process to utilise its full timeslice. The implications of not using it are latency...

• What is next for complex event processing and tick databases? Will they be able to keep up?

CEP is a necessary evil, however, any strategy that uses it is almost impossible to debug.

With regards to tick databases I fail to see why people store stuff in databases at all - all our
market data is captured on an electrically connected secondary machine using HDF5 in date order directories. it's then shipped up to a disk array overnight. Backtest generally uses only three months worth of data.

• Surviving the technical glitch at high speed: designing robust architectures

The ability to run multiple strategies and services on a multicast message bus means recovery
from failure is straightforward.

Tuesday, February 08, 2011

High Frequency Trading World, Chicago

I've kindly been invited to talk at High Frequency Trading World, Chicago on June 27-29th 2011.

I suggested the following three talks. The real-time risk management one was chosen as the host thought this would be of great interest to traders. I've implemented this for real and gave a talk on Infosec Data Analytics and Visualisation which was well received so I thought I'd extend it to transactional logging. The basic idea came from work I did with Ian on triple entry accounting [sic] and "Notelets".

Real-time risk management and regulatory compliance

persisting transactions to the cloud
non-repudiation, risk management and distributed regulation using "triple entry" transactional logging
Market analytics using Hadoop

The next idea is more pioneering. Looking at new approaches to exchange technology and innovative delivery mechanisms.

Next Generation Exchange Technology

The transition to non-computational infra
Neat tricks with FPGA, DSP and memristors
Making the real virtual - multicast in software
Affordable networks: VPLS, QoS and IGMP snooping

And finally the day job.

Trading Engine Technology

Lockless design: avoiding data races
Shared memory techniques, superpages and reflective memory,
RSS, recvmsg() and kernel bypass for fast data acquisition
Real-time Linux, thread prioritisation techniques.
Tuning systems for HPC
Cheap, high accuracy time using PTP

Thursday, February 03, 2011

What to do with your FPGA Enabled Network Card

Here's what you can do with your new shiny 10GE network card with onboard FPGA:

Port Forwarding: this allows you to copy an incoming data stream to another port so that you can have one server for execution only and one for data persistence.

Data Filtering: transform and redirect data based at 10GE speed. Reformat XML to binary e.g

Port Forwarding to Multicast: forward a filtered or unfiltered data stream as a multicast stream

BGP/IGMP Routing: save a fortune on hardware and a network stack traversal too!

Port Forwarding to Multicast by topic: forward a filtered or unfiltered data stream as a multicast stream by topic

Port Failover: If your server fails, the feed data can automatically be transferred to another port electrically.

Timestamping of packets: at +-5 nanosecond resolution.

Object Serialisation: data formatted to binary - no parsing.

Data hashing: Listening to the A and the B? implement a "group feed" on the card

QOS marking: Packets can be marked with an appropriate quality of service to be expedited by the network.

Wednesday, February 02, 2011

Buy-side Technology European Summit 2011

I'm moderating the panel at the above conference. Here's some notes I sent in:

Market structure: Confronting the issues around market fragmentation, the impact of HFT and globalization

The future is multi-venue, multi-asset fusion trading using global VPLS networks with QoS and high precision time.

It's a war - you have to use engage in deep research and new techniques in order to win it . And you must know every layer of your technology - down to the wires. If you're not processing on the network card - you're too slow. Microtrading engines are effectively here.

Economic situation: New risks resulting from the financial landscape and the implications for technology

I predict the move to a dynamic cost of market data and trading leading to a growth in liquidity and the emergence of financial cryptography techniques to mitigate market risk. Ideas such as Triple Entry Accounting allowing third party risk analytics.

Technological developments: What are the opportunities created by new technologies such as cloud computing and SaaS?

SaaS can be used for autonomic computing - dealing with failure dynamically sourcing services from a reputation based market place, paying for them by microcash.

It's also a great place to do regulatory transaction logging, audit and risk management in near-real time and where enterprise wide service level monitoring and security can take place a la Loggly. I described how this works here.

New regulation: How are buy-side firms coping with the regulatory flood?

Largely ignoring it.

With regards The 15:50 Panel

I think the buy-side would like to see is a transparency from liquidity providers. In particular, older technology is prone to "queue allocation" due to fan-in/fan-out techniques which is opaque and can be seen as unfair and often the cause of friction. What would help is detailed performance stats (internal timings, distribution latencies) and new technology: multicast data distribution by topic in binary and high speed dedicated trading connections straight to the matching engine infra rather than fan-in via multiple tiers.

Mitigating the cost of accessing multiple trading venues

Standard pricing and transparency would be a good start. How about dynamic pricing for market data and cost of trade. Combine this with a "reputation index" - ie the likelihood of being filled, this would lead to a dynamic, fair market system.

Ensuring effective liquidity access

See above

Consolidated tape: What the buy-side really want

Profitability

How are the buy-side planning to prepare for new regulation

We need new thinking on how we achieve transparency whilst enabling the regulators access to transactional data otherwise we'll fall victim to centralisation which would kill the markets. One good example would be logging transaction information to the cloud in a secure way for post transactional analytics.

How equipped are the buy-side from a technology standpoint in the front middle and back office? Which areas will see less or more investments?

This is an area which is widely neglected and opens the opportunity for risk.

Sunday, January 30, 2011

Mankoff Company 2nd Annual Ultra Low Latency

I've been kindly invited to speak at another HFT conference in London. The 2nd Annual Ultra Low Latency: Trading Opportunities and Development in FX and other asset classes run by the Mankoff Company to be held at the Grange Holborn, London on the 23rd March 2011.

Wednesday, January 26, 2011

Wireshark Remoting

The technique whereby a feed handler data is captured and this feed capture is pulled back to your local workstation, for easier analysis and inspection. For more information, see: http://wiki.wireshark.org/CaptureSetup/Pipes

Prerequisites.

1) You are using SSH keys to log in to engines;

2) Your ssh-agent is running (Ubuntu desktops) with your SSH keys added to the keyring.
Most Ubuntu desktops will run ssh-agent for you on login.

3) wireshark is installed on your workstation;
sudo aptitude install wireshark

4) tshark is installed on the engine;
sudo aptitude install tshark

5) You are in the 'wireshark' group on the engine,
and the relevant Linux capabilities are set up as per
http://wiki.wireshark.org/CaptureSetup/CapturePrivileges

A typical Bash command line looks like this (on your local workstation):

$ wireshark -k -i <(ssh hostname tshark -w- -p -i eth1 -f \'tcp portrange 4000-4010\')

Let's review exactly what these arguments mean.

For the local wireshark invocation:
The '-k' flag means 'start capturing immediately'.
The '-i' flag tells wireshark to get its input from a pipe.

The <(...) bash construct is a pipeline invocation, which runs 'tshark' remotely on matrab (in this example) to capture from interface eth1

'ssh hostname tshark' runs tshark, the command-line version of wireshark, remotely on hostname.

The '-w-' flag means: write the raw PCAP format output captured by wireshark, to the standard output of the ssh remoted command, so that the local wireshark GUI on your workstation will pick up the feed.

The '-i' flag specifies the physical interface upon which to tap the traffic. This can only be a single logical Linux network interface.

If you need to capture traffic on more than one interface at once, you will need to configure a bridge interface. This is out of scope for this example. Typically this is only used for non-invasive captures using passive network taps.

Typically, the argument passed to the '-i' flag to tshark is chosen by using the Linux-specific command 'ip route get x.x.x.x' to find the physical Ethernet interface where a feed handler is running.

In this example, we used 'ip route get 10.69.14.16' to find the physical interface which matrab uses to reach an EBS Ai on the A-feed at Equinix LD4; eth1.

The '-p' flag tells tshark NOT to put the interface into 'promiscuous mode' -- a special hardware mode where a network adapter will pass traffic up the network stack, even if it isn't addressed to any of the *hardware addresses* the adapter is configured for.

Typically this is only needed for closer inspection, or if it's suspected that network addressing is incorrectly configured at either end of a feed. Promiscuous mode carries a penalty in that the system must then process every single packet physically received.

Finally, the '-f' flag specifies a PCAP style filter expression. The syntax for these expressions is NOT the same as the wireshark filter language; it can be found in the manual page for pcap-filter (man 7 pcap-filter).

In this example, we are asking only for all TCP traffic with port numbers between 4000-4010 in *either* the destination *or* source port fields.

PCAP filters are implemented inside the Linux kernel using a virtual CPU. Just-in-time assembly is used to convert the filters to x86 machine language for fast capture. The virtual machine, LPF, is based closely on the original Berkeley Packet Filter (BPF) design from BSD. The virtual machine has 8-bit opcodes, and 32-bit addressing modes.

Friday, January 14, 2011

High Frequency Trading Conferences

I've been kindly asked to speak at two conferences this year: The first is HIFREQ 2011 on February the 24th. I'm on the panel with Prof Dave Cliff talking about next generation tech for HFT.

I'm also speaking at The High Frequency Trading World Conference in Amsterdam on the 7-9th June 2011.

Things I'll be talking about:

The latest technology for market data acquisition and parsing, both in hardware and techniques using parallelised software.
The role of FPGA, DSP and Memristors
The future of XML
Shared and reflective memory for market fusion
CPU instructions for vectorisation and IO parallelism
CPU analysis tools
Kernel techniques for high performance
Global layer 2 networks and VPLS
Multicast routing in software
Differential trading
High precision global time
The importance of platform analytics

Very grateful for both opportunities as I don't get out much :-)

Wednesday, January 05, 2011

Google's Strategy

I listened to a series of predictions by Mark Anderson of Strategy News Service several nights ago on Global Business during a bout of insomnia and was intrigued by his prediction that "Google has lost its way", "doesn't know what business they are in" and were "a rudderless organisation".

With regard to what business Google are in, my analysis is that they are in the data business: collection, analysis, enrichment and dissemination. The fact that they make money out of ads is coincidental and is indeed predicated on this, without which, targeting of ads would be impossible. The fact that they do it so successfully gives this strategy provenance.

With regards strategic direction, I can see where Mark Anderson is coming from, having done and MBA at the OU and the Certificate in Company Direction at the IOD. He's talking about the classic model of top-down corporate strategy development (which is rudimentary and heuristic.)

Strategic direction is usually driven by external marketing campaigns and salesmanship rather than the day-to-day expertise of the workforce as it's much easier dealing with external consultants, vendors and pundits. You do pay for their opinion after all and you're not accountable to them but, most of all, they'll tell you what you want to hear or sell you what their paymasters tell them to and buy you a nice lunch too.

The fact that, in general, the workforce is ignored and their tacit knowledge is never made explicit, analysed, persisted and incorporated into strategic direction is the sad state of knowledge management. It is mostly due to fear of the workforce and their expertise together with a fear of failure.

Perhaps Google is different (I don't know as I've neither worked there nor talked to any of their employees) in that they are driven by the expertise of their developers and develop products which best fit the data they have. Strategy may be bottom up and may be derived from the expertise of the many clever people who work there. Does this make them rudderless? Perhaps but the "rudder" developed by most organisations is often generic at best and irrelevant in practise. I think I'd rather go with my perception of how Google develops strategy

Thursday, November 25, 2010

Visualisation Tools

Elie has sent me links to two tools - Tulip and Circos. There's a video of Tulip in action here.

This is a useful tool for visualisation of trades and market data and has proved invaluable.

A variant of the Circos approach was used quite effectively to replay a time series of equities trades by sector to end users showing their P and L and risk. Today it's used for multi-venue market visualisation.

Wednesday, November 24, 2010

Layers are for Cakes - Not Software

This erudite quote comes from Performance Anti-patterns by Bart Smaalders:

SOFTWARE LAYERING

Many software developers become fond of using layering to provide various levels of abstraction in their software. While layering is useful to some extent, its incautious use significantly increases the stack data cache footprint, TLB (translation look-aside buffer) misses, and function call overhead. Furthermore, the data hiding often forces
either the addition of too many arguments to function calls or the creation of new structures to hold sets of arguments. Once there are multiple users of a particular layer, modifications become more difficult and the performance trade-offs accumulate over time. A classic example of this problem is a portable application such as Mozilla using various window system toolkits; the various abstraction layers in both the application and the toolkits lead to
rather spectacularly deep call stacks with even minor exercising of functionality. While this does produce a portable application, the performance implications are significant; this tension between abstraction and implementation efficiencies forces us to reevaluate our imple-
mentations periodically. In general, layers are for cakes, not for software.

Friday, November 05, 2010

Smart Almanac

My friend Elie has written this clever on line application. Check it out: Smart Almanac is the first astrological application which computes favorable time frames for actions you may have to perform in your daily life, this is the perfect astrological assistant to organize your schedule hourly.

Quickly get a 400+ pages analysis report that will tell you hourlywhen to act in your favor, when the planets do support your action, given your natal chart, and your current location on earth.
A large choice of questions among 9 themes is available.

Bring the results with you along the the day : with the iCalendar option, synchronize with your mobile phone !

Monday, August 02, 2010

Infosec Data Analytics and Visualisation

I gave this talk at the IISYG last Friday. It was well received thanks to the reading of Raffael Marty's excellent work and blogging on loggly.

It's a nice idea but the key to success is getting application developers to adopt the logging mechanism and for team leaders to understand transactions and enforce teams to use them. There were many questions, mainly about the security of sending your logs to a third party.