Thursday, February 23, 2012
netmap - user space NIC ring buffer
netmap looks promising but it's about to be blown away by the ability to inject packets into L3 cache in the next iteration of Intel chips which have DCA - direct cache access
Friday, January 20, 2012
John Nolan on FPGA and GPU
A good overview of FPGA and GPU technology in this presentation http://www.infoq.com/interviews/nolan-hardware-acceleration
Monday, January 09, 2012
LEON3
The LEON3 is a synthesisable VHDL model of a 32-bit processor compliant with the
SPARC V8 architecture. The
model is highly configurable, and particularly suitable for system-on-a-chip
(SOC) designs. The full source code is available under the GNU GPL license, allowing free and
unlimited use for research and education. LEON3 is also available under a
low-cost commercial license, allowing it to be used in any commercial
application to a fraction of the cost of comparable IP cores. The LEON3
processor has the following features:
- SPARC V8 instruction set with V8e extensions
- Advanced 7-stage pipeline
- Hardware multiply, divide and MAC units
- High-performance, fully pipelined IEEE-754 FPU
- Separate instruction and data cache (Harvard architecture) with snooping
- Configurable caches: 1 - 4 ways, 1 - 256 kbytes/way. Random, LRR or LRU replacement
- Local instruction and data scratch pad RAM, 1 - 512 Kbytes
- SPARC Reference MMU (SRMMU) with configurable TLB
- AMBA-2.0 AHB bus interface
- Advanced on-chip debug support with instruction and data trace buffer
- Symmetric Multi-processor support (SMP)
- Power-down mode and clock gating
- Robust and fully synchronous single-edge clock design
- Up to 125 MHz in FPGA and 400 MHz on 0.13 um ASIC technologies
- Fault-tolerant and SEU-proof version available for space applications
- Extensively configurable
- Large range of software tools: compilers, kernels, simulators and debug monitors
- High Performance: 1.4 DMIPS/MHz, 1.8 CoreMark/MHz (gcc -4.1.2)
The LEON3 processor is distributed as part of the GRLIB IP library, allowing simple integration into complex SOC designs. GRLIB also includes a configurable LEON3 multi-processor design, with up to 4 CPU's and a large range of on-chip peripheral blocks.
Tuesday, November 22, 2011
Waters European Trading Architecture Summit 2011
Some feedback from this event which I attended today.
Event http://events.waterstechnology.com/etas
Event http://events.waterstechnology.com/etas
Infrastructure Management: Reducing costs, Improving performance, Professor Roger Woods, Queens University, Belfast
Prof Woods gave an impassioned talk about a tool that he has developed which takes c++, allows you to navigate the code and identify subsystems which you can target to run on hardware or emulation of hardware.
- He worked on the JP Morgan collaboration with Maxellor and was bullish about the technology.
- Two years from pilot to production.
- Developed at tool that allows identification of sections that are suitable for FPGA
- Key issue: programming FPGA bitstreams (http://en.wikipedia.org/wiki/Bitstream) - took six months
- C++ is translated into C (manually) before being cross compiled into Java which is what the Maxellor compiler requires.
- This is to remove c++ abstraction which "kills parellisation" (see slides)
- Focus was hardware FFT - all other logic in software - comms via FPGA bitstream
In summary:
- ideal for risk calculation and monte carlo where algorithm does not change.
- C++ legacy code does not parallelise easily and is not a candidate for FPGA
- Three year dev cycle.
- Complex, manual process
- JPM own 20% of Maxellor
Resources
- http://www.ecit.qub.ac.uk/Card/?name=r.woods
- eFutures Portal - http://efutures.ac.uk/
This continued to a panel hosted by Chris Skinner
Panel: The Parallel Paradigm Shift: are we about to enter a new chapter in the algorithmic arms race
Moderator: Chris Skinner, Panel: Prof Woods, Steven Weston, Global Head of Analytics, JPM. Andre Nedceloux, Sales guy, Excelian
- FPGA Plant needs to be kept hot to achieve best latency. To keep FPGA busy you need a few regular host cores loading work onto them.
- Programming/debugging directly in VHDL is ‘worse than a nightmare’, don’t try.
- Isolate the worst performing pieces, (Amdahl’s law) de-abstract and place on FPGA, they call each of the isolated units a ‘kernel’ .
- Compile times are high for Maxeler compiler to output VHDL, 4 hours for a model on a 4 core box.
- Iterative model for optimisation and implementation. They improved both the mathematics in the models and the implementation onto FPGA – ie, consider it not just a programming problem, but also a maths modeling one.
- They use python to manage the interaction with the models (e.g pulling reports)
- Initially run a model on the FPGA hosts and then incrementally update it through the day - when market data or announcements occur.
- No separate report running phase – it is included in the model run and report is kept in memory. Data only written out to a database at night time, if it is destroyed then it can be re-created.
- Low-latency is no longer a competitive advantage but now a status quo service for investment banking.
- Requires specialist non-general/outsourced programmers required who can understand hardware and algorithms who work alongside the business.
Panel
How low can you go? Ultra-low-latency trading
Moderator: David Berry
Members: Jogi Narain, CTO, FGS Capital LLP, Benjamin Stopford, Architect, RBS. Chris Donan, Head of Electronic Trading - Barcap.
This was a well run panel with some good insights from Chris Donan in particular:
- Stock programmers don't understand the stack from network to nic to stack to application and the underlying hardware operations
- Small teams of experienced engineers produce the best results
- Don't develop VHDL skills in house - use external resources.
- Latency gains correlate to profitability
- FPGA is good for market data (ie fixed problem) and risk
- Software parallelism is the future.
Friday, June 10, 2011
Thomson Reuters Expert Session
I was kindly invited to give an expert session by Thomson Reuters when I was between assignments. I gave an hour presentation which has been edited into four sessions:
On the FX Business Model http://thomsonreuters.na4.
On FX and the OTC Market http://thomsonreuters.na4.
On High Frequency Trading http://thomsonreuters.na4.
On FX Strategies http://thomsonreuters.na4.
I'm now gainfully employed so back to silent running for me.
Tuesday, March 15, 2011
Global Connectivity Vendor Selection
TeleGeography produce a topological map of submarine cables which is of interest to hft firms. There are many aspects to consider when procuring global connectivity.
Before you choose a suppler, perhaps some important questions to ask are:
Before you choose a suppler, perhaps some important questions to ask are:
- What are the fibre miles versus physical miles of long haul links
- How do you measure your quoted latency
- What is your underlying network technology and how is it provisioned
- What network equipment do you use for your core network
- How many hardware queues on the above equipment do you allocate to our service
- When using third parties, what is their underlying technology and distribution methodology
- What hand-off equipment do you use
- What can the above deliver in terms of serialisation capability
- What traffic shaping do you perform
- How often do you sample cdr violation
- Do you report CDR violation
- Do you offer QoS
- Do you offer IGMP snooping
Monday, March 07, 2011
Cavium Octeon II
Met with Barry, CTO of Tervela on Friday. He recommended taking a look at the Cavium Octeon II NPU card which has 32 cores, a C like interface and shiny new architecture.
Wednesday, March 02, 2011
FTQ for platform jitter analysis
FTQ is a useful tool dug up by Bruce which we've started using for jitter analysis and it's showing up some surprising results. The idea is simple - how many iterations of a variable can be performed in a fixed time.
I started by running the threaded version on our 8 core, dual cpu server for approximately 3 minutes using the following command:
t_ftq -t 8 -n 450000
Using Octave, I calculated the variance (42133) and standard deviation (2485.1.) Plotting this gave this over populated graph:
Next I thought I'd run it over seven cores and got a smoother profile. But graphs are fine and dandy but you need to look at the data and the percentiles. So as a first pass, I wrote this nifty awk script:
#!/bin/bash
FACTOR=2
CORES="`grep -c processor /proc/cpuinfo`"
THREADS=`echo "$CORES * $FACTOR" | bc`
while [ "$THREADS" -gt 1 ]
do
./t_ftq -t $THREADS
for FILE in ftq*counts.dat
do
awk 'BEGIN {
minimum = 4500000
maximum = 0
average = 0
}
{
if($1 < minimum =" $1"> maximum)
{
maximum = $1
}
average += $1
}
END {
printf("THREADS=%d min=%d:max=%d:avg=%d:var=%d\n", '"$THREADS"', minimum, maximum, average/NR, maximum-minimum)
}' $FILE
done
THREADS="`expr $THREADS - 1`"
rm -f *.dat
echo
done
exit 0
Which produced this output when run with a loading factor of 1:
THREADS=8 min=19080:max=43090:avg=41247:var=24010
THREADS=8 min=8401:max=43090:avg=41971:var=34689
THREADS=8 min=8401:max=43090:avg=42596:var=34689
THREADS=8 min=8956:max=43090:avg=42453:var=34134
THREADS=8 min=21515:max=43090:avg=42326:var=21575
THREADS=8 min=11157:max=43090:avg=42548:var=31933
THREADS=8 min=6351:max=43090:avg=42619:var=36739
THREADS=8 min=6351:max=43090:avg=42381:var=36739
THREADS=7 min=20666:max=43090:avg=42217:var=22424
THREADS=7 min=7591:max=43090:avg=42264:var=35499
THREADS=7 min=7591:max=43090:avg=42487:var=35499
THREADS=7 min=25263:max=43090:avg=42566:var=17827
THREADS=7 min=20513:max=43090:avg=42603:var=22577
THREADS=7 min=15328:max=43090:avg=42528:var=27762
THREADS=7 min=9555:max=43090:avg=41859:var=33535
THREADS=6 min=9324:max=43090:avg=40872:var=33766
THREADS=6 min=10144:max=43090:avg=41454:var=32946
THREADS=6 min=29223:max=43090:avg=42749:var=13867
THREADS=6 min=25239:max=43090:avg=42590:var=17851
THREADS=6 min=20013:max=43090:avg=42357:var=23077
THREADS=6 min=4612:max=43090:avg=42114:var=38478
THREADS=5 min=457:max=43090:avg=42351:var=42633
THREADS=5 min=457:max=43090:avg=41645:var=42633
THREADS=5 min=15064:max=43090:avg=41190:var=28026
THREADS=5 min=16821:max=43090:avg=41614:var=26269
THREADS=5 min=15204:max=43090:avg=41272:var=27886
THREADS=4 min=21561:max=43090:avg=42436:var=21529
THREADS=4 min=23847:max=43090:avg=42158:var=19243
THREADS=4 min=5588:max=43090:avg=41406:var=37502
THREADS=4 min=5588:max=43090:avg=41282:var=37502
THREADS=3 min=26739:max=43090:avg=42303:var=16351
THREADS=3 min=19834:max=43090:avg=42021:var=23256
THREADS=3 min=12879:max=43090:avg=41332:var=30211
THREADS=2 min=10438:max=43090:avg=41910:var=32652
THREADS=2 min=10438:max=43090:avg=41816:var=32652
Which is quite surprising in that on the two thread run, there are surprising minimums. In 5, 7 and 8, two adjacent threads have the same minima/maxima which is weird. So with FACTOR set to 2, this is what we get:
THREADS=16 min=23:max=43090:avg=39844:var=43067
THREADS=16 min=22:max=43090:avg=41978:var=43068
THREADS=16 min=9:max=43090:avg=39131:var=43081
THREADS=16 min=9:max=43090:avg=37050:var=43081
THREADS=16 min=17:max=43090:avg=39012:var=43073
THREADS=16 min=17:max=43090:avg=40153:var=43073
THREADS=16 min=4:max=43090:avg=41036:var=43086
THREADS=16 min=23:max=43090:avg=40206:var=43067
THREADS=16 min=32:max=43090:avg=40174:var=43058
THREADS=16 min=68:max=43090:avg=40551:var=43022
THREADS=16 min=23:max=43090:avg=40927:var=43067
THREADS=16 min=23:max=43090:avg=40747:var=43067
THREADS=16 min=28:max=43090:avg=40886:var=43062
THREADS=16 min=8:max=43090:avg=39380:var=43082
THREADS=16 min=8:max=43090:avg=36551:var=43082
THREADS=16 min=22:max=43090:avg=38743:var=43068
THREADS=15 min=139:max=43090:avg=39622:var=42951
THREADS=15 min=12:max=43090:avg=40690:var=43078
THREADS=15 min=64:max=43090:avg=39721:var=43026
THREADS=15 min=3:max=43090:avg=39207:var=43087
THREADS=15 min=3:max=43090:avg=40143:var=43087
THREADS=15 min=3213:max=43090:avg=41611:var=39877
THREADS=15 min=18:max=43090:avg=39399:var=43072
THREADS=15 min=18:max=43090:avg=39894:var=43072
THREADS=15 min=3:max=43090:avg=39579:var=43087
THREADS=15 min=3:max=43090:avg=39027:var=43087
THREADS=15 min=9:max=43090:avg=39910:var=43081
THREADS=15 min=77:max=43090:avg=40085:var=43013
THREADS=15 min=16:max=43090:avg=40392:var=43074
THREADS=15 min=13:max=43090:avg=41455:var=43077
THREADS=15 min=12:max=43090:avg=41152:var=43078
THREADS=14 min=63:max=43090:avg=41229:var=43027
THREADS=14 min=64:max=43090:avg=40931:var=43026
THREADS=14 min=12:max=43090:avg=39935:var=43078
THREADS=14 min=12:max=43090:avg=39307:var=43078
THREADS=14 min=37:max=43090:avg=39408:var=43053
THREADS=14 min=202:max=43090:avg=41830:var=42888
THREADS=14 min=18517:max=43090:avg=42397:var=24573
THREADS=14 min=87:max=43090:avg=41449:var=43003
THREADS=14 min=87:max=43090:avg=41352:var=43003
THREADS=14 min=17:max=43090:avg=41919:var=43073
THREADS=14 min=17:max=43090:avg=41896:var=43073
THREADS=14 min=5902:max=43090:avg=42156:var=37188
THREADS=14 min=3620:max=43090:avg=41960:var=39470
THREADS=14 min=64:max=43090:avg=41448:var=43026
THREADS=13 min=20:max=43090:avg=39998:var=43070
THREADS=13 min=124:max=43090:avg=40715:var=42966
THREADS=13 min=1:max=43090:avg=38856:var=43089
THREADS=13 min=1:max=43090:avg=39265:var=43089
THREADS=13 min=18:max=43090:avg=40026:var=43072
THREADS=13 min=18:max=43090:avg=40526:var=43072
THREADS=13 min=1:max=43090:avg=38695:var=43089
THREADS=13 min=1:max=43090:avg=38107:var=43089
THREADS=13 min=76:max=43090:avg=40457:var=43014
THREADS=13 min=76:max=43090:avg=39891:var=43014
THREADS=13 min=283:max=43090:avg=40472:var=42807
THREADS=13 min=119:max=43090:avg=40724:var=42971
THREADS=13 min=119:max=43090:avg=40402:var=42971
THREADS=12 min=130:max=43090:avg=42537:var=42960
THREADS=12 min=10:max=43090:avg=40826:var=43080
THREADS=12 min=54:max=43090:avg=39270:var=43036
THREADS=12 min=151:max=43090:avg=41114:var=42939
THREADS=12 min=151:max=43090:avg=40087:var=42939
THREADS=12 min=466:max=43090:avg=41241:var=42624
THREADS=12 min=164:max=43090:avg=42035:var=42926
THREADS=12 min=164:max=43090:avg=41621:var=42926
THREADS=12 min=3398:max=43090:avg=41298:var=39692
THREADS=12 min=3398:max=43090:avg=41979:var=39692
THREADS=12 min=758:max=43090:avg=42505:var=42332
THREADS=12 min=10:max=43090:avg=41605:var=43080
THREADS=11 min=1416:max=43090:avg=41151:var=41674
THREADS=11 min=9554:max=43090:avg=42649:var=33536
THREADS=11 min=1416:max=43090:avg=41709:var=41674
THREADS=11 min=21903:max=43090:avg=42534:var=21187
THREADS=11 min=93:max=43090:avg=41279:var=42997
THREADS=11 min=93:max=43090:avg=40962:var=42997
THREADS=11 min=239:max=43090:avg=41907:var=42851
THREADS=11 min=53:max=43090:avg=42096:var=43037
THREADS=11 min=53:max=43090:avg=41543:var=43037
THREADS=11 min=408:max=43090:avg=40986:var=42682
THREADS=11 min=1971:max=43090:avg=42006:var=41119
THREADS=10 min=27331:max=43090:avg=42582:var=15759
THREADS=10 min=5713:max=43090:avg=42033:var=37377
THREADS=10 min=3765:max=43090:avg=41529:var=39325
THREADS=10 min=3765:max=43090:avg=42201:var=39325
THREADS=10 min=207:max=43090:avg=42670:var=42883
THREADS=10 min=207:max=43090:avg=41863:var=42883
THREADS=10 min=4105:max=43090:avg=40956:var=38985
THREADS=10 min=140:max=43090:avg=41083:var=42950
THREADS=10 min=140:max=43090:avg=42134:var=42950
THREADS=10 min=176:max=43090:avg=41888:var=42914
THREADS=9 min=629:max=43090:avg=41771:var=42461
THREADS=9 min=1938:max=43090:avg=41748:var=41152
THREADS=9 min=435:max=43090:avg=41567:var=42655
THREADS=9 min=435:max=43090:avg=41126:var=42655
THREADS=9 min=7019:max=43090:avg=40533:var=36071
THREADS=9 min=133:max=43090:avg=41031:var=42957
THREADS=9 min=133:max=43090:avg=41695:var=42957
THREADS=9 min=118:max=43090:avg=41558:var=42972
THREADS=9 min=65:max=43090:avg=41412:var=43025
THREADS=8 min=3028:max=43090:avg=41970:var=40062
THREADS=8 min=4713:max=43090:avg=41803:var=38377
THREADS=8 min=4713:max=43090:avg=41633:var=38377
THREADS=8 min=1184:max=43090:avg=41842:var=41906
THREADS=8 min=1184:max=43090:avg=41401:var=41906
THREADS=8 min=12598:max=43090:avg=41587:var=30492
THREADS=8 min=19076:max=43090:avg=42217:var=24014
THREADS=8 min=9136:max=43090:avg=42355:var=33954
THREADS=7 min=12260:max=43090:avg=41692:var=30830
THREADS=7 min=12489:max=43090:avg=42036:var=30601
THREADS=7 min=272:max=43090:avg=42520:var=42818
THREADS=7 min=272:max=43090:avg=42526:var=42818
THREADS=7 min=18847:max=43090:avg=42556:var=24243
THREADS=7 min=12026:max=43090:avg=42078:var=31064
THREADS=7 min=12026:max=43090:avg=41752:var=31064
THREADS=6 min=14357:max=43090:avg=42024:var=28733
THREADS=6 min=14357:max=43090:avg=42175:var=28733
THREADS=6 min=22221:max=43090:avg=42552:var=20869
THREADS=6 min=23168:max=43090:avg=42747:var=19922
THREADS=6 min=26899:max=43090:avg=42721:var=16191
THREADS=6 min=6890:max=43090:avg=42610:var=36200
THREADS=5 min=22566:max=43090:avg=42447:var=20524
THREADS=5 min=16706:max=43090:avg=42329:var=26384
THREADS=5 min=16706:max=43090:avg=42252:var=26384
THREADS=5 min=15030:max=43090:avg=42335:var=28060
THREADS=5 min=15030:max=43090:avg=42263:var=28060
THREADS=4 min=7988:max=43090:avg=42158:var=35102
THREADS=4 min=8031:max=43090:avg=42410:var=35059
THREADS=4 min=10691:max=43090:avg=42238:var=32399
THREADS=4 min=10691:max=43090:avg=41725:var=32399
THREADS=3 min=15163:max=43090:avg=42264:var=27927
THREADS=3 min=17850:max=43090:avg=42188:var=25240
THREADS=3 min=6638:max=43090:avg=41799:var=36452
THREADS=2 min=6497:max=43090:avg=41353:var=36593
THREADS=2 min=6497:max=43090:avg=41521:var=36593
So a very rough heuristic visual analysis tells me that I'd be best having 6 cores at most running my trading engine. Time to play with Octave...
I started by running the threaded version on our 8 core, dual cpu server for approximately 3 minutes using the following command:
t_ftq -t 8 -n 450000
Using Octave, I calculated the variance (42133) and standard deviation (2485.1.) Plotting this gave this over populated graph:
Next I thought I'd run it over seven cores and got a smoother profile. But graphs are fine and dandy but you need to look at the data and the percentiles. So as a first pass, I wrote this nifty awk script:#!/bin/bash
FACTOR=2
CORES="`grep -c processor /proc/cpuinfo`"
THREADS=`echo "$CORES * $FACTOR" | bc`
while [ "$THREADS" -gt 1 ]
do
./t_ftq -t $THREADS
for FILE in ftq*counts.dat
do
awk 'BEGIN {
minimum = 4500000
maximum = 0
average = 0
}
{
if($1 < minimum =" $1"> maximum)
{
maximum = $1
}
average += $1
}
END {
printf("THREADS=%d min=%d:max=%d:avg=%d:var=%d\n", '"$THREADS"', minimum, maximum, average/NR, maximum-minimum)
}' $FILE
done
THREADS="`expr $THREADS - 1`"
rm -f *.dat
echo
done
exit 0
Which produced this output when run with a loading factor of 1:
THREADS=8 min=19080:max=43090:avg=41247:var=24010
THREADS=8 min=8401:max=43090:avg=41971:var=34689
THREADS=8 min=8401:max=43090:avg=42596:var=34689
THREADS=8 min=8956:max=43090:avg=42453:var=34134
THREADS=8 min=21515:max=43090:avg=42326:var=21575
THREADS=8 min=11157:max=43090:avg=42548:var=31933
THREADS=8 min=6351:max=43090:avg=42619:var=36739
THREADS=8 min=6351:max=43090:avg=42381:var=36739
THREADS=7 min=20666:max=43090:avg=42217:var=22424
THREADS=7 min=7591:max=43090:avg=42264:var=35499
THREADS=7 min=7591:max=43090:avg=42487:var=35499
THREADS=7 min=25263:max=43090:avg=42566:var=17827
THREADS=7 min=20513:max=43090:avg=42603:var=22577
THREADS=7 min=15328:max=43090:avg=42528:var=27762
THREADS=7 min=9555:max=43090:avg=41859:var=33535
THREADS=6 min=9324:max=43090:avg=40872:var=33766
THREADS=6 min=10144:max=43090:avg=41454:var=32946
THREADS=6 min=29223:max=43090:avg=42749:var=13867
THREADS=6 min=25239:max=43090:avg=42590:var=17851
THREADS=6 min=20013:max=43090:avg=42357:var=23077
THREADS=6 min=4612:max=43090:avg=42114:var=38478
THREADS=5 min=457:max=43090:avg=42351:var=42633
THREADS=5 min=457:max=43090:avg=41645:var=42633
THREADS=5 min=15064:max=43090:avg=41190:var=28026
THREADS=5 min=16821:max=43090:avg=41614:var=26269
THREADS=5 min=15204:max=43090:avg=41272:var=27886
THREADS=4 min=21561:max=43090:avg=42436:var=21529
THREADS=4 min=23847:max=43090:avg=42158:var=19243
THREADS=4 min=5588:max=43090:avg=41406:var=37502
THREADS=4 min=5588:max=43090:avg=41282:var=37502
THREADS=3 min=26739:max=43090:avg=42303:var=16351
THREADS=3 min=19834:max=43090:avg=42021:var=23256
THREADS=3 min=12879:max=43090:avg=41332:var=30211
THREADS=2 min=10438:max=43090:avg=41910:var=32652
THREADS=2 min=10438:max=43090:avg=41816:var=32652
Which is quite surprising in that on the two thread run, there are surprising minimums. In 5, 7 and 8, two adjacent threads have the same minima/maxima which is weird. So with FACTOR set to 2, this is what we get:
THREADS=16 min=23:max=43090:avg=39844:var=43067
THREADS=16 min=22:max=43090:avg=41978:var=43068
THREADS=16 min=9:max=43090:avg=39131:var=43081
THREADS=16 min=9:max=43090:avg=37050:var=43081
THREADS=16 min=17:max=43090:avg=39012:var=43073
THREADS=16 min=17:max=43090:avg=40153:var=43073
THREADS=16 min=4:max=43090:avg=41036:var=43086
THREADS=16 min=23:max=43090:avg=40206:var=43067
THREADS=16 min=32:max=43090:avg=40174:var=43058
THREADS=16 min=68:max=43090:avg=40551:var=43022
THREADS=16 min=23:max=43090:avg=40927:var=43067
THREADS=16 min=23:max=43090:avg=40747:var=43067
THREADS=16 min=28:max=43090:avg=40886:var=43062
THREADS=16 min=8:max=43090:avg=39380:var=43082
THREADS=16 min=8:max=43090:avg=36551:var=43082
THREADS=16 min=22:max=43090:avg=38743:var=43068
THREADS=15 min=139:max=43090:avg=39622:var=42951
THREADS=15 min=12:max=43090:avg=40690:var=43078
THREADS=15 min=64:max=43090:avg=39721:var=43026
THREADS=15 min=3:max=43090:avg=39207:var=43087
THREADS=15 min=3:max=43090:avg=40143:var=43087
THREADS=15 min=3213:max=43090:avg=41611:var=39877
THREADS=15 min=18:max=43090:avg=39399:var=43072
THREADS=15 min=18:max=43090:avg=39894:var=43072
THREADS=15 min=3:max=43090:avg=39579:var=43087
THREADS=15 min=3:max=43090:avg=39027:var=43087
THREADS=15 min=9:max=43090:avg=39910:var=43081
THREADS=15 min=77:max=43090:avg=40085:var=43013
THREADS=15 min=16:max=43090:avg=40392:var=43074
THREADS=15 min=13:max=43090:avg=41455:var=43077
THREADS=15 min=12:max=43090:avg=41152:var=43078
THREADS=14 min=63:max=43090:avg=41229:var=43027
THREADS=14 min=64:max=43090:avg=40931:var=43026
THREADS=14 min=12:max=43090:avg=39935:var=43078
THREADS=14 min=12:max=43090:avg=39307:var=43078
THREADS=14 min=37:max=43090:avg=39408:var=43053
THREADS=14 min=202:max=43090:avg=41830:var=42888
THREADS=14 min=18517:max=43090:avg=42397:var=24573
THREADS=14 min=87:max=43090:avg=41449:var=43003
THREADS=14 min=87:max=43090:avg=41352:var=43003
THREADS=14 min=17:max=43090:avg=41919:var=43073
THREADS=14 min=17:max=43090:avg=41896:var=43073
THREADS=14 min=5902:max=43090:avg=42156:var=37188
THREADS=14 min=3620:max=43090:avg=41960:var=39470
THREADS=14 min=64:max=43090:avg=41448:var=43026
THREADS=13 min=20:max=43090:avg=39998:var=43070
THREADS=13 min=124:max=43090:avg=40715:var=42966
THREADS=13 min=1:max=43090:avg=38856:var=43089
THREADS=13 min=1:max=43090:avg=39265:var=43089
THREADS=13 min=18:max=43090:avg=40026:var=43072
THREADS=13 min=18:max=43090:avg=40526:var=43072
THREADS=13 min=1:max=43090:avg=38695:var=43089
THREADS=13 min=1:max=43090:avg=38107:var=43089
THREADS=13 min=76:max=43090:avg=40457:var=43014
THREADS=13 min=76:max=43090:avg=39891:var=43014
THREADS=13 min=283:max=43090:avg=40472:var=42807
THREADS=13 min=119:max=43090:avg=40724:var=42971
THREADS=13 min=119:max=43090:avg=40402:var=42971
THREADS=12 min=130:max=43090:avg=42537:var=42960
THREADS=12 min=10:max=43090:avg=40826:var=43080
THREADS=12 min=54:max=43090:avg=39270:var=43036
THREADS=12 min=151:max=43090:avg=41114:var=42939
THREADS=12 min=151:max=43090:avg=40087:var=42939
THREADS=12 min=466:max=43090:avg=41241:var=42624
THREADS=12 min=164:max=43090:avg=42035:var=42926
THREADS=12 min=164:max=43090:avg=41621:var=42926
THREADS=12 min=3398:max=43090:avg=41298:var=39692
THREADS=12 min=3398:max=43090:avg=41979:var=39692
THREADS=12 min=758:max=43090:avg=42505:var=42332
THREADS=12 min=10:max=43090:avg=41605:var=43080
THREADS=11 min=1416:max=43090:avg=41151:var=41674
THREADS=11 min=9554:max=43090:avg=42649:var=33536
THREADS=11 min=1416:max=43090:avg=41709:var=41674
THREADS=11 min=21903:max=43090:avg=42534:var=21187
THREADS=11 min=93:max=43090:avg=41279:var=42997
THREADS=11 min=93:max=43090:avg=40962:var=42997
THREADS=11 min=239:max=43090:avg=41907:var=42851
THREADS=11 min=53:max=43090:avg=42096:var=43037
THREADS=11 min=53:max=43090:avg=41543:var=43037
THREADS=11 min=408:max=43090:avg=40986:var=42682
THREADS=11 min=1971:max=43090:avg=42006:var=41119
THREADS=10 min=27331:max=43090:avg=42582:var=15759
THREADS=10 min=5713:max=43090:avg=42033:var=37377
THREADS=10 min=3765:max=43090:avg=41529:var=39325
THREADS=10 min=3765:max=43090:avg=42201:var=39325
THREADS=10 min=207:max=43090:avg=42670:var=42883
THREADS=10 min=207:max=43090:avg=41863:var=42883
THREADS=10 min=4105:max=43090:avg=40956:var=38985
THREADS=10 min=140:max=43090:avg=41083:var=42950
THREADS=10 min=140:max=43090:avg=42134:var=42950
THREADS=10 min=176:max=43090:avg=41888:var=42914
THREADS=9 min=629:max=43090:avg=41771:var=42461
THREADS=9 min=1938:max=43090:avg=41748:var=41152
THREADS=9 min=435:max=43090:avg=41567:var=42655
THREADS=9 min=435:max=43090:avg=41126:var=42655
THREADS=9 min=7019:max=43090:avg=40533:var=36071
THREADS=9 min=133:max=43090:avg=41031:var=42957
THREADS=9 min=133:max=43090:avg=41695:var=42957
THREADS=9 min=118:max=43090:avg=41558:var=42972
THREADS=9 min=65:max=43090:avg=41412:var=43025
THREADS=8 min=3028:max=43090:avg=41970:var=40062
THREADS=8 min=4713:max=43090:avg=41803:var=38377
THREADS=8 min=4713:max=43090:avg=41633:var=38377
THREADS=8 min=1184:max=43090:avg=41842:var=41906
THREADS=8 min=1184:max=43090:avg=41401:var=41906
THREADS=8 min=12598:max=43090:avg=41587:var=30492
THREADS=8 min=19076:max=43090:avg=42217:var=24014
THREADS=8 min=9136:max=43090:avg=42355:var=33954
THREADS=7 min=12260:max=43090:avg=41692:var=30830
THREADS=7 min=12489:max=43090:avg=42036:var=30601
THREADS=7 min=272:max=43090:avg=42520:var=42818
THREADS=7 min=272:max=43090:avg=42526:var=42818
THREADS=7 min=18847:max=43090:avg=42556:var=24243
THREADS=7 min=12026:max=43090:avg=42078:var=31064
THREADS=7 min=12026:max=43090:avg=41752:var=31064
THREADS=6 min=14357:max=43090:avg=42024:var=28733
THREADS=6 min=14357:max=43090:avg=42175:var=28733
THREADS=6 min=22221:max=43090:avg=42552:var=20869
THREADS=6 min=23168:max=43090:avg=42747:var=19922
THREADS=6 min=26899:max=43090:avg=42721:var=16191
THREADS=6 min=6890:max=43090:avg=42610:var=36200
THREADS=5 min=22566:max=43090:avg=42447:var=20524
THREADS=5 min=16706:max=43090:avg=42329:var=26384
THREADS=5 min=16706:max=43090:avg=42252:var=26384
THREADS=5 min=15030:max=43090:avg=42335:var=28060
THREADS=5 min=15030:max=43090:avg=42263:var=28060
THREADS=4 min=7988:max=43090:avg=42158:var=35102
THREADS=4 min=8031:max=43090:avg=42410:var=35059
THREADS=4 min=10691:max=43090:avg=42238:var=32399
THREADS=4 min=10691:max=43090:avg=41725:var=32399
THREADS=3 min=15163:max=43090:avg=42264:var=27927
THREADS=3 min=17850:max=43090:avg=42188:var=25240
THREADS=3 min=6638:max=43090:avg=41799:var=36452
THREADS=2 min=6497:max=43090:avg=41353:var=36593
THREADS=2 min=6497:max=43090:avg=41521:var=36593
So a very rough heuristic visual analysis tells me that I'd be best having 6 cores at most running my trading engine. Time to play with Octave...
Friday, February 11, 2011
HIFREQ 2011 Panel Discussion Input
Here's my input to the panel discussion for HIFREQ 2011.
• What are the different set ups and combinations for HFT architecture?
I have experience of four different architectures:
I'd be interested in other approaches
• Is massive multicore or specialist silicon (FPGA, GPU etc.) the next frontier?
Multicore is attractive for a multi strategy play, but it requires careful design to avoid data races and performance issues like tlb cache misses and memory barriers. FPGA has always been attractive for dealing with FIX and conversion of ascii to binary (ie parsing). GPU has promise in the equities world where dynamic pricing and portfolio analysis are required. What's been widely overlooked is DSP. There are some very interesting things you can do with DSP.
• Which solutions provide maximum scalability, configuration, and customisation to ensure continual upgrade and development of your
systems to survive in tough technology race?
It has to be a message orientated multicast pure layer 2 architecture with software routing.
• Managing microbursts: the art and engineering of data capacity management
Again, a high performance messaging system combined with high resolution time and accurate traffic analysis are a must. Combine this with knowledge of the underlying network hardware in order to utilise multiple hardware queues on the switches, judicious use of QoS and correctly configuring the messaging system to utilise multiple channels yields good results.
• Taming the data torrent: conflating direct and aggregate market data feeds
Here's where an FPGA enabled network card helps e.g. by coalescing multiple keystations or A and B feeds, hashing messages and dropping duplicates and translating ascii to binary help greatly. Combine this with multicast for efficient data transport.
• Is asynchronous processing inevitable? What are the implications?
This has been used in HFT for many years and are necessary for effective parallelisation. One pattern I use is an "asynchronous n-slot put-take connector" which is a way of joining different processes in a way that allows each process to utilise its full timeslice. The implications of not using it are latency...
• What is next for complex event processing and tick databases? Will they be able to keep up?
CEP is a necessary evil, however, any strategy that uses it is almost impossible to debug.
With regards to tick databases I fail to see why people store stuff in databases at all - all our
market data is captured on an electrically connected secondary machine using HDF5 in date order directories. it's then shipped up to a disk array overnight. Backtest generally uses only three months worth of data.
• Surviving the technical glitch at high speed: designing robust architectures
The ability to run multiple strategies and services on a multicast message bus means recovery
from failure is straightforward.
• What are the different set ups and combinations for HFT architecture?
I have experience of four different architectures:
- traditional monolithic event queue and broadcast
- reflective memory, distributed processing
- dma, shared memory, multi-process and multicast
- trading engine on a card
I'd be interested in other approaches
• Is massive multicore or specialist silicon (FPGA, GPU etc.) the next frontier?
Multicore is attractive for a multi strategy play, but it requires careful design to avoid data races and performance issues like tlb cache misses and memory barriers. FPGA has always been attractive for dealing with FIX and conversion of ascii to binary (ie parsing). GPU has promise in the equities world where dynamic pricing and portfolio analysis are required. What's been widely overlooked is DSP. There are some very interesting things you can do with DSP.
• Which solutions provide maximum scalability, configuration, and customisation to ensure continual upgrade and development of your
systems to survive in tough technology race?
It has to be a message orientated multicast pure layer 2 architecture with software routing.
• Managing microbursts: the art and engineering of data capacity management
Again, a high performance messaging system combined with high resolution time and accurate traffic analysis are a must. Combine this with knowledge of the underlying network hardware in order to utilise multiple hardware queues on the switches, judicious use of QoS and correctly configuring the messaging system to utilise multiple channels yields good results.
• Taming the data torrent: conflating direct and aggregate market data feeds
Here's where an FPGA enabled network card helps e.g. by coalescing multiple keystations or A and B feeds, hashing messages and dropping duplicates and translating ascii to binary help greatly. Combine this with multicast for efficient data transport.
• Is asynchronous processing inevitable? What are the implications?
This has been used in HFT for many years and are necessary for effective parallelisation. One pattern I use is an "asynchronous n-slot put-take connector" which is a way of joining different processes in a way that allows each process to utilise its full timeslice. The implications of not using it are latency...
• What is next for complex event processing and tick databases? Will they be able to keep up?
CEP is a necessary evil, however, any strategy that uses it is almost impossible to debug.
With regards to tick databases I fail to see why people store stuff in databases at all - all our
market data is captured on an electrically connected secondary machine using HDF5 in date order directories. it's then shipped up to a disk array overnight. Backtest generally uses only three months worth of data.
• Surviving the technical glitch at high speed: designing robust architectures
The ability to run multiple strategies and services on a multicast message bus means recovery
from failure is straightforward.
Tuesday, February 08, 2011
High Frequency Trading World, Chicago
I've kindly been invited to talk at High Frequency Trading World, Chicago on June 27-29th 2011.
I suggested the following three talks. The real-time risk management one was chosen as the host thought this would be of great interest to traders. I've implemented this for real and gave a talk on Infosec Data Analytics and Visualisation which was well received so I thought I'd extend it to transactional logging. The basic idea came from work I did with Ian on triple entry accounting [sic] and "Notelets".
Real-time risk management and regulatory compliance
Next Generation Exchange Technology
Trading Engine Technology
I suggested the following three talks. The real-time risk management one was chosen as the host thought this would be of great interest to traders. I've implemented this for real and gave a talk on Infosec Data Analytics and Visualisation which was well received so I thought I'd extend it to transactional logging. The basic idea came from work I did with Ian on triple entry accounting [sic] and "Notelets".
Real-time risk management and regulatory compliance
- persisting transactions to the cloud
- non-repudiation, risk management and distributed regulation using "triple entry" transactional logging
- Market analytics using Hadoop
Next Generation Exchange Technology
- The transition to non-computational infra
- Neat tricks with FPGA, DSP and memristors
- Making the real virtual - multicast in software
- Affordable networks: VPLS, QoS and IGMP snooping
Trading Engine Technology
- Lockless design: avoiding data races
- Shared memory techniques, superpages and reflective memory,
- RSS, recvmsg() and kernel bypass for fast data acquisition
- Real-time Linux, thread prioritisation techniques.
- Tuning systems for HPC
- Cheap, high accuracy time using PTP
Thursday, February 03, 2011
What to do with your FPGA Enabled Network Card
Here's what you can do with your new shiny 10GE network card with onboard FPGA:
- Port Forwarding: this allows you to copy an incoming data stream to another port so that you can have one server for execution only and one for data persistence.
- Data Filtering: transform and redirect data based at 10GE speed. Reformat XML to binary e.g
- Port Forwarding to Multicast: forward a filtered or unfiltered data stream as a multicast stream
- BGP/IGMP Routing: save a fortune on hardware and a network stack traversal too!
- Port Forwarding to Multicast by topic: forward a filtered or unfiltered data stream as a multicast stream by topic
- Port Failover: If your server fails, the feed data can automatically be transferred to another port electrically.
- Timestamping of packets: at +-5 nanosecond resolution.
- Object Serialisation: data formatted to binary - no parsing.
- Data hashing: Listening to the A and the B? implement a "group feed" on the card
- QOS marking: Packets can be marked with an appropriate quality of service to be expedited by the network.
Wednesday, February 02, 2011
Buy-side Technology European Summit 2011
I'm moderating the panel at the above conference. Here's some notes I sent in:
Market structure: Confronting the issues around market fragmentation, the impact of HFT and globalization
The future is multi-venue, multi-asset fusion trading using global VPLS networks with QoS and high precision time.
It's a war - you have to use engage in deep research and new techniques in order to win it . And you must know every layer of your technology - down to the wires. If you're not processing on the network card - you're too slow. Microtrading engines are effectively here.
Economic situation: New risks resulting from the financial landscape and the implications for technology
I predict the move to a dynamic cost of market data and trading leading to a growth in liquidity and the emergence of financial cryptography techniques to mitigate market risk. Ideas such as Triple Entry Accounting allowing third party risk analytics.
Technological developments: What are the opportunities created by new technologies such as cloud computing and SaaS?
SaaS can be used for autonomic computing - dealing with failure dynamically sourcing services from a reputation based market place, paying for them by microcash.
It's also a great place to do regulatory transaction logging, audit and risk management in near-real time and where enterprise wide service level monitoring and security can take place a la Loggly. I described how this works here.
New regulation: How are buy-side firms coping with the regulatory flood?
Largely ignoring it.
With regards The 15:50 Panel
I think the buy-side would like to see is a transparency from liquidity providers. In particular, older technology is prone to "queue allocation" due to fan-in/fan-out techniques which is opaque and can be seen as unfair and often the cause of friction. What would help is detailed performance stats (internal timings, distribution latencies) and new technology: multicast data distribution by topic in binary and high speed dedicated trading connections straight to the matching engine infra rather than fan-in via multiple tiers.
Mitigating the cost of accessing multiple trading venues
Standard pricing and transparency would be a good start. How about dynamic pricing for market data and cost of trade. Combine this with a "reputation index" - ie the likelihood of being filled, this would lead to a dynamic, fair market system.
Ensuring effective liquidity access
See above
Consolidated tape: What the buy-side really want
Profitability
How are the buy-side planning to prepare for new regulation
We need new thinking on how we achieve transparency whilst enabling the regulators access to transactional data otherwise we'll fall victim to centralisation which would kill the markets. One good example would be logging transaction information to the cloud in a secure way for post transactional analytics.
How equipped are the buy-side from a technology standpoint in the front middle and back office? Which areas will see less or more investments?
This is an area which is widely neglected and opens the opportunity for risk.
Market structure: Confronting the issues around market fragmentation, the impact of HFT and globalization
The future is multi-venue, multi-asset fusion trading using global VPLS networks with QoS and high precision time.
It's a war - you have to use engage in deep research and new techniques in order to win it . And you must know every layer of your technology - down to the wires. If you're not processing on the network card - you're too slow. Microtrading engines are effectively here.
Economic situation: New risks resulting from the financial landscape and the implications for technology
I predict the move to a dynamic cost of market data and trading leading to a growth in liquidity and the emergence of financial cryptography techniques to mitigate market risk. Ideas such as Triple Entry Accounting allowing third party risk analytics.
Technological developments: What are the opportunities created by new technologies such as cloud computing and SaaS?
SaaS can be used for autonomic computing - dealing with failure dynamically sourcing services from a reputation based market place, paying for them by microcash.
It's also a great place to do regulatory transaction logging, audit and risk management in near-real time and where enterprise wide service level monitoring and security can take place a la Loggly. I described how this works here.
New regulation: How are buy-side firms coping with the regulatory flood?
Largely ignoring it.
With regards The 15:50 Panel
I think the buy-side would like to see is a transparency from liquidity providers. In particular, older technology is prone to "queue allocation" due to fan-in/fan-out techniques which is opaque and can be seen as unfair and often the cause of friction. What would help is detailed performance stats (internal timings, distribution latencies) and new technology: multicast data distribution by topic in binary and high speed dedicated trading connections straight to the matching engine infra rather than fan-in via multiple tiers.
Mitigating the cost of accessing multiple trading venues
Standard pricing and transparency would be a good start. How about dynamic pricing for market data and cost of trade. Combine this with a "reputation index" - ie the likelihood of being filled, this would lead to a dynamic, fair market system.
Ensuring effective liquidity access
See above
Consolidated tape: What the buy-side really want
Profitability
How are the buy-side planning to prepare for new regulation
We need new thinking on how we achieve transparency whilst enabling the regulators access to transactional data otherwise we'll fall victim to centralisation which would kill the markets. One good example would be logging transaction information to the cloud in a secure way for post transactional analytics.
How equipped are the buy-side from a technology standpoint in the front middle and back office? Which areas will see less or more investments?
This is an area which is widely neglected and opens the opportunity for risk.
Sunday, January 30, 2011
Mankoff Company 2nd Annual Ultra Low Latency
I've been kindly invited to speak at another HFT conference in London. The 2nd Annual Ultra Low Latency: Trading Opportunities and Development in FX and other asset classes run by the Mankoff Company to be held at the Grange Holborn, London on the 23rd March 2011.
Wednesday, January 26, 2011
Wireshark Remoting
The technique whereby a feed handler data is captured and this feed capture is pulled back to your local workstation, for easier analysis and inspection. For more information, see: http://wiki.wireshark.org/CaptureSetup/Pipes
Prerequisites.
1) You are using SSH keys to log in to engines;
2) Your ssh-agent is running (Ubuntu desktops) with your SSH keys added to the keyring.
Most Ubuntu desktops will run ssh-agent for you on login.
3) wireshark is installed on your workstation;
sudo aptitude install wireshark
4) tshark is installed on the engine;
sudo aptitude install tshark
5) You are in the 'wireshark' group on the engine,
and the relevant Linux capabilities are set up as per
http://wiki.wireshark.org/CaptureSetup/CapturePrivileges
A typical Bash command line looks like this (on your local workstation):
$ wireshark -k -i <(ssh hostname tshark -w- -p -i eth1 -f \'tcp portrange 4000-4010\')
Let's review exactly what these arguments mean.
For the local wireshark invocation:
The '-k' flag means 'start capturing immediately'.
The '-i' flag tells wireshark to get its input from a pipe.
The <(...) bash construct is a pipeline invocation, which runs 'tshark' remotely on matrab (in this example) to capture from interface eth1
'ssh hostname tshark' runs tshark, the command-line version of wireshark, remotely on hostname.
The '-w-' flag means: write the raw PCAP format output captured by wireshark, to the standard output of the ssh remoted command, so that the local wireshark GUI on your workstation will pick up the feed.
The '-i' flag specifies the physical interface upon which to tap the traffic. This can only be a single logical Linux network interface.
If you need to capture traffic on more than one interface at once, you will need to configure a bridge interface. This is out of scope for this example. Typically this is only used for non-invasive captures using passive network taps.
Typically, the argument passed to the '-i' flag to tshark is chosen by using the Linux-specific command 'ip route get x.x.x.x' to find the physical Ethernet interface where a feed handler is running.
In this example, we used 'ip route get 10.69.14.16' to find the physical interface which matrab uses to reach an EBS Ai on the A-feed at Equinix LD4; eth1.
The '-p' flag tells tshark NOT to put the interface into 'promiscuous mode' -- a special hardware mode where a network adapter will pass traffic up the network stack, even if it isn't addressed to any of the *hardware addresses* the adapter is configured for.
Typically this is only needed for closer inspection, or if it's suspected that network addressing is incorrectly configured at either end of a feed. Promiscuous mode carries a penalty in that the system must then process every single packet physically received.
Finally, the '-f' flag specifies a PCAP style filter expression. The syntax for these expressions is NOT the same as the wireshark filter language; it can be found in the manual page for pcap-filter (man 7 pcap-filter).
In this example, we are asking only for all TCP traffic with port numbers between 4000-4010 in *either* the destination *or* source port fields.
PCAP filters are implemented inside the Linux kernel using a virtual CPU. Just-in-time assembly is used to convert the filters to x86 machine language for fast capture. The virtual machine, LPF, is based closely on the original Berkeley Packet Filter (BPF) design from BSD. The virtual machine has 8-bit opcodes, and 32-bit addressing modes.
Prerequisites.
1) You are using SSH keys to log in to engines;
2) Your ssh-agent is running (Ubuntu desktops) with your SSH keys added to the keyring.
Most Ubuntu desktops will run ssh-agent for you on login.
3) wireshark is installed on your workstation;
sudo aptitude install wireshark
4) tshark is installed on the engine;
sudo aptitude install tshark
5) You are in the 'wireshark' group on the engine,
and the relevant Linux capabilities are set up as per
http://wiki.wireshark.org/
A typical Bash command line looks like this (on your local workstation):
$ wireshark -k -i <(ssh hostname tshark -w- -p -i eth1 -f \'tcp portrange 4000-4010\')
Let's review exactly what these arguments mean.
For the local wireshark invocation:
The '-k' flag means 'start capturing immediately'.
The '-i' flag tells wireshark to get its input from a pipe.
The <(...) bash construct is a pipeline invocation, which runs 'tshark' remotely on matrab (in this example) to capture from interface eth1
'ssh hostname tshark' runs tshark, the command-line version of wireshark, remotely on hostname.
The '-w-' flag means: write the raw PCAP format output captured by wireshark, to the standard output of the ssh remoted command, so that the local wireshark GUI on your workstation will pick up the feed.
The '-i' flag specifies the physical interface upon which to tap the traffic. This can only be a single logical Linux network interface.
If you need to capture traffic on more than one interface at once, you will need to configure a bridge interface. This is out of scope for this example. Typically this is only used for non-invasive captures using passive network taps.
Typically, the argument passed to the '-i' flag to tshark is chosen by using the Linux-specific command 'ip route get x.x.x.x' to find the physical Ethernet interface where a feed handler is running.
In this example, we used 'ip route get 10.69.14.16' to find the physical interface which matrab uses to reach an EBS Ai on the A-feed at Equinix LD4; eth1.
The '-p' flag tells tshark NOT to put the interface into 'promiscuous mode' -- a special hardware mode where a network adapter will pass traffic up the network stack, even if it isn't addressed to any of the *hardware addresses* the adapter is configured for.
Typically this is only needed for closer inspection, or if it's suspected that network addressing is incorrectly configured at either end of a feed. Promiscuous mode carries a penalty in that the system must then process every single packet physically received.
Finally, the '-f' flag specifies a PCAP style filter expression. The syntax for these expressions is NOT the same as the wireshark filter language; it can be found in the manual page for pcap-filter (man 7 pcap-filter).
In this example, we are asking only for all TCP traffic with port numbers between 4000-4010 in *either* the destination *or* source port fields.
PCAP filters are implemented inside the Linux kernel using a virtual CPU. Just-in-time assembly is used to convert the filters to x86 machine language for fast capture. The virtual machine, LPF, is based closely on the original Berkeley Packet Filter (BPF) design from BSD. The virtual machine has 8-bit opcodes, and 32-bit addressing modes.
Friday, January 14, 2011
High Frequency Trading Conferences
I've been kindly asked to speak at two conferences this year: The first is HIFREQ 2011 on February the 24th. I'm on the panel with Prof Dave Cliff talking about next generation tech for HFT.
I'm also speaking at The High Frequency Trading World Conference in Amsterdam on the 7-9th June 2011.
Things I'll be talking about:
I'm also speaking at The High Frequency Trading World Conference in Amsterdam on the 7-9th June 2011.
Things I'll be talking about:
- The latest technology for market data acquisition and parsing, both in hardware and techniques using parallelised software.
- The role of FPGA, DSP and Memristors
- The future of XML
- Shared and reflective memory for market fusion
- CPU instructions for vectorisation and IO parallelism
- CPU analysis tools
- Kernel techniques for high performance
- Global layer 2 networks and VPLS
- Multicast routing in software
- Differential trading
- High precision global time
- The importance of platform analytics
Wednesday, January 05, 2011
Google's Strategy
I listened to a series of predictions by Mark Anderson of Strategy News Service several nights ago on Global Business during a bout of insomnia and was intrigued by his prediction that "Google has lost its way", "doesn't know what business they are in" and were "a rudderless organisation".
With regard to what business Google are in, my analysis is that they are in the data business: collection, analysis, enrichment and dissemination. The fact that they make money out of ads is coincidental and is indeed predicated on this, without which, targeting of ads would be impossible. The fact that they do it so successfully gives this strategy provenance.
With regards strategic direction, I can see where Mark Anderson is coming from, having done and MBA at the OU and the Certificate in Company Direction at the IOD. He's talking about the classic model of top-down corporate strategy development (which is rudimentary and heuristic.)
Strategic direction is usually driven by external marketing campaigns and salesmanship rather than the day-to-day expertise of the workforce as it's much easier dealing with external consultants, vendors and pundits. You do pay for their opinion after all and you're not accountable to them but, most of all, they'll tell you what you want to hear or sell you what their paymasters tell them to and buy you a nice lunch too.
The fact that, in general, the workforce is ignored and their tacit knowledge is never made explicit, analysed, persisted and incorporated into strategic direction is the sad state of knowledge management. It is mostly due to fear of the workforce and their expertise together with a fear of failure.
Perhaps Google is different (I don't know as I've neither worked there nor talked to any of their employees) in that they are driven by the expertise of their developers and develop products which best fit the data they have. Strategy may be bottom up and may be derived from the expertise of the many clever people who work there. Does this make them rudderless? Perhaps but the "rudder" developed by most organisations is often generic at best and irrelevant in practise. I think I'd rather go with my perception of how Google develops strategy
With regards strategic direction, I can see where Mark Anderson is coming from, having done and MBA at the OU and the Certificate in Company Direction at the IOD. He's talking about the classic model of top-down corporate strategy development (which is rudimentary and heuristic.)
Strategic direction is usually driven by external marketing campaigns and salesmanship rather than the day-to-day expertise of the workforce as it's much easier dealing with external consultants, vendors and pundits. You do pay for their opinion after all and you're not accountable to them but, most of all, they'll tell you what you want to hear or sell you what their paymasters tell them to and buy you a nice lunch too.
The fact that, in general, the workforce is ignored and their tacit knowledge is never made explicit, analysed, persisted and incorporated into strategic direction is the sad state of knowledge management. It is mostly due to fear of the workforce and their expertise together with a fear of failure.
Perhaps Google is different (I don't know as I've neither worked there nor talked to any of their employees) in that they are driven by the expertise of their developers and develop products which best fit the data they have. Strategy may be bottom up and may be derived from the expertise of the many clever people who work there. Does this make them rudderless? Perhaps but the "rudder" developed by most organisations is often generic at best and irrelevant in practise. I think I'd rather go with my perception of how Google develops strategy
Thursday, November 25, 2010
Visualisation Tools
Elie has sent me links to two tools - Tulip and Circos. There's a video of Tulip in action here.
This is a useful tool for visualisation of trades and market data and has proved invaluable.
A variant of the Circos approach was used quite effectively to replay a time series of equities trades by sector to end users showing their P and L and risk. Today it's used for multi-venue market visualisation.
This is a useful tool for visualisation of trades and market data and has proved invaluable.
A variant of the Circos approach was used quite effectively to replay a time series of equities trades by sector to end users showing their P and L and risk. Today it's used for multi-venue market visualisation.
Wednesday, November 24, 2010
Layers are for Cakes - Not Software
This erudite quote comes from Performance Anti-patterns by Bart Smaalders:
SOFTWARE LAYERING
Many software developers become fond of using layering to provide various levels of abstraction in their software. While layering is useful to some extent, its incautious use significantly increases the stack data cache footprint, TLB (translation look-aside buffer) misses, and function call overhead. Furthermore, the data hiding often forces
either the addition of too many arguments to function calls or the creation of new structures to hold sets of arguments. Once there are multiple users of a particular layer, modifications become more difficult and the performance trade-offs accumulate over time. A classic example of this problem is a portable application such as Mozilla using various window system toolkits; the various abstraction layers in both the application and the toolkits lead to
rather spectacularly deep call stacks with even minor exercising of functionality. While this does produce a portable application, the performance implications are significant; this tension between abstraction and implementation efficiencies forces us to reevaluate our imple-
mentations periodically. In general, layers are for cakes, not for software.
SOFTWARE LAYERING
Many software developers become fond of using layering to provide various levels of abstraction in their software. While layering is useful to some extent, its incautious use significantly increases the stack data cache footprint, TLB (translation look-aside buffer) misses, and function call overhead. Furthermore, the data hiding often forces
either the addition of too many arguments to function calls or the creation of new structures to hold sets of arguments. Once there are multiple users of a particular layer, modifications become more difficult and the performance trade-offs accumulate over time. A classic example of this problem is a portable application such as Mozilla using various window system toolkits; the various abstraction layers in both the application and the toolkits lead to
rather spectacularly deep call stacks with even minor exercising of functionality. While this does produce a portable application, the performance implications are significant; this tension between abstraction and implementation efficiencies forces us to reevaluate our imple-
mentations periodically. In general, layers are for cakes, not for software.
Friday, November 05, 2010
Smart Almanac
My friend Elie has written this clever on line application. Check it out: Smart Almanac is the first astrological application which computes favorable time frames for actions you may have to perform in your daily life, this is the perfect astrological assistant to organize your schedule hourly.
Quickly get a 400+ pages analysis report that will tell you hourlywhen to act in your favor, when the planets do support your action, given your natal chart, and your current location on earth.
A large choice of questions among 9 themes is available.
Bring the results with you along the the day : with the iCalendar option, synchronize with your mobile phone !
Quickly get a 400+ pages analysis report that will tell you hourlywhen to act in your favor, when the planets do support your action, given your natal chart, and your current location on earth.
A large choice of questions among 9 themes is available.
Bring the results with you along the the day : with the iCalendar option, synchronize with your mobile phone !
Monday, August 02, 2010
Infosec Data Analytics and Visualisation
I gave this talk at the IISYG last Friday. It was well received thanks to the reading of Raffael Marty's excellent work and blogging on loggly.
It's a nice idea but the key to success is getting application developers to adopt the logging mechanism and for team leaders to understand transactions and enforce teams to use them. There were many questions, mainly about the security of sending your logs to a third party.
It's a nice idea but the key to success is getting application developers to adopt the logging mechanism and for team leaders to understand transactions and enforce teams to use them. There were many questions, mainly about the security of sending your logs to a third party.
Subscribe to:
Posts (Atom)