Can Backtesting and Execution Use the Same Program?
The intricacy of establishing a connection with a brokerage or exchange, obtaining real-time market data, placing and receiving orders, updating portfolio positions, and other tasks is usually concealed by special purpose execution platforms. In the meantime, historical data is usually integrated into special-purpose backtesting systems. By separating the pure trading logic into a function, free of the specifics of where to place orders or how to retrieve data, the backtest program can be made identical to the live execution program for many special purpose trading platforms. To switch between backtesting and live execution mode, simply press a button that toggles between feeding in historical data and live market data.
This ease of switching between backtesting and live execution is more than just convenience: It eliminates any possibility of discrepancies or errors in transcribing a backtest strategy into a live strategy, discrepancies that often plague strategies written in a general programming language whether it is C++ or MATLAB. Just as importantly, it eliminates the possibility of look-ahead bias. As explained before, look-ahead bias means mistakenly incorporating future, unknowable information as part of the historical data input to the backtest engine. Special-purpose platforms feed in historical market data into the trade generating engine one tick or one bar at a time, just as it would feeding in live market data. So there is no possibility that future information can be used as input. This is one major advantage of using a special-purpose trading platform.
There is one more advantage in using a platform where the backtesting and live execution programs are one and the same it enables true tick based high-frequency trading strategies backtesting. This is because most industrial-strength live execution programs are “event-driven”; that is, a trade is triggered by the arrival of a new tick, not the end of an arbitrary time bar. So if the input historical data is also tick-based, we can also backtest a high-frequency strategy that depends on the change of every tick or even every change in the order book. (I said “in theory” assuming that your hardware is powerful enough. Otherwise, see the discussion later in this chapter in the section “What Type of Asset Classes or Strategies Does the Platform Support?”) Of course, we can backtest tick-based strategies in MATLAB by feeding every tick into the program as well, though that is quite a cumbersome procedure.
If you are a competent programmer who prefers the flexibility of a general purpose programming language, yet you want to use the same program for both backtesting and live trading because of the preceding considerations, you can still use the institutional grade special-purpose platforms as IDEs, or you can use the many open source IDEs available: Marketcetera, TradeLink, Algo-Trader, ActiveQuant. I call them IDEs, but they are more than just a trading strategy development environment: They come with libraries that deal with the nuts and bolts of connecting to and exchanging data with your broker, much like a special purpose platform does. Many of them are also integrated with historical data, which is an important time saver. As an added bonus, these open-source IDEs are either free or quite low-cost compared to special-purpose platforms. I display in Table 1.2 the languages, markets, and brokers that they support. (FIX as a broker means that the system can directly access any execution venues via the FIX protocol, regardless of clearing broker.) I also indicate whether the IDE is tick based (sometimes called event driven or stream based ).
One should note that Table only compares features of open-source IDEs. The institutional grade special purpose platforms typically have all of these features.
Asset Classes and Strategies:
While using a special purpose platform for trading strategies has several important advantages described earlier, few but the most high end of these platforms support all possible asset classes, including stocks, futures, currencies, and options. For example, the popular MetaTrader is for currencies trading only. It is especially difficult for these platforms to trade strategies that involve arbitrage between different asset classes, such as between futures and stocks or currencies and futures. The open-source IDEs are better able to handle these situations. As Table indicates, most IDEs can trade a variety of asset classes. But, as usual, the most flexible solution in this respect is a stand alone program written outside of any IDE.
Beyond asset classes, many special-purpose platforms also place restrictions on the type of strategies that they support even within one asset class. Often, simple pairs trading strategies require special modules to handle. Most lower end platforms cannot handle common statistical arbitrage or portfolio trading strategies that involve many symbols. Open-source IDEs do not have such restrictions, and, of course, neither do stand-alone programs.
What about high(er)-frequency trading? What kind of platforms can support this demanding trading strategy? The surprising answer is that most platforms can handle the execution part of high-frequency trading without too much latency (as long as your strategy can tolerate latencies in the 1- to 10-millisecond range), and since special-purpose platforms as well as IDEs typically use the same program for both backtesting and execution, backtesting shouldn’t in theory be a problem either.
To understand why most platforms have no trouble handling high-frequency executions, we have to realize that most of the latency that needs to be overcome in high-frequency trading is due to live market data latency, or brokerage order confirmation latency.
1.) Live market data latency:
For your program to receive a new quote or trade price within 1 to 10 milliseconds (ms), you have to co-locate your program at the exchange or in your broker’s data center (see Box image); furthermore, you have to receive a direct data feed from the exchanges involved, not from a consolidated data feed such as SIAC’s Consolidated Tape System (CTS). (For example, Interactive Brokers’ data feed only offers snapshots of market data every 250 ms.).
2.) Brokerage order confirmation latency:
If a strategy submits limit orders, it will depend on a timely order status confirmation before it can decide what to do next. For some retail brokerages, it can take up to six seconds between the execution of an order and your program receiving the execution confirmation, virtually.
Colocation of Trading Programs
The general term colocation can mean several ways of physically locating your trading program outside of your desktop computer. Stretching the definition a bit, it can mean installing your trading program in a cloud server or VPS (virtual private server) such as Amazon’s EC2, slicehost.com, or go grid. com. The advantage of doing so is to prevent power or Internet outages that are more likely to strike a private home or office than a commercial data center, with its backup power supply and redundant network connectivity. Co-locating in a cloud server does not necessarily shorten the time data take to travel between your brokerage or an exchange to your trading program, since many homes or offices are now equipped with a fiber optics connection to their Internet service provider (e.g., Verizon’s FiOS in the United States, and Bell’s Fibe Internet in Canada). To verify whether co-locating in a virtual private server (VPS) actually reduces this latency, you would need to conduct a test yourself by “pinging” your broker’s server to see what the average round trip time is. Certainly, if your VPS happens to be located physically close to your broker or exchange, and if they are directly connected to an Internet backbone, this latency will be smaller. (For example, pinging the Interactive Brokers’ quote server from my home desktop computer produces an average round trip time of about 55 ms, pinging the same server from Amazon’s EC2 takes about 25 ms, and pinging it from various VPSs located near Interactive Brokers takes about 16 to 34 ms).
I mention VPS only because many trading programs are not so computationally intensive as to require their own dedicated servers. But if they are, you can certainly upgrade to such services at many of the hosting companies familiar with the requirements of the financial trading industry such as Equinix and Telx, both of whom operate data centers in close proximity to the various exchanges.
If your server is already in a secure location (whether that is your office or a data center) and is immune to power outage, then all you need is a fast connection to your broker or the exchange. You can consider using an “extranet,” which is like the Internet but operated by a private company, which will guarantee a minimum communication speed. BT Radianz, Savvis, and TNS are examples of such companies. If you have a large budget, you can also ask these companies to build a dedicated communication line from your server to your broker or exchange as well.
The next step up in the colocation hierarchy is co-locating inside your brokerage’s data center, so that quotes or orders confirmation generated by your broker are transmitted to your program via an internal network, unmolested by the noise and vagaries of the public Internet. Various brokers that cater to professional traders have made available colocation service: examples are Lime Brokerage and FXCM. (Because of colocation, clients of Lime Brokerage can even receive direct data feeds from the NYSE at a relatively low rate, which, as I mentioned before, is faster than the consolidated SIAC CTS data feed.)
The ultimate colocation is, of course, situating your trading server at the exchange or ECN itself. This is likely to be an expensive proposition (except for forex ECNs), and useful only if you have a prime broker relationship, which allows you to have “sponsored access” to connect to the exchange without going through the broker’s infrastructure (Johnson, 2010). Such prime broker relationships can typically be established only if you can generate institutional level commissions or have multimillion dollar account. The requirements as well as expenses to establish colocation are lower for forex prime brokers and ECNs. Most forex ECNs including Currenex, EBS, FXall, and Hotspot operate within large commercial data centers such as Equinix’s NY4 facility, and it is not too expensive to co-locate at that facility or sign up with a VPS that does.
Some traders have expressed concern that co-locating their trading programs on a remote server exposes them to possible theft of their intellectual property. The simplest way is eliminate this risk is to just store “executables” (binary computer codes that look like gibberish to humans) on these remote servers, and not the source code of your trading algorithm. (Even with a MATLAB program, you can convert all the .m files to .p files before loading them to the remote server.) Without source codes, no one can know the operating instructions of running the trading program, and no one will be foolish enough to risk capital on trading a black-box strategy of 34 which they know little about. For the truly paranoid, you can also require an ever-changing password that depends on the current time to start a program.
Ensuring that no high-frequency trading can be done. Even if your brokerage has order confirmation latency below 10 ms, or if they allow you to have direct market access to the exchanges so you get your order status confirmation directly from the exchanges, you would still need to co-locate your program with either your broker in the former case, or with the exchange in the latter case.
Practically any software program (other than Excel running with a VB macro) takes less than 10 ms to submit a new order after receiving the latest market data and order status updates, so software or hardware latency is usually not the bottleneck for high-frequency trading, unless you are using one program to monitor thousands of symbols. (Concerning this last point, see Box 1.3 for issues related to multithreading.) But backtesting a high-frequency strategy is entirely a different matter. To do this, you will be required to input many months of tick data (trades and quotes), maybe on many symbols, into the backtesting platform. Worse, sometimes you have to input level 2 quotes, too.
Multithreading and High-Frequency Trading of Multiple Symbols
Multithreading for a trading platform means that it can respond to multiple events (usually the arrival of a new tick) simultaneously. This is particularly important if the program trades multiple symbols simultaneously, which is often the case for a stock-trading program. You certainly don’t want your buy order for AAPL to be delayed just because the program is deciding whether to sell BBRY! If you write your own stand-alone trading program using a modern programming language such as Java or Python, you won’t have any problem with multithreading because this ability is native to such languages. However, if you use MATLAB, you will need to purchase the Parallel Computing Toolbox as well; otherwise, there is no multithreading. (Even if you purchase that Toolbox, you are limited to 12 independent threads, hardly enough to trade 500 stocks simultaneously!) But do not confuse the lack of multithreading in MATLAB with the “loss of ticks.” If you write two “listeners,” A and B, in MATLAB to receive tick data from two separate symbols, because the fact that listener A is busy processing a tick-triggered event doesn’t mean that listener B is “deaf.” Once listener A has finished processing, listener B will start to process those tick events that it has received while A was busy, with no lost ticks (Kuznetsov, 2010).
Data will overwhelm the memory of most machines, if they are not handled in special ways (such as using parallel computing algorithms). Most special-purpose backtesting platforms are not designed to be especially intelligent when handling this quantity of data, and most of them are not equipped at all to backtest data with all of bid/ask/last tick prices (and sizes) nor level 2 quotes either. So backtesting a high-frequency strategy usually requires that you write your own stand-alone program with special customization. Actually, backtesting a high-frequency strategy may not tell you much about its real-life profitability anyway because of the Heisenberg uncertainty principle that I mentioned before.
All but the most costly special purpose systems are often hampered by news driven trading in addition to high-frequency trading. By definition, one of the prerequisites for news driven trading is a machine readable news stream. This capability is absent from most open-source IDEs and special-purpose systems. Progress Apama, which merges the Reuters and Dow Jones machine readable news feeds, and Deltix, which incorporates Raven Pack’s News Sentiment data feed, are two noteworthy outliers. One of the IDEs that offers a newsfeed from benzinga.com is Market Cetera, albeit it is unlikely to be as quick as that of Bloomberg, Dow Jones, and Reuters. You can either read a news XML file that is periodically ftp’d to your hard drive or connect to these news feeds via the news provider’s API if you’re writing your own standalone trading program. For example, both Thomson Reuters and Dow Jones have made their machine-readable news available through an API. It is necessary to utilize the more costly option when news trading is frequent. If not, suppliers like News ware provide even more reasonably priced choices.
Read Also; Backtesting Strategies, Platform Selection & Programming