Source : 
Keywords : Market microstructure, machine learning, high-frequency trading, electronic exchanges, alternative exchanges, dark pools, limit order book.
Modern financial markets have been highly influenced by technological advancements and the benefits of it can be seen in the plethora of information that is generated in a short span of time. Although the application of machine learning to financial markets and its prediction has been abundantly studied, its application to market microstructure data is very rare. The large quantity of high-frequency data has driven researchers to explore data at finer granularity and examine the dynamic details about price formation .
A market microstructure consists in the study of financial markets and how they operate, when it comes to decisions made about trades and price discovery process. They may include:
The time between trades, because it is generally an indicator of trading intensity ;
Volatility, which could represent evidence of good and bad trading scenarios, as high volatility may cause unsuitable market state ;
Volume, which can directly correlate with trade duration, because it might represent informed trading rather than less high volume active trading ;
And trade duration, high trading activity is claimed to be related to higher price impact of trades and faster price adjustment to trade-related events, while slower trades may suggest informed single entities .
Whist several other options are available, they are most of the time instrument related and require a high domain knowledge. It is usually important to tailor and evaluate ones features to cater the specific scenario identified.
One such important scenario to think about when catering to prices, is about the aggressiveness of buyers and sellers. In an Order Book, a match implies a trade, which occurs whenever a bid/ask orders are matched, however the trade is at most ever originated by one party. The tick rule is employed in the determination of who is the initiator in this scenario, .
A buy initiated trade is labeled by 1, and -1 for a sell. The logic is that the following an initial label l is assigned an arbitrary value of 1, if a trade occurs and therefore the price change is positive, then l = 1. However, in case the price change is negative then l = 0 and if there is no price change l is inverted. This has been shown to be ready to identify the aggressor with high degree of accuracy .
Among the other challenges posed by microstructure data are the scalability and interpretation. As an illustration, a day of microstructure data of a very liquid stock such as amazon or apple is measured in gigabytes. Compression and substantial disk usage will be needed for the storage of this kind of data for a long period of time; even then, the efficient process of this data usually requires streaming through the data by only uncompressing small amounts at a time .
The technological challenge resides more in the interpretation of the microstructure data than its processing. In the language of machine learning, what “features” or variables can we extract from this extremely granular, lower-level data that might be useful in building predictive models for the trading problem at hand? 
Reinforcement Learning for Optimized Trade Execution
Reinforcement learning (RL)  is designed for learning dynamic state-based policies from data and used as a technique to enhance existing analytical solutions for optimal trade execution with elements from the market microstructure. Applying RL to optimal execution has been proposed in several papers [12, 13, 14]. Given a volume-to-trade, fixed time horizon and discrete trading periods, the aim is to adapt a given volume trajectory such it is dynamic with respect to favourable/unfavourable conditions during realtime execution, thereby improving overall cost of trading .
We demonstrated the potential of machine learning approaches to problems of pure execution and showed that machine learning methodology might be effective for such efforts. Rather than seeking how to reduce costs for executing a given trade, it might be better to consider models that themselves profitably decide when to trade and how to trade, for alpha generation purposes.
Predicting Price Movement from Order Book State
In the stock market, trading activity is managed through the limit order book, which represents a set of buy and sell orders placed by traders at a variety of price points and which explains also the fluctuation that can be seen during the trading day.
For instance, an influx of sell orders at market would quickly exhaust the quantity available at the best bid price, thereby exposing the subsequent bid layer, which becomes the best bid layer. This lowers the stock price. The information contained in the deeper layers of the limit order book are continuously exposed during the trading session. In the recent years, The development of electronic trading systems made the study of deeper layers more practical and provided its users with useful information in the prediction of market price movement.
J. Doering and all  research delved new scientific breakthrough by developing and evaluating a convolutional neural network (CNN) in financial forecasting. Their model had been trained on a full limit-order book dataset (2007-2008 period), collected from the London Stock Exchange (LSE). Primary results indicated that convolutional networks behave reasonably well on this task and extracted interesting microstructure patterns, which are in line with previous theoretical findings. Additionally, it demonstrated a new approach to apply modern deep-learning methods for the analyse of market microstructure behaviour.
Machine Learning & Smart Order Routing in Dark Pools
The studies mentioned so far applied machine learning to trading problems arising in relatively long-standing exchanges where microstructure has been available for some time. It can also be applied to emerging exchanges, just the data available are less voluminous. In this purpose, we will describe the use of a machine learning approach to the problem of Smart Order Routing (SOR) in dark pools.
Following Reg NMS (US) and MiFID (Europe) regulations, the trading landscape have seen a multitude evolutions. New trading venues emerged to complement the trading capability of primary markets as the NASDAQ and the NYSE in the US, or EURONEXT, the London Stock Exchange and Xetra in Europe. Such alternative venues are called “Electronic Communication Network” (ECN) in the US and Multilateral Trading Facilities (MTF) in Europe.
Fees or rebates and liquidity are main difference between these trading venues. Because of liquidity issues, the trading firms split large orders across several trading destinations to optimize their execution by using SOR. Such devices are dedicated to split orders between trading destinations .
Figure 1 : Machine Learning Smart Order Router (link)
M. Kearns and all  studied the application of machine learning to the problem of SOR across a variety of dark pools, in attempt to maximize execution rates. They exogenously gave the number of shares to execute, and split it across venues. The basic challenge seen is that for a given security at a given time, different dark pools may have different available liquidity, thus necessitating an adaptive algorithm that can split a large order across numerous pools to maximize its execution. They developed a model that permits a different distribution of liquidity for each venue, and a learning algorithm that estimates this model in service of maximizing the fraction of filled volume per step.
A key limitation of dark pool microstructure data is the presence of censoring: if we place an order to buy (say) 100 shares of amazon, and an order of 50 is executed, we are certain that only 50 were available; but if all 100 shares are filled, it is possible that more shares were available for trading. Their machine learning approach to this problem adapted a classical method from statistics known as the Kaplan-Meier Estimator in combination with a greedy optimization algorithm.
Advanced mathematical tools and extensive amounts of data are required in order to have a good comprehension of modern aspect of market microstructure. In this article, we discussed the application of machine learning to microstructural research, different trading venues and explained how orders are executed, rooted and divided different pools.