import React from 'react';

function article_1() {
    return {
        date: "1 Aug 2024",
        title: "Pair Trading in KDB+",
        description:
            "Creating market-neutral strategy where we simultaneously take long and short positions in two cointegrated securities when their price relationship temporarily diverges using a state space model to model their relationship.",
        keywords: [
            "ETF Albitrage",
            "Pair Trading  KDB+",
            "Cointegration KDB+",
            "Kalman Filter KDB+",
        ],
        style: `
                .article-content {
                    display: flex;
                    flex-direction: column;
                }

                .randImage {
                    align-self: center;
                    outline: 2px solid red;
                    max-width: 80%;
                }
                
                ul,li { 
                    list-style-type: none;
                    list-style-position:inside;
                    margin:0;
                    padding:0; 
                }

                .code-block {
                    display: inline-block;
                    min-width: 100%;
                    background-color: #F8F6F0;
                    border: 1px solid #e0e0e0;
                    border-radius: 4px;
                    padding: 15px;
                    font-family: monospace;
                    white-space: pre;
                    overflow-x: scroll;
                }
                `,
        body: (
            <React.Fragment>
                <div className="article-content">
                    <div className="paragraph">
                        I will implement a pair trading strategy – a well-known market-neutral strategy that leverages the relationship between two assets whose prices typically move together. I will build this strategy using a generalizable real-time implementation using Kdb+/q, nevertheless, I will also provide code in R which mimics the core components of the strategy. The main goal is to maintain the code as straightforward and efficient as possible to obtain optimal performance and maintainability.
                    </div>
                    <h3>Main Steps of Pair Trading Strategy</h3>
                    <ul>
                        <li>1. Identifying cointegrated securities</li>
                        <li>2. Implementing a (dynamic) model to calculate spreads</li>
                        <li>3. Visualizing the approach in real-time and executing trades when a signal arises, stop trading if there's a stuctural break in the cointegration relationship.</li>
                    </ul>

                    <h3>Kdb+/q Implementation: Tick Architecture</h3>
                    <p>For the Kdb+/q implementation, we will rely on the tick architecture, a series of interconnected q processes for handling high-frequency trading data efficiently. Key components include:</p>
                    <ul>
                        <li><strong>- Tickerplant (TP):</strong> Responsible for receiving and timestamping incoming data, and then broadcasting it to other components such as the RDB in a timely manner.</li>
                        <li><strong>- Real-time database (RDB):</strong> Stores data for quick access and analysis.</li>
                        <li><strong>- Historical database (HDB):</strong> Archives older data for long-term storage and analysis.</li>
                        <li><strong>- Feed handler:</strong> Interfaces with external data sources, enabling TP to receive accurate up-to-date information.</li>
                    </ul>

                    <p>
                        This simple architecture enables fast access to data, making it ideal for high-frequency trading applications such as the one we will build. For the application we will build, we will also introduce other components as shown in the diagram below, first developed by Alexander Unterrainer, and extended by Habla Blog authors and contributors. These components are the Model Server (MS), Real-Time Pairs Trading (RPT), and KX Dashboard, and will be introduced as needed.
                    </p>
                    <img
                        src="https://www.habla.dev/blog/assets/2024/07/02/general-architecture-rpt.png"
                        alt="Tick Architecture"
                        className="randImage"
                    />

                    <h3>Implementation Process</h3>
                    <p>
                        Now, we will move forward with the implementation of this strategy. For that, we will first need to identify cointegrated assets. To do this, we will need to delve into the HDB component to identify which asset-pairings historically satisfy this statistical relationship, for the last x days. To do so, we use interprocess communication (IPC) to send a query to HDB the following way to get the data we want:
                    </p>

                    <div className="code-block">
                    <div className="paragraph">//Get 252 days worth of intraday data from HDB</div>
                    <div className="paragraph">prices: &#123;select sym, time, date, price, volume from trade.table where date within(.z.d-x;.z.d),</div>
                    <div className="paragraph">         time within(09:30.000;16:00.000), listing_mkt in enlist(`TSE`AQL)&#125;;</div>
                    <div className="paragraph">hdb:`$":",.z.x 0;</div>
                    <div className="paragraph">t:hdb(prices;252);</div>
                    </div>

                    <p>
                        In the first line of code we use q-SQL syntax to assign to the variable prices a function that retrieves specific columns from HDB with some filters, namely only intraday data, for equities listed in specific venues, for the last x trading days. In the second line, 
                        we read the initial command line argument (.z.x) which specifies the port number where the HDB is expected to be running. Following this, we provide the prices query along with its parameter to define the range, in this case 252 trading days or 1 year worth of trading data.
                        When we execute this 3 lines of code, we request and retrieve from HDB using IPC, a table of data, saved under variable t, containing the intraday pricing data for the universe of equities listed in our pre-specified venues for 252 trading days.
                    </p>

                    <p>Now we will proceed to identify all the cointegrated pairs in our universe. To ensure a thorough and robust analysis, we will take the following approach. First, we will gather intraday price data for each ticker, ensuring we have data points for every second of the trading day. For this analysis, we will use the Volume Weighted Average Price (VWAP) as our primary metric, as it provides a more accurate representation of the true trading price throughout the day.
                    Then, we will construct a correlation matrix encompassing all potential pairs in our universe. This matrix will allow us to identify pairs with correlation coefficients above a predetermined threshold. This initial screening based on correlation is just the first step in our pair selection process. </p>
                    
                    <div className="code-block">
                    <div className="paragraph">// Defining General Pivot Function from KX </div>
                    <div className="paragraph">piv:&#123;[t;k;p;v;f;g]</div>
                    <div className="paragraph">    v:(),v;</div>
                    <div className="paragraph">    G:group flip k!(t:.Q.v t)k;</div>
                    <div className="paragraph">    F:group flip p!t p;</div>
                    <div className="paragraph">    count[k]!g[k;P;C]xcols 0!key[G]!flip(C:f[v]P:flip value flip key F)!raze</div>
                    <div className="paragraph">        &#123;[i;j;k;x;y]</div>
                    <div className="paragraph">            a:count[x]#x 0N;</div>
                    <div className="paragraph">            a[y]:x y;</div>
                    <div className="paragraph">            b:count[x]#0b;</div>
                    <div className="paragraph">            b[y]:1b;</div>
                    <div className="paragraph">            c:a i;</div>
                    <div className="paragraph">            c[k]:first'[a[j]@'where'[b j]];</div>
                    <div className="paragraph">            c&#125;[I[;0];I J;J:where 1&lt;&gt;count'[I:value G]]/:\:[t v;value F]&#125;;</div>
                    <div className="paragraph"> </div>
                    <div className="paragraph">//pivoting wider table and filling null price values with preceding non-nulls</div>
                    <div className="paragraph">data: fills piv[`date`time xasc 0!select volume vwap price by sym, 1 xbar `second$time, </div>
                    <div className="paragraph">     date from t;(), `date`time;(), `sym; `price;&#123;y[;0]&#125;&#123;x,z&#125;];</div>
                    <div className="paragraph"> </div>
                    <div className="paragraph">//correlation matrix between tickers using closing prices</div>
                    <div className="paragraph">show corr_matrix: x cor/:\\:x:flip delete date,time from 0!select from data where time=(last;time) fby date;</div>
                    <div className="paragraph"> </div>
                    <div className="paragraph">//generate table with pairwise correlations</div>
                    <div className="paragraph">pairwise_corr: flip (`pair1`pair2`correlation)!(raze &#123;(count[value corr_matrix])#x&#125; each (cols value corr_matrix);</div>
                    <div className="paragraph">     (`int$(count[value corr_matrix] xexp 2))#cols value corr_matrix;</div>
                    <div className="paragraph">     raze &#123;raze x&#125; each (value corr_matrix));</div>
                    </div>

                    <p>We will now subject our resulting subset of highly correlated pairs to number of statistical tests, namely:</p>
                    <ul>
                        <li><strong>- Augmented Dickey-Fuller (ADF) tests for stationarity.</strong></li>
                        <li><strong>- Hurst Exponent for strong mean-reversion tendencies.</strong></li>
                        <li><strong>- Jarque-Bera test to ensure residuals are normally distributed.</strong></li>
                    </ul>
                    <p>There are many more test we could apply, but with this set, I believe we can ensure that the pairs we ultimately select for our trading strategy are not only highly correlated but also exhibit stable, mean-reverting behavior that is essential for successful pair trading.</p>

                    <div className="code-block">
                    <div className="paragraph">&#47;&#47;Stationarity test based on residulas of OLS Regression (Augmented Dickey-Fuller)</div>
                    <div className="paragraph">adf_test:&#123;[t;pairs]</div>
                    <div className="paragraph">     pair1: first pairs;pair2: last pairs;</div>
                    <div className="paragraph">     X: ?[t[pair1] &gt; t[pair2];t[pair2];t[pair1]]; </div>
                    <div className="paragraph">     y: ?[X=t[pair1];t[pair2];t[pair1]];</div>
                    <div className="paragraph">     mdl: .ml.stats.OLS.fit[log X; log y; 0b]; </div>
                    <div className="paragraph">     spread: (log y) - (first mdl[`modelInfo]`coef)*log X;</div>
                    <div className="paragraph">     :(value .ml.ts.stationarity spread)[`stationary]=1b&#125;;</div>
                    <div className="paragraph"> </div>
                    <div className="paragraph">&#47;&#47;Apply test to pairs that have a correlation &gt;= 0.7</div>
                    <div className="paragraph">results_adf: adf_test[delete date,time from 0!select from data;] peach flip (select from  pairwise_corr where pair1&lt;&gt;pair2, 0.7 &gt;=  abs correlation)[`pair1`pair2];</div>
                    <div className="paragraph">&#47;&#47;Get unique combinations of cointegrated pairs according to the adf_test</div>
                    <div className="paragraph">stationary_pairs: distinct &#123;asc x&#125; each (flip pairwise_corr[`pair1`pair2]) where raze adf;</div>
                    </div>
                    <p>The subset of pairs that passed our first test will now be subject to the Hurst Exponent test. For this test, we will be looking for pairs that show mean-reversing behaviour &#40;i.e., a Hurst Exponent value strictly less than 0.5&#41;. Pairs that don't satisfy this criteria, will be deleted.</p>

                    <div className="code-block">
                    <div className="paragraph">// Hurst Exponent estimation using RS analysis with log price series in ascending order of time </div>
                    <div className="paragraph"> </div>
                    <div className="paragraph">// ts: table w log prices series in asc order of time </div>
                    <div className="paragraph">// ms: int for minimum size of a range </div>
                    <p> hurst: &#123; [ts;ms]; </p>
                    <p>     // calculate number of ranges, and lags for each range </p>
                    <div className="paragraph">     nrange: floor ((count ts) % ms);  </div>
                    <div className="paragraph">     ranges: 1 + til nrange; </div>
                    <div className="paragraph">     lags: &#123;floor ((count ts) % x)&#125; each ranges;</div>
                    <div className="paragraph">     // calculate sublists for each range </div>
                    <div className="paragraph">     subts: &#123; nsublist [ts;lags[x-1];x] &#125; each ranges;</div>
                    <div className="paragraph">     // calculate all RS for all sub ranges </div>
                    <div className="paragraph">     rss: &#123; &#123;rs x&#125; each x &#125; each subts;</div>
                    <div className="paragraph">     // calculate RS(averaged over all sub ranges) for each range</div>
                    <div className="paragraph">     RS: &#123;avg x&#125; each rss;</div>
                    <div className="paragraph">     // calculate the best-fit line slope </div>
                    <div className="paragraph">     :slope [2 xlog lags;2 xlog RS] &#125;;</div> 

                    <p>// Helper Functions </p>

                    <div className="paragraph">rs: &#123; [ts];</div>
                    <div className="paragraph">     // calculate mean</div>
                    <div className="paragraph">	    mean: sum ts % count ts;</div>
                    <div className="paragraph">     //calculate mean adjusted ts</div>
                    <div className="paragraph">     mdts: ts - mean;</div>
                    <div className="paragraph">     // calculate cumulative deviate series based on mdts  z: sums mdts;</div>
                    <div className="paragraph">     z: sums mdts;</div>
                    <div className="paragraph">     // calculate range series R</div>
                    <div className="paragraph">     R: max z - min z;</div>
                    <div className="paragraph">     //calculate std series S</div>
                    <div className="paragraph">     S: std ts;</div>
                    <div className="paragraph">     //calculate rescaled range series R/S </div>
                    <div className="paragraph">     :R%S &#125;;</div>


                    <p>// Split list to (most) equal chunks </p> 
                    <div className="paragraph">nsublist: &#123; [ts;lag;n]; </div>
                    <div className="paragraph">     // calculate cut indices, and partition ts to n chunks </div>
                    <div className="paragraph">	    cuts: lag*(til n);</div>
                    <div className="paragraph">     :cuts _ts &#125;;</div>

                    <p>stds: &#123; [ts]; :std each &#123;x#ts&#125; each (1+til count ts) &#125;; </p> 
                    <p>std: &#123; [ts]; mean: sum ts % count ts; sqrt (sum ((ts - mean) xexp 2) % count ts) &#125;;</p> 
                    <div className="paragraph">slope: &#123; [x;y];  </div>
                    <div className="paragraph">     xMean: sum x % count x;  yMean: sum y % count y;  </div>
                    <div className="paragraph">     sumX: sum x;  sumXY: sum (x*y);  sumXX: sum (x xexp 2);  </div>
                    <div className="paragraph">	    :(sumXY - sumX * yMean) % (sumXX - sumX * xMean) &#125;;</div>
                    </div>

                    <p>Finally, with the remainding fraction of eligible pairs, we will perfrom Jarque-Bera test to make sure our distributions sample data have the skewness and kurtosis matching a normal distribution.</p>

                    <div className="code-block">
                    <div className="paragraph">JB: &#123; [ts]; </div>
                    <p>     // Sample skewness</p> 
                    <div className="paragraph">     tmp: (data-mu);</div>
                    <div className="paragraph">     N: count data;</div>
                    <div className="paragraph">     S: (xexp[N*N-1;0.5]%N-2)*((1.0%"f"$N)*sum tmp*tmp*tmp)%xexp[((1.0%"f"$N)*sum tmp*tmp);1.5];</div>
                    <p>     // Sample kurtosis</p> 
                    <div className="paragraph">     K: (((1.0%"f"$N)*sum tmp*tmp*tmp*tmp)%xexp[((1.0%"f"$N)*sum tmp*tmp);2])-3.0;</div>
                    <p>     // JB Test</p> 
                    <div className="paragraph">     : (N%6)* S xexp 2 + 0.25*(K-3) xexp 2 &#125;;</div>
                    </div>

                    <p>Alternatively, instead of creating our own statistical test as we did above, we could have used PyKX to leverage relevant libraries from the Python ecosystem within kdb+. As our implemetation in kdb+ is more effecient, we will skip this step and go directly to the next task in our application.</p>

                    <h3>Model Development</h3>

                    <p>Now that we have our pairs of cointegrated equities, we need to build a model to detect trading opportunities. For this we will rely on a Kalman Filter. The Kalman filter will will dynamically estimate the state (intercept and slope) of the relationship between pairs over time. We will then normalize the spread calculation to a z-score using the last n observations. When the z-score exceeds a certain threshold (e.g., 2 standard deviations), it generates a signal:</p>
                    
                    <ul>
                    <li><strong>- If z-score &gt; threshold:</strong> Generate a short signal (-1) for the spread (short asset 1, long asset 2)</li>
                    <li><strong>- If z-score &lt; -threshold:</strong> Generate a long signal (1) for the spread (long asset 1, short asset 2)</li>
                    <li><strong>- If -threshold ≤ z-score ≤ threshold:</strong> No signal (0)</li>
                    </ul>

                    <p>For the spread calculation, we will use log pricing to calculate intraday spreads with our model. And for the sake of performance, we will build this aforementioned model as a separate program within the MS (Model Server) component. MS will communicate with our other components using IPC.</p>

                    <img
                        src="https://www.habla.dev/blog/assets/2024/07/02/general-architecture-ms.png"
                        alt="Model Server"
                        className="randImage"
                    />

                    <p>As of right now, the code below which is part of our MS is optimized for redability. Performance can be greatly enhanced by not uisng if statements, and while loops.</p>
                    
                    <div class="code-block">
                    <div class="paragraph">//Kalman Filter</div>
                    <div class="paragraph">kalmanFilter:&#123;[x;y]</div>
                    <div class="paragraph">    // Initialize parameters</div>
                    <div class="paragraph">    i:0;</div>
                    <div class="paragraph">    while[i&lt;count[y];</div>
                    <div class="paragraph">            if[i=0; </div>
                    <div class="paragraph">                [</div>
                    <div class="paragraph">                delta:0.0001;</div>
                    <div class="paragraph">                Vw:(delta%1-delta)*2 2#1 0 0 1;  // 2x2 diagonal matrix</div>
                    <div class="paragraph">                Ve:0.001; </div>
                    <div class="paragraph">                R:2 2#0f;  // initiliaze state covariance </div>
                    <div class="paragraph">                P:2 2#0f;  // 2x2 zero matrix</div>
                    <div class="paragraph">                Kx: (2;count y)#0f; //capture Kx gain</div>
                    <div class="paragraph">                beta:(2;count y)#0f;  // Initialize beta with zeros</div>
                    <div class="paragraph">                y_est:();  // Initialize measurement prediction</div>
                    <div class="paragraph">                e:();  // Initialize measurement prediction error </div>
                    <div class="paragraph">                Q:();  // Initialize measuremnt prediction error variance</div>
                    <div class="paragraph">                ]</div>
                    <div class="paragraph">            ];</div>
                    <div class="paragraph">       </div>
                    <div class="paragraph">            if[i&gt;0;</div>
                    <div class="paragraph">                [</div>
                    <div class="paragraph">                beta[;i]:beta[;i-1];</div>
                    <div class="paragraph">                R:P+Vw;</div>
                    <div class="paragraph">                ]</div>
                    <div class="paragraph">            ];</div>
                    <div class="paragraph">        </div>
                    <div class="paragraph">            // Measurement prediction</div>
                    <div class="paragraph">            y_est,: sum x[i;]*beta[;i]; </div>
                    <div class="paragraph">    </div>
                    <div class="paragraph">            //Measurement Variance</div>
                    <div class="paragraph">            Q,: &#123;[M;v]:sum v*t1:M mmu v;&#125;[R;x[i;]] + Ve;</div>
                    <div class="paragraph">       </div>
                    <div class="paragraph">            // Calculate error</div>
                    <div class="paragraph">            e,:y[i]-y_est[i];</div>
                    <div class="paragraph">       </div>
                    <div class="paragraph">            // Kalman gain</div>
                    <div class="paragraph">            K:mmu[R;&#123;x*/:y&#125;[x[i;];(1%Q[i])]];</div>
                    <div class="paragraph">            Kx[;i]:K; / list of Kalman gain</div>
                    <div class="paragraph">    </div>
                    <div class="paragraph">            // State update</div>
                    <div class="paragraph">            beta[;i]+: K*e[i];</div>
                    <div class="paragraph">            P:R-(x[i;] mmu R) */: Kx[;i];</div>
                    <div class="paragraph">            i+: 1;</div>
                    <div class="paragraph">        ];</div>
                    <div class="paragraph">        // Return updated values as a dictionary</div>
                    <div class="paragraph">        : flip `beta`intercept`y_est`e`Q!(beta[0];beta[1];y_est;e;Q);</div>
                    <div class="paragraph">    &#125;;</div>
                    <div class="paragraph">    </div>
                    </div>

                    <p>Now that we can calculate the spread for any cointegrated pairs, we need to ensure we can do this in real-time for our application to work effectively.
                    To achieve this, we must make sure our model can receive data from TP to calculate the new spread in real-time. We can accomplish this using IPC to communicate with the MS component.
                    Moreover, we can utilize multi-threading to calculate the spreads of multiple cointegrated pairs simultaneously, enhancing efficiency.
                    </p>

                    <img
                    src="https://www.habla.dev/blog/assets/2024/07/02/general-architecture-rpt.png"
                    alt="Real-Time Pair Trading"
                    className="randImage"
                    />
                    <p>Above, you can see a diagram showcasing how the IPC to communicate with the MS component would work. When we recieve the results of our spread calculation, we could use a tool such as KX dashboard to see the if we have a signal in real time.
                        The code below will achive all of this.
                    </p>

                    <div class="code-block">
                    <div class="paragraph">(pair;spread_model):model_server(&#123;enlist[pair],spread_func pair:pair_symbols p_values?min p_values&#125;;::)</div>
                    </div>

                    <p>In this context, we assume that model_server refers to the Model Server component (analogous to how historical_db was utilized in a previous section). We transmit a query for the Model Server to process. It determines the most cointegrated pair by identifying the minimum p-value, locating its position, and using this information to retrieve the names of the indexes. Subsequently, it applies the spread_func function obtained earlier to generate the spread model for the most cointegrated pair. Consequently, we obtain both the pair of indexes (pair) and the corresponding spread model (spread_model).</p>
                    <p>Components operating in real-time can express their interest in a specific table and a subset of symbols. Naturally, our focus is solely on acquiring quotes associated with one of the indexes in our pair:</p>
                    <div class="code-block">
                    <div class="paragraph">//Acquiring Real-time Quote Data from Tickerplant</div>
                    <div class="paragraph">tickerplant(".u.subscribe";`quote;pair)</div>
                    </div>
                    <p>Here, tickerplant represents a reference to the Tickerplant process. Essentially, .u.subscribe registers the Real-time Processing Tool (RPT) handle in the Tickerplant, enabling it to subsequently notify the newly subscribed component about incoming events. This mechanism assumes that the subscriber has defined an update function to serve as a callback:</p>
                    <div class="code-block">
                    <div class="paragraph">price_cache:pair!2#0f</div>
                    <div class="paragraph">update_func:&#123;price_cache^:exec sym!log(ask+bid)%2 from select by sym from y&#125;</div>
                    </div>
                    <p>Our update_func takes tick data as input (y), selects the most recent entry for each symbol, computes their mid-price, and refreshes the values (if any) in a straightforward dictionary called price_cache. This dictionary functions as a buffer, which proves crucial in the subsequent and final phase of our process.</p>
                    
                    <h2>Conclusion</h2>
                    <p>We have successfully completed the construction of our application that enables the implementation of a Pairs Trading strategy using kdb+/q. The cornerstone of our application is the Tick Architecture, which empowers us to handle both historical and real-time data in an efficient manner with the help of IPC. IPC serves as a crucial bridge between components, facilitating seamless data flow and processing.</p>
                    <p>While this implementation is not yet a "production-quality" solution, it provides us with a solid foundation for a high-frequency trading strategy. If executed well and further refined, this approach has the potential to be highly profitable.</p>
                    <p>Moving forward, there are several areas we could explore to enhance our system such as incorporating risk management techniques, further code refinement to optimize performance and developing a more comprehensive backtesting framework to validate the strategy. These enhancements would bring our application closer to a production-ready state, potentially increasing its efficacy and reliability in real-world trading scenarios.</p>
                    
                </div>
            </React.Fragment>
        ),
    };
}




function article_2() {
	return {
		date: "15 Aug 2024",
		title: "Predicting Realized Volatility of S&P 500 Constituents with ML",
		description:
			"Stock volatility prediction model achieves 85% accuracy, leveraging a decade of data, over 50 unique features, hyperparameter optimization and strategic industry grouping. Models could have applications to long/short vega strategies.",
			
		style: ``,
		keywords: [
			"Machine Leanining in Finance",
			"Long/Short Vega Strategies",
			"Predict Volatility",
			"Forecast Volatility",
		],
		body: (
			<React.Fragment>
				<h1>Content of article 2</h1>
			</React.Fragment>
		),
	};
}

const myArticles = [article_1, article_2];

export default myArticles;
