Data Science, Optimization, Programming

How can GA help cut down problem space and converge towards a better solution?

Double Helix — Photo by Daniil Kuželev on Unsplash

If you have heard of systematic trading or algorithmic trading, then you must know that optimization of strategy is among one of the most important factors that dictate whether the strategy would even break even. And the worst part is: optimization is very computationally heavy. Imagine a simple MACD crossover strategy, there will be at least 3 parameters: fast, slow and signal moving average period, and hundreds of possible values for each, making it more than a million possible combinations.

Incomes genetic algorithm (GA): a probabilistic & heuristic searching algorithm inspired by Darwin’s theory on natural selection that the fittest…

Data Science, Programming, Pandas, Efficiency

Time to stop being too dependent on .iterrows() and .apply()

Photo by Chris Ried on Unsplash

Python, arguably the coolest programming language these days (thanks for Machine Learning and Data Science), is not very well known for its efficiency when compared with one of the best programming language — C. An example of this would be conditional logics. When developing machine learning models, it is quite common that we would need to programmatically update labels based on hard-coded rules derived from statistical analysis or findings of the previous iteration. No shame in admitting it: I have been coding it with Pandas apply until one day I grew so fed up with the nested block that I…

Photo by Shane Aldendorff on Unsplash

What are the impacts? And how can we avoid them?

A common machine learning modelling process goes by (1) weight initialization, (2) forward propagation, (3) loss (or cost) computation, (4) backpropagation, and (5) weight update using optimization algorithms. While weight initialization is usually as simple as one line, it is often easily overlooked how delicate weight initialization is and how impactful it is to the final model performance.

In this blog, we will look into some of the common problems in neural network initialization, their corresponding impacts, and more importantly how can we detect and avoid them.

# one-liner weight initialisation for tensorflowtf.keras.initializers.RandomUniform(minval=-0.05, maxval=0.05, seed=None)

Zero Initialisation — Symmetry Problem

  • Why does it happen…

Image by Free-Photos from Pixabay

THOUGHTS AND THEORY

What is Attention Mechanism & Why is RFA better than Softmax?

Google has recently released a new approach — Random Feature Attention — to replace softmax attention mechanisms in transformers for achieving similar or better performance with significant improvement in time and space complexity.

In this blog, we will look into the background of transformers, what is an attention mechanism, and why is RFA a better alternative to softmax attention mechanism. We will finish the blog with a couple of takeaways from RFA.

Note: This blog is based on Google Deepmind’s latest paper ‘Random Feature Attention’ by Peng et al. Although we will cover the majority of the core concepts, please…

Photo by Marc-Olivier Jodoin on Unsplash

Data Science, Machine Learning

Accelerate matrix operations with TensorFlow’s experimental API

If you have used Python for any data processing, you have most likely used NumPy (short for Numerical Python). It provides a rich arsenal of complex data types and efficient matrix manipulation functions. Its C-accelerated implementation of vectorisable functions has earned it its reputation for processing n-dimensional array at lightning speed. But can we go faster than that?

NumPy’s C-accelerated implementation of vectorisable functions enables us to efficiently process large multi-dimensional arrays

In comes, TensorFlow’s take on NumPy API.

Thanks to TensorFlow’s GPU acceleration, we can now run NumPy situationally even faster than we can already — faster than lightning…

Database, but we need to fit it in a Raspberry Pi — Photo by Jan Antonin Kolar on Unsplash

Data Science, IoT, Technology

Containerized Postgres Server with PostGIS and TimescaleDB

Network-Attached Storage (NAS) is a centralized well of data repository which is accessible by other devices in the same computer network. For those who are not familiar with it, think of it as Google Drive or dropbox, where data is accessible via the network. Because of its accessibility, NAS is one of the best solutions for sharing data across devices in the form of a database, be it relational or no-SQL.

In this blog, we will be going through how to assemble and set up your own NAS on a Raspberry Pi. Our database server will be a Postgres database…

Can WALL-E pick the next tree for you? — Photo by Jason Leung on Unsplash

Machine Learning, Innovation

Repurposing the algorithm behind Spotify for selecting your investment portfolio

While most of the blogs in Medium has been dedicated to generating entry and exit signals to scalp the market for a single instrument, this blog starts with a different assumption: We don’t know which stock should we opt for nor do we know when is the best time to enter or exit. Instead of putting all our eggs in one basket, can we use machine learning to select and rebalance a portfolio of stocks for outperforming S&P 500?

By periodically investing in an index fund, the know-nothing investor can actually out-perform most investment professionals — Warren Buffett

Our weapon…

Source: Derivative from qimono and comfreak on Pixabay

DATA SCIENCE, OPTIMIZATION, PROGRAMMING

How can the Coefficient of Variation make Genetic Algorithms more robust?

In a previous blog, we have talked about how the Genetic Algorithm can be used for optimizing parameters of a trading strategy, and hence parameters of any non-linear functions given an appropriate fitness/cost function. The arguably most important part that we have not touched on, is how do we make sure that the GA is well-fitted but not overfitted.

This time, we will be going through how to ensure a robust GA training process when applied to a time series problem using the same MACD strategy as an example. If you have not read my blog on how to apply…

Have you heard of memory over-allocation?

Photo by Fredy Jacob on Unsplash

Almost certainly, the first iterable data structure any Python programmer has come across is a list. It has a strong first impression of being super flexible, and suitable in almost any scenario in our daily coding routine. But if we want to become a better Python programmer, we should not opt for one structure just because of the flexibility and first impression, but rather make the decision after understanding what is under the hood.

After diving into the rabbit hole of different Python data structure for quite a while, we are going to cover in this blog on some findings…

How to Muli-thread with Python to Speed up Your Code

Man Office Businessman — Free image on Pixabay

As Python was first developed 29 years ago, it is unsurprisingly that it was designed more as a linear programming language when single-core CPUs are still dominating the market. In fact, CPython developer may still be feeling the heat when it comes to concurrency. Luckily, we Python noobs can lay back and enjoy the fruits of PEP 371, where multiprocessing was officially added to the standard libraries back in 2008, and PEP 3156 where asyncio made its way to standard libraries in 2012. …

Louis Chan

Data Scientist in the day, Data Scientist in the night as well. Life is nothing without statistics and code.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store