Interview: Mohammed Shalaan, Chief Data Officer, Facteus
What do you think are the biggest challenges facing data scientists/AI experts/quantitative investors in 2021 post Covid-19 and why?
I think the extraordinary dynamics we observed in 2020 and into the last couple of months were proven very challenging for many of the typical assumptions built into most forecasting models. In general, I believe what we encountered was a simultaneous change of regimes on many fronts where relationships that held constant for many years were no longer valid and required constant re-calibration, introducing increased model risk in uncharted territories and less confidence in back test results.
In the alternative data space, we have seen increased volatility of capture rates and forecast errors across geography, age groups, gender and income classes exposing a host of modelling problems that were typically masked by the historical slow rate of changes across these dimensions.
Don't miss new reports! Sign up for The AI Data Science in Trading Newsletter
The way in which alt data is being sourced and consumed is changing after the pandemic. Why do you think this is and what do you see happening further within the alt data space?
When it comes to consuming alternative data, we see two themes strongly evolving in the investment space: diversification and story coherence.
As most practitioners are aware, every alternative dataset comes with different built-in biases and sometime, blind spots. This is typically related to the limitation on the cohorts covered by the underlying source of data and the process of how the data is collected.
For example, many consumer transactions panels are biased in their geography coverage, age groups and income levels of the consumers cohort they cover. During pre-pandemic, there was less focus on identifying and understanding this type of skews.
The pandemic effect on consumer behaviour is proven to be asymmetrical across all these dimensions and the over/under index effects is making every single dataset less predictive and investors are getting less confidence that what worked in the past would continue to work.
This created a lot of emphasis on diversifying away from those biases by acquiring multiple datasets to construct the full picture of consumer behaviour and favoured datasets that offers more dimensions rather than simple company level aggregates. I foresee more effort will be going into the standardization and providing clean join keys between various datasets to simplify the integration.
In addition, prior assumptions on the relationship between the different income statement measures are no longer reflecting the fast pace change in the real world. This created a demand for datasets that goes deeper into the income statement providing views on multiple revenue segments (consumer, corporate, domestic, and international) and can provide coherent contextual views on more bottom-line measures.
A portion of the industry still feel that advanced ML techniques such as Reinforcement Learning and Deep Learning cannot be applied to financial data effectively – do you agree? What are the main challenges in preventing this from happening? Can you give some concrete examples where you’ve seen this work successfully?
I think the adoption in the investment process is still very slow for few reasons: the low frequency of available fundamental data, incorporating the true effects of regime changes and the ability to methodically incorporate forward looking views is still challenging for many investment firms. There is also hesitation to incorporate these models into the investment process if the outcomes are hard to explain with simple set of fundamental oriented measures that resonate well with most active investors.
Where I have seen good example of this works well is when applied to high frequency data with high level of dimensionality and complex interaction between them. Applications in trade cost analysis, routing and optimization is a very good example with many successful implementations.
In the alternative data space, there are also many successful use cases where these techniques are applied to merchant identification and tagging and the creation of synthetic dimensions such as gender and income levels.
What is your advice to funds hoping to get new systematic strategies into production quickly and more often?
In todays world, my best advice is to build a strategy around continuously seeking diversification and accepting a high level of uncertainty in the process. Incorporating multiple sources of data with wide-ranging views on the same KPI is a good starting point but I do not think is enough. Forecast models also need to be diversified and able to deal with higher level of dimensions and time variant processes.
This also creates a need for a methodical way to incorporate and arbitrate between the different views expressed by the data and models and a lot of discipline in continuously monitoring/auditing divergence between backtest and out of sample performance.
Privacy and regulation surrounding the responsibility and ownership of data is an area that is still being explored and understood. What measures are you predicting will be put in place to navigate any foreseeable data privacy challenges while searching for alpha?
I foresee the push for more privacy protection laws and more regulation around the right to use and monetize data will only get stronger as alternative data become more widely used and more and more consumers are made aware of the value of their own data in the alpha generation process.
In the alternative data space, I see more push for using synthetic data as another layer of privacy guarantee where simply anonymized data failed to serve that purpose. What I mean by “synthetic” is the process where the data provided to investment firms contains all the true statistical properties and the relevant dimensions of the original data but with all values on the transaction/record level distorted to eliminate the possibility of identifying the individual or entity associated with the original record.
For investment firms, navigating complex and nonstandard DDQs (Due Diligence Questionnaires) is creating a lot of pressure on internal compliance teams to stay up to speed on regulation in multiple jurisdictions and a constant fear of potential headline risk. I foresee more coordinated effort across the alternative data industry to standardize the legal/regulatory review of data rights, chain of fiduciary duty and other aspects of the compliance process that typically add weeks to month to a typical data evaluation process.
Hear from Mohammed at AI & Data Science in Trading Online this March 15th - 16th, as he joins us on the 4pm panel on Day 2 "Journey of embedding quant methods within your fundamental approach for effective analysis."