Interview: Armando Gonzalez, CEO and Co-Founder, RavenPack
What do you think are the biggest challenges facing data scientists/AI experts in 2021 post Covid-19 and why?
The COVID-19 pandemic has accentuated some existing challenges for the data science community, and created new ones. A large source of frustration, even before the pandemic, was data access. The amount of data around us keeps growing at impressive rates, but the majority of data is inaccessible to researchers because it is not made available in a reliable and consistent manner to data science teams. The pandemic has illustrated that when pressed for time by a global crisis, companies who had not built the data infrastructure necessary to use the data from a variety of sources were poorly positioned to make timely decisions. Conversely, the availability of highly structured, deeply curated data, like the news analytics we process at RavenPack, made it possible for us to build and launch a coronavirus media monitoring dashboard from scratch in a matter of weeks, with a reduced team working remotely.
The second learning of the pandemic is that data is not enough: the crisis has highlighted a painful lack of trust towards scientific approaches in handling health data models, which likely prefigures a similar lack of confidence for insights drawn by machine learnings models. A lot of the work that companies are doing in building high-quality datasets is to establish a source of truth. COVID-19 has revealed that truth itself proved insufficiently compelling to many. This means that data and AI scientists are facing an educational challenge to bridge the gap between those who believe in a careful quest for unbiased understanding of accurate datasets, and those who don’t.
Don't miss new reports! Sign up for The AI Data Science in Trading Newsletter
The way in which alt data is being sourced and consumed is changing after the pandemic. Why do you think this is and what do you see happening further within the alt data space?
At a structural level, we are seeing huge investments going into the build out of large scale data lake infrastructures (or so-called data mosaics), leveraging both structured and unstructured data, originated from internal and external sources. However, this is a huge task and many firms are working closely with data vendors and technology & service providers opting for a best-of-breed approach to unlock the value of data across the organisation.
In this landscape, we have seen the emergence of data aggregators who aim to provide a one-stop solution to a large selection of datasets. We work with some of them. For example we serve a large part of our academic research community through Wharton’s WRDS system. These data aggregators are valuable and particularly effective for self-describing data and to simplify some aspects of data distribution logistics. Yet for the most complex datasets, they often lack the contextualization and navigation required to make the most of the data, which can often only be achieved by nurturing a close partnership between in-house data science teams and researchers on the client side. I foresee an accentuation of this dynamic where the less complex alternative data find a suitable vehicle in off the shelf collections, and the more advanced alt datasets live in a dynamic ecosystem powered by dedicated researchers.
ESG and sustainable investing is a topic that is becoming increasingly relevant in the current climate – do you believe ESG data can be used for as an alpha generation tool rather than just a risk management process? What changes are you seeing in how ESG data is being used?
Initially ESG data was thought of as a filter enabling portfolio managers and investors to target companies based on their client’s appetite for sustainability. European fund managers in particular, have shown their interest in ESG positive firms, but we have now reached a critical point where it’s no longer just a filter because it impacts valuation. Indeed, when the largest funds in terms of AUM elect to shift capital towards listed companies with positive ratings, it makes capital more scarce for other companies, so it will impact their premium. Even fast-money hedge funds have begun to factor in the discount rate of stocks whose companies rank poorly on ESG.
More can be achieved in alpha generation by going beyond the ratings. For instance you can capture alpha by comparing what companies do on ESG with what they say they do. This requires a higher frequency of signal than periodic MSCI ratings, for example, but news analytics like RavenPack’s platform can nowcast ESG behavior as published in news, filings, social media, and other public sources. . Once you leverage highly granular data like media attention and sentiment, you can go even further and use network graphs of media co-mentions to identify companies associated with other companies with low ESG ratings. This approach gives you further insights that can facilitate alpha capture too
Financial services lag in terms of adoption of cloud computing; what's your experience and how do you see the situation evolve?
Privacy and data safety concerns are still very present in the financial industry, but we are starting to see a real evolution due to an increased specialization of the cloud deployment responsibility. Even some of the largest hedge funds are now acknowledging that when properly configured, cloud deployments offer state of the art data security. Another factor also facilitates the transition: if you plan on performing BERT-style training of models in your own racks, you will quickly reach the limits of your own infrastructure. Digesting petabytes of data makes the cloud altogether a more practical option. Finally, it’s worth remembering that it’s not a binary choice: many firms may choose to leverage a cloud to train your model, and then delete the data. This modular approach can prove both practical and compelling for larger datasets.
Machine learning in finance is a frequent topic. Where do you see it go in the coming years?
First, machine learning is always first about machine training: your ML engine will only ever be as effective as the datasets you trained them with. Those datasets are expensive to produce, should be focused on what matters to you, and often require substantial manual work that limits the scalability of the approach. A lot of work is being put into shortening the entry barrier so companies can make better use of machine learning. In natural language processing, the stakes are high since firms are currently sitting on large amounts of unstructured data, from research reports to emails and messages, from which they cannot draw insights. It is an area where I foresee a lot of new developments and a lot of interest in the coming years as companies focus on revamping their knowledge workflows.
At the same time, ML is not a silver bullet. It is a new tool that keeps learning and becomes more attractive. I anticipate the technology roadmap for the financial industry will be a mix of ML and other approaches, and that one or the other may prove better performing depending on the task, and it is that comprehension of the capabilities and limits of the model, and the practical training that will be the differentiators that can bring success to ML in finance. Going further, those models may evolve to define different tribes in the financial industry that will approach risk, investment and knowledge in idiosyncratic ways.
Armando Gonzalez is a technology entrepreneur and the CEO and Co-founder of RavenPack, the leading provider of big data analytics for financial institutions. At RavenPack, he oversees all product design and engineering of the company’s data products and analytical tools. Armando is a recognized expert in applied big data and artificial intelligence (AI) technologies. He is widely regarded as one of the most knowledgeable authorities on systematic data analysis in finance. He is a recognized speaker at academic and business conferences across the globe. Armando holds degrees in Economics and International Business Administration from the American University in Paris. As a thought leader, his commentary and research has appeared in leading business publications such as the Wall Street Journal, Financial Times, among many others.