What do you think are the biggest challenges facing data scientists/AI experts/quantitative investors in 2018/2019? Why are they important?
Despite the prestige associated with being a data scientist or AI engineer, we find that in most organizations such folks focus most of their time on basic data engineering tasks – building ingestion pipelines, manually cleaning data, building a master data store, etc. Until very recently, the tools available to automate these tasks have been very primitive, and therefore haven’t been adopted by the industry. Specifically, these tools try to be a one-stop-shop for all use cases, and that is an almost impossible challenge to solve. Instead, we’re now seeing a more verticalized approach, and companies such us at Cherre (we focus solely on real estate data) and others are trying to solve more verticalized learning to automate the fusion of data.
Can you share an example of how your system has been used by a new customer? Feel free to include any feedback or practical examples
One of our clients, a top 3 real estate investor by AUM, recently joined our platform to help them make better investment decisions. Our real estate data fusion platform, CoreConnect, allows them to connect all of their data, from all source – public and private, internal and external, paid and free, structured and unstructured – to a single source of truth. CoreConnect automatically geocodes the data to primary data objects (i.e. a building or a unit), programmatically discovers schemas and matches fields both to the cardinal data object and to a master data dictionary item, and flags conflicts for resolution (heuristically or manually). We essentially replaced their entire infrastructure of ETL, storage, modelling, compute, and delivery at a fraction of the cost, and at a much higher level of accuracy. We cut more than a year off their roadmap. They also started using our AI predictive analytics platform, CorePredict to programmatically evaluate asset acquisition, and while still in early stages of adoption, have so far acquired over $8B in assets leveraging the platform.
What can be done about the talent war in AI and machine learning and how do you handle this in your organisation?
This is one of the most challenging aspects of being in this space. We are pretty fortunate to be a well-funded startup, as this allows us to both properly compensate our employees, as well as offer them equity and meaningful project ownership. Big firms can typically only offer good compensation. For instance, we added Dr. Ron Bekkerman last year to our team to spearhead our CorePredict product. Dr. Bekkerman was one of the founding members of LinkedIn’s data science team, and has worked for companies like Google and Salesforce on some really cool products. Being able to offer great compensation, as well ownership of our flagship product is what really allowed us to compete. For junior folks, I would recommend trying to avoid hiring rockstar data scientists, and instead hire rockstar data engineers and train them to be great data scientists. It’s easier to accomplish, and at the end of the day you’re getting more bang for your buck.
Considering alt data, some news articles have suggested that US based data sets are performing badly for alpha as they are over-mined. Do you agree with this, and if so, where are better sources to be found?
While I’ve seen these reports, I’ve yet to see the data that actually supports this claim. If true, I would think that it’s just a question of saturation – more U.S. funds are consuming alternative data sets and generating signal, and therefore the median generated alpha per fund is lower. In Europe it’s likely a matter of funds being behind the curve, so those early adopters are enjoying an outsized market share of the alpha generated.
The methods hedge funds use to ingest new data sets are long, complex and expensive. Is this sustainable, or will outsourcing play a role?
Absolutely unsustainable. However, as I mentioned earlier, until recently they didn’t have much of a choice. Older tools have minimal source control or data lineage tracking, mandating a manual, in-house process. What we’re seeing today is more verticalization in handling ingestion, processing, and resolving of data sets, allowing funds to place more trust in external vendors to properly replace their expensive and flawed processes. We do this in the real estate space, and companies like Crux are doing this effectively for public equities. This is a trend that will just expand.