Back to all articles

CaaS: The Challenges of Data Sourcing | FundApps

6 mins
Posted on Apr 28 2020 by Andrew White

Interested in how compliance is automated? Find out more on how ‘compliance as a service’ (CaaS) has become a reality and the challenges around data sourcing.

The investment management industry is one where competitive advantages are pursued with a level of fanaticism rarely seen elsewhere.

On the technology side, entire server farms are moved metres closer to the trading venue and new tunnels dug to ensure fibre optic cables are straighter, all to shave nanoseconds off trade execution times. On the people side, managers pay top dollar to hire and retain the world’s brightest minds to devise and execute trading strategies.

Whether by design or by accident, this behaviour has spread from the front office to permeate entire organisations. Secret sauces, trading algorithms and proprietary information are not to be shared, and secrecy is the name of the game. 

This secrecy does not mesh with the aims of regulation, which attempts to establish a level playing field by ensuring that all market participants are abiding by the same regulations. In practice, this has meant that investment managers across the industry are duplicating exactly the same work. When a new regulation is passed, ad hoc teams are assembled with experts in legal matters, compliance, business analysis and technology. They assess how the regulation impacts their company and, if it does, how best to turn the regulation into something that machines can monitor before testing if the algorithms devised are returning the expected results. Processes are introduced, and operations teams trained in how to work in the new environment.

The time and effort spent on this is enormous. One survey recently calculated that asset managers, brokers, banks and others currently spend 4 per cent of revenue on compliance, with this number predicted to rise to 10 per cent by 2022. The fact that compliance is a cost centre and that no additional revenue or profit can be derived from it makes these numbers truly staggering. If every investment manager is spending similar amounts, the industry as a whole is spending trillions on something that effectively should be a commodity. Compliance is a binary state; either one’s regulatory obligations are being fulfilled or they are not. It is not possible to be ‘over-compliant’.

In this blog series we will be looking at how compliance can be automated and how ‘compliance as a service’ (CaaS) has now become a reality, largely due to cloud computing and the adoption of global data standards. The first in the series, this post focuses on the challenges around data sourcing. 


A key challenge for any institution or provider looking to automate compliance is the provision of standardised, high-quality data. Data has historically been country specific and very siloed (eg. securities data vs. derivatives data). Additionally, the data is often flavoured by the data provider (eg. Bloomberg Global IDs vs. Reuters Instrument Codes), making it extremely difficult to gain a consistent view across all geographies and asset classes. 

Data standards is an area in which the financial services industry lags far behind others. The international securities identification number (ISIN) uniquely identifies a security and was introduced in 1981. Acceptance was low until the G30 recommended its adoption in 1989, and in 2004 the European Union (EU) mandated its use in regulatory reporting. 


What the ISIN has done for financial instruments is mirrored (albeit with a ca. 30-year delay) in the legal entity identifier (LEI). Introduced in the wake of the 2008 financial crisis, the LEI identifies distinct legal entities that engage in financial transactions, and its uptake is absolutely crucial for automated compliance. Without an LEI one’s counterparty risk could be with ‘Bank’, ‘Bank Limited’, ‘Bank Ltd.’, ‘Bank Ltd (UK)’; a machine would struggle to identify if they are the same entity. With an LEI one’s counterparty can be uniquely identified with a 20-digit code. 

In September 2018, one of the final missing data pieces was announced when ANNA and the Global Legal Entity Identifier Foundation (GLEIF) declared that they would be working together to provide a global database of ISIN to LEI mappings. In a nutshell, this allows any financial instrument to be matched to its relevant legal entity, ensuring greatly enhanced counterparty/issuer risk calculations. This is all a step in the right direction, but much work remains to be done. 

Firstly, the ISIN and LEI must be globally adopted and not just by EU (generally a driving force behind standardisation) member states. Uptake is particularly weak in Asia, for example, mainly due to the lack of a ‘super regulator’ or supranational body to align local regulators. Secondly, while the LEI currently provides level one/‘business card’ information on entities (eg official name, headquarters, etc), level two data on relationships among entities is still sadly lacking. 

Machine Readability

Standardised, high-quality data is essential for automated systems to be able to monitor regulation correctly. The data itself, however, must be provided in a machine-consumable format; otherwise it is of little value. If managers have proven to be slow to ‘go electronic’, then competent authorities have proven to be slower still. 

Consuming Data Provided By Competent Authorities

Much data that was originally published on paper has simply been migrated to an electronic format thereof; take, for example, the US Securities and Exchange Commission (SEC), which publishes the list of 13F securities electronically but in a tab-delimited format that is difficult for machines to process. 

Progress, however, is being made, and many regulators are starting to publish data that is readily consumable by machines such as the UK Takeover Panel, which publishes the list of companies involved in a takeover bid on a daily basis. It is published in ‘human readable’ format as hypertext markup language (HTML), but more importantly, it is also published in extensible markup language (XML), which is perfect for machine processing. 

Sending Data to Competent Authorities

For decades the input regulators received was paper based. This transitioned to fax and has briskly moved onwards to be largely electronic. The vast majority of these electronic regulatory filings, however, are still made via proprietary/unstructured document formats such as Microsoft Excel, Microsoft Word or PDF. This is an acceptable medium if the sole purpose of the filing is to be archived and likely never viewed again, but it is relatively useless if the information is to be further processed by machines.

Creating these electronic documents is also difficult to automate, and the bespoke nature of each form ensures that relatively similar information is presented completely differently in different jurisdictions and even within the same jurisdiction. Take, for example, the United Kingdom — a short ownership disclosure must be made to the Financial Conduct Authority (FCA) by email with a Microsoft Excel file as an attachment, but a long ownership disclosure is to be sent to the FCA by email with a Microsoft Word file as an attachment.

Rather than accept emails, some regulators have chosen to provide ‘portals’ where filings can be uploaded. It is, however, alarmingly common that these portals require completion of a ‘captcha’ before logging in, essentially rendering it impossible for a machine to upload the filing. 

Multiply the above by ca. 100 jurisdictions, where regulators require electronic disclosures, and one can begin to grasp the scale of the problem. Across the industry, participants are spending time and effort generating, emailing and uploading bespoke filings to regulators who then cannot easily process the information they have received.

As with the publishing of data in a structured format/provided via API, there is also progress on the ability of regulators to consume structured electronic data. Progressive regulators such as the Federal Financial Supervisory Authority (BaFIN) in Germany now allow disclosure of interests via API, enabling a machine to generate the required information and send it to the regulator in milliseconds. More importantly, the regulator can then process this information easily, the effect of which is not to be underestimated. Rather than a regulator storing documents that are only likely to be viewed post-mortem, real-time structured data can be used to predict future events. Trends can be spotted and analysis run in milliseconds to prevent possible future crisis.

Here at FundApps we automatically source the latest in regulatory information by scraping multiple websites to provide our clients with takeover panel data, issuer limits and more. If more detail is needed, we reach out to regulators to get a better understanding directly from the source. If you'd like to find out more about how our Shareholding Disclosure service works then get in touch!

Click here to check out part 2 of this series and read about the benefits of knowledge sharing within the regulatory space. In part 3 we discuss how the gap between practice and interpretive decision-making can be closed.