Data Virtualisation within Asset Management
11 May 2021
What is the problem statement?
This varies across asset management firms and by use case within each firm. The principal issues arise where the operating model brings data from different processes and sources together to support business requirements and decision making. Often the following results:
A siloed and unintegrated view of data from source systems predominates;
Data quality is highly variable;
Data can often be misaligned, both temporally and semantically;
‘Flush & fill’ models dominate and data is rarely if ever preserved in its original form;
Systems of record are rarely, if ever, immutable or globally bitemporal.
A centralised and authoritative source of truth for investment and business data
One way to achieve this is to implement a data virtualisation layer. Virtualisation creates the look and feel of a single unified data source. But have the core problem statements been addressed? We offer the following answers:
The value proposition from data virtualisation is often predicated on fast time to value and a relatively quicker integration pathway.
The limiting factors tend to be the competencies and design principles of the underlying source systems, how data is processed and managed, and the technologies’ performance. Ultimately, a virtualisation layer is only as sophisticated as the source systems and / or data it references.
We assert that several questions should be asked about the overall the data architecture:
Is data overwritten in each source system? Is data deleted from the source system? In summary, are all the source systems immutable?
Does each source system support bitemporal data and contain data which supports querying in a bitemporal fashion (time stamps, versioning etc.)?
Does each source system record the provenance of data and its lineage?
Do each of the source systems contain data which is semantically the same such that views of data throughout the virtualisation layer are consistent?
Are the source systems relatable and aligned both from a primary ID perspective and temporally?
How performant would a data virtualisation layer be, considering that it likely needs to call multiple legacy systems, with long historical data sets?
Without an immutable, globally bitemporal data paradigm, do the age-old problems of explaining the impact of changes in historical data get solved?
‘Data Virtualisation Plus’ – dual sourcing to reach the target operating model
Data virtualisation can be a first and useful step in presenting a ‘unified view’ to users within the investment firm and selling the vision of improved data. Speed of integration, limited ETL (caveats apply) and early steps towards better data access and centralisation can all build internal support. Despite these benefits, virtualisation alone does not result in better source data.
A system such as Aprexo’s DMS is focussed on ‘getting the data right’, managing the clash of data for different use cases in the operating model. It also addresses data quality, alignment, provenance and lineage. It does this natively from the cloud, is scalable, performant and bitemporal.
A two-pronged strategy of solving a subset of the core underlying data management problems using Aprexo’s DMS, together with implementing a virtualisation layer across many systems, including Aprexo, would support an asset manager’s move to a long-term solution for data consumers and begin strategic steps towards resolving the problems with systems of record, all relatively inexpensively.
The roll-out of Aprexo’s DMS can initially focus on discrete areas and expand to cover more domains over time. A solid foundation for the asset management firm’s very long-term operating requirements can begin to be created now. Rearchitecting with a virtualisation technology without addressing the core underlying data store shortcomings can only be a sticking plaster solution.
Over time, with such a dual approach:
Virtualisation would be used as an insulating wrapper to shield downstream users from changes in the underlying data sources and technologies;
Reporting data warehouses could gradually be removed, reducing support costs;
Downstream processes will become more efficient due to Aprexo’s ability to surface all data with good lineage and provenance;
End users would be able to interrogate and understand historic data changes themselves using Aprexo’s ability to near-instantly explain the past bitemporally, significantly freeing up engineering time from ad hoc support activities.
One further benefit of such a dual sourcing approach is that the semantic field mapping only needs to be done once (for the two systems simultaneously) rather than be repeated for the second system at a later date. Virtualisation does not solve semantic misalignment between systems of record.
This gradual dual source approach would enable an asset manager to take advantage of Aprexo’s new data paradigm and minimise the commercial risks of selecting a small supplier. The extra costs of this approach would be marginal, Aprexo’s pricing is consumption driven and the asset manager would only pay in line with its use.
Comparing Data Virtualisation Plus with the problem statement
A siloed and unintegrated view of data from source systems predominates
Virtualisation gives a unified view;
Data quality is highly variable
Raised data quality from improved data management and governance processes;
Data can often be misaligned, both temporally and semantically;
Data is single-sourced and the system of record persists everything in near real time;
‘Flush & fill’ models dominate and data is rarely if ever preserved in its original form;
Data is well-lineaged with clear provenance
Systems of record are rarely, if ever, immutable or globally bitemporal
Data is stored bitemporally to a high client and regulatory audit standard.
Building Something That Works
Gall’s Law and Complexity
2 February 2021
A complex system that works is invariably found to have evolved from a simple system that worked. A complex system designed from scratch never works and cannot be patched up to make it work. You have to start over with a working simple system.
John Gall, The Systems Bible: The Beginner's Guide to Systems Large and Small: Being the Third Edition of Systemantics
John Gall and Gall’s Law
John Gall was an American paediatrician who, alongside his medical & research career, wrote about what makes systems succeed or not. He sought to draw generally applicable lessons from system failures, using the term systems to cover pretty much anything designed by humans. Gall named his approach Systemantics and published three editions of his book on the subject, in 1975, 1986 and 2002. The quotation above is from the 2002 edition but the title of the first edition perhaps gives a better insight into his general thesis: Systemantics: How Systems Work and Especially How They Fail.
Gall’s central assertion was that failure is an intrinsic quality of any system, increasingly so as systems get more complex, and that one should recognise this and behave according. He also thought jokingly that any sufficiently complex system exhibits antics, hence (though perhaps only in part) the name Systemantics.
Because Aprexo is a software company we restrict our thoughts here to the applicability of Gall’s Law to software engineering.
Gall’s Law says that it is not possible to – or in a weaker form extremely inefficient to – design a successful complex system from scratch. Applied to software engineering it is an argument in favour of an agile or incremental approach rather than a waterfall one. If you want to build a complex system that works, start with the aim of building a simple system that solves a subset of your objectives, ensure that it works, make it more complex then ensure it still works, and repeat until the desired capabilities are reached.
So one implication of Gall’s Law is that systems should be underspecified, the spec needs to be just good enough and shouldn’t be more complete than that. Any specification inevitably has underlying assumptions and beliefs about the world with which the planned system has to interact, some of which will not survive contact with reality – expect this and react quickly when it happens.
Gall’s Law thus speaks to the advantages of modular code design, and to clean interfaces between modules.
This has been Aprexo’s development approach, which is broadly an agile one. The first use case we set out to address for our Data Mastering Solution (DMS) was as a post-trade investment book of record (IBOR) for transactions and positions in securities, cash and foreign exchange. In 2019, we proved its accuracy and reliability and since then have incrementally extended it to cover a broader range of instruments and activities such as cash projections, P&L calculations and orders, which for the first time took its capabilities up the chain into pre-trade.
At each stage we ensured that the new book of record functionality worked – without breaking what had been built before – before moving on to the next.
The same has been true of our 2020 extension into broader enterprise data management use cases for our DMS. This evolution has required greater emphasis on data ingestion, data quality assurance and data matching. We are integrating proven third party software into the DMS to help us do this where we believe it’s the best way to create the required complex functionality in a simple way.
But where to start?
You need to have an idea of your destination in order to start a software engineering journey, perhaps something that might grandly be called a vision. So where to start? This isn’t an easy question to answer, to quote architect and author Jorge Arango “Determining which aspects of the system are central and which are trivial is something of an art, and easier to do in some projects than in others. This is one of the reasons why it’s so important to start with a conceptual model instead of the user interface; UI discussions can be rife with important yet non-critical issues.”
At Aprexo the core concepts underlying our DMS are:
Preserve and expose the lineage of all data items so that all changes are easily explained;
Store all data bitemporally using the patterns we have developed, enabling accurate historical reporting from a single data store;
Update – don’t delete, don’t amend – so that the system is effectively immutable;
Be flexible enough to master data or receive it from another system, and be able to take different approaches for different domains in order to support a broad range of use cases;
Offer a data model suited to fund management, informed by our many years in the industry, to meet clients’ needs;
Give clients self-service tools for extending the out-of-the-box data model, so they can make IT changes to meet new business needs without vendor-imposed delays
Enable users to access everything easily, at any time – surface, control, use – via our unique approach to snapping and an open API.
In setting up a new system, tread softly. You may be disturbing another system that is actually working.
John Gall, The Systems Bible
In modern enterprise IT architectures this is now well-handled via RESTful APIs, with ETL layers between different systems where needed. Containerisation is a more recent advance which reduces deployment risks.
Aprexo is an API-first business and we deploy our DMS in containers, hitherto in Azure but we are able to install in other clouds or even on premise.
At Aprexo we have – intentionally – a very experienced engineering team who have used many different development approaches during their careers. Perhaps because of this we are pragmatists not dogmatists when it comes to selecting methodologies, tools and technologies. But we are adamant that it is important to understand the data you’re seeking to model, to know what you don’t know, and to build / test / release in manageably short cycles. Gall’s Law is an interesting perspective on why this approach works when so many others fail.
 The first edition’s title in full: General Systemantics, An Essay on how Systems Work and, Especially How They Fail, Together with the Very First Annotated Compendium of Basic Systems Axioms: a Handbook and Ready Reference for Scientists, Engineers, Laboratory Workers, Administrators, Public Officials, Systems Analysts, Etc., Etc., Etc., and the General Public
Technology is Changing Cultures
Long Predicted, Finally Reality
Remote working has tested the resilience of fund managers’ IT infrastructures and the effectiveness of their governance structures. The pandemic has also confirmed how important it is to have sufficient operational expertise to create and curate high quality data across all business domains. If ever in doubt, technological and operational capabilities are now acknowledged as business-critical competencies.
Threats can create opportunities
After a long period of significant remote working it is obvious that savings can be made on city centre office space without impairing overall business performance. We will all work in offices again, but in smarter and more flexible ways than before the pandemic. Firms fortunate enough to have some flexibility in their accommodation costs are reallocating budgets from premises to projects – investing to further reduce costs and improve competitiveness, with payback periods of only one or two years.
As for technology infrastructure, the pandemic has demonstrated that cloud computing is now mature and safe, and surely puts into question any further investment in on-premise IT infrastructure by firms for which managing hardware is not a core competence.
Business Services – not “back-offices”
In the main, boards understand the importance of IT & Operations in their roles of servicing, advancing and protecting the wider business and its clients. An efficient Operations team ensures that investment decisions are made based on accurate and complete data. Furthermore, Operations supports the roll out of new products and is often able to prevent or rectify operating errors that occur in other parts of the organisation before they affect clients or the firm’s P&L. Without effective and resilient IT & Operations functions firms cannot manage their risks and reputations. The so called “back-office” has finally been acknowledged as a business enabler.
Leading firms are increasing “build the firm” budgets for projects which at the same time seek to reduce operating costs, reduce operational risks, and advance front-to-back resilience through the use of cloud computing.
Enriching C-Suite pools with technology expertise
Over the years, many fund managers have appointed NEDs with technology backgrounds to their boards. This helps ensure that the technology and data strategy is understood and supported at the highest level of an organisation. At the executive level, a number of C-Suite positions have also been created such as Chief Data Officer, Chief Transformation Officer and Chief Digital Officer, to complement the more traditional COO and CTO roles.
However, representation on a committee is secondary. What is important is that teams collaborate and think critically and creatively on how to progress together. COOs and CTOs have always been champions for change with remits and skill-sets well beyond that of a subject matter expert.
The New Now
Hierarchical structures need to be gradually adjusted to deal with the obvious current challenges to firms’ cultures and modi operandi. Operating in a C-19 environment for a long period of time is likely to redefine the meaning of business culture, and strict governance structures must be liberated to ensure the acquisition, development and retention of talent.
A successful digital transformation of an organisation requires talent, technology, trust and above all, teamwork.
18 November 2020
A Data Mastering Solution for Sustainability
5 October 2020
Surface - Control - Use
ESG data brings critical insights into opportunities and risks and is rapidly evolving to become an integral component of the investment data set and decision framework. Firms are seeking to innovate with new products and also integrate ESG factors into existing funds. Demand for differentiating ways of managing ESG data is high from businesses in many parts of the asset management ecosystem.
Whether the ESG data is used to identify alpha, profile and manage risk, or enhance corporate stewardship and environmental engagement, each requirement creates its own data challenges. How to integrate multiple sources of ESG data into complex multi-asset investment processes presents a major challenge for most firms.
The situation is made even more complex due to the lack of standardised ratings and methodologies. Each security will be rated differently by each data provider. Accodingly, any ESG-ready data store needs to be flexible by design, and readily adaptable once in use in order to support changing requirements from internal and external clients.
Aprexo’s very successful ‘Surface - Control - Use’ paradigm for every aspect of the asset management data universe can incorporate any ESG analytics and data into its extensible database, for each stage of the investment process.
The advent of this new class of investment data presents an opportunity to manage it in a modern, bitemporal way, fully lineaged and with the provenance of all data items recorded at the most granular atomic level. Only then can real business value can then be created from it. This can be done by storing ESG data in Aprexo’s immutable and bitemporal Data Mastering Solution, which facilites easy access to everything in it due to its API-led design.
Aprexo’s DMS supports workflows from research, portfolio construction and analytics through to risk and reporting. Our solution captures, validates and maps ESG factors to different business functions across the investible issuer and securities universe. It delivers the widest possible view of this emerging data set and thereby drives informed decision making.
The Nature of Data
21 September 2020
When designing a Data Mastering System (DMS), it’s all about the data. Data comes in many shapes and forms, and within every set of data there are hidden rules and complexities that are important to understand in order to process, store, surface, control and use that data; its nature if you will. This is the third of three articles about the nature of data.
Part 3. The Timeliness of Data
In many financial systems today, timeliness of data is a real issue.
If we look back 20 years, and consider retail bank accounts, knowing your balance and what you had spent meant walking into a branch and asking, or waiting for a monthly statement in the post. Fast forward 10 years, and telephone / online banking eased access, but even then your balance and list of transactions were updated only once a day, and not available until the next. Today retail bank accounts provide near real-time updates of balances and transactions, and challenger banks such as Revolut, Monzo and Starling offer mobile alerts as transactions are processed; so buying a coffee and seeing the money leave your bank account before you’ve had your first sip is now a common occurrence.
In asset management most systems still operate like retail banking did over 10 years ago, being unable to see an accurate view of positions (balances) until the next day. Whilst clearly far from ideal, this is still widely accepted as the standard.
So, what is the point? Well, if you have a near real-time view of positions, you are able to make more informed decisions. Knowing how much cash you have to spend minute by minute allows for better investment of that cash; knowing if a mistake occurs in near real-time allows for correction of that mistake sooner, potentially limiting the damage; knowing if the market is moving against you and seeing how it affects your portfolio tick by tick, allows for immediate action; and knowing your costs intraday gives you a chance to optimise those costs. More information is generally considered better, more timely information doubly so.
Some in asset management don’t see the need for a near real-time system, which is understandable as many portfolios are only rebalanced 2 or 3 times a week, sometimes even less. But if you were to buy or design a modern system today, why wouldn’t you choose a near real-time one, for all the benefits above and the new ones which will arise in the future? Today, would you sign up for a retail bank account from 10 years ago or would you go with one that sent you real-time mobile alerts?
In many financial systems much of the data created intraday is not useable until the next day, which leaves opportunities for investment returns, and for cost reductions, on the table. As the saying goes, this is playing with one hand tied behind your back!
The Nature of Data
14 September 2020
When designing a Data Mastering System (DMS), it’s all about the data. Data comes in many shapes and forms, and within every set of data there are hidden rules and complexities that are important to understand in order to process, store, surface, control and use that data; its nature if you will. This is the second of three articles about the nature of data.
Part 2. The Temporality of Data
Understanding how time applies to data is a critical concept when it comes to modelling data correctly.
Most financial systems, in broad terms, have what is referred to as “static” data, which has no associated date, and “time-series” data, which has a single associated “valid for” business date. Examples of static data are country or currency data used for reference purposes, and this data is valid across all dates; whereas examples of time-series data are stock prices and holdings, which are valid for a given business date.
So where is the problem with this approach? Consider the following scenario: a portfolio manager (PM) at 6pm on Monday evening runs an end of day (EOD) holdings report for a portfolio and notes that it holds 1M shares of IBM. At 1am on Tuesday morning a purchase of 200K shares of IBM that was traded on Monday, but delayed, hits the system. At 7am on Tuesday morning the PM runs Monday’s EOD holdings report again, but now finds that it has changed to reflect a holding of 1.2M shares of IBM. Given this, how does the PM confirm that the holdings were read correctly the previous evening? If they were, how does the PM find out what happened overnight? The related cash accounts might now be overdrawn and have incurred fees due to the purchase of the unexpected 200K shares of IBM, so was this a mistake or a cost that couldn’t have been avoided?
To investigate, the PM could just recreate the EOD holdings report from Monday at 6pm that showed 1M shares of IBM and compare the differences, right? Unfortunately, not. The problem is that the holdings data only contains a single business date and does not track the date and time that it was added to or updated in the system. The PM can only ask for the holdings data using this business date, Monday, which can return only the latest data for that business date, which in this case now shows 1.2M shares of IBM. Now to be fair, in most systems it would be possible to look at the transaction history and figure out what occurred overnight and why; but this would take time and effort to diagnose given an incomplete understanding of what had happened in the first place.
There is a better way, “bi-temporality”! As the name suggests, this means two timelines for data, though in practice it means adding to all data a date and time that the system knew about the data, i.e. when it was loaded, committed, or received into the system. So static data becomes a time-series of when it was known to the system, and stock prices and holdings data becomes a “bi-temporal” timeseries, where both a valid-for business date and a when-known system date-time are tracked independently.
Given this, it is then possible for the PM to run a report asking the system what it knew at 6pm on Monday evening (also known as “as-at” a given system date and time), about the EOD holdings for Monday (also known as “as-of” a given business date), versus what the system knows now about the EOD holdings for Monday. A well designed bi-temporal data system allows, for instance, a PM to run a report for what a given portfolio looked like to the system, 6 months ago, at 2:49pm on a Tuesday.
Bi-temporality is a powerful concept that creates the ability to “time-travel” through data. It enables a fundamental understanding of data and the decisions that were taken using that data. This provides value for so many areas of finance: for compliance, audit and risk in the ability to explain and reproduce exactly a given report or set of data, to research and portfolio management where learning from the past is key to investing for the future.
The Nature of Data
9 September 2020
When designing a Data Mastering System (DMS), it’s all about the data. Data comes in many shapes and forms, and within every set of data there are hidden rules and complexities that are important to understand in order to process, store, surface, control and use that data; its nature if you will. This is the first of three articles about the nature of data.
Part 1. The Mutability of Data
If you were to look it up, the dictionary definition of mutability is “the liability or tendency to change”. This is apt, as nearly all systems used in finance today mutate data as a matter of course, and this can create several “liabilities”.
Data that is mutable can be changed, overwritten and deleted, and when this happens, the previous version of that data is lost. Why is this a problem? Consider this scenario: a portfolio manager (PM) holds 1M shares of IBM, bought for $130 per share and the share price in the system then changes to $150. The PM instructs the trading desk to sell those equities at best market price believing that to be $150, but the price increase was due to a market data error and in fact the market price had decreased to $110. In the system, the $150 intraday price is corrected to $110 and is overwritten with no record of it having been $150. How does the PM vindicate their decision and the potential loss; how do they explain this to their clients without an audit trail of what happened? How is the market data error investigated without any evidence?
But no system would do this right, it seems fundamental? You would be surprised at how many systems today overwrite intraday prices and positions!
So, what is the solution? That would be immutability, “something which cannot change after it has been created”. Mutable data can change after a user or system has read it, or even while they are reading it, which can cause data inconsistencies and invalid results; it is not possible to go back to look at what the data looked like before it was changed, and there is no history to understand what has happened in the past. Immutable data however has many benefits; it can be copied around without fear of those copies being out of date; two or more users or systems can access immutable data at the same time since it is unchanging and read only; for audit or look back purposes it can be relied upon to always be the same. It is akin to an author who is writing a book and is half way through; the pages behind are fixed (bar redrafts!), and can always be relied upon to be the same, whereas the pages ahead have yet to be written, but once they are, they will also be fixed, as they have now been added to the pages behind.
Now in reality, very little data can be considered never to change after it has been created, and even in those cases, such as a birth date, data may have been created incorrectly and therefore needs to be able to change in order for it to be corrected. If this is achieved by creating a new version of the data to be changed rather than changing it in place and consequently overwriting it, then each new data version is immutable with the above benefits. If we take the author/book analogy again, an author cannot change the books already out there in the world that have been sold, they are immutable! But an author can produce a 2nd edition of a previous work, a new version. The 1st and 2nd editions can then be compared for differences, and to understand the changes.
In financial systems, auditability and compliance are paramount, and inadequate controls in these areas can potentially lead to large fines and reputational damage. Immutable data, where nothing is ever overwritten or deleted, is a key advantage here.
Climbing the Legacy Mountain
7 July 2020
Addressing legacy systems and processes is a mountain most financial services firms will have to climb eventually. History will show that for many it was the sad events of COVID-19 which forced the start of their journey. In the post-pandemic world it will be early adopters of new technology that benefit not only from new functionality, but also from mobilising their organisations to make step-changes in their operating models.
In a very short period of time COVID-19 has brought the investment industry to an inflection point where digital client engagement (mobile and otherwise), operational resilience, enterprise risk reduction and the need for substantial ongoing cost savings dominate senior executives’ priorities now and for the foreseeable future. Technology-driven change, implemented through modular system enhancements and process simplification, is now being funded by mandatory rather than discretionary budgets. Operational efficiency and good data management provide the critical foundation for investment alpha.
'A journey of a thousand miles begins with a single step'
Addressing data challenges will form the basis of the first step.
'The effective control and management of data has been one of the central issues facing the asset management industry for several years. The challenges around providing high quality data that can be consistently and timely delivered across the whole organisation has been further heightened in recent years by a) the increased demands on that data from not only investment, operations and client reporting functions but by other areas such as risk management, regulatory reporting, product etc; and b) the multiplicity of systems which have grown over time across functions which are all hungry to consume the same reliable and consistent data'
George Efthimiou, former Global COO, HSBC Global Asset Management
In our professional and private lives, we have to cope with less readily digestible and more voluminous data every day. Data is both an asset and a liability for firms. It is an asset if access is easy and we are able to make key decisions based on complete and accurate data, available on demand. However, that asset quickly becomes a liability when the data is fraught with inaccuracy and access and availability is poor.
In volatile markets, having intra-day access to near real-time and accurate investment data, such as portfolios’ investible cash balances and forecasts, has become a necessity for asset managers. New data-focused technologies, such as Aprexo’s Data Mastering Solution, play a vital role in future operating platforms. Transactions and events, in addition to the positions they impact, are becoming the atoms of the new modus operandi. Creating an atomic chain gives data a valuable lineage, benefiting every part of the investment management operating model. Investment decision making, regulatory oversight, client and executive reporting are increasingly requiring DMS technologies.
For many asset managers and owners, the ‘fly in the data ointment’ is too often the legacy systems the data is tied to. The consequences are a multiplicity of interfaces to maintain, data richness being restricted to what the lowest common denominator in the chain can handle, and manual oversight and intervention. Scalability is absent, and front-office confidence in outputs is low.
Technology is gradually liberating the asset management industry from this legacy. Big data infrastructures are emerging, helping to capture and analyse vast amounts of structured and unstructured data. Trials of Robotic Process Automation are showing fruit in many places. Acceptance of cloud computing and the concomitant need for cloud-born applications is now high, and the new Software-as-a-Service paradigm is accepted by all leading firms.
For asset servicers, a DMS constitutes an integral part of a modern Data-as-a-Service offering. The economic impacts of COVID-19 are likely to spark another wave of outsourcing of middle-office functions by Tier 2 and Tier 3 asset managers. In anticipation of this global securities services providers have already confirmed their renewed interest in a foundational DMS. Some have tried to do this via in-house development, few have yet succeeded.