Mututal Funds Research and Portfolio Analysis Tool

Problem

Our client was a group of investment advisors and portfolio managers who wanted a better alternative to the tools available in the market for Mutual fund research and Portfolio analysis. Their main issue was that currently there is no unbiased third-party service that offers top of-the-line research tools for investors.

Investment platforms like Paytm Money, Groww and Kuvera do offer rudimentary research tools, but those are biased and geared to entice users to invest using their platform, rather than providing an objective research tool.

The clients also tried to use a few while-label utilities in the market that offer rudimentary research and analysis functionality, but they all suffer from poor user experience, very high costs, lack of support and next to no customizability. The prime functional requirements were:

Research into the performance of the mutual funds in the market
Portfolio analysis for their customers

The prime non-functional requirements for the solution were:

Low cost - the entire system should be very cost-effective, with minimal hosting and deployment costs.
Accuracy of data - the price information and performance statistics of the funds should be exactly as reported by the AMCs.

Research

India has a growing mutual fund industry as more and more retail investors switch to mutual funds in favor of more traditional investments like real estate and gold. The mutual fund ecosystem is extremely complex.

It is composed of a mix of government bodies, private firms, and quasi-private companies, including the mutual fund companies (aka Asset Management Companies or AMCs) which manage the investments, Registrar Transfer Agents (RTAs) which maintain records for the AMCs, and regulators including the Securities Exchange Board of India (SEBI). We had to do thorough research to understand this market well and to figure out how to access all the relevant data from all these different government and private entities.

Solution

We opted to build a web application where members of the client’s organization can log in via username and password. Administrators in the client’s organization would be able to create and manage users using the back-office admin panel that would also be accessible via a password-based login. The front-end would be a typical React.js application which would make API calls to the back-end to fetch the relevant data, and it would be hosted on AWS behind a CDN to ensure fast load times. Some of the unique challenges encountered were:

Access to complete, accurate data on mutual fund performance.
Processing, storing and serving updated price data at low cost.
Determining the holdings and transactions of a customer for portfolio analysis

Architecture

To ensure minimal operational costs, we first evaluated the costs and benefits of a traditional three-tier application architecture, and then applied custom, modern cloud engineering practices to evolve our solution into a near zero-cost system. Traditional Architecture

A traditional three-tier web application’s architecture would look something like this:

Presentation layer - a single-page React application.
API layer - serves API requests for the front-end - data-fetching, calculations and other business logic.
Database - Stores all the mutual fund data including fund metadata and historic daily unit price data

Assuming the smallest instance sizes of all resources, the minimum cost of hosting such a system would be:

Database: A single db.t2.small RDS PostgreSQL instance with 1 vCPU and 2GB memory with 10GB storage costs about $80/month.
EC2 instances: Two t2.small instances with 1 vCPU and 2GB memory costs about $40/month.
Load Balancer: A single ALB instance costs about $20/month.

The total minimum cost per month comes to around $140.

Cost-Effective Cloud-Native Architecture

The costliest component of the entire system is the database, at $80/month. We discovered that the nature of the mutual fund data allowed us to make several assumptions that would preclude the need for a traditional relational database:

The price data for the funds is the most dynamic data in the application. The other data like fund metadata is mostly static, and is a good fit for object storage like S3.
There is a limited set of funds (~35 thousand including closed funds), and the number of funds is very unlikely to balloon quickly in the next few years.
Since the price of a mutual fund unit (NAV) is declared only once a day, the total number of data points for each fund is bounded. Even for 20 years worth of data, each fund will have 365 x 20 = 7300 data points.
The access pattern for the price data of the funds is very predictable - the price data will be requested on the basis of funds, i.e., we will never see a query for the unit price of all funds on a particular day, but rather the unit price for a specific fund for a range of dates.

Based on these assumptions, it is safe to say that if we can store the price data for a particular fund in a static file, it should be quite easy to access the full price history of a fund without incurring the compute costs of a full-fledged database.

With this in place, we needed two more things:

A compact and efficient file format to store price history data in.
An efficient and low-cost way to update the price history daily.

For storing the data points, we opted to create a custom binary file format, dubbed Simple Time Series. This file format specializes in compact storage of dense time-series data. It stores just the starting date in the header, and then stores each day’s unit price sequentially as integers. Crucially, the date associated with each value is not stored in the file, but rather inferred from the starting date in the header and the index of the integer in the body. This makes the file very, very compact, and also ensures O(1) random access.

Further, using gzip compression makes the average size of these files around 10KB, a very comfortable size for clients to download directly when fetching the price history for a fund.

As for the next challenge - daily updates to the price history, it was found that we can take advantage of AWS’s pricing structure to incur zero data transfer costs. A simple batch process that is triggered daily was able to scrape the last day’s price information from AMFI’s website and update the binary files for all the ~35,000 schemes in about 5 minutes.

Because S3 doesn’t allow partial writes, simply appending the data to the files was not possible. Instead, the daily batch process downloads all the files to local storage, updates them in place, and then uploads them back to S3, overwriting the existing files with the updated ones.

All this data transfer costs nothing since all of it is within AWS. Even if it did, it wouldn’t be much since all the files together come to around 500MB, so we’d be transferring only about 30GB a month.

With the new data architecture, we could even move all of the business logic to the front end, meaning that we could get rid of the API servers that add to the cost of the system.

In the end, the only components of the system remaining were the React front-end and the STS files, both hosted and served as static assets from S3 and CloudFront.

The total cost for this comes to around $0.25/month, down from $140/month. That’s more than a 400x cost saving.

Portfolio analysis

The only way for mutual fund investors in India to get a consolidated statement listing all their mutual fund transactions and holdings is to request a statement from KFintech or CAMS, the RTAs that maintain transaction and holding records for all the AMCs in India, by mandate of SEBI. The report is provided as an auto-generated PDF document that contains tables describing the transactions and holdings.

Therefore, to get a complete account of the investor’s transactions and holdings, it was required to parse this PDF document, which would be uploaded by the user. A parser for the PDF report was written. This was rather complicated, as we needed access to many real reports from real investors to figure out the issues and bugs in the parser.

These reports were provided by our clients, but since they involved sensitive data, they were first put through an anonymization process where all references to the investor’s identity were removed, and only the transactions were kept intact (with randomized transaction reference numbers). This allowed us to create a comprehensive test suite with hundreds of tests without compromising security and confidentiality.

Once the PDF reports could be parsed, it was a simple matter of simulating the transactions and using the historic price data of the funds to compute the historic market value of the portfolio, and the associated aggregates like CAGR, XIRR, etc.

Outcomes

The client got a custom-made system for mutual funds research and portfolio analysis, allowing them to provide data-driven information to investors in a highly polished and presentable form, backed by the most accurate figures sourced directly from the mutual fund industry’s representative body (AMFI).

Additionally, the system runs at near zero cost, averaging at around $3 per month, compared to more than $100 for a traditional setup not optimized to the specific use case.