Integrating New Database Technological Innovations
By Hesom Parhizkar, Vice President - Technology & Operations, eVestment
The Adoption of Cloud & NoSQL
Current trends I have been seeing lately are the NoSQL movement, full cloud adoption, and the hybrid on-site/cloud approach.
“Usually, the NoSQL objects/documents are modeled closely to the application, so no translation or object hydration code is needed to be written or tested.”
At eVestment, we have augmented out a platform with a couple of NoSQL technologies to help with performance and scale. We are using Elastic Search for high speed search and filtering. A few backend services are used to keep the data “in-sync” and the data is modeled closely to the objects used by the application. This reduces the overheard of translating and pulling data from various locations. The key features that use this database are our Product Search solution and the Omni Portal. Redis is used for session and cache management. We have configured the service to only store in-memory, so the inserts and queries are very quick. Any data that is non-volatile could be stored here.
Full cloud adoption is a subject we discuss internally as the next phase of our platform, but potentially significant architecture changes would need to be made to fully utilize the features of the cloud. To move our current database(s) to the cloud on comparable hardware, it would cost us more and would provide little to no return in terms of client usability or satisfaction. Refactoring our database(s) into logical units and auditing on how we store the data would be needed to fully migrate to a cloud service.
The hybrid on-site/cloud approach has been gaining some steam. This functionally is native with the newest release of Microsoft’s SQL Server  and other tools are being introduced in the marketplace. We have started a proof-of-concept with replicating a portion of our online database to the cloud so we can use on-demand resources for various ETL jobs and processes.
An Eye for Security
Two challenges almost any database service organizations face are security and scale/performance.
When speaking about security, I am referring to database/server level security, as in who has a valid login to the database server, and as in who has access to the specific data in the database. Whether it is storing consumer’s personal or financial data or having a multi-tenant structure setup, certain business rules need to be established to determine who can or cannot view or utilize the data. Discussing these business rules at design time tends to lower the risk and complexity towards having a secure database.
Scaling databases becomes a challenge when the amount/velocity of data is not known at the time of the database design or if the demand of the database significantly increases. I have seen this happen a few times, usually when the development team doesn’t work closely with the database team. Developers will tend to create the data model that mimics the object model in code, but this may not be the best design for scale/performance. Choosing the right level of data normalization, primary/foreign keys, and indexes will help this. It’s always easier tackling these problems during the design and implementation phase of the project instead of trying to do it with a live database.
NoSQL technologies have their place in the enterprise, but I would argue that they should be used in conjunction with a relational database. When working on a project and this topic is discussed, here are a few questions that need to be asked:
- Is this Greenfield development? If there is not, or ever will be, a dependency to an existing system?
- Will the data collected need to be used in another downstream processes, i.e. Business Intelligence reporting?
- What is the performance attributes of the data? High/low volume?
An advantage of using a NoSQL technology is the increased development velocity. Usually, the NoSQL objects/documents are modeled closely to the application, so no translation or object hydration code is needed to be written or tested. The development team does not need to worry about creating normalized structures, primary keys, foreign keys, or indexes.
Integration and Availability of Data
The data availability metric should be measured as a function of cost. As the requirement for data availability edges close to 100percent, the cost and complexity significantly increases.
While designing your disaster recovery strategy, this topic needs to be the first one addressed. If zero data loss is a requirement, a highly complex (and expensive) DR plan needs to be put in place with frequent multi-region replication and data checks.
Another approach would be to architect several redundant databases with automatic failover. In this approach, each transaction would be committed to several databases, where one is being the primary and the others are being replicas (non-readable). If at any point the primary has a failure, one of the replicas will be promoted with very little data loss.
An “Active-Active” configuration is also becoming popular. In this configuration, instead of having a single primary database with one to too many replicas, all of the databases are essentially treated as a primary. A transaction is committed to all of the databases in this cluster. If a database is lost due to a failure, the other database will handle the load. The key difference between this configuration and a traditional primary/replica configuration is that all of the databases are accessible for queries/reads. This allows an organization to take advantage of all of their hardware investments instead of having an expensive insurance policy.
For the Budding Technologists
Some advice I would give would be the following:
- Try to learn about multiple technologies and tools. Don’t limit yourself to one particular technology at first. Try them all out and then focus your efforts on a few. Doing this will help you understand the similarities and differences between the different database technologies.
- Work on an open-sourced or side project to help you understand the concepts. This has always helped me when learning a new technology. I try to set a goal for myself, i.e. build a blogging engine from scratch, and while trying to achieve that goal, I am forced to learn about several different technologies and understand how they work together.
- Don’t just learn how to perform the tasks, take the time to understand what they do. When you are asked what an index is on a database table, you should be able to explain how to design, apply, and discuss what it actually does with the data and the advantages (or disadvantages) of it.
- Become a “full stack” developer. This kind of technologist understands and can work in any part of the technology stack and also knows the basics on networking, hardware, operating systems, etc.
One major announcement I think will transform the space is Microsoft announcing that their next version of SQL Server will be able to run on Linux . This will continue to blur the lines on which technology stack a team/company should use. Currently, if your team decided that they wanted to use SQL Server for the database, you will be locked into the Microsoft eco-system, that comes with the licensing costs. Soon, you will be able to choose the best tools, on the best environments for your applications.