In the previous blog post, we had written about what is MongoDB and its importance for enterprises. In this post, we explore the considerations for selecting such a database.
Does your organization, like most others, have relational databases that have long-standing positions and are used for legacy apps that fulfill the current needs of the business? If so, these systems are perhaps maintained by a substantial number of personnel and supported by an ecosystem of tools.
However, more and more companies are looking at alternatives to the legacy infrastructure based on relational databases. Besides looking for alternatives to expensive infrastructure, companies also have technical reasons to find such alternatives as they want to perform better and scale up operations. Development speed and agility too are among the motivating factors.
These factors affect both transactional and analytical applications. Companies are building operational and online apps using new technologies for data management known as NoSQL, e.g., MongoDB.
NoSQL systems provide better performance and greater scalability as compared to relational databases. While assessing NoSQL products, you should look at five dimensions to select the right one for your apps and your business.
1. Community Strength and Commercial Support
It is challenging and costly to migrate an existing application to another database. So you have to choose one carefully before you invest in it. Generally, companies put their money on a few core technologies, in order to be able to develop expertise that can be utilized for a number of projects. While these are relatively new, some of the NoSQL systems are likely to last for long.
A strong community built around a technology can be quite advantageous, especially for databases. If the users form a strong community for a technology, it is simpler to employ developers who are well-versed in that technology. Code samples, documentation and information are also found more easily and companies find it easier to retain technical talent. More technology vendors are encouraged to take part in the ecosystem and to build integrations.
While assessing a database, users should look closely at the project’s or the company’s health. It is significant to provide new features and for the product to keep evolving, instead of just continuing to exist. Another consideration that is quite relevant is to have a robust support organization to serve clients globally.
Each system has different capabilities and designs for application development teams as no standard exists for interface with NoSQL systems. The cost and time required for development and maintenance of the NoSQL system depend to a large extent upon the API’s maturity
Every programming language offers different paradigms for working with services and data. Experts, who are aware of how programmers like to work within a language, create idiomatic drivers. Through this approach, you can take advantage of specific features of the language that could make processing and accessing data more efficient.
Idiomatic drivers reduce the time required for teams to start working with the system as the drivers are easier to use and learn for programmers. For instance, other interfaces may require you to retrieve and parse whole documents and to go to specific values to get or set a field. However, idiomatic drivers let you have direct interfaces to get and set documents or even specific fields.
RESTful or Thrift APIs
A few systems offer RESTful interfaces. This approach is reliant on the innate latencies related to HTTP, even as it is familiar and simple. Also, developers have to develop an interface themselves, because of which the interface may not be consistent with the other programming interfaces that they use. Developers have to similarly build more abstract interfaces in their apps in a low-level paradigm known as a Thrift interface.
3. Model based on Data
The data model is the main way in which relational databases differ from NoSQL databases. There are different types of NoSQL databases and the main categories in which these can be classified are as follows:
Typically, every record and the data associated with it are stored in one document, instead of being spread out among various columns and tables. Because of this the requirement for complex transactions and joins is reduced or eliminated and access to data is simplified.
In document databases, each document may have different fields and the concept of a schema is dynamic. Modeling of polymorphic and unstructured data becomes easier because of this flexibility. It also helps add new fields or to make other such modifications to an application under development. Also, document databases are as good as relational databases when it comes to providing query robustness. You can query data on the basis of any field within a document.
Uses: Because of the natural mapping of documents to objects in programming languages, the ability to query on any field and the flexibility of the data model, document databases can be used for various applications.
Examples: CouchDB and MongoDB
Properties, edges and nodes in graph structures are used to represent data in graph databases. The relationship network among specific elements is used to model data. The graph model can be used for several applications, even as it requires time to comprehend and is somewhat counter-intuitive. What goes in its favor is that it simplifies the modeling of relationships among the various entities in applications.
Uses: Social networks and other such cases where relationships are central in the applications are where graph databases prove to be most useful.
Examples: HyperGraphDB and Neo4j
Wide-Column and Key Value Models
From the perspective of data modeling, the most basic kind of NoSQL database is the key-value store. Attribute names or keys, along with their values, are used to store items in the database. However, you can only query data using the keys and the values are not read by the database. Since the database does not have a set schema for various key-value pairs, the model can be used for the representation of unstructured and polymorphic data.
Data is stored in a distributed, sparse and multidimensional sorted map in column family stores or wide column stores. You can nest columns inside other columns that are known as super columns and every record can vary regarding the stored number of columns. Columns grouped together form column families and these can be accessed together or columns can be spread over many column families. Primary keys for column families are used to retrieve data.
Uses: Wide column stores and key value stores can be useful only for narrow application set where single key values are used to query data. The scalability and performance of these systems forms their appeal. These can be optimized highly on account of the data access patterns being quite simple.
Examples: Cassandra and HBase (wide column), Redis and Riak (key value).
4. Model based on Queries
The query requirements for every application are different. A basic query model where applications use a primary key to access records may be acceptable in some cases. For the majority of applications, however, the ability to query on the basis of many values in every record is important.
For example, if you have customer data stored in an app, you may also have to look up data sorted by state or postal code, besides deal sizes or specific companies.
You may also have to update more than one individual field while updating records in an application. You have to query data on the basis of secondary indexes to meet these requirements. A document database may be the best option in these cases.
You can query any field in a document through a document database. Geospatial indexes, compound indexes, text indexes, sparse indexes, unique indexes and time to live (TTL) indexes are among the variety of queries that can be optimized through the rich set of indexing options offered by products such as MongoDB.
MongoDB offers sophisticated analysis through a native MapReduce implementation, besides the aggregation framework for real-time analytics. Instead of having to make several round trips, you can update values in documents through a single statement in MongoDB, which offers find and modify capabilities.
You can make direct or indirect inferences about the data in the system by looking at simple and complex relationships through the rich query models provided by these systems. Other types of analysis may not be optimal, but relation-type analysis is generally quite efficient in these systems.
Wide-Column and Key Value Databases
You can update data on the basis of only a primary key in these systems. However, these encourage users to have their own indexes for queries related to other values. A few products offer limited support for secondary indexes. However, these are accompanied by a number of caveats. You have to make two trips to update these systems. That is because you first have to find the record and then update it. Irrespective of whether you change the entire record or only a part of it, you have to completely rewrite it to update it.
5. Model based on Consistency
For the purposes of scalability and availability, NoSQL systems keep several copies of data. The guarantees related to data consistency are different for different copies in these architectures. NoSQL systems are either consistent or eventually consistent.
When a system is consistent, whatever the application writes becomes visible immediately in following queries. Writes do not become visible immediately in eventually consistent systems, however.
For example, in a consistent system, queries regarding inventory will see the correct inventory level in each instance, whereas the inventory level provided by an eventually consistent system may produce incorrect results for a while but will become accurate eventually.
Because of that, the application codes tend to differ somewhat between consistent and eventually consistent systems.
The data consistency requirements for every application are different. The data is required to be consistent always for some applications. This is a more familiar and natural approach for developers, as they have worked with a consistency model for a long time with relational databases. The flexibility that eventual consistency allows is a trade-off that is acceptable in other cases.
Graph databases and document databases can be consistent or eventually consistent.
MongoDB offers tunable consistency. Data is consistent by default and all reads and writes go to the primary data copy. On the other hand, data is eventually consistent for secondary copies and you can issue read queries there; it is at the query level that the choice for consistency is made.
Eventually Consistent Systems
All data copies are not synchronized for a certain period of time in the case of eventually consistent systems. Historical archives or other such data stores such as those for read-only applications are suitable for eventually consistent systems. Similarly, these may also be suitable for use cases where data such as logs is captured and is to be read only at a later stage. Wide-column and key value stores are generally of the eventually consistent type.
Conflicting updates for individual records have to be accommodated by eventually consistent systems. It is common for writes to conflict with one another and that is possible because eventually consistent systems accommodate conflicting updates for individual records.
Some systems like Riak ensure that the most recent operation comes up trumps if there is a conflict, through the use of vector clocks that help determine the order of events. Others like CouchDB allow users to decide on which alternative to choose by retaining all conflicting values. Cassandra, on the other hand, assumes that the greatest value is correct.
On account of the above, updates involve trade-offs that can complicate matters, even as writes tend to do well in eventually consistent systems.
Although there may be trade-offs that you have to make if you choose NoSQL databases, these can still perform better and faster than relational databases.
You should keep the five points provided above in mind while choosing the NoSQL database most suitable for your requirements.
Are you already using a NoSQL database? If so, do share your experiences regarding its selection and usage in the comments section below.