Database variants explained : SQL or NoSQL? Is that really the question?

A first glance beyond the religion

When taking a look towards the landscape of databases, one can only accept that there has been a lot of commotion about “SQL vs NoSQL” in the last years. But what is it really about?

SQL, which stands for “Structured Query Language”, has been around since the seventies and is commonly used in relational databases. It consists of a data definition language to define the structure and a data manipulation language to alter the data within the structure. Therefore a RDBMS will have a defined structure and has been a common choice for the storage of information in new databases used for financial records, manufacturing and logistical information, personnel data, and other applications since the 1980s.

1401269083847

NoSQL, which stands for “Not only SQL”, departs from the standard relational model since it saw its first introduction in the nineties. The primary focus of these database was performance, or a given niche, and focus less consitency/transactions. These databases provide a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. Motivations for this approach include simplicity of design, horizontal scaling, and finer control over availability. The data structures used by NoSQL databases (e.g. key-value, graph, or document) differ from those used in relational databases, making some operations faster in NoSQL and others faster in relational databases. The particular suitability of a given NoSQL database depends on the problem it must solve.

So it depends on your need…

Do you want NoSQL, NoSQL, NoSQL or NoSQL?

NoSQL comes in various flavors. The most common types of NoSQL databases (as portrayed by Wikipedia) ;

There have been various approaches to classify NoSQL databases, each with different categories and subcategories. Because of the variety of approaches and overlaps it is difficult to get and maintain an overview of non-relational databases. Nevertheless, a basic classification is based on data model. A few examples in each category are:

  • Column: Accumulo, Cassandra, Druid, HBase, Vertica
  • Document: Clusterpoint, Apache CouchDB, Couchbase, MarkLogic, MongoDB, OrientDB
  • Key-value: Dynamo, FoundationDB, MemcacheDB, Redis, Riak, FairCom c-treeACE, Aerospike, OrientDB
  • Graph: Allegro, Neo4J, InfiniteGraph, OrientDB, Virtuoso, Stardog
  • Multi-model: OrientDB, FoundationDB, ArangoDB, Alchemy Database, CortexDB

Column

A column of a distributed data store is a NoSQL object of the lowest level in a keyspace. It is a tuple (a key-value pair) consisting of three elements:

  • Unique name: Used to reference the column
  • Value: The content of the column. It can have different types, like AsciiType, LongType, TimeUUIDType, UTF8Type among others.
  • Timestamp: The system timestamp used to determine the valid content.

Example

{
    street: {name: "street", value: "1234 x street", timestamp: 123456789},
    city: {name: "city", value: "san francisco", timestamp: 123456789},
    zip: {name: "zip", value: "94107", timestamp: 123456789},
}

Document

A document-oriented database is designed for storing, retrieving, and managing document-oriented information, also known as semi-structured data. The central concept of a document-oriented database is that Documents, in largely the usual English sense, contain vast amounts of data which can usefully be made available. Document-oriented database implementations differ widely in detail and functionality. Most accept documents in a variety of forms, and encapsulate them in a standardized internal format, while extracting at least some specific data items that are then associated with the document.

Example

<Article>
   <Author>
       <FirstName>Bob</FirstName>
       <Surname>Smith</Surname>
   </Author>
   <Abstract>This paper concerns....</Abstract>
   <Section n="1"><Title>Introduction</Title>
       <Para>...
   </Section>
 </Article>

Key-Value

A key-value (an associative array, map, symbol table,or dictionary) is an abstract data type composed of a collection of key/value pairs, such that each possible key appears just once in the collection.

Example

{
    "Pride and Prejudice": "Alice",
    "The Brothers Karamazov": "Pat",
    "Wuthering Heights": "Alice"
}

Graph

A graph database is a database that uses graph structures for semantic queries with nodes, edges, and properties to represent and store data. A graph database is any storage system that provides index-free adjacency. This means that every element contains a direct pointer to its adjacent elements and no index lookups are necessary. General graph databases that can store any graph are distinct from specialized graph databases such as triplestores and network databases.

Example

GraphDatabase_PropertyGraph

MultiModel

Most database management systems are organized around a single data model that determines how data can be organized, stored, and manipulated. In contrast, a multi-model database is designed to support multiple data models against a single, integrated backend. Document, graph, relational, and key-value models are examples of data models that may be supported by a multi-model database.

And what flavor do I want?

Each type and implementation has its own advantages… The following chart from Shankar Sahai provides a good overview ;

nosql-comparison-table

Any other considerations I should take into account?

Be wary that most implementations were not designed around consistency integrity and more towards performance. Transactions are referential integrity are not supported by most implementations. High availability designs (including on geographic level) are possible with some implementations, though this often implies a performance impact (as one would expect).

Also check out the research made by Altoros ;

5. Conclusion
As you can see, there is no perfect NoSQL database. Every database has its advantages and disadvantages that become more or less important depending on your preferences and the type of tasks.
For example, a database can demonstrate excellent performance, but once the amount of records exceeds a certain limit, the speed falls dramatically. It means that this particular solution can be good for moderate data loads and extremely fast computations, but it would not be suitable for jobs that require a lot of reads and writes. In addition, database performance also depends on the capacity of your hardware.

They did a very decent job in performance testing various implementations!

2015-01-21 09_08_23-A_Vendor_independent_Comparison_of_NoSQL_Databases_Cassandra_HBase_MongoDB_Riak.

Global Standards and Publications Edition 2012 / 2013

Our environment is ever changing and we hear a lot of frameworks and standards being tossed around. Vanharen has published a free ebook with a “one pager” per Standard / Framwork. So if you want get a quick overview of a given standard, this is the place to look!

Wich standards are featured?

  • 1. Agile 9
  • 2. Amsterdam Infor mation Management Model (AIM) 13
  • 3. ArchiMate® 17
  • 4. ASL® 20
  • 5. Balanced Scorecard 23
  • 6. BiSL® 26
  • 7. CATS CM® 29
  • 8. CMMI® 32
  • 9. COBIT® 35
  • 10. EFQM 40
  • 11. eSCM-CL 44
  • 12. eSCM-SP 47
  • 13. Frameworx 50
  • 14. ICB® 53
  • 15. ISO 9001 56
  • 16. ISO 14000 59
  • 17. ISO/IEC 15504 62
  • 18. ISO/IEC 27000 series 66
  • 19. ISO 31000 69
  • 20. ISO 38500 72
  • 21. ISO/IEC 20000 75
  • 22. ITIL® 2011 78
  • 23. Lean management 82
  • 24. M_o_R® 85
  • 25. MoP™ 88
  • 26. MSP® 91
  • 27. OPBOK 94
  • 28. P3O® 97
  • 29. PMBOK® Guide 100
  • 30. PRINCE2® 103
  • 31. SABSA® 106
  • 32. Scrum 113
  • 33. Six Sigma 117
  • 34. SqEME® 120
  • 35. TMap® NEXT 123
  • 36. TOGAF® 126

Dear readers,

In this rapidly changing IT and business environment most things
should and could be more easy. It is no wonder that methods like
Agile and Scrum are gaining popularity. New developments offer
great opportunities for those willing to make the most out of it
but it can be diffi cult not to get overwhelmed.

In the current environment with constant changes and almost
infi nite ways accessing information and communicating it is
essential to make communication as clear as possible and ensure
the quality of information. Van Haren Publishing makes general
Best Practices available to provide quality, practically validated
information worldwide. The use of standards and frameworks
gives everyone the same language thus minimalizing the chance
of errors due to unclear communication. Best Practices regarding
these standards and frameworks provides you with information
summarizing years of experience by the best in the industry.

Not only do we publish books on Best Practices, we also actively
and independently promote the standards and frameworks via
our freely accessible eKnowledge. To make communication on
standards everywhere a little easier, we provide you with a basic
summary of 36 relevant standards in our catalog. It is an illusion
those standards will lead to better results. More important is the
people factor, as without people all these things don’t evolve at
all. But that is beyond the service we provide, all we can do is give
a start in sharing best practice and share generic solutions. The
rest should come from you.

Kind regards,

Ivo van Haren, CEO Van Haren Publishing

Source : http://www.vanharen.net/file/PDF/Global_Standard_And_Publications.pdf