Database variants explained : SQL or NoSQL? Is that really the question?

A first glance beyond the religion

When taking a look towards the landscape of databases, one can only accept that there has been a lot of commotion about “SQL vs NoSQL” in the last years. But what is it really about?

SQL, which stands for “Structured Query Language”, has been around since the seventies and is commonly used in relational databases. It consists of a data definition language to define the structure and a data manipulation language to alter the data within the structure. Therefore a RDBMS will have a defined structure and has been a common choice for the storage of information in new databases used for financial records, manufacturing and logistical information, personnel data, and other applications since the 1980s.

1401269083847

NoSQL, which stands for “Not only SQL”, departs from the standard relational model since it saw its first introduction in the nineties. The primary focus of these database was performance, or a given niche, and focus less consitency/transactions. These databases provide a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. Motivations for this approach include simplicity of design, horizontal scaling, and finer control over availability. The data structures used by NoSQL databases (e.g. key-value, graph, or document) differ from those used in relational databases, making some operations faster in NoSQL and others faster in relational databases. The particular suitability of a given NoSQL database depends on the problem it must solve.

So it depends on your need…

Do you want NoSQL, NoSQL, NoSQL or NoSQL?

NoSQL comes in various flavors. The most common types of NoSQL databases (as portrayed by Wikipedia) ;

There have been various approaches to classify NoSQL databases, each with different categories and subcategories. Because of the variety of approaches and overlaps it is difficult to get and maintain an overview of non-relational databases. Nevertheless, a basic classification is based on data model. A few examples in each category are:

  • Column: Accumulo, Cassandra, Druid, HBase, Vertica
  • Document: Clusterpoint, Apache CouchDB, Couchbase, MarkLogic, MongoDB, OrientDB
  • Key-value: Dynamo, FoundationDB, MemcacheDB, Redis, Riak, FairCom c-treeACE, Aerospike, OrientDB
  • Graph: Allegro, Neo4J, InfiniteGraph, OrientDB, Virtuoso, Stardog
  • Multi-model: OrientDB, FoundationDB, ArangoDB, Alchemy Database, CortexDB

Column

A column of a distributed data store is a NoSQL object of the lowest level in a keyspace. It is a tuple (a key-value pair) consisting of three elements:

  • Unique name: Used to reference the column
  • Value: The content of the column. It can have different types, like AsciiType, LongType, TimeUUIDType, UTF8Type among others.
  • Timestamp: The system timestamp used to determine the valid content.

Example

{
    street: {name: "street", value: "1234 x street", timestamp: 123456789},
    city: {name: "city", value: "san francisco", timestamp: 123456789},
    zip: {name: "zip", value: "94107", timestamp: 123456789},
}

Document

A document-oriented database is designed for storing, retrieving, and managing document-oriented information, also known as semi-structured data. The central concept of a document-oriented database is that Documents, in largely the usual English sense, contain vast amounts of data which can usefully be made available. Document-oriented database implementations differ widely in detail and functionality. Most accept documents in a variety of forms, and encapsulate them in a standardized internal format, while extracting at least some specific data items that are then associated with the document.

Example

<Article>
   <Author>
       <FirstName>Bob</FirstName>
       <Surname>Smith</Surname>
   </Author>
   <Abstract>This paper concerns....</Abstract>
   <Section n="1"><Title>Introduction</Title>
       <Para>...
   </Section>
 </Article>

Key-Value

A key-value (an associative array, map, symbol table,or dictionary) is an abstract data type composed of a collection of key/value pairs, such that each possible key appears just once in the collection.

Example

{
    "Pride and Prejudice": "Alice",
    "The Brothers Karamazov": "Pat",
    "Wuthering Heights": "Alice"
}

Graph

A graph database is a database that uses graph structures for semantic queries with nodes, edges, and properties to represent and store data. A graph database is any storage system that provides index-free adjacency. This means that every element contains a direct pointer to its adjacent elements and no index lookups are necessary. General graph databases that can store any graph are distinct from specialized graph databases such as triplestores and network databases.

Example

GraphDatabase_PropertyGraph

MultiModel

Most database management systems are organized around a single data model that determines how data can be organized, stored, and manipulated. In contrast, a multi-model database is designed to support multiple data models against a single, integrated backend. Document, graph, relational, and key-value models are examples of data models that may be supported by a multi-model database.

And what flavor do I want?

Each type and implementation has its own advantages… The following chart from Shankar Sahai provides a good overview ;

nosql-comparison-table

Any other considerations I should take into account?

Be wary that most implementations were not designed around consistency integrity and more towards performance. Transactions are referential integrity are not supported by most implementations. High availability designs (including on geographic level) are possible with some implementations, though this often implies a performance impact (as one would expect).

Also check out the research made by Altoros ;

5. Conclusion
As you can see, there is no perfect NoSQL database. Every database has its advantages and disadvantages that become more or less important depending on your preferences and the type of tasks.
For example, a database can demonstrate excellent performance, but once the amount of records exceeds a certain limit, the speed falls dramatically. It means that this particular solution can be good for moderate data loads and extremely fast computations, but it would not be suitable for jobs that require a lot of reads and writes. In addition, database performance also depends on the capacity of your hardware.

They did a very decent job in performance testing various implementations!

2015-01-21 09_08_23-A_Vendor_independent_Comparison_of_NoSQL_Databases_Cassandra_HBase_MongoDB_Riak.

Advertisements

Technical insight into NTFS

Check out OSnews’s article “NTFS: A File System with Integrity and Complexity“.

The topics handled in this article:

  • Glossary of NTFS Terms
  • A File System of Files
  • Fill Your Quota
  • Better Backups with Shadow Copy
  • File Compression Made Easy
  • Alternate Data Streams
  • File Screening
  • Volume Mount Points
  • Hard Links, Soft Links, and Junction Points
  • Fun With Logs
  • Encrypting File System – EFS
  • Advanced Format Drives
  • Interview: Dr Gary Kimura

It showed me some interesting insights into the technical possibilities I wasn’t aware off. Nice examples are that “symlinks” is possible since Windows Vista and the concept of “streams!

Hyperthreading explained!

There are a lot of nice technology names that sound cool, but where the actual specifics are a bit unclear. Hyperthreading is one of them for a lot of people. So check out the following arcticle “What Is Hyper-Threading?” @ makeuseof.com!

Excerpt

I’ve even spoken with one poor fellow who thought he’d purchased a eight-core processor because he saw eight graphs in Windows Task Manager. That’s what the sales rep told him, so when he went home and saw eight graphs, he was hooked. Seeing was believing. Two graphs appear in Windows for each core on a Hyper-Threading processor because Windows is detecting two logical processors for each core. The term “logical processor” sounds fancy, but a logical processor is by definition a processor that has no physical existence. Windows can send threads to each logical processor, but there is still just one core doing the actual execution, so a single core with Hyper-Threading is dramatically different from two seperate physical cores.

Outsourcing IT? Some wisdoms to preserve the long term.

Listening to the sales talk
Sales persons in regards to IT software always say it’s easy to install & use. Yet from experience we know that this is correct if it’s done without straying from the path they had in mind. This being a “default default default” installlation without attempting to do anything “custom”. I use the term “custom” here very lightly, as mostly deviating from using a superuser will turn the application useless. Yet sadly enough, such a default installation is something that never occurs, unless you want to accept the given application as an “island” within your IT environment.

Integration & Maintenance are key!
The biggest downfalls of software is situated within Integration and on maintenance. If the integration part isn’t advertised during the sales round, then you can probably forget about that. To be clear; integration meaning when you want to use the data from a given application into another.

Good maintenance is a bulletproof installation which will last for ages! To be honest, I have yet to see an external party who thinks about the long term and therefor set up a system that’s rock solid and won’t budge even if an earthquake occurred. Most vendors think merely in regards to sales, where an unstable application provides them with billable hours (“consultancy” or support services).

Simply Put
As every management book tells you, think about our TCO. What’s that first character stand for? TOTAL… being the costs you’ll accumulate over the years. Looking short term will give your a big increase on maintenance which should not be treaded lightly. F*cked up installations are dreadfull to manage and will (Note: WILL, not might) give you disruptions in your availability. How much is such a downtime worth to you?

OpenSource Storage Management

I came across OpenFiler a while ago and was intriged by it. Now I’ve taken the liberty to testing it in my lab, and I must say that I’m impressed by the features. It’s something every sysadmin should check out to see if it isn’t a viable solution for their overpriced storage solution… 😉

.

Openfiler is a powerful, intuitive browser-based network storage software distribution. Openfiler delivers file-based Network Attached Storage and block-based Storage Area Networking in a single framework. Its uses the rPath Linux metadistribution and is distributed as a stand-alone Linux distribution. The entire software stack interfaces with third-party software that is all open source.

File-based networking protocols supported by Openfiler include: NFS, SMB/CIFS, HTTP/WebDAV and FTP. Network directories supported by Openfiler include NIS, LDAP (with support for SMB/CIFS encrypted passwords), Active Directory (in native and mixed modes) and Hesiod. Authentication protocols include Kerberos 5.

Openfiler includes support for volume-based partitioning, iSCSI (target and initiator), scheduled snapshots, resource quota, and a single unified interface for share management which makes allocating shares for various network file-system protocols a breeze.