Sunday, September 8, 2013

NoSQL at Research Triangle Software Symposium

Notes from Talk by Venkat Subramanium on NoSQL.

With key value, you put garbage in and you get garbage out.  A Document database - XML , given key can store XML or a JSON document or a blob. The benefit is can get specific element using query. With colunar, a key and associated column of data, several keys and columns. Its often clustered. High availability and scalability data access in efficient way. You run nodes in grid, some in each node, request comes in find out which location its in.  The benefit is its very scalable and support a lot of users.  Something goes down on node, app runs, seems seamless.

Replicate is sharing , every time we update, make copy over another node.  Which nodes do you makes updates to? When is it available right away, but if its down than we would check later and update it then.  Joe has address, Kim has an address, both  update fine, but update both across different nodes.

Relationship are graph databases  example marketing, twitter, Facebook , etc do data mining,
Data graph, patterns

Key value store (i.e. what you like) access based on key, Hash code algorithm on machine to get data
Dynamo db is an example. Here you have App data/Session data , shopping data , user profiles, preference.

No schema how access data? Typically in Java with relational you have properties file and you update your hibernate file and also provide a new setter and getter. But, we prefer that we only have to update in one place.


Consistency on server, sharing it won't happen instantly on another node. Consistency important? Bank yes, blog no! Nosql - eventually consistent, some time in future. Amazon claims their eventually consistent takes 1 sec for consistency.

Column family db - Groups of columns  similar not same as relational
i.e. Customer data : name address phone number, music data
One or many music interests , age, some don't store age, With relational very inflexible. No schema flexible. Each row (i.e. key) can have a different  set off columns. examples are Cassandra & Hbase

Graph data bass relationships -  Not clustered, Inserts expensive, Nodes and edges, 2 way relationships, examples are Neo4j flockdb

Nosql wrong name, but catchy. Doesn't mean no SQL but not a rigid schema, more flexible to work with. No schema to evolve

Enterprise data no SQL  is not a solution.

Polyglot persistence -  multiple databases to build applicactions just like multiple languages

NoSQL works for application data, also saves programming effort with nosql

Multiple  classifications of nosql databases

Aggregate oriented, Aggregate ignorant - relationship oriented

Data hierarchy or relationship between, Parent child vs traversing, Key value, Document, Column family

Where we are  with databeses ? Use them when we have to store out data.
Only use db when you have a need for it based on requirements
Use it if we have to support concurrency, Multi user concurrency at same time, multiple different applications concurrency at same time

When app turns into enterprise data, people come for your data, Minute becomes enterprise rules change

With relational, Rows and columns relationships tuples, Robust , standard , familiar, popular
Consistency very well , easy access using SQL , adhoc queries

Impedance mismatch . Good 30 years ago with c and fortran

OO is defacto standard now in programming. Objects tear them apart and form them into shape to fit database. OO database don't have to dismantle, no database work at all, Object db solved problem well for problem at hand, but corporate data didn't integrate we'll. Object relational database died

Came along ORM but structure rigid - stability where things do not change fast and takes effort to evolve schema. Centralized data and hard to scale for availability

Thus, what is Different now with these NoSQL databases? As, Relational not affected by oo databases.

What has changed is modem and baud rate gone, now everyone carries devices

Frequency of data is enormous so applications need distributed access to data.

Data we sending to applications more data exchanges moving away from corporate data.

Clustering

Still have Batch processing overnight. Clustering with greater availability . Distribute data into many locations.  i.e. Hadoop

No comments:

Post a Comment