By: Andrew C. Oliver
- Great at consistency
- Not great at scaling
- Usually have lots of servers with many connections to few servers
- Usually have a lot of servers with distributed connections
It was shit, hardware made a great
- RDBMS is based on relational algebra based on set theory
- Not every problem is a set problem "direct Path" or "which thing contains other thing which has this other thing" (FOAF)
- Sometimes relationships are as important as the data
- Sometimes data is simpler than the relational model but needs higher levels of availability
- One size never really did fit all
- The cheapness of storing data has yielded more demand
-
- Economics predicted this
- Moore's law ended while yous slept
- Massive parallelization is the most feasible way to get at it (counter trended with an explosion in disk speeds)
- if
-
- your data is tabular
-
- fits cleanly in a relational model
-
- Don't have scale issues
-
- Don't have a large dataset Then use this RDBMS model
- One db type is unlikely to be well suited for all your problems
- The system doing "Short and sweet" "Lightweight" transactions is your operational system
- The system doing long running reports is analytical and the system doing the light weight queries is operations
- There is also search or so he says, assume this
- Constant time O(1) lookup by key
- good enough for :right now stock quotes
- Usually combined with an index for search
- generally works well with mac reduce
- scales very well
- Mang key-value store
- Keys and values become composite
- Damnit he moves so fast
- JSON documents
- Couchbase, CouchDB, MongoDB
- Key value store that understands the values
- Not as map-reduce friendly, larger data set require indexes
- operational store
- Based on graph theory
- Less about bolume of data more about ocmplexity
- Many are transactionl
-
- Often the transactions are "more correct" than those offered by a relational database
- FOAF, direct path operations are easy
-
- Very complicated / inefficient in RDBMS
- Usually paired with an index for search
- NOSql
- Software frameworks
-
- Pig
-
- ETC
-
- ETC
- excellent chouce for data processing and data analytics
- MapReduce
he moves so fast wtf... Convergence of filesystems and database hadoop HDFS
- Triplestores
-
- Apache Jenna
- OODBMS/ORDMS
-
- Cache
- Persistence
-
- Async / Sync
- Replication
- Availability
- Transactions / Consistency
- "locality" detecting the best node to go to in a multiple instance setup
- Language
Solr Lucene
- Merge index, most similar to a document database
- RDBMS may not scale to your needs, Obviously
- Your data may not map to tables
- Key value store data by, key fast, scalable cant handle complex data
- Column family fast, scalable denormalized, map reduce, good for series
The real important thing to take away is that there is no one silver bullet. You might store data in one method and then load it into another to perform analytics on the data. For example in analysing emails you might store data in a filesystem or document store and then load metadata into a graph search to find certain relationships between data and what has you.