NoSQL Study Notes

Relational Database

Relational databases have been the focus of intense academic research and industrial improvements for more than forty years.  A relational database is an excellent choice for query flexibility. They are good for data that is fairly homogeneous and conforms well to a structured schema (defined columns in a table).

Weakness of Relational Database

Partitioning is not one of the strong suits of relational databases like PostgreSQL. If you need to scale out rather than up (multiple parallel datastores rather than a single beefy machine or cluster), you may be better served looking elsewhere. If your data requirements are too flexible to easily fit into the rigid schema requirements of a relational database or you don’t need the overhead of a full database, require very high-volume reads and writes as key values, or need to store only large blobs of data, then one of the other datastores might be a better fit.

Searching

Levenshtein is a string comparison algorithm that compares how similar two
strings are by how many steps are required to change one string into another.
In Oracle run the utlmatch.sql package, in MySQL you must define your own stored Function.

Full-text searches can be done with Inverted indexes used by search engines like Lucene or Sphinx.

We’ve inched toward matching less-specific inputs. LIKE and regular expressions require crafting patterns that can match strings precisely according to their format. Levenshtein distance allows finding matches that contain minor misspellings but must ultimately be very close to the same string. Trigrams are a good choice for finding reasonable misspelled matches. Finally, full-text searching allows natural-language flexibility, in that it can ignore minor words like a and the and can deal with pluralization. Metaphones, which are algorithms for creating a string representation of word sounds.

AWS

Free Account includeds: https://aws.amazon.com/free/

https://aws.amazon.com/registration-confirmation/

Console

http://aws.amazon.com/console/

ssh -i Redhat.pem ec2-user@54.68.61.25

scp -i ~/Redhat.pem f ec2-user@54.68.61.25:

Riak - Distributed key-value

Riak is a distributed key-value database where values can be anything—from plain text, JSON, or XML to images or video clips—all accessible through a simple HTTP interface.If you’ve ever used Amazon Web Services, like SimpleDB or S3, you may notice some similarities in form and function. This is no coincidence. Riak is inspired by Amazon’s Dynamo paper.

With the curl command, we speak directly to the Riak server’s HTTP REST interface without the need for an interactive console or, say, a Ruby driver.