h1. Key-Value database info --(Not everything here is related to key-value database storage.)-- Culling posts tagged #database at ToledoTalk.com http://toledotalk.com/cgi-bin/tt.pl/search/%23database http://redis.io *[X]* - https://speakerdeck.com/sunnygleason/practical-tips-for-using-mysql-as-a-key-value-store http://www.quora.com/Quora-Infrastructure/How-does-Quora-structure-data-models-and-caching-at-the-application-level http://www.quora.com/Quora-Infrastructure/Why-does-Quora-use-MySQL-as-the-data-store-instead-of-NoSQLs-such-as-Cassandra-MongoDB-or-CouchDB Adam D'Angelo, MySQL user since 2004 If you partition your data at the application level, MySQL scalability isn't an issue. Facebook reported [1] running 1800 MySQL servers with just two DBAs in 2008. You can't do joins across partitions, but the NoSQL databases don't allow this anyway. Facebook hasn't confirmed using Cassandra as the primary source for any data, and it seems like inbox search might be their only use of it. [2] These distributed databases like Cassandra, MongoDB, and CouchDB[3] aren't actually very scalable or stable. Twitter apparently has been trying to move from MySQL to Cassandra for over a year. When someone reports using one of these systems as their primary data store for over 1000 machines for over a year, I'll reconsider my opinion on this. << Update as of August 2011: after I wrote this, foursquare reported an 11-hour downtime because of MongoDB. [4] Separately, a friend's startup that was going through explosive growth tried to switch to MongoDB and gave up after a month due to instability. Twitter gave up on the Cassandra migration. [5] Facebook is moving away from Cassandra. [6] HBase is getting better but is still risky if you don't have people around with a deep understanding of it. [7] >> The primary online data store for an application is the worst place to take a risk with new technology. If you lose your database or there's corruption, it's a disaster that could be impossible to recover from. If you're not the developer of one of these new databases, and you're one of a very small number of companies using them at scale in production, you're at the mercy of the developer to fix bugs and handle scalability issues as they come up. You can actually get pretty far on a single MySQL database and not even have to worry about partitioning at the application level. You can "scale up" to a machine with lots of cores and tons of ram, plus a replica. If you have a layer of memcached servers in front of the databases (which are easy to scale out) then the database basically only has to worry about writes. You can also use S3 or some other distributed hash table to take the largest objects out of rows in the database. There's no need to burden yourself with making a system scale more than 10x further than it needs to, as long as you're confident that you'll be able to scale it as you grow. Many of the problems created by manually partitioning the data over a large number of MySQL machines can be mitigated by creating a layer below the application and above MySQL that automatically distributes data. FriendFeed described a good example implementation of this [8]. Personally, I believe the relational data model is the "right" way to structure most of the data for an application like Quora (and for most user-generated content sites). Schemas allow the data to persist in a typed manner across lots of new versions of the application as it's developed, they serve as documentation, and prevent a lot of bugs. And SQL lets you move the computation to the data as necessary rather than having to fetch a ton of data and post-process it in the application everywhere. I think the "NoSQL" fad will end when someone finally implements a distributed relational database with relaxed semantics. http://www.quora.com/LiveNode http://kylerush.net/blog/meet-the-obama-campaigns-250-million-fundraising-platform/ https://news.ycombinator.com/item?id=4842510 *[X]* http://highscalability.com/blog/2010/5/17/7-lessons-learned-while-building-reddit-to-270-million-page.html http://kev.inburke.com/kevin/reddits-database-has-two-tables http://news.ycombinator.com/item?id=4468265 http://www.allthingsdistributed.com/2012/01/amazon-dynamodb.html http://code.flickr.net/2010/02/08/ticket-servers-distributed-unique-primary-keys-on-the-cheap/ *[X]* http://backchannel.org/blog/friendfeed-schemaless-mysql h2. Reddit Lesson 3: Open Schema The essence of this lesson is: don't worry about the schema. They used to spend a lot of time worrying about the database, keeping everthing nice and normalized. You shouldn’t have to worry about the database. Schema updates are very slow when you get bigger. Adding a column to 10 million rows takes locks and doesn’t work. They used replication for backup and for scaling. Schema updates and maintaining replication is a pain. They would have to restart replication and could go a day without backups. Deployments are a pain because you have to orchestrate how new software and new database upgrades happen together. Instead, they keep a Thing Table and a Data Table. Everything in Reddit is a Thing: users, links, comments, subreddits, awards, etc. Things keep common attribute like up/down votes, a type, and creation date. The Data table has three columns: thing id, key, value. There’s a row for every attribute. There’s a row for title, url, author, spam votes, etc. When they add new features they didn’t have to worry about the database anymore. They didn’t have to add new tables for new things or worry about upgrades. Easier for development, deployment, maintenance. The price is you can’t use cool relational features. There are no joins in the database and you must manually enforce consistency. No joins means it’s really easy to distribute data to different machines. You don’t have to worry about foreign keys are doing joins or how to split the data up. Worked out really well. Worries of using a relational database are a thing of the past. The essence of this lesson is: the key to speed is to precompute everything and cache it. The way to make a slow website is have a perfectly normalized database, collect it all on demand, and then render it. It takes forever on every single request. So if you have data that might be displayed in several different formats, like links on front page, in-box, or profile, store all those representations separately. So when somebody comes and gets the data it’s already there. Every listing has 15 different sort orders (hot, new, top, old, this week). When someone submits a link they recalculate all the possible listing that link could effect. It may be a little wasteful upfront, but it’s it better to wasteful upfront than slow. Wasting disk and memory is better than keeping users waiting. hr. http://kev.inburke.com/kevin/reddits-database-has-two-tables/ This fits with a piece I read the other day about how MongoDB has high adoption for small projects because it lets you just start storing things, without worrying about what the schema or indexes need to be. Reddit’s approach lets them easily add more data to existing objects, without the pain of schema updates or database pivots. Of course, your mileage is going to vary, and you should think closely about your data model and what relationships you need. Update, 11:31PM PDT: A former engineer at reddit adds this comment. There isn’t a “table” for a subreddit. There is a thing/data pair that stores metadata about a subreddit, and there is a thing/data pair for storing links. One of the properties of a link is the subreddit that it is in. Same with the comments. There is one thing/data pair for comments and the subreddit it is in is a property. Still today I tell people that even if you want to do key/value, postgres is faster than any NoSQL product currently available for doing key/value. Update, 7:11PM PDT: From Hacker News, it looks like they use two tables for each “thing”, so a thing/data pair for accounts, a thing/data pair for links, etc. https://news.ycombinator.com/item?id=4468265 Note: Steve was exaggerating a bit, or this post misrepresents what he said. I can think of 12 just off the top of my head. That said, many of them fit the same pattern: reddit_thing_link reddit_thing_comment reddit_thing_user ...and each had a complement with "data" in place of "thing". Though there were also ones like: reddit_xref_subscription_user_subreddit reddit_xref_vote_user_link reddit_xref_vote_user_comment ...for the many-to-many relations. That accounts for most of the big ones (though there are also a handful of smaller specialty tables, too.) Also, the above is as of March 2011 -- a lot has moved to Cassandra since then. all "thing" tables have exactly the same schema as each other. Ditto "data" and "xref" tables. It should also be noted that this isn't the only way to do this either. The other three most common ways I've seen to store differently structured data in a consistent way are: * Edge table * Binary table * Universal table The edge table approach maps edges into key:value tables for the type of table. So you might have an integer data table that was ID:Value, and it only stored the data values and a key to identify the value. And then you have a thing table which stored Name:Flag:ID where the flag column said "int" and the ID column then contains the key of some row in your integer table. There are as many tables as there are types of data, + 1 table for the definition of all things stored. The binary approach groups values of the same name into a table of that name. So if you have a property called "username" you'd have a table called "username", and this approach usually still includes the edge approach such that the values are stored in tables with ID:Value structure by type of value. Effectively partitioning the value table by name of the property being stored. There are as many tables as there are types of data, + the count of unique property names. The universal table approach uses a single table to store all data, effectively containing all of the edges in a graph. It's a conceptual full outer join of all binary tables. Imagine 100's of columns, and for each row the vast majority of those are NULLs. The triplestore can be thought of as a condensed version (not type-safe) of the universal table. More space efficient, but usually at the cost of the datatype safety offered by the database. You can combine these methods to store any type of semi-structured data in a traditional RDBMS and these approaches are used a lot if you know where to look for evidence of them. SharePoint uses a version of the universal table. Oracle (for the ORA_XML storage type) uses a combination of the above under the hood to map XML into traditional database tables. They all have pros and cons relating to query speed (for certain types of query), storage efficiency, indexing, types, etc. And as always you need to know why you're choosing something. SharePoint also partitions their universal table dynamically. But they start with saying that each "SiteCollection" has it's own database/tablespace. As SiteCollections can contain a single site, this is basically equivalent to Reddit creating new tables per subreddit. Elsewhere (up or down on this page) I saw that the SQLAlchemy author advises not to do this kind of thing. I simply say don't do if you don't know strongly why you should do it. Don't cargo-cult... go and read up on the problem space and only implement one of these solutions when you know the effects of doing so. This is one of those times when analysis paralysis can be your friend and prevent you from building a mess you didn't need. If anyone starts with one of these designs and you haven't got a decade of experience to know why... you are probably doing it wrong. -----