Monday, June 13, 2011

Database evolution ACID --> BASE --> DIRT

A good introduction to database evolution and why node JS is most appropriate for DIRT applications.



Monday, May 23, 2011

MongoSF - MongoDB conference notes

The 10Gen people were organized and efficient as registration was smooth and friendly. The information packet received at registration included an updated agenda and a map of the venues. Swag bags were available in the back. Some pretty cool (and some useless) swag was included (a USB drive in the shape of a person called "USB People"). I did get a nifty MongoDB coffe mug and snagged some vendor T-shirts. About a dozen vendors were present mostly pushing there cloud PaaS services. Redhat, VMWare, dotcloud, and bunch of smaller start ups.

PaaS was a big theme beyond just MongoDB. Also, real-time, event driven web with nodeJS was also a hot topic of discussion and popular sessions.


Monitoring & Queuing MongoDB
9:30am - 10am (30m)


This talk will contain 2 topics: 1) How to monitor your MongoDB cluster and what to look out for to prevent explosions. 2) How to build a redundant, scalable queuing system using MongoDB.At Boxed Ice we throw 3.5TB of data into MongoDB each month, which results in processing billions of documents. Fun times

David Mytton
(Boxed Ice)

David is an entrepreneurial programmer based in the UK. He is currently working on a server performance monitoring tool, Server Density, through his startup, Boxed Ice.


My Notes:


Boxed Ice is using RabbitMQ for alerts on background processing. But RabbitMQ has no native failover. This was the primary reason they are exploring using MongoDB to provide a persistent data store for recovery.


Basically, they wanted the following:

  • Redundancy
  • Atomicity
  • Speed
  • Garbage collection


They chose to use MongoDB over RabbitMQ based on experience working with mongo versus RabbitMQ Not wanting to add yet another system to learn, they decided to use MongoDB exclusively.


Next the presenter talked about monitoring performance in MongoDB.


In memory is always faster than disk. MongoDB has an explain method, use this to determine indexes and disk i/o -- whether operations are reading/writing from/to disk.

Regarding storage, MongoDB pre-allocates in 2gb increments.

When considering sharding, max size assumes capacity same on all nodes. Best to set to 70% of memory capacity.

Rotate your logs, don't let them get too big.

Use journaling and don't go over max 1gb.


To determine used connections: db.serverStatus(). Always use use connection pooling.

connPoolStats

indexCounters, from db.ServerStatus()

Op counters, fsynch setting, config slaves to handle reads


background flushing


Dur


rs.Status() (replicaste set status)

myStatus


Optime, last updated


heartbeat, last comm with members


An overview of the mongostats command presented.


Concerning the value of faults, high values implies not enough ram for indexes/data.


If status is 'locked', this causes queuing and signifies an index miss. Excessive queuing will cause performance issues.


Other useful mongostats commands:


db.currentOp()

db.kill


Boxed Ice runs a site called mongomonitor.com that offers a DB monitoring service for MongoDB instances:


http://www.serverdensity.com/mongodb-monitoring/


In summary:


  • Keep indexes in ram and as much data as possible.
  • Watch storage usage, both disk and ram.
  • Monitor status


Questions from audience/me:


Rabbitmq vs MongoDB thru put?

5000 msg/sec vs 2000 msg/sec

Order of magnitude slower, but not bad.


Someone from the audience asked whether the presenter was aware of something called 'Celery'. Celery offers a way to cluster RabbitMQ instance using MongoDB as a data store.


http://celeryproject.org/


OpenShift
10am - 10:45am
(45m)

Tobias Kunze
(Red Hat)

My Notes:

This session is probably going to be the most 'markety'. By that I mean the most full of marketing type speak. I hope note. I really want to know the technical details as much as possible.

OpenShift - Currently is 'developer preview', i.e., 'beta'. It's still early in usage.

How openshift is used withing opneshift PaaS.
Vrishna from RedHat is giving a live demo.

Slides begin, why PaaS?
Dev needs: stop dealing with the stack focus on application development.
Operations needs: focus on service, not deployments.

Server setup takes a long time, stealing development time
Operations, 50% of time spent on deployment. Each organization has it's own issues, but mostly all have 'know issues' that require manual intervention. This eats time away from things that add value to the organization.

Speaker just said: "getting the right abstraction," Yeah, this is a marketing speech.

Dev: Open shift offers "open source ecosystem"
Ops: Open shift offers cloud management

Open shift offers a "platform kernal" they refer to as 'fabric'

Someone just haded me a "Shift Happens" sticker. Ha Ha. I get it.

Distributed apps slide
Composite apps, many apps communicating through services

Rightscale, template as a service, cloud config

PaaS Types:

Middleware
Frameworks, Heroku, vmforce


Opensource, middleware+framework models

Makara, the creators of Openshift (makara bought by redhat)

Openshift Express product: free, git-based deploys. Interaction 'runtime as a service'

Openshift Flex: premium service offers nodes, middleware, frameworks service
uses mongo (vs redis)

MongoDB is agile and scalable, failover built in

"Write now, design later". This refers to the schema-less nature of document storage.

Demo Krishna Raman

kraman/mongo1 on git

Openshift uses EC2

Standard EC2 cluster setup
create your application
java/mysql app
select the components, tomcat, mysql
downloads, installs components, mongo is available too
supports tomcat, jboss apps

Didn't really learn anything new here. Already familar with this stuff so far.


At 11am, couldn't decide which session to go to! "Storing and Querying location data" or "Schema Design at Scale". I wish more folks from my company could have gone. It would have been much
easier to cover multiple threads.

Storing and Querying location data with MongoDB
11am - 11:45am
(45m)

Looking to store and query location data? MongoDB has you covered. Learn how to structure, and even shard your geo data, along with an unlikely use case: an infinitely large board game!

Grant Goodale
(WordSquared)




My Notes
Ok, this gut is super hyper-active and extremely excited about his work. Basically it's an infinite multi-player Scrabble board played online. It
uses nodeJS, html5, and MongoDB (single instance)

It uses geospatial indexing, calculates across 'units' (the playing squares).
Mongodb 1.9, multi geolocation coming

geo2d in 1.8
$geoNear
get result sets for geo lookup
use ordered hashes
returns collection ordered by sorted distance
query within region
$box query
$center query
MongoDB version 1.9 will support polygon searches
$nearSphere, $centerSphere
uses radians, not native units
position is long, lat

geojason standard

67 million records, in one node
activity is mostly in the edges, not all records are active
Current just using master/slave, no sharding

massivelyfun.com

The scrabble like game "Word squared" game looked pretty cool:
http://massivelyfun.com/

Schema Design at Scale
11am - 11:45am
(45m)

Eliot Horowitz
(10gen)

Eliot is CTO of 10gen, the company that sponsors the open source MongoDB project. Eliot is one of the core MongoDB kernel committers. Eliot is also the co-founder and chief scientist of ShopWiki. In January 2005, he began developing the crawling and data extraction algorithm that is the core of ShopWiki's innovative technology. Eliot has quickly become one of Silicon Alley's up and coming entrepreneurs, having been selected as one of BusinessWeek's Top 25 Entrepreneurs Under Age 25 in 2006. Prior to ShopWiki, Eliot was a software developer in the R&D group at DoubleClick. Eliot received a B.S. in Computer Science from Brown University.


My Notes:


Missed this presentation because I attended the geospatial talk...


Shell Hacks
11:45am - 12:30pm
(45m)

Scott Hernandez
(10gen)

My Notes:

This talk was mainly demo, so I was mostly watching trying to absorb the discussion. The scripting shell is pretty much used like a Toad would be used in the Oracle world to prove out your SQL. In this case, working with MongoDB, the scripting language is Javascript. Most everything that is accessible via programming API, you can so in the MongoDB JS shell.

MongoDB for Java Devs with Spring Data
1:30pm - 2pm
(30m)

The Spring Data project provides sophisticated support for NoSQL datastores. The MongoDB module consists of a namespace to easily setup MongoDB access, a template class to provide a nice API to persist and query objects as well as sophisticated support to build repositories accessing entities stored in a MongoDB. The talk will introduce the Spring Data MongoDB support and present the features in hands on demos.

Chris Richardson
(VMware)

My Notes:

Spring Data contains support for nosql databases

The presenter was demonstrating how Spring Data can hide much of the boiler plate and plumbing code that is required with data access technologies like JDBC and JPA. Spring Data also has rich support for NoSQL databases as well and MongoDB in particular.


Cloud Foundry --> SpringSource --> VMWare

Chris Richardson was the founder of Cloud Foundry which was acquired my SpringSource. SpringSOurce was subsequently acquired my VMWare. A case of big fish consuming little fish which was consumed by an even bigger fish.

Spring Data support for MongoDB

Map from java to mongo documents using annotations.
Can also use relational/jpa and mongo data sources through one access layer via MongoTemplate. This was refered to as "cross store persistence".

The MongoConverter interface is used to implement with your domain objects to read/write java to mongo documents.


MongoRepository class defines CRUD methods.

Annotations are used for java to MongoDB document mapping.
@Id, @Indexed, @PersistemceConstructor
@GeoSpatialIndexed

Spring Data has support for QueryDSL, a domain specific language for database queries. This allows a consistent way to query underlying databases. Coupled with the cross store persistence, this is a powerful combination that would enable the introduction of MongoDB without disrupting DAL code.

Query DSL is generated from domain model class and produces type-safe composable queries.


So, to sum up, cross-store persistence allows jpa/relational _and_ document (MongoDB) data stores.

The presenter then demonstrated a grails/MongoDB sample that highlighted the ease at which the domain and data access models were blended.

2PM is another tough choice between "Scaling and Sharding" or "Geospatial Indexing..."

Scaling and Sharding
2pm - 2:45pm
(45m)

Eliot Horowitz
(10gen)

Eliot is CTO of 10gen, the company that sponsors the open source MongoDB project. Eliot is one of the core MongoDB kernel committers. Eliot is also the co-founder and chief scientist of ShopWiki. In January 2005, he began developing the crawling and data extraction algorithm that is the core of ShopWiki's innovative technology. Eliot has quickly become one of Silicon Alley's up and coming entrepreneurs, having been selected as one of BusinessWeek's Top 25 Entrepreneurs Under Age 25 in 2006. Prior to ShopWiki, Eliot was a software developer in the R&D group at DoubleClick. Eliot received a B.S. in Computer Science from Brown University.


Geospatial Indexing with MongoDB
2pm - 2:45pm
(45m)

Greg Studer
(10gen)

Greg works on various aspects of the MongoDB core server. Prior to 10gen, he completed a PhD at the University of Sussex and Masters at Cornell where he studied multi-agent simulation and computational scaling through assembly. He began his career working on various projects at IBM related to enterprise configuration and modeling, and legacy systems integration.


My Notes:


Rich set of storage and query of geospatial coordinates. Spherical coordinates are the most accurate. Version 1.9 supports multiple locations within a document.

MongoDB as a data integration layer between Apps in Cloud Foundry
3pm - 3:30pm
(30m)

In this talk we will discuss a new design pattern for building applications that consist of many small apps that work together to appear as one website. Cloud Foundry is a PaaS that supports many languages and frameworks as well as many services and data stores, one of which is MongoDB, We will talk about a new pattern enabled by this architecture where you can write different pieces of your application in multiple different languages or frameworks and use a shared MongoDB instance provided by Cloud Foundry's data service layer as an integration point between all the app parts. Not only can you store data in MongoDB for display on the web pages of your app, but you can use MongoDB as a message queue or logging device between apps as well as a shared data structure container. So with this pattern you could have written your main application ui in Ruby on Rails but you may have a few web services or applets written in sinatra or node.js as well as perhaps a Spring java application doing some heavier number crunching or data processing. And use Mongodb via Cloud Foundry's Services layer as an integration point between all the pieces of this application in order to make it appear as one app to the outside world.

Ezra Zygmuntowicz
(VMware)

My Notes:

This talk was about Cloud Foundry (cloudfoundry.com). What was cool was the concept of a "micro cloud", essentially a cloud on a vm that a developer can use to code and test cloud solutions before committing to a pre-production or production cloud.

There was a brief demo using vmc commands to build cloud instances and pushing apps to instances.

SpringSource Tool Suite (STS) has a plugin for cloud foundry.


Cloud Foundry supports all the favored languages and HTTP servers. The presenter was touting the use of MongoDB as the common "glue" or persistant state manager of application data between disparate/specialized concerns. Fast, non-blocking HTTP with nodeJS, coupled with Rudy for site metrics and your language of choice for business domain objects, services and front end rendering.

I thought there was going to be more 'meat' provided on this concept as it was the title of the session, but most of the content was a pitch for Cloud Foundry and why you should use VMWare for cloud support.

Rapid Realtime App Development with Node.JS & MongoDB
3:30pm - 4:15pm
(45m)

Jump on board to learn about combining two of the most exciting technologies to quickly build realtime apps yourself. This talk will introduce the popular Node.js library, Mongoose, which is a MongoDB "ORM" for Node.js. First, the speaker will deliver a quick primer on Node.js. Then, he'll walk you through Mongoose's schema api, powerful query builder, middleware capabilities, and exciting plugin ecosystem. Finally, he'll demonstrate some realtime capabilities using Node.js and Mongoose.

Brian Noguchi
(Shortrr.com)

Brian Noguchi is a software engineer in San Francisco. He is the founder of Shortrr, which helps you save time reading the flood of content online. Prior to that, he was the founder and lead engineer of Trendessence, which sold advanced Twitter analytics solutions for the enterprise. He is currently focused on designing and building a realtime framework on top of Node.js that he hopes will do for realtime app development what Rails and Django did for web app development. He is a core contributor to Mongoose, the popular MongoDB package for Node.js, and he is also the author of several other popular Node.js packages. You can find his open source work on github.

My Notes


This was the most crowded session by far. It was mainly about Mongoose, a MongoDB plugin for nodeJS.


http://blog.learnboost.com/blog/mongoose/


There was a brief introduction to nodeJS. As most in attendance were already familar with nodeJS, not a lot of time was spent here. I suggest you look at the offical web sites and blogs for details:

http://nodejs.org/


Who uses nodeJS: yammer/github/netflix/learnboost


nodejs = realtime, evented , non-blocking I/O


nodeJS is fast, although it's JavaScript, server side js

One advantage is it can share code between server and browser. Not sure I get that though. Aren't those very different kinds of concerns? Can UI developers use server based algorithms?


nodeJS is an active community

lots of packages supported


recommended packages: express, jade, socketio, mongoose


express=sinatra

jade=template engine http://jade-lang.com/

socketio, name implies it's use http://socket.io/

mongoose= package to support mongoDB which was developed by learnboost http://blog.learnboost.com/blog/mongoose/


mongoose, gives casting, validation, and crud methods

mongoose.schema


crud

c,. save

r, .find, .findall, .findone

u, .find and .save

d, .remove


schema types

string

number, increment, decrement both atomic

objectid

arrays js array syntax, push, pop

comments[CommentSchema] embedded documents!

defaults

validations required:true, enums

custom validations by passing closures/functions like validate Title, 'myTitleValidate')


indexes

index:true or unique:true


nested documents


virtuals


schema.virtual('name.full')


return first+last


advanced querying


mongoose where.[].where[]


namedScopes


scope.name.query


dynamic named scopes defines functions which accept parameters



middleware


pre/post event methods are passed a callback to execute before a save and after a save

need to call next() to maintain atomicity



schema.connectSet for replica sets


.explain is supported


Question from audience: geospatial supported? use chain builder


plugins

mongoose-types (email, for email type validations)

mongoose-auth offers facebook, twittter, git etc auth login capabilities

mongoose-solr is in the works


schema.plugin(aplugin, attribute hash..


Mongoose code is available on github learnboost


See the mongoose blog for more details: http://blog.learnboost.com/blog/mongoose/


How do you unit test? Use espresso package: http://visionmedia.github.com/expresso/


Application Design with MongoDB: 4 Examples

4:30pm - 5:15pm
(45m)

Kyle Banker
(10gen)

Kyle maintains the MongoDB Ruby Driver and supports the Ruby developer community. Previously, Kyle built e-commerce and social networking applications, and he once worked as teacher of languages and literature. Kyle has presented MongoDB in numerous forums, and he's the author of the forthcoming book MongoDB in Action.


last talk of the day...


Quite honestly, this presenter was over animated and went through his presentation deck at lightening like speed. So I didn't really gleam a lot from this session. Whatever notes are here are of little value I fear.

The first sample problem was how to represent a category hierarchy in a document database. The presented explained the problem and how parent child relationships needed to be defined and modified. Then he presented some methods to for querying for ancestor and descendants and how to keep hierarchies up to date. He show some 'tricks' like using:


$positional operator, updates all instances with the db


The next sample concerned analytics and his suggestions for pre aggregate your data and to keep ephemeral data separate. Age it out, in a separate store.

Collections: days, months
Scope based on day and month durations
Use a composite index, which in this example is uri + date


Next was how to store binary data.
BSON has a bindata type
Also gridFS is a driver level support for large data files, chucks files to 256kbyte chunks
These can be sharded as binary data too.


Last but not least was a transactional model. Although the argument was are transactions actually ever required? Since MongoDB does not support them, how do you handle situations where you might need them? The answer is use compensation model.