May 1, 2012
0 notes

CQRS and Hook.io

Hook.io is popular library for node.js, which allows you to build up distributed cluster of applications (hooks) which communicate by sending an events to each other. Command and query responsibility segregation (cqrs) is the way to design your server, where write and read side are completely separated. In last few weeks I’ve found those two perfectly match each other, allowing you to build up decoupled, scalable and fault tolerant applications. I expect reader to be familiar with both hook.io and cqrs. If you’re not, links above would be a great start.

Look into architecture

To build up your application, you would split it into multiple pieces - hooks. Each hook is a process with a single responsibility (the pattern is used in OOP as single responsibility principle, although its well suitable on the processes level as well). Each hooks does its job and emit event if something happens. Another hook, can listen for an event and act on that. Its basically observer pattern and thats the way hooks talk to each other (inter process communication is implemented in hook.io and is based on dnode).

In case of CQRS system, the application was distributed into 3 main hooks:

Architecture diagram

Fig.1: Green boxes (webserver / domain / denormalizer) represent separate processes (hooks). Each of them works separately and send events (green arrows) to other components. As you can see, events and read view data are stored separately.

Domain

Domain is responsible to encapsulate the business logic. Its a write side of the application stack.

Domain architecture

Fig.2: Domain architecture. The domain has single command bus and multiple handlers (one per each command). Each handler can work with multiple aggregates and orchestrate processing of the command.

Domain listen for a specific set of hook events, where each trigger one command. Command is sent to a command bus, where appropriate handler is chosen. Handler executes the command by calling public API of aggregates (business unit in CQRS system) and each time command result in change of the system state, aggregate emits an event (This pattern is called event sourcing), which is stored in the event storage and emited from domain hook for interested parties.

Denormalizer

Data in the event storage would be hard to use to build up the views. Transforming the events to “read-ready” data (the data displayed in UI - reports, forms, …) is the job for denormalizer. Each time new event arrives, it takes the last snapshot, replay the new event and store the new snapshot (there can be even multiple snapshots with different version being built in parallel).

Denormalizer architecture

Fig.3: The denormalization process can be done either synchronously (directly on new event arrival) or asynchronously (sitting behind the queue, triggered once per X events or once in a given time) if you are ok with the data being stale for a reasonable amount of time. Both approaches can be used for different views.

Webserver

The webserver responsibility is to server the requests. It doesn’t matter which framework, protocol or data type is used. All what is done by webserver is processing the incoming request, build up an event with command to domain (in general when some data need to be saved) or query the read model (if some data need to be loaded from server). Notice the data were already denormalized for a read side, thus loading the data by webserver is trivial and very fast. It mostly does simple query by an ID to the read data storage.

Why cqrs and hook.io?

Scalability

In given configuration, you are able to scale each system component separately. You can choose a different strategy for each or use a different database for your events and for your read data models as well. With a hook.io you can easily distribute the components across your server infrastructure.

Fault tolerance

With a recent changes there is no master hook, thus the system keep running if any hook fail down. On top of that with an autoheal feature, system can try to heal itself by spawning new process after failure. Because hooks are very light-weight processes, they can be spawned in a very fast way, ensuring your system keeps up and running with a minimal down time.

Decoupling

You have well decoupled architecture where your delivery mechanism (webserver) and data persistancy (denormalizer and read model) are decoupled from your business logic (domain). The biggest benefit in my eyes is, that you are allowed to focus on a single thing in a given system component.

Easy webserver switch

Because webserver only responsibility is to handle requests, you would have no problem to change it for another one. It would make sense as well to have multiple webservers side by side. One can serve the HTML pages while other might focus on JSON API (if your HTTP server goes down, your API users wouldn’t worry at all).

Ability to change read data

It happens quite often these days business requirements change significantly in time. Data your users wanted to see might not be valid anymore thus you will have to adapt as well. Event sourcing allows you to drop the read data completely and reconfigure your denormalizer to build up another read data (as long as necessary information is contained in your events). This mean you can react instantly and project changes even to the past.

Conclusion

I hope I’ve listed reasonable set of arguments, why it might be worth to use cqrs with hook.io. I am not saying its silver bullet for everything but might be well suitable for many systems.

Project node cqrs (v0.5.1) is still quite a lot under development and would probably be considered as naive by C# or Java enterprise guys, where CQRS is used more often, although its light weight and supports the described architecture already. I am working on another set of features and if someone is interested in more details about implementation with hook.io let me know.

Comments
Mar 3, 2012
0 notes

OSX faster terminal startup

I was wondering, if there is a way to speedup the terminal startup on my OSX. Its a task I do daily many times and each second matters. Especially if fix is as easy as

sudo rm -f /private/var/log/asl/*

The command cleans up the system logs, which makes my terminal startup almost instant. No delay anymore. yAAy!

Comments
Jan 18, 2012
0 notes

Javascript and CQRS

CQRS is the system architecture coming from enterprise world, mostly from Java and C#. Its an system architecture, which base on event sourcing and domain-driven development. You can find pretty detailed description of the concepts online, but when it comes to CQRS and Javascript, there is not too much written.

First I’ve started with CQRS implementation on server-side. I’ve choose the node.js and created simple, yet working basic implementation https://github.com/petrjanda/node-cqrs. It might come handy if you think about full-stack javascript system although following architecture is backend language agnostic.

CQRS with frontend MVC

Lets go straight to the point and take a look at the picture, which illustrates system architecture diagram divided into three major pieces - domain, views and frontend client.



In general this is nothing else, then other CQRS systems. There are few hints which might be useful, especially when you have a rich frontend application in Javascript:

  • Keep your business logic in domain and have view specific logic in the frontend app. Actions like filtering, ordering data or switching between different view perspectives belongs to frontend. Call your backend with a command in the moment, user does action you wanna keep track about.
  • Build frontend models around the views DTO. This allows you to utilize system, where data from view database get mapped to DTO, travel to your frontend and then in automated manner gets parsed into the model instances.
  • JSON is the great choice for DTO and can be used for your views data as well. It might lead to situation, when Read model can be completely removed. View data might be directly served as DTO.
  • Execute your business logic on backend and use query side to load updates you need afterwards. Remember, views might be a bit behind (they are built asynchronously). To deal with it, you can build event handler, which through websocket sends a notification to frontend about data being ready.

Conclusion

CQRS systems are fundamentally different and offer alternative to traditional data-driven systems. It works quite well with frontend Javascript MVC, especially when you wanna keep larger part of your business logic in server-side. Its not silver bullet though, but has a lot of potential.

Compared to standard CRUD oriented architecture, this has couple benefits:

  • clear separation of read and write part of the system lowering coupling and complexity
  • higher scalability of system, where either read and write side can scale differently
  • there is single behavior oriented business logic component inside your domain
  • advanced binding mechanism can work to transfer your data between view database and user interface views
  • you can easily divide work and outsource components or split work into multiple teams

Besides advantages there are some cons:

  • your business logic lives mostly on server which might not be desired for some applications
  • its not so widely used architecture, lack information, examples and information sources in general

Comments
Jan 8, 2012
7 notes

Behavior oriented systems

Every day, new server applications are born and I think I am able to guess, how most of them looks like. They have a multiple resources, which represent the objects structure in domain, have database tables, to store information about them and do a lot of sql manipulations on the data, using the sql directly or wrapped by ORM. Server system usually expose the API, which can be used to manipulate that data by calls from the client system. Such a call usually request set of the resources to be loaded or do modification over some of the resources.

This architecture might be perfectly valid for many situations, although I start to think its used in some cases where it dont fit so well. My question would then be. Did we made a rational decision, to use such an architecture?

We are data (structure) driven

Since the very beginning I’ve started programming server systems I’ve been told, shown and educated, this is perfectly viable architecture. In last few weeks I’ve started thinking why is that so? Why do we consider such a architecture as the one, we should implement to our systems?

I would search the answer a bit in a history. In early 90’s.

The most popular database systems, were RDBMS (rational database management systems) and still are very popular. The core idea is to persist data in tables and link them together with (primary and foreign) keys. Common operations are selecting data from one or more tables and insert, update or delete the rows in one of them.

I think design of RDBMS drive most web systems architecture. When you take a look at the HTTP protocol design, you will see this reflected in form of get / post / put / delete methods, which are specifically designed to do such an actions over the resources (REST). To go far, this resource is in most system persisted in relational database table. When you take a look around to frameworks, database systems, protocols and another parts of the ecosystem, you will notice such an influence of RDBMS is everywhere. The web is build to wrap around the RDBMS resources (=tables).

Example: You are requested to build a system by a Bank. This system has an Accounts. You model it to database table to store meta information and the current balance. Each time money is withdrawn or deposited to your account, you update your balance. Now, someone will ask you: “I wanna know what transaction happened on my account and if it was the cash operation or bank transfer”. Well you add the Transaction model which is gonna store that. Let me stop here.

Why do we model the transaction in a same way as account? Is the transaction the real world object the same way as account is? I would answer no, its just an operation on the bank account. Then why to store it the same way as the Account?

Such a system architecture is data or even storage driven. I think its all oriented to the problem “how is my data stored” instead of “how do my objects behave”. You define your objects around data, which need to be stored and you are no longer focused to model objects and their behavior.

Bahavior driven systems

Lets think about our system behavior driven. Consider your system has objects, which are able to take the commands. Execution of such a command depends on the object state and results in an exception in case it failed or the event if successful. Event will carry all the information about the command. We store those events = we store the original intention. The storage then looks like an very long ordered stack of events. These events are never deleted (so you never loose any information) and when you wanna correct a mistake, you add new event. This technique is called event sourcing.

Example: In our bank system, Account object will take commands withdraw or deposit. The withdraw command will first check if there is enough money (check state) and throw error in case there is not enough money. Event then store the details: what action happened, what amount of money and the cash / transfer information.

When we go back, in our data driven system we had stored a snapshot of the object data in the table. Each time something happened, we did update a snapshot. With our behavior driven system, we have just a list of events. What now, if we wanna know the state of the object? We wanna know what is the current balance on our account.

Answer is simple, create new instance of your objects and replay all the events, which happened. Because each event carry necessary information, if you carefully apply them in a right order, you have to get exactly same state as you had in your data driven system.

Benefits

The one huge benefit is, that you can build any structural look on your data, so you can even ask different questions to your system. You can update the way events are applied to your object and get different kind of output. You can do that, because you have captured user intention, not the snapshot of the data in a way you were looking on them in past. You can ask the new questions and you can ask them even backwards!

What I think is even bigger benefit is how your domain object looks like. As a bonus and result of event sourcing, your object consist from commands and event handlers. Its no longer oriented how your data is structured, it now cares about behavior. Command execute the behavior, but doesn’t mutate object state, it just emits event or throw exception. The event get applied back to the object separately in handler. The handler update the object state and doesn’t do any more conditional logic. It just apply what happend in a past, so there is no more place for decisions anymore. You now have behavior and state mutation separated. Thats huge gain.

You might notice that i didn’t talk about storage system in behavior driven system at all. I did it on purpose, because it doesn’t matter. All your storage has to do is to store list of events. You can use sql or key-value systems, or you can even build your own. The storage is very simple and dont drive your system at all! Its just an implementation detail.

Conclusion

I dont wanna end this up in a way that behavior driven system are all good and structural systems are all evil. Its not a silver bullet and its not black and white. I would look at that as an option to model the system, you wanna build.

I am currently busy with event sourcing implementation in node.js and also work on a bit wider system, known as CQRS. Take a look in my implementation on github:

https://github.com/petrjanda/node-cqrs.

Comments
Dec 11, 2011
13 notes

What is (really) MVC?

If you talk to the web application developers these days, you will realize that almost everyone talk about MVC. Everybody use it, encourage other to use, but sometimes also misunderstand what it really is.

MVC is a software architecture pattern which splits app into three parts: model, view and controller. The initial definition and target is clear, although the implementation of MVC in various situation might be very different. In web world there are mostly two main types of applications: server-side and client-side. MVC is very different in there.

Server-side MVC

When I speak about webserver applications in this article, I mean the applications running on top of HTTP webservers. Their responsibility is to handle incoming requests and generate proper response. These systems are generally designed to be stateless (there ofc is the way how to maintain state through sessions, but thats an add on). Its pretty straightforward to implement server-side MVC.

Server-side MVC

Control flow:

Incoming requests are handled by webserver and through dispatcher / router land in your controller. Controller call methods on models, to update or get state of the data (for instance by fetching records from database). The data from models are then used to generate the view. The result of view rendering is the content, which is sent back as the response to the request. At the end of the request processing webserver returns back to its initial state and wait for another request.

Concepts:

  • Controller is the “driver” of the application
  • Model layer is “fat”, handle business logic and maintain state of the application
  • Controller layer should be “thin” thus its purpose is to get model data and render view only
  • View layer define the template for the response. It shouldn’t be in direct contact with model layer.
  • View layer is stateless

Client-side MVC

Client applications deliver the user interface and are the frontend for the server apps. Client application have to maintain state. In different states, application display different set of UI elements to the user and allow him to do different actions.

Control flow:

User interact with the UI and start actions. Action are propagated to controllers (in various forms - method calls, events, bindings - depending on framework you use) and controller manipulates the model layer. Once the model state is updated, it fires an events and update the view layer.

Client-side MVC

In case client application communicate with server, it might happen that model layer state is updated on behalf of server (model layer finished loading data from server, server pushed some data through websocket, etc.). In this situation, model layer trigger the event to update view as well.

Concepts:

  • View layer is the “driver”. It takes the user actions and initialize state updates.
  • View layer has state. State is updated based on the user actions (if your app communicate with backend, then the app model data can change and update UI state as well).

Conclusion

You can see that server and client MVC has very different concepts. Situation is even more complex, especially on client side, because there is even multiple frameworks, which base on MVC, but add more layers or adjust the MVC itself to something new. Server application frameworks almost exclusively follow the flow described above (Rails, Django, Symphony, Zend, …) so usage is quite clear. In one of the upcoming posts i will update you with architecture behing Backbone.js, Batman.js or Sproutcore, which are the client side frameworks and each update the MVC in its own, different way.

Comments

Blog by Petr Janda. I am software engineer focused on web technology stack, able to drive projects and deliver cutting-edge application. Evangelist and lover of Javascript, Ruby and Agile development.

I am interested in software architecture, distributed systems, machine learning and genetic algorithms.

Published with Tumblr and the Infinity theme by Kempstumblr.