jnbds

Last Modified: 2007.08.17

This is the home page for the Java Network Block Device Server, jnbds for short. There is also the Sourceforge jnbds project page, which has many project related tools and the source code itself.

Warning: this code is alphaware. Look at it the wrong way and it will corrupt your data and dump core all over your disk. Still, in my limited tests it can read and write data correctly.

Introduction

The following information was taken from the project registration description:

This project will develop a Java Network Block Device server. An NBD server allows a computer to act as a block device to a client (typically Linux) over a network. The client can format the emulated block device as any number of filesystems and use it transparently. The use of Java for an NBD server is appropriate because of Java's cross-platform nature, object-orientedness, threading, network support, and security features.

The goal of this project is to create an open source Java NBD server which:

Anticipated difficulties include understanding the NBD protocol, Java's threading capabilities, and implementing the more exotic backends.

Overview

In essence, jnbds turns a supported Backend into a block device. A "block device" is a device which reads and writes data in "blocks" of 512, 1024, 2048, or more bytes, and is indexed by whole numbers (0, 1, 2, etc). To a linux client the use of TCP/IP, the NBD protocol, and jnbds is transparent -- it can read and write blocks just like it can with a local hard drive and therefor it can format the emulated block device using any of its filesystems.

jnbds is compose of three layers:

Server
Handles the nbd protocol. Talks to the RangeHandler.
RangeHandler
Handles the logical naming of blocks and tracks accesses. Talks to the Backend.
Backend
In actuality, a tree of Backends, with each node/Backend having a different capability. The "leaf" Backends turn services (file, ftp, S3) into "block devices". The other Backends add features like transparent compression or caching.

As a way of explaining how a service can be turned into a block device, the FTPBackend can be looked at. Simplified a great deal, it basically treats each block in its care as a file with the name "0", "1", "2", etc. Thus, reads and writes are to these files and the linux client is none-the-wiser.

Server
	RangeHandler
		MigrationBackend
			FileBackend
			S3Backend

Using jnbds

A quick-and-dirty guide:

  1. Server: create the empty 10MB file: dd if=/dev/zero of=nbd_file bs=1024 count=10240
  2. Server: start jnbds: java ReadWriteJNBDS -p 0 -s structure.xml
  3. Linux client: insmod nbd.ko
  4. Linux client: create the nbd node: mknod /dev/nd0 b 43 0
  5. Linux client: get the nbd-client from the Sourceforge NBD Project and do a configure and make
  6. Linux client: start the nbd-client: nbd-client server-ip port /dev/nd0
  7. Linux client: format the node: mke2fs /dev/nd0
  8. Linux client: mount the node: mount /dev/nd0 /mnt
  9. Enjoy.

License

jnbds is GPL software.

Links

All versions of jnbds can be downloaded from the jnbds Project site, under the "[View ALL Project Files]" link.

The NBD Project (http://sourceforge.net/projects/nbd) has more information about the NBD protocol, plus Unix client software and Unix/Windows server software.

News

2007.08.17

It's been almost two years, but I'm back. The project was never dead, but months would go by without me touching it. Now, however, I'm putting in a little bit of time each day and updating the code. Specifically, I'm trying to adhere to the end-to-end principle and work on improving the quality of the code for one specific use case, rather than just writing lots of "cool code".

The "specific use case" right now is getting jnbds to work with Amazon's Simple Storage Service. S3 acts like a big Map that you can access using HTTP GETs and PUTs. And it is pay-as-you-go. So you could use it as a 64 MB filesystem to store all your writing, or a 64 GB filesystem to store all your media. And if your filesystem supports online reduction/growth then you can change the size whenever you want.

Of course, while S3 support is handled by S3Backend, there is lots of other code too. ZipBackend for transparent compression of blocks. MigrationBackend for transparent local caching of blocks. MemoryBackend for storing blocks in memory. FileBackend for storing blocks on the local disk. RangeHandler for adapting the Backends to the NBD's flat address space. Server for implementing the NBD wire protocol. JUnit tests to make sure all that code works correctly.

The code I mentioned above is gaining features and being simplified at the same time. Springframework support has been added, multi-threading is made easier by using the Concurrent Package, the State pattern is used to manage the states of the Backends, the Composite pattern is used to turn the Backends into a "tree", the JUnit tests keep uncovering bugs and poor design decisions, etc.

I have lots of things I'd like to do, but I'm really trying to get the code working for the S3 case. There's a lot of people using S3 and I think this project could help make S3 more useful by making it even more transparent. I'll know I'm done when a person's Linux smartphone is able to access a 1 TB filesystem and everyone is like, "Wow..."

2005.10.02

Another two months have passed. I've made a few improvements, like better threading and the ability to read/write the hsql db to the Backend. I've also transcribed my To Do list to the bottom of the page. I have some interesting things in store for jnbds, if I can find the time to work on it after my full-time job. The very first thing done will be a retrofit to support Inversion of Control (probably via Spring). Once that's done jnbds is halfway to everywhere.

2005.07.30

It's been nearly two months without an update, but I'm still plugging away at jnbds. As usual, I took a month off and let some ideas stew in the back of my mind while I worked on a different project. But now I am back working on jnbds and hope to get the next version out soon.

That "other project", by the way, was my photography website (which is linked off of my resume). I spent quite some time thinking about how to rewrite it using one of the available modern frameworks -- eventually I settled on the Spring Framework. I must say, the dependency injection features of Spring were an eye-opener: I realize now that the structure.xml file I created for jnbds is actually a crude form of dependency injection. The next/next version of jnbds will hopefully be rewritten to use Spring (or perhaps HiveMind) to manage the assembly of the server and its Backends.

However, the version of jnbds I'm currently working on is not so glamourous: making code thread-safe is difficult work and no one notices when you've done a good job but everyone notices when you've done a bad job. Anyway, I've been using the excellent backport of the java.util.concurrent package I mentioned in my last post and it's worked superbly. I've made the FTPBackend thread-safe and added a "keepalive" thread, and just yesterday made the CachingBackend thread-safe and added a "writeback" thread. Other minor Backends have also been made thread-safe.

Lest I sound like I know what I'm doing, I've realized that the locking I've implemented in the Backends has been too simplistic, and therefor, inefficient. For example, the FTPBackend is basically right (I think), although the keepalive thread is inefficient, but the JBODBackend denies read/write access to its Backends whenever one thread is writing to one of its Backend. Obviously that's not going to help scalability. ;) The solution, I believe, is to go read the java.util.concurrent JavaDoc again and pay attention to the parts on Conditions.

I still have no idea how to test thread-safe Backends. Perhaps I can run multiple stress tests simultaneously...

One last thing: I use the State Pattern in the Backends to handle the Closed, Readable, and ReadWritable states the Backends can be in. The problem I have is that the transition to a new state is handled by the current state which means that if the new state, Readable for example, isn't initialized properly then the whole Backend is in an undefined state. The problem is made worse if there are other threads waiting to change the state too. Hmm... I realize I haven't given a very good explanation, but that's mostly because I don't fully understand the problem yet.

2005.06.05

I've been working on four different areas of jnbds since my last update.

First, some actual javadoc. There isn't much, but it's a start. Right now I'm documenting the Backends that I finish making thread-safe.

Second, refactoring the Backends to use "static private" Classes to implement the State Pattern, along with abstract Classes to cut down on the copy-and-pasting of code.

Third, a few junit Tests. Aside from the complexities of writing Tests for Interfaces and abstract Classes, writing good Tests is surprisingly hard. I imagine it would be easier if I had a set of Use Cases. Right now the Tests I've written only check the basic behaviors of the Backends. Still, I found and fixed a handful of bugs just based on those limited Tests. Note: I have no clue how I'm going to test the multi-threaded Backends.

Fourth, I've been spending days and days investigating different ways of making the Backends thread-safe. Before I get into the long explanations, I should mention that I've been using the backport of the java.util.concurrent library that made it way into Java 5. In short, it's brilliant. If you're trying to write multi-threaded code with only the "synchronize", "wait", and "notify" functionality of the pre-Java 5 release, stop right now and look at the util.concurrent library. The "locks" sub-package alone is worth your attention.

Now that I've made my sales-pitch, I can talk about the experiments I tried. First was the attempt to make the Backends implement the Callable Interface -- it's like Runnable except that it returns an Object. Second was the use of queues between Backends so that everything was passed as Command objects (ie, message passing). Unfortunately, both those experiments failed because they required too much copy-and-pasting of code. What I've finally settled on is to just use the Classes from the util.concurrent.locks package. They're easy to use and understand (well, mostly easy to understand); no need to synchronize on objects and notify() and wait() and crap. I'm pleased.

Of course, actual thread-safe code is hard to write, so I'm taking my time as I convert each Backend. As I mentioned a few paragraphs ago, I have no clue how to test the Backends to make sure they're actually thread-safe. Brute force testing via loops is certainly one option.

2005.05.16

It's been almost two months since my last update -- much has been accomplished.

Most of those months was spent thinking about fundamental issues: how do I design a system which can support advanced capabilities like versioning and caching and yet still be able to use Backends which support very different semantics, like File, FTP, or Email? After much agonizing/thought, I settled on using an embedded DB in the RangeHandler layer: the DB tracks how blocks are assigned to keys, how blocks are aggregated together into blocks of blocks (BoBs, which can be any size, like 64 KB or 4 MB), and in future versions will track versions of blocks and even optimize access to blocks (LRU). Seems like an easy decision to make (to put the intelligence into the RangeHandler), but the problem is that jnbds is a stack of code, and therefor there isn't really a perfect layer in which to put a piece of code.

The rest of my time was spent thinking about the design of jnbds. Beyond simply breaking jnbds into three layers (Server, RangeHandler, and Backend), I wondered what Patterns would be applicable. So I spent quite a bit of time looking through my Design Patterns book. There're many interesting Patterns in that book, and no doubt I could use many of them, but right now I can only positively say the following:

Server
The ReadJNBDS and ReadWriteJNBDS Classes now have their own main() methods. This is to make it explicit that the server can be in two separate modes, and moreover, gets rid of a switch (for turning on or off readwrite mode). I'm not sure if this is a Design Pattern -- it's probably more of a Refactoring.

RangeHandler
The intelligence of jnbds is now centered in the RangeHandler: it uses a small embedded DB to track how blocks are being mapped to keys, and in the future will track access patterns. The Strategy Pattern is appropriate, I think.

Backend
Once I realized how useful it would be to chain Backends together the Patterns fell into place: Pipe and Filter, Composite, and perhaps Decorator. Also extremely helpful is the State pattern, which is used internally to handle Closed, Readable, and ReadWritable states.

While the Design Patterns book is excellent, it doesn't cover server design issues: threading and (a)synchronous modes (although there are the Command and Chain of Command Patterns). Also, its viewpoint is about as far away from the code as one can get; happily, my Refactoring book is the opposite and I hope it will help me clean up the code, especially in regards to its error handling abilities.

An irony with my design of jnbds is that there isn't a good reason to code any of the "exotic" Backends (like Email) because the Backends are (supposed to be) indistinguishable from each other. (In other words, the FileBackend works perfectly, so it can be used for all testing. Once jnbds is "perfect", the GmailBackend can be written.) So, the Software Engineer in me keeps nagging about proper design and error handling, whereas the Hacker in me wants to set up a stripe of Gmail accounts. It's annoying. :)

About the contents of this release: perhaps the biggest improvement is the use of an xml file to specify the structure of jnbds. In it goes the config data for the different layers and how the Backends are chained together. Thanks to the magic of Java's ClassLoader, I've been able to create a tree of Backends like this, for example: CachingBackend( ZipBackend( RAID0Backend( FTPBackend, FTPBackend ) ), FileBackend, FileBackend ).

Unfortunately, given that this software is so immature there are still many things that need to be worked on:

It feels like I've rewritten jnbds every two weeks since I started the project. Now, however, I think I've settled on a design that will work well for my limited use cases. A stable design and more software engineering will allow me to produce more robust and usable versions of jnbds.

2005.03.22

In the last four days I've been hacking up a storm and have ended up with... less functional software. However, my understanding of the semantics of what the software is supposed to do has improved ten-fold. In other words, I ripped the software to pieces and am redesigning it from the ground up. It currently supports the 'file' Backend, but if my new design is correct, 'ftp' and 'sql' should be easy to implement (naively).

As I said, my understanding has improved a great deal. For example:

2005.03.18

The 2005.03.18 release features both Read and ReadWrite FTP Backends. Multiple "Sequence of Ranges" files are stored on the ftp server and hold data. I've successfully been able to format an 8 MiB reiserfs, manage files on it, and do a check.

Notes:

Thoughts on the code:

The absolute biggest design challenge I face stems from the "atomic" write code I want to introduce. Once write requests are serialized to the disk and then fulfilled later you are capable of performing "reordering". Read requests can also be reordered. As long as the requests don't overlap you can do them in any order you want (subject to the guarantee that the writes will succeed). The problem is simply to figure out where this reordering code lives. In the jnbds server? The Backend? In between, as a transactional queue? Should the write requests be serialized to the local disk, or to the Backend itself? Write performance will probably be a dog if the Backend is used (ftp has something like a 1 second latency), unless some clever reordering is done. Is there a way to minimize code duplication and maximize performance?

I am beginning to believe that a "ranges-to-files-mapper" Class should be created that will manage the storage of ranges of data -- the Backends will be simple readers and writers of files. Yes...

2005.03.04

The 2005.03.04 release features both Read and ReadWrite DB Backends. Data is stored in small BLOBs in the database and the Server uses some simple SQL to retrieve and update the BLOBs. In my limited tests I have formatted a small 10 MiB ext2 filesystem (using Apache Derby as the DB), copied data to it, and successfully done an e2fsck. Currently this is mostly just a proof of concept -- with a better design the Server will be faster and support more users. With all the advanced features that DBs possess it's possible that a DB backed (block) filesystem could be very powerful.

Right now the DB (via its Schema) only supports a single user, BLOBs of a fixed size, and no transactions or locking. I see the next version supporting:

Lots of questions, not many answers.

2005.02.16

The 2005.02.16 release features Read and ReadWrite servers. (See the 2005.02.11 entry for design notes.) The default behavior is read-only -- to get read-write a -rw has to appended to the command line.

This release also has a bit of javadoc and a few junit tests. More extensive tests will require me to write half of the Client.

2005.02.15

I've been thinking that the next version of jnbds will contain a read-only Server that allows multiple Clients to read simultaneously. There are some obvious difficulties with this:

  1. Multiple reads on the same file means some form of locking so that the "file pointer" doesn't encounter a race condition. But you've now got yourself a disk scheduler! So, perhaps there are ways of handing the requests off to the OS.
  2. Denial of Service can be achieved by Clients demanding blocks 0..N. All sorts of security procedures can be implemented to handle this scenario.
  3. The Server uses long-lived tcp/ip connections which might not be appropriate in this case. A connection pool could be created and Clients would line up to get one -- if a Client wanted to connect, but the pool was empty, the Server could drop an idle Client. Hmmm... isn't that just a web server?

2005.02.15

Junit is interesting because it forces you to think hard about how to effectively test your code.

To properly test jnbds I think I will have to implement half of the Client: the "top" half which implements the protocol, not the "bottom" half which interfaces with the kernel. Some corner cases will be tested, and probably lots of random reads, but what worries me is that I'm not entirely sure how the protocol works. Therefor, writing a Server and Client that work together doesn't tell me much -- only tests with 3rd party Servers and Clients would be truely helpful.

2005.02.13

I've been bothered by the Least Recently Used data-structure used for caching for a long time. The idea that access patterns can be efficiently represented by a one dimensional data-structure is laughable. In my mind the relationship between blocks is fractal: there isn't two, three, or even ten variables that can represent the relationships between blocks. Instead, there is a "fractional" number of variables.

Here's how I imagine the algorithm:

  1. Record the sequence of block accesses to a file. So, 10, 3, 900, 600, etc. This would be in a thread, I suppose.
  2. Create a graph data-structure which contains "block" nodes and where each node contains edges to the blocks that were accessed after. Multiple edges are necessary, I think, to capture frequent accesses. Note that this graph only captures "pairs" of accesses; ie, (10, 3) and (3, 900) -- there's no info about (10, 3, 900).
  3. Traverse the graph and look for clusters of block accesses (easier said than done). These clusters of blocks should be considered as one "unit" when making decisions about caching, storage on Gmail, etc. In other words, the blocks' physical layout will no longer match their logical layout.

Would the above algorithm work better than an LRU? It captures how blocks relate to each other, in that if block 10 was accessed, then blocks 3, 900, and 600 will probably be accessed too, regardless of how long ago those blocks were accessed. Perhaps layering an LRU on top would improve performance further, or there might be a way of encoding time information into the graph.

2005.02.12

Next up is documentation and junit. Actually, creating good junit tests is hard. But I really want to do it to see how it improves the code.

After that will be more backends. It's going to be quite the challenge because some backends, like Gmail, won't be able to support 102,400 4 KB messages. Instead, they will have to use Blocks of Blocks (BoBs). Synchronous writes could be very slow if the BoBs are too large, since the BoBs will have to be written to the backend synchronously.

2005.02.11

The next release is going to feature Read and ReadWrite servers. I thought long and hard about it and decided that a ReadWrite server Is Not A Read server. This means you can't use a ReadWrite server in place of a Read server. The reason is because of security: no matter how many layers of clever security you have, there's always the chance that someone will crack them and turn your "read-only" ReadWrite server into a "read/write" server. But that's impossible with the Read server because it has no ability to write. In other words, the Read server doesn't understand the "write" portions of the NBD protocol.

On the other hand, the ReadWriteBackend Is A ReadBackend. This makes it easier to use from an OO standpoint. The security is handled by the server.

But, what do I know? Perhaps I have it backwards.

2005.02.09

First public release. I've been able to mount and format the block device as an ext2 filesystem. However, to shutdown the server one still has to control-c it, which doesn't seem too elegant.

To Do

Spring

Use the core of Spring to turn jnbds into a set of loosely-coupled POJOs. It will allow me to get rid of the hacked structure.xml file I created and use Interfaces even more.

Backend Interfaces

How easy is it to make the Backends usable within the regular java.io package? It would require some kind of Facade, after which the tree of Backends would handle the reading/writing of (arbitrary sized) files.

Server

RandomAccessWrite Interface

Not all Backends can support random writes within files. The Backend interface should be made into a hierarchy.

Dirty Key Tracking

Dirty keys in the Caching2Backend are indicated by a "dirty_key" file existing alongside each key which is dirty. This is not a good design for some Backends.

Steganographic Backend

Records bits into the last x bits of each channel in an image. Probably the image has to be uncompressed or else only 1 bit can be saved per channel without the viewer noticing.

ExternalProcessBackend

A Backend which is designed to work with an external process which implements the Backend interface. For example, the ExternalProcessBackend would accept a FileBackend and write additional information in the form of "commands" to the FileBackend. The external process would then pick-up those "commands", do the right thing, and write its own "commands" back. So it's like synchronous message passing, using various "channels" and "commands". The benefit is that not all processing takes place in the same JVM; other computers could be monitoring the files and acting on the "commands" which appear there.

Is the only command needed "pageFault"?

Abstract Backend

It should handle locking of close(), openforread(), openforreadwrite() so that no other threads are in the backend trying to read, write, or change the state. Once the state is changed then the next blocked thread can change the state again. Any internal threads will need to end gracefully.

BoBBackend (B3)

The RangeHandler is currently doing two things: 1) accepting reads/writes by absolute byte positions. 2) creating, tracking, optimizing BoBs.

The B3 should handle #2. That will let it be inserted into the Backend Tree and handle FileBackends, FTPBackends, and EmailBackends, which like 4KB, 128KB, and 4MB.

The tracking and optimizing may not have to be done for each B3, because if the first B3 works properly then the files should be distributed properly. Right?!

PackerBackend

It understands how blocks have been duplicated and spread among BoBs. It would have to run when space is getting low because blocks are being duplicated and the FS doesn't know that. What algorithm should be used to pack the blocks? (Read and write back according to access patterns. Once an old block contains no unique information, delete it. This could be tricky to do without running out of space.) Wouldn't this have to run within a Backend which creates BoBs (Caching2Backend, JBODBackend?)? Two possibilities: 1) Allow it to understand the contents of the BoBs by looking in the B3 file. 2) Make it a Strategy object which can be plugged into a Backend.

Internal Backend Threads

They should check for a boolean called "end" which is set when another thread wants them to stop. This means that there won't be any more interupting threads.

Writeback Thread

The feedback isn't proper in a lot of cases.

Capacity

In most cases capacity doesn't need to be enforced because the Client won't write more than the Capacity. However, there might be a few cases when a Backend is doing its own thing and need to keep track of how much has been written. Caching2Backend?

Read/ReadModify/ReadWrite

Add a new mode to the Backend: "Read/Modify". This would allow the Backend to perform writes, but not the Client. This would allow the Backend to do compression, packing, worm, etc. "Read" would mean no changes at all.


SourceForge.net Logo Copyright 2005 Greg Gabelmann