You appear to be using an outdated browser. For the best experience on this site, please install Chrome Frame, update to Internet Explorer 8 or later, or use another browser like Chrome or Firefox.

XNAT, Heartbleed, and you

The impact of the Heartbleed vulnerability in OpenSSL has been much discussed. How does it affect XNAT users and administrators? Fortunately, our risk exposure is extremely narrow. Lead XNAT developer Rick Herrick wrote a quick note on the Water Cooler discussion page of our developer wiki.

The Heartbleed exploit does not affect XNAT directly! XNAT does not use OpenSSL internally or in the application at all.

This is not to say that it may not affect your XNAT installation. You are at some risk if you use one of the exploitable versions of OpenSSL to provide an HTTPS connection at your Tomcat or HTTP proxy (e.g. Apache HTTPD, nginx). It would be possible for an intruder to open the encrypted connection between a user’s browser and the server. Once that is available, the intruder could set the user’s login credentials in the HTTP transaction as the user logs in. They would also be able to monitor the contents of that traffic, potentially exposing PHI or other identifying information from the XNAT installation.

Read the full post here: 
https://wiki.xnat.org/display/WaterCooler/2014/04/11/XNAT%2C+Heartbleed%2C+and+you

Posted in General Announcements | Leave a comment

XNAT 1.6.3 Release Details

XNAT 1.6.3 has been released! This is our most heavily-tested release to date, and updates have been made to series importing, the prearchive, administrative tasks and other features.

A complete list of XNAT 1.6.3 updates can be found here: https://wiki.xnat.org/display/XNAT16/XNAT+1.6.3+Release+Notes

You can download XNAT 1.6.3 here: http://xnat.org/download-xnat.html

Posted in XNAT Releases | Leave a comment

Job Posting: XNAT Administrator at University College of London

The Centre for Medical Image Computing, part of UCL Department of Medical Physics and Bioengineering, is looking for a Imaging Data Manager to coordinate the deployment of XNAT applications across the various clinical collaborators at University College Hospital (UCH).

For those interested in applying, full details can be found at this link:
http://www.jobs.ac.uk/job/AHX581/research-associate-senior-research-associate-neuro-imaging-data-manager

Posted in General Announcements | Tagged | Leave a comment

RSF-1 High Availablity SSD pool for VM storage and Build space

SSD storage is all the rage in big data today.   It is solving a lot of high IOPs problems at the same time introducing new challenges.  Its price point is in range of replacing 15K SAS pools when large numbers of IOPs are needed.  The dollar cost per IOP has never been better.

XNAT pipelines and VMware datastores continue to push our need for IOPS.   So I have been tasked with solving the IO problems with an SSD based ZFS server build.

With the addition of SLC SSDs for ZIL and MLC SSDs for L2ARC on our existing ZFS server it has done very well with only 40 NL-SAS spindles handling 160+ VMs, but when memory pressure on the VM cluster starts pushing things to swap, it can no longer keep up and complaints about performance start rolling in.    The BlueArc and ZFS server have not been keeping  up with VM storage and build storage for the large processing jobs for the Humanconnectome Project  (HCP) so more IOPs are on order and SSD is our solution.

Besides IOPs we wanted to also increase our availability of our ZFS pools and be able to maintain our servers without downtime of the pools.

I had first planed to use existing heartbeat tools and write my own failover scripts.   This got ruled out because of the risk that unknown bugs in the scripts could potentially cause unplanned outages instead of increasing availability.   We decided it would be better to license a well tested and trusted solution instead of risk unplanned outages.  Enter RSF-1 for ZFS from High-Availablity.com.

The SSD Server

It’s almost a drum beat on the ZFS related mailing lists, that when building a pool to only use SAS drives if you want reliability to never use consumer SATA.   That is a real problem when trying to purchase multiple terabytes of SSD on a tight budget.   When consumer MLC (not TLC) cost about $1 per GB and SAS SSD cost $4+ per GB it becomes hard to justify.    There are lower cost examples of each from brand X, but every thing less seems to come with a bad reputation of failure and wasn’t even a consideration.

One of the known problems with consumer SATA is long to extremely long times to return an error.   The other problem discussed on mailing lists are reset storms when connected to a SAS expander.   The long time to return an error is related to many retries to read corrupted data on a disk platter.   Enterprise drives, simple return an error and let the redundant file system find the data elsewhere.   Consumer drives, since there is rarely redundancy, will try their mightiest to get the data and will take extremely long times to return an error from a read operation, stalling the entire file system.  Obviously not a problem on SSD, no platter to keep trying to read, they will return the error immediately if they have a read problem.   As to the reset storm problem, I have not personally witnessed one, so this risk may still be out there with SATA SSDs.

The choice of SSDs is a bit complicated.   We currently have three defined purposes for this server.  Host our production VMs, our development VMs and provide build space for the HCP pipelines.

For the production VMs data loss, complete failure or downtime must be avoided, so the 800GB Intel DC S3700 SSD were chosen to build a small pool for them.  We ordered 7 of them which are back-ordered until sometime in September.  Unfortunately these “enterprise” SSDs are still SATA.   They were chosen because of consistent performance, life expectancy and price per GB, in spite of the SATA interface.

For development VM storage and build space the 512GB Samsung 840 Pro SSDs were chosen.  We purchased 63 of them.   Consumer SSDs come with some very serious thorns that will bite you when using them on a production ZFS server.

  1. No super-capacitor.  All modern SSDs use a write cache to help with speed, wear leveling and garbage collection.   The problem with consumer SSDs is if they lose power in the middle of a write, even a ‘sync’ write, they will lose data and possibly the entire contents.   Enterprise SSD protect this ‘written’ data that is in memory with a capacitor that reserves enough power to flush the writes to flash.
  2. Small to no over provisioning.   ZFS will used the entire SSD as presented to the system and does not support trim on Solaris based distributions.    This causes a big write penalty once the entire SSD has been written to.  The only remedy at this point is running a secure erase to re-zero all sectors.

The lack of over-provisioning is fairly easily overcome, by artificially over-provisioning by slicing (partitioning) the drive to only use 70-90% of the available storage.   The 840 Pro has a good garbage collection routine and 80% provisioning works very well to maintain write performance.

The lack of super-capacitor can only be dealt with by aggressive backup policies, a UPS and luck.   So no production data will reside on this pool.

Building for High Availability

Attempt 1: Connect all the SATA SSDs to the SAS expander/backplane and use a SAS switch to hand over the pool for a failover.   This plan failed rather quickly when I realized that more often than not, when a SAS expander is disconnected hot from an Illumos based system it will panic.   The cut over time was in the order 30 seconds+ for the SSDs as the receiving system scanned all the newly attached drives.   So this solution was not going to work for high-availability.

Attempt 2: Order some interposers and determine if they play well with the SSDs.   Interposers are talked about as a cheap hack on most the mailing lists I follow, so this approach seemed a bit risky, but I knew LSI had been working on them recently so considered it a worthy trial.    So far this gamble seems to be paying off.  With interposers installed on 15 SSDs, I have two servers that can talk to them at will without a single blip.  I’ve been through many performance tests and fail over tests without issue.

RSF-1 for ZFS

I’ve know about RSF-1 ever since I first experimented with Nexenta and knew they sold solutions for many types of high-availability solutions besides ZFS.    So I contacted them through their website and requested a trial and pricing to make sure their offering wasn’t outside our budget.   They were right on target.    Out of respect, since they don’t publish their pricing I will not either.

For testing their software I setup OmniOS on our SSD server and a VM with an HBA via hardware passthrough.  We arranged a time for them to connect to our servers and do the initial install.   They have prebuilt Solaris packages so the install was rather painless.   They provided a nearly complete config.  All I needed to workout was the network configuration I was to use.

There were a couple hiccups, because this was a pure SSD system and I configured the pool in a non-standard way.

RSF-1 uses several strategies to determine when to failover and to safely fail over a ZFS pool.   It can use any combination of network, serial and disk based heartbeats to determine if node members are alive.   It also uses SCSI reservations on the disks to prevent dual-headed ZFS pools.

The first hiccup was the reservations set in the config provided.   They were not aware that one of my servers had two HBAs and setup multipath to the SSDs.  This caused system panics with the initial version they installed.   When I alerted them to this issue they quickly got me an updated version that fixed the problem.

The second hiccup was caused by the disk heartbeats that were setup.  The pool was configured with an ashift of 12 for the SSDs to reduce the read/modify/writes that would be happening with 512b sectors.   When I told them about the problem, they said I would be better off installing a couple spinning disks for heartbeats or use a serial cable for an out of band heartbeat.    The reason for this is the low level writes could end up bypassing the wear leveling on the SSD and cause premature failure.

With those issues behind me, I’m on to testing.   I have configured a zpool and ZFS folder and NFS mounted it to our vSphere cluster.   I performed failovers while doing each of the following:

  1. Deploying a VM with puppet.
  2. Storage vMotion a VM to and from the pool.
  3. Suspending a running VM
  4. Powering on a suspended VM
  5. Accessing the web interface of the VM.

In most cases the failover could not even be noticed.  If were noticeable it was only about a three second delay in response.

For some of the tests I had the pool on the virtual OmniOS system and powered off the VM.  Again RSF-1 was quick to respond and the failover was not even noticeable.

The configuration is straight forward, once I learned the details I’m actually surprised that High-Availability insists on doing the install.   A little more documentation on their part and it could be a no brainer to install on your own.   However, I don’t know with specialized software like RSF-1 if that helps or hurts sales.

At this point I’m requesting an official quote so we can order the software.  I will follow up with details of installing OmniOS on our existing ZFS server and marrying it to our new SSD server with RSF-1.

Breaking the law!:
Fast, Cheap, Reliable: Pick two.

With 70 SSDs this system is unbelievably fast. It is relatively cheap for approximately 20TB of useable SSD storage. Okay lets keep fingers crossed on reliable.  Several corners of best practices were broken for reliable enterprise ZFS storage, but considering the majority of this server’s use is high speed scratch pool we should okay.  We will add a second SAS switch and eliminate the all the hardware single point of failure soon.  The only foreseeable gotcha I may still have out there is a reset storm on a SAS expander/backplane that takes down a pool.

RSF-1 will play critical role in our ability to update hardware and software, even change ZFS operating systems if we deem it necessary without downtime of our ZFS pools.   I don’t expect it to be the magic bullet to make consumer SSDs as reliable as enterprise SAS SSDs.    I have had to live not doing updates on our ZFS box for many months at a time to find a long enough window to service it.   That will be a thing of the past once RSF-1 is implemented.   I will simple move the pool to one server and perform the maintenance at any time.   One less reason to come into work on a Saturday or Sunday!

The Nitty-Gritty Hardware Details

For those of you looking for the parts I used in this build, here’s the raw parts list.

Qty Item
1 Supermicro X9DRD-7LN4F-JBOD
1 Supermicro 417E26-R1400LPB
2 Supermicro 2U heat sink
2 Supermicro system drive mount
1 Xeon E5-2643 (4C 3.3 Ghz)
8 16GB DDR3
2 LSI  9207-8e
1 Intel dual 10gBe NIC
3 2-port External to Internal iPass
2 10 GBE SFP+ twinax cables
2 Intel 320 Series SSD – 80 GB
7 SSDs Intel S3700 (800GB)
63 SSDs  Samsung 840 Pro (512GB)
72 LSI Interposer Card
1 LSI 6160 SAS Switch
1 Spare power supply
1 SAS Switch Shelf
10 2.2ft External SAS CBL-0166L
1 SANtools license

Update Feb 2014:  Pick two holds true

Looks like the law has caught up to us.   We got two out of three, fast and cheap.   The interposers are proving to be a breaking point.   Out of over 80 interposers used, there have been two failures.   Each time the interposer failure has caused the entire pool to become frozen.    Both of the failed interposers were used in the L2ARC on other systems.  It appears that the mpt_sas driver tries repeatedly to reset the interposer and eventual resets the entire SAS path including the driver.   This leads the pool becoming inaccessible.   Rebooting the server caused the failed interposer and SATA SSD to be kicked offline.   If the device would properly be kicked from the pool in Illumos the reliability may still be there.

To date the Samsung SSDs are the only SSD to work for me behind interposers.  Intel and Micron both have failed to initialize and talk to the system.    They are also proving to be reliable and fast.   Granted their usages is still less than a year, out of over 80 of them, I haven’t seen a single error and their speed has not degraded.

The SAS switch also is not getting along with the Supermicro JBODs.    Making changes on one system often causes a cascading problem where devices in the JBOD go offline and don’t return until a power cycle.   I suspect this is a problem in the Supermicro JBOD as I’ve seen lots of odd problems like this with their JBODs.   I’ve come to the conclusion that for highly available production system Supermicro JBODs should be avoided.    My current best choice is DataON.   They do cost significantly more, but in the total cost picture they are not a bad investment.

 

Posted in XNAT Hardware and IT, ZFS Storage | Tagged , , , , , , , | 3 Comments

Clojure and XNAT: a REPL inside

When I’m adding a new feature to XNAT, there’s often existing code that does much of what I need — if I can find it, and if I can figure out exactly what it does. In the absence of detailed internal API documentation, and in the presence of deeply nested inheritance trees with multiple layers of automatically generated classes, it’s tremendously useful to be able to run experiments — to create objects and call methods interactively in the running server. Java just isn’t well suited to this sort of experimentation, especially in a system like XNAT where even the smallest code change forces a rebuild, restart (Tomcat), and refill (coffee).

I do a lot of exploring, experimenting, and sometimes prototyping in Clojure, in a REPL embedded inside a running XNAT. Anything I can do in Java — creating new objects, calling member functions or static methods, even defining new classes — I can do in the Clojure REPL without rebuilding or restarting the webapp, and with no discernable loss of performance. I use the embedded REPL mostly for development, but it’s easy to imagine using it for operations tasks, even on a production server.

The critical component for running Clojure inside of XNAT is liverepl, which uses the Java Attach API to inject an interactive Clojure session into a running JVM. I run a slightly customized liverepl inside Emacs, using a wrapper I wrote for the standard inferior-lisp mode. Getting liverepl+Emacs set up is a little fiddly, but I’ll gloss over the details for now so I can move on to the cool stuff. I’m writing these instructions on Mac OS X, but I’ve done all this in Linux (Gentoo) and everything is essentially identical. Windows might be more complicated, but I really haven’t tried; if you get this working with Windows, please let me know.

Running a REPL inside XNAT

First, we need an XNAT to experiment on. I’ll assume that you’ve set up, built, and deployed your XNAT and started Tomcat. You need to run liverepl as the same user that is running Tomcat. You’ll probably also want your favorite Java IDE (Eclipse, IntelliJ, or whatever) set up to view the XNAT source code.

Now, let’s fire up our REPL. I’ll start emacs and do M-x run-liverepl (remember the fiddly bits I’m glossing over? Here’s where you need to have them all in place). Down in the minibuffer, we get prompted Target JVM pid: . Here I enter the Tomcat process ID; if I don’t know it (I usually don’t), I just press return and get shown a listing of all JVMs and their PIDs. The liverepl script shows us two JVMs: the liverepl process itself (38044) and Tomcat (37951).

Now we run M-x run-liverepl again, and for Target JVM pid: enter 37951. We get another prompt, Classloader index: ; this is something else I don’t know, but again we can hit return and liverepl lists the options.

Tomcat has a flock of classloaders, but it’s easy to identify the right one (#5, /xnat). So now:

M-x run-liverepl
Target JVM pid: 37951
Classloader index: 5

and we’re in — we have a Clojure REPL inside a running XNAT.

Let’s start with a simple operations task: figuring out what users are currently logged in. We can do this by looking at the Spring session registry:

;; get the Spring context service object
user=> (def context-service (org.nrg.xdat.XDAT/getContextService))
#'user/context-service

;; get the session registry from the Spring context service
user=> (def session-registry
(.getBean context-service "sessionRegistry"
org.springframework.security.core.session.SessionRegistryImpl))

#'user/session-registry

;; for each value p returned from sessionRegistry.getAllPrincipals(),
;; call sessionRegistry.getAllSessions(p, false), and collect the
;; results into a sequence named sessions.
user=> (def sessions (mapcat #(.getAllSessions session-registry % false)
(.getAllPrincipals session-registry)))

#'user/sessions

Now we can ask questions about the active sessions. For example, how many users are logged on?

user=> (count sessions)
0

Well, that’s unsatisfying. Let’s use a web browser to log into XNAT, and then try again:

user=> (def sessions (mapcat #(.getAllSessions session-registry % false)
(.getAllPrincipals session-registry)))

#'user/sessions
user=>
(count sessions)
1

What can we learn about that active session?

user=> (map #(.getLogin (.getPrincipal %)) sessions)
("karchie")
user=>
(map #(.getSessionId %) sessions)
("83DC26C104501A4D7022FB3B08476E47")

So just by typing a few lines into the REPL, we can get usernames and JSESSIONID values for all active sessions. These few lines were a little obscure, though: I’d hate to have to remember or reconstruct how to do this every time.

Instead of defining complicated statements in the REPL, I usually write some functions in a .clj source code file, then load the file into my REPL. Now I can open the file in Emacs (C-x C-f), then have Emacs send the file contents to the Clojure REPL (C-c C-l), which compiles them into Java bytecode. Any time I want to change the code, I can edit and save the file and do C-c C-l to reload and recompile, without rebuilding the XNAT webapp or restarting Tomcat.

I’ve collected Clojure code for the in-XNAT REPL in a library named cljinxnat, containing several Clojure source files organized roughly by topic (security, prearc, search). I expect this library to grow as I work on other XNAT features. Each function in the library includes documentation:

user=> (find-doc "sessions")
-------------------------
xnat.security/get-sessions
([])
Gets the Spring SessionInformation object for each active session.
These are a little opaque but contain both the jsessionid and the
XDATUser.
nil

I haven’t really provided enough detail to get a user who’s new to either Clojure or XNAT up and running — this is the sales pitch, not the user manual. If I’ve piqued your interest, send me mail and I’ll be happy to provide detailed instructions and advice as needed.

In a future post, I’ll go into more detail about using liverepl to make sense of XNAT internals, focusing on stored searches.

Posted in Clojure and XNAT | Tagged | Leave a comment