Monday 5 November 2012

Progress on Mirage Data Migration

The Disk Space Issue

The main issue that is preventing us from rolling out Mirage for the majority of CMM's instruments is disc space.  Basically, if we did a "full" roll-out, we would fill up the available disc space in a small number of months, and then we would have to start deleting older files to make space That would mean that users would need to manage their own long-term file storage, and we would essentially be back where we started.

Then there is the new 3View system that is capable of generating 10Gb of data in a single night, or 20 Gb after rendering the images as tif files.

Managing Lots Of Data

We are addressing the issue of "too much data, not enough disc space" in two ways:
  • For data that needs to be kept online, we are currently implementing a scheme where individual data files are "migrated" to secondary online locations within UQ.  As far as users are concerned, the data files will be accessed as before, except that access will be a just little bit slower.
  • For data that no longer needs to be online, we will be implementing an archiving scheme in which snapshots of entire experiments complete with all relevant metadata are saved to offline storage.

Progress So Far

I am currently developing the code for the datafile migration subsystem for Mirage.  The basic file migration code is working in the Mirage test system, and the code that will decide what files to migrate and when to migrate them is in progress.  The initial migration system will take into account the size of the individual files, their file types, and when they were created and last accessed.  Later on, I intend to allow users to indicate the relative importance of files, datasets and experiments to influence the migration decision making.

(For MyTardis folks, the Mirage migration code is actually a MyTardis "app".  To use it, you will need to set up one or more secondary "destinations", which can simply be private WebDAV servers on some other machine with lots of disk space.  Look in my MyTardis repo on GitHub for the code.)

The other aspect that needs to be sorted out is actual disk space provisioning.  I have negotiated some space on the UQ HPC cluster for interim storage, but "the real thing" will be implemented on the QERN system that QCIF is currently developing.  We are currently "on the list" for transition onto QERN.

Tuesday 23 October 2012

Mirage is in production

Mirage has been in limited production on the following CMM's instruments for a few months:
  • The JEOL Neoscope in the Hawken Lab
  • The JEOL 6610 in the Hawken Lab
  • The JEOL 6440 in the Hawken Lab
  • The JEOL 7100 in the AIBN Lab
If you are CMM user of one of these instruments, this means that you have a new way to access and organize the data from your sessions on these instruments.

How to access your data

Simply do the following:
  1. Open a web browser and visit "http://mirage.cmm.uq.edu.au"
  2. Click the "Login" button or link.
  3. Enter your ACLS account name and password, and click "Login"
  4. Now you can either click the "Data" link to go your data, or click the "Getting Started" link for an overview of how to use Mirage.

How your data got there

Data files written to the "S:" drive on the instrument is "grabbed" and assembled into Datasets based on their file names and when they were written.  If the grabber can work out who the Datasets belong to, they are sent to Mirage automatically.

If the data grabber can't work out who Datasets belong to, they are kept in the instrument's "hold" queue for a couple of weeks, or until someone claims them.  This will typically happen if you forget to login to the instrument using your own ACLS account.  Always remember to log in and log out.

Why is your data missing from Mirage?

  • If your data was saved before the instrument was Mirage enabled, then we don't have it.
  • It might be stuck in the "hold" queue ... because you forgot to log in.
  • It might have been sent to someone else's account ... if you forgot to login and they forgot to log out!
  • If you are in the habit of renaming files on the "S:" drive, you may have confused the grabber.  The data is probably there under the old name. (Wait until you can access the files in Mirage before you start renaming and reorganizing things.)
And if none of that helps, your data should still be accessible in the old way via the "R:" drive on one of the Lab PCs.  (But of course, you only have a week to retrieve it from there.)

Thursday 18 October 2012

About this blog

Hi Folks,

This blog is about the Data Management Systems that we are building and running at the Centre for Microscopy and Microanalysis (CMM) at the University of Queensland (UQ).

CMM is the UQ's main microscopy centre, running a number of Electron Microscopes, X-ray Diffraction instruments and Mass Spectrometers in 5 separate laboratories on the St Lucia campus. There are in the region of 20 major instruments, and we provide a service to over 400 regular clients (UQ staff and students, staff and students of other universities, and commercial clients).  UQ CMM is a foundation node of the Australian Microscopy & Microanalysis Research Facility (AMMRF).

All of CMM's instruments produce results in digital form; 2D and 3D images, and various other kinds of characterization data, often in proprietary formats.   The problems we (the data wranglers) are trying to solve are:
  • Getting the data off the instruments and into a place where the clients can access it.
  • Providing long term storage for the data.
  • Allowing clients to view, organize, search and process their data collections.
  • Allowing clients to share data with other clients, and "publish" it in various ways for other people to fund and use.
  • Do all of the above in a secure and sustainable way.
In this blog, I'm going to talk about where we are, where we are going, and how we implement things.  I'm going to try to cover things in ways that are relevant to both the users of the CMM facilities, and to other people who are building their own data management systems.

Stay tuned ...