Activity :: Just Me

Grid and hydrology

martin | page | 28-Nov-2007 06:00

Comments morning of 28/11/07

Here are comments on V2 in no particular order.

Introduction paragraphs 1 and 4: we talk about "standard escience tools". I am inclined to think that the word "standard" is not helpful here, because generally research councils are not keen on funding things that are standard. On the other hand, hype words such as "novel" may also be counterproductive. I wonder whether "emerging" is a better word.
But the whole sentence in paragraph 4 that uses the word "standard" rather plays down the role of escience to one very specific aspect of the project, whereas I think that we have to make the point that escience is what is going to transform the science. eScience is such a wide-ranging word that it is possible to make such a statement without treading on too many sensibilities. For example, the use of web pages is not escience, but portals are (particularly if they have portlets and stick to standards). Anyways, my point is that this paragraph needs to make two points (if I understand the science argument): first, we are now poised to do some groundbreaking work based on the fact that we believe that there is a way to characterise water runoff; second that to do this properly requires some good escience work.
Last sentence in paragraph 4 of the Introduction: is the primary point that inductive approaches are much less worked out, yet with enough data (observational or model) they could be the most powerful? Can I check, are the terms downwards/upwards correctly applied?
My feeling is that the Introduction needs subheadings because it does too many things. Of course, this is a matter of style and is personal, but my take on this is that to make a proposal readable it needs to have subsections that make one major point only. In our Introduction I might suggest the following subheadings. a) Background in which we point to the problem, and which might end around the end of paragraph 4 or 5; b) a precis of what we are going to achieve (some of what follows), c) what the partners bring to the project that mean that it is timely and that they are the ones to do the work. I think all this stuff is in the Introduction, but it needs ordering as such.
The preceding point actually would resolve two other comments I have about the introduction. a) the link (or perhaps the textual transition) between science and escience is not very clean at the moment; a) nor is the textual transition between what the two partners bring to the project; c) I don't think we really make a clean statement of the problem we are tacking in specifics, and in particular we do need to make clear why this project is a major project and not something that could be easily achieved. With regard to these three points, I think I understand the points, but the text doesn't make the transitions between ideas. I think that effective use of subheadings will fix this.
I didn't really understand the point in the Introduction about the proposed work not being in conflict with the work of Duan and Gupta.
First aim/objective needs expanding I think.
What is needed I think is a link between the grand vision set out in the Introduction and the specific vision of this project. This should be in the aims and objectives section, but I think that the aims and objectives mix together grand vision and specifics. I would have a grand vision statement at the start of the aims and objectives, and a statement that the grand vision will be achieved by a list of specific task-oriented objectives.
WP1 is unbalanced between sections a and b in my view. The point in WP1 is that there are two sources of data, real observations and the predictions of models. I think that the proposal will read better if both are of more-or-less equal length, but the differences in length of both sections is really unbalanced. Observational part is itself in two parts, namely time-series and spatial. I would make this point in the opening paragraph, and slightly trim the descriptions. The summary paragraph of part a has a lot of details about specific issues that rather takes away the main point. Although we have to mention licensing, would it not be better in a footnote?
I would think that WP1 should be the place to describe how simulated data will be generated. Here is the place to say that it will need grid computing to get the required high throughput. I think we have to talk about the simulation codes here, or at least reference them and back-reference to the Introduction.
I am happy to edit b but will want to chat before hand.
I am not sure about the balance in WP2 – which is my bit. It would be useful to chat.
WP3 probably contains the simulation detail I have said is lacking in WP1. Maybe we need to simply create proper hooks/links as we edit the text. WP3 isn't really linked in to the grid part.
WP4 is where we talk about the user interface. Here I would like to flag the point I made about advice to users being created on-the-fly rather than hard-wired into the interface. (see my notes below). The point is that if you hard-wire stuff into interfaces, if the simulation code has new functionality, there will be a lot of knock-on effects. On the other hand, if you can modularise the process such as the way we do it in MaterialsGrid, then it is easy to incorporate new functionality or even use new simulation codes. In short, the approach is that you have XML files that contain all the information that the portal needs to collect, and these files are read by the portal/portlet and turned into requests. The portlet has no code-specific hard wiring, but simply interprets the XML files it reads. XML files are much easier to change that code.

Post from the weekend

A lot of this proposal has a lot of nice links to work we have been doing in Cambridge within all three of our escience activities, eMinerals, NIEeS and MaterialsGrid.

There are some core ideas that link all three efforts, namely

Grid computing can be extremely useful for running ensemble jobs, and generally grid computing increasingly is a route towards availability of computing power.
Data management involves having data within data repositories with access to collaborators in a transparent manner – this is what the SRB gave us, but the key gains of the SRB could be replicated using alternative technologies (our favoured route is via webdav servers).
Data management requires proper metadata capture, and with very rich metadata one can use the metadata as an interface to data.
The key to making grids easy to use is to represent data in XML format. This makes reading data files relatively easy as one gain (we have tools to perform a transformation to XHTML on the fly, with graphs drawn using SVG on demand), but it also makes extraction of information easy. We use this for gathering metadata for output files during their grid runs, for example.

Different projects are bring different tools to the table:

eMinerals has developed

a grid job submission that fully integrates data management and metadata capture (RMCS)
a set of XML-writing libraries for Fortran (FoX)
XML transformation tools (ccViz) and plotting tools (pelote)
metadata tools (RCommands)

NIEeS has developed

a KML version of FoX
a grid infrastructure based on standard middleware tools and other Cambridge tools

MaterialsGrid has developed

a portal/portlet interface to job submission and data management
an service-based infrastructure for setting up and running jobs, undertaking workflows, and managing data
a tool for extracting information from XML files based on the use of XML dictionaries (Golem)
a tool for converting XML to SQL

It can be seen in all of the above that XML is important to our work. Our experience is that XML is actually critical, and I would advocate using it within this proposal. The cost overhead is now not high (eg using our FoX tool for Fortran, and there are decent XML tools for other programming languages), and you lose nothing by using it, but the gains are enormous.

WP1

I would advocate using our xml2sql tool for this work. The tasks required here are

Ensure that the simulation codes are writing an appropriate XML. This is a task we are currently thinking about in general. KML is the XML for Google maps, but it is not adequate on its own because it carries information about representation, not the raw data. Different XML namespaces can easily be mixed within the same XML document. A subset of GML might well be useful.
Write a good database schema, which you need for any xml2sql tool.
On the use of SRB, I like SRB a lot, but it is not the only tool we can use and I would advocate that we do some thinking here. The main issue with data grids is over who controls the data. There is clear risk if access to your data is dependent on another institute, because you have no control over continued access. Most data grid products, SRB included, have a model that presuppose that the data grid will last for a long time. Our idea is to use a set of locally-based webdav servers, with a metadata interface that provides access to all the files, enabling sharing of data but without compromising ownership.

WP2

I am not entirely sure what goes in here, but here are some ideas of things that I think we should do.

We need a grid job submission system that is not tied to any non-standard system. At the present time, the key tool around is Globus. GridSAM is being developed within OMII, and I think we have to mention it, but we have evaluated its current status and found that it is not robust and lacking in documentation. On the other hand, Globus may well be heavyweight and hard to use for new people, but there is a lot of expertise.

Our approach has been to provide tools that interact with one or more Globus servers. It is completely impractical putting Globus on the users' computer for several reasons (eg need for static IP address and name, issues with installation, doesn't work on Windows). Then you have the issue that writing Globus commands is not easy – well, some are of course, but scripting jobs and workflows on a case-by-case basis is not easy to do and even harder to debug. So eMinerals has developed its RMCS system, and both NIEeS and MaterialsGrid are running an independent RMCS instance.

Let me specify what RMCS does. It is based on a server that will submit Globus jobs. At its heart is a perl program called MCS (My_Condor_Submit – so called because we use the Condor-G interface to Globus, and thus we use Condor-like scripts). MCS integrates data management with grid computing in one very specific way. It grabs its data files not from the user's computer but from the data grid, and it writes all output files back into the data grid rather than sending them to the user's computer. The user can then access the files from the data grid. One key advantage of this is that the user ends up with a complete and reasonably-protected archive of all files associated with a job, without having to do anything about it. In short, MCS builds data curation into the job submission process. Moreover, MCS will collect metadata automatically, but extracting metadata from the output XML files. We collect various sort of metadata, including stuff about the job environment (date, machine etc), all metadata the program throws out (eg code version number), all input parameters, and core output values (these are the only bit that the user needs to specify). MCS requires a relatively easy and lightweight script from the user that allows the user to specify information about the location of directories in the data grid, name of executable etc, in a relatively easy format. MCS is the tool that does the job submission and data management. RMCS is the way the user interacts with MCS. RMCS basically consists of a server and a database. The server receives instructions via web services from client tools, it submits the MCS job, and it keeps records of how the job is doing in the database. In practice the client tools interact directly with the database (a bit of side information). We have 2 client tools, one a set of shell commands for tasks such as submit a job, check the job status, and delete a job. The other is a java GUI which basically does the same thing except allowing users to press buttons rather than type in commands.

Now the RMCS system allows any process to send off web services call, so it can be used in conjunction with any portal. This is what MaterialsGrid uses for its portal-based job submission system. Because everything is done using 'standards', it works well.

RMCS can submit jobs to any system that uses standard middleware such as Globus. Thus it works on things like the National Grid Service (worth a mention), and should work on EGEE (but we haven't tried it). We also have it working on the NW-grid. There is a need to have some things installed on the grid resource besides Globus; external resources will need XML, metadata and data grid tools, but these can be installed on a user basis if we can't get them installed system-wide (we have them system-wide on the NGS).

The point of this discourse is to note that I would advocate using this system for the proposed simulation runs. If we do, then our coding effort is raised to a higher level of interfacing with RMCS rather than with the underlying Globus calls. We get a lot "for free" with RMCS, as MaterialsGrid has realised.

I am not clear as to how much workflow is required. There are various routes to this.

If the workflow is fixed and straightforward, it can be exectuted within a shell script. This requires little effort really, and we do it when we submit tasks that involve a mix of simulation and analysis.

In other cases, the workflow may be generated (or at least defined) by the code's XML dictionary, and put together by a tool such as the portal. This is exactly what MaterialsGrid does. But you then have a question of which workflow tool to use? We use Pipeline Pilot because it works really well, much better than BPEL (tools are buggy, and only implement parts of the BPEL standards), but BPEL is free and PP is not.

WP4

The idea of having an interface guide the reader through the issues associated with a program is something we have been working on in Cambridge, and I would be keen to include it here.

The idea is that in a general framework, any hard-wiring against the requirements of a specific code immediately throws away the generality. For specific projects where no-one will have significant changes of mind, this need not be a worry, but in the wider picture, it is nice to have no code-specific hard-wiring within any infrastructure.

Within the MaterialsGrid approach, this is tackled using the Golem tool in combination with an XML task list. This is accessed via a special portlet written for the MaterialsGrid portal.

To get this working, one needs to create a sample output file from the code. The Golem tool can then be used to create the dictionary, in terms of defining for each item its data type, units etc. The human-readable part can be added later (and is worth the effort).

Summary

What I have done above is give my take on the grid stuff. I think that the next stage is to liaise as to how to link this in specifically. I think that the grid roadmap is reasonably clear, just as the science roadmap looks clear to me. What we have to do is ensure that the two match seamlessly, which means (in my view) adapting the grid stuff to the science drivers. I am sure this is not hard.

We will need to make some specific comments from our side, such as the design of the data grid, the design of the portal, and the XML-isation of the simulation codes. The latter is a bit of work only in one sense, namely defining the XML language. The actual mechanics of adapting the codes is now straightforward.

But this is stuff we want to do anyways!

[More]

Comment on "Mac OS X 10.5 (Leopard) first impression"

martin | weblog comment | 16-Nov-2007 08:16

Updated to 10.5.1. First impressions are

Back-to-my-mac (including both file sharing and screen sharing) now appears to work more consistently than before.
Internet sharing giving out IP addresses via DHCP seemed not to work.
I note that the firewall now tells more of the truth, but that is cosmetic.

Other point to make is that I am really liking QuickLook, which works well in several different contexts. I would add that since this is linked to CoverFlow, I actually realise that CoverFlow is useful for browsing (particularly when pruning out old useless files) because you can see what you are looking at without having to open it.

[More]

Comment on "Mac OS X 10.5 (Leopard) first impression"

martin | weblog comment | 13-Nov-2007 10:03

I agree that it is not as impressive as I had hoped and this has led to a feeling of disappointment.

There are some things about Spaces that are better than VirtueDesktop, in my opinion, and some things that were better IN VirtueDesktop. For example, I think using the exposé-like interface for Spaces is better than the VirtueDesktop interface, but VirtueDesktop did not tie an application to a window if you docked it [I found that was a useful way of moving applications from one window to another - shrink application into dock, move to other window, unshrink from dock] whereas with Spaces it seems the only way to move an application from one window to another is to press F8 to present the set of shrunken windows and drag the application from one space to another. Once you have opened an application in a window it is tied to that window even if you have docked it. I find this a particular irritation with iTunes - I want it to be playing but out of the way most of the time - with VirtueDesktop I could just unshrink it from the dock quickly and then put it back again and get on with whatever I was doing, but with Spaces I get switched to another window (whichever one I opened iTunes up in) and then have to switch back to where I was working - this is not something huge, but we're supposed t be saying "it's the little things that Apple gets right that make the whole experience better".

The new dock, with stacks, is also a disappointment - I couldn't care less about the glass shelf because I always have my dock on the side of the screen and never see it. The look for the dock on the side - a translucent dark grey - is fine if you have a light coloured desktop but awful (in my view) if you have a dark desktop (as in the new default desktop image). Hence my first reaction to the new dock was "yugh!". I'm over that, but stacks is still an issue. I don't like the way the icon for a folder changes to the icon for the first item in whatever sort order you have chosen. It is no longer obvious what is a folder and what is a file. When you open a stack with many files the grid display is unusable. It is "cute" to see the thumbnails of the first pages of some documents and the quite pleasant images for some generic file types, but I am well past the stage of limiting my file names to 12 characters, which means that most of the names of my files are truncated and illegible until I call up a proper finder window. Two clicks instead of one. One step forward two steps back.

The biggest plus, I think, is that Leopard seems to me (subjectively) to be a little quicker.

I think iCal is supposed to be much more usable for sharing calendars than before - I will check that out and hope it is true.

Mail has a few tweaks - the auto-detection of names and dates and the integration with iCal will, I think, be quite useful - but this was not an obvious thing - I stumbled on it by accident when my cursor was floating above the word "tomorrow" in a mail message and a little drop-down menu appeared. I can imagine some people using the new Notes and To Do list features, but I don't think I will be one of them - they are not bad features, it's just that I have a way of working already and they don't fit in.

Similarly, I don't think I need iChat to share files. In the middle of a chat I might use it but it doesn't solve a problem I had.

Time Machine is obviously a good idea, but I have not tried it out and probably won't in the near future - I don't need another backup option on the desktop, and my laptop really does spend most of it's waking life on my lap, and not connected to a massive external disk. The initial promise of being able to do this through one of the new Airport base stations was appealing, but was withdrawn just before release of Leopard.

The interface to Spotlight is better than it was before, but still does not give users access to the full power of the underlying engine.

In a sense, I am frustrated by this upgrade because I think it really IS better than Tiger, but as far as the user experience is concerned it is mostly by small increments. And the new features seem promising but do not fulfil the promise yet. I wouldn't willingly downgrade to Tiger, but I find it hard to pinpoint why someone using Tiger should rush to upgrade.

[More]

Will we see a drawing package in iWork 09?

martin | weblog | 11-Nov-2007 19:16

I recently had to produce a number of diagrams for some talks and papers. In the end, I found that the best tool to put them together was Keynote.

Which made me think: Apple has all the components for a really good drawing program. They could market it as "Graphics for the rest of us".

I do hope that Apple are thinking this way. It could be the next great app for iWork 09.

[More]

Apple iWork and mathematical equations

martin | weblog | 11-Nov-2007 19:13

iWork's Pages and Keynote are less that completely usable for scientists for one simple reason; unlike Microsoft Office, there is no equation editor. My documents and talks always contain equations in one form or other.

I had always assumed that sometime like an equation editor would be too expensive to be included within iWork. iWork is priced at a remarkably low cost (well done Apple!), and Microsoft license their equation editor from a third party. So an equation editor for iWork seemed out of the question.

But then I discovered that Apple's Grapher application has more than the basis of a decent equation editor. It lacks the h-bar symbol, and one or two other things, but in most respects it is almost suitable for inclusion within the iWorks apps.

So will an equation editor feature in iWork 09? Pity about not making it into 08; it could have been in there.

[More]

Mac OS X 10.5 (Leopard) first impression

martin | weblog | 11-Nov-2007 19:05

I have been with OS X 10.x since x was zero (I am afraid that I didn't ever try the public beta because it seemed to be impossible to actually work with it). It was fun being an early adopter. Things were rough around the edges, but we were getting an idea of a new way of working. I still remember when I first saw the dock with its magnification, and it was striking just how different it was. 10.1 came as a free upgrade (what else could Apple do – 10.0 really wasn't really ready for production use except for the adventurous few). 10.2 saw what I thought was a proper professionalisation of OS X, and 10.3 saw what I thought was more-or-less the completion of the transition (with new features such as Exposé and fast user switching). During these versions we saw the introduction of tools such as ichat, and public beta versions of ical, isync, safari and X11. It was actually quite exciting.

Then we got to 10.4, which seemed to be to be somewhat underwhelming. I fear I have hardly made any use of widgets at all, mostly because they didn't seem to give me anything of value that a bookmarked browser would give. Spotlight seemed like a decent idea, but over time it seemed to me to be not quite as useful as I had been hoping. 10.4 stuck me, in conclusion, as being a set of small upgrades that certainly improved things in many ways (including new tools such as Dictionary), but without any obvious vision.

So we come to 10.5, which seemed to me to be prefaced by quite a lot of hype. Having now had it running on my three systems for a few days, I am no less underwhelmed than I was with 10.4. In fact it is worse, because some of the things I was hoping for seem not to work. In fact, I wonder whether in the rush to get 10.5 released (remember it was slowed down by the iPhone – it is almost as if 10.5 took too low a priority within Apple) there are too many loose ends still remaining to be tied. Take for example the translucent menu bar. Whilst this appears to be reviled by many, there appears to be no way to change how transparent it is. But there are a number of systems for which there is no transparency at all, and a strange note to this effect in Apple's knowledge base. I don't care whether I have a transparent menu bar or not, but the fact that I am supposed to have it and don't suggests to me that we will be seeing some bug fixes in the coming months.

The feature I was really looking forward to was "Back to my mac". I have a .mac account and I do need to access my home computer from other places. But it just doesn't work, and again, there is a page on this in Apple's knowledge base.

Spaces is a nice feature and appears to be better implemented than VirtueDesktop which I had before (I had to give up on that because palettes became separated from main windows in applications such as Pages – Spaces appears to do a better job in this regard). However, it is not near to being perfect. I don't always end up in the right place when going to a different application, and too often I watch it going around various windows before it finds what I want (making one almost sea sick!). It doesn't always go to an open Finder window correctly.

I will never use Bootcamp nor Webclip. Dictionary having access to wikipedia is neat, but hardly essential. When I buy a new disk Time Machine looks worth a look (I don't want to overwrite my current backup disks). I hope that Spotlight proves more useful that its previous incarnation. I am hoping that Stacks will work, but I suspect it will need a bit more work than I was hoping. Finder coverflow seems to me to be too slow to be useful. On a negative, something in the security model (probably with TSL) has killed my use of my department's email system.

In short, 10.5 looks different round the edges, and it seems to have some small useful tweaks (like Dictionary), but I see no big vision. In fact I would say that the number of small tweaks is actually quite impressive, and I can imagine it being fun coming across them one by one. For example, I have just come across the link between Address Book and Google Maps. It is nice. Screen sharing could be interesting, but when I tried it out my daughter was using my desktop and it came down to a tool for spying! 'Creepy' was the reaction. Icon preview could be useful, but there will be times when it might be a nuisance. Quicklook might be better.

What seems to me to be a pity is that some nice tweaks could be given away for free in the incremental updates. For example, adding tabs to terminal, or obtaining address information from emails, are small tweaks that would be nice to suddenly fund in an incremental update. The fact that Apple stores these up for one of the 300 new features in a large paid-for upgrade suggests to me that Apple is running out of a big vision for OS X. Of course, it would be churlish to complain, because effectively we now have a very robust and usably operating system that is apparently very hard to improve upon.

[More]

iFort vs G95 compiler performance

martin | weblog | 21-Aug-2007 20:18

I had been told that g95 is not as efficient as other Fortran95 compilers, but I have been amazed at today's experience.

Job times for my simulation with identical parameters on the National Grid service:

g95 compiler: 2762 s

ifort compiler: 468.3 s

This is a factor of 6 difference!

I wasn't doing anything clever with either compiler. Presumably both compilers can be tweaked in their performance, and the gap in performance narrowed.

[More]

File publish: ossia2007.f90

martin | file | 21-Aug-2007 19:31

ossia2007 Fortran90 source file

[More]

Visualisation of atomic configurations - tool possibilities

martin | page | 17-Aug-2007 08:25

Here are some ideas for visualisation, mostly as per earlier discussions

A visualisation tool as a workbench, with incorporation of the following codes (mostly Fortran)

CRUSH for rigid unit modes
Group theory analysis (GROUP code)
pair (radial) distribution function

Analysis of many configurations or structures

Read many files at once (Apple's "open *.tag" command does this at the shell level)
Read many configurations from one file (eg molecular dynamics history or trajectory file)
Constant scale and range for all files
One action affects all (eg change view, scale, orientation)
Change size of window for each (ideally perhaps have an automatic window size that matches the view of the configuration)
Compute some averages across all configurations (eg accumulated pair distribution function)
Easy creating of animations from molecular dynamics trajectory files

Input file formats

XML (particularly CML) files
Input from some standard codes (eg DLPOLY) that don't have outputs in standard formats

Tools for configurations

Move origin on a mouse (only for P1) as per atomeye
(From before) Analysis include pair distribution function, average bond length, average coordination number

[More]

File publish: testfiles.zip

martin | file | 15-Aug-2007 17:21

ossia2007 test files

[More]

<< Older

People:	Everyone \| Friends & Community \| Inbox \| Just Me
Display:	Full-text \| Summary
Include:	Blog Posts \| Blog Comments \| Files \| Wiki Page

SciSpace

The social networking site for scientists

Martin Dove

Recent Activity

Martin Dove

Friends

Bookmarks

Blog categories

Owned communities

Community memberships

Martin Dove :: Activity :: Just Me

Page 1 of 3

Grid and hydrology

Comments morning of 28/11/07

Post from the weekend

[More]

Comment on "Mac OS X 10.5 (Leopard) first impression"

[More]

Comment on "Mac OS X 10.5 (Leopard) first impression"

[More]

Will we see a drawing package in iWork 09?

[More]

Apple iWork and mathematical equations

[More]

Mac OS X 10.5 (Leopard) first impression

[More]

iFort vs G95 compiler performance

[More]

File publish: ossia2007.f90

[More]

Visualisation of atomic configurations - tool possibilities

[More]

File publish: testfiles.zip

[More]

Page 1 of 3