Friday, October 16, 2009

CrossRef Labs - worth a look!

If you work in the "information space", then it's nice to know that CrossRef have some developmental services, but there's one, (okay, two), in particular that I really like the look of:

  1. Metadata Search complete with an OpenSearch plugin. This is not, of course a complete "search": "Instead, CrossRef Metadata Search is focused on allowing researchers to lookup citations using only terms that might appear in the bibliographic metadata of the item they are searching for."
  2. an undocumented aspect which I discovered which is a more RESTful service providing the metadata for a given DOI. This is of the form:{DOI}.xml

Sunday, September 27, 2009

txtckr stage 1

Well, I've just committed a batch of updates to txtckr, which has finally moved beyond a mix of php & pseudo-php to a stage where I run it on a laptop without any errors (with the included very simple test).

It's almost at the stage of adding the finishing touches to the name handling, where the various openurl name parts get humpty-dumptied again. It's just skeleton coding at this stage, but hopefully a pretty sound basis for what's to come!

Thursday, September 17, 2009

Google Co-op: EBSCO Connect search

There's been a bit of talk about EBSCO Connect content appearing on Google, and while it's not the sort of thing that thrills me necessarily, I found it was easy enough to create yet another Google Custom search, which would allow the retrieval of EBSCO connect material only.

There appears to be quite a bit of content there - give the search a go, and leave a response if you feel that way inclined...

Thursday, August 6, 2009

Dealing with humans' (names)

Recently I hit a slight snag on a fairly common problem... dealing with names. This is a problematic area, given that everyone has one, and trying to build in what we know about names into software is actually a bit of a slog!

What I'm doing is trying to parse names, (mainly author names), for txtckr, so that one of the output display formats could be a reference, (APA, for example). To do this, I also need to untangle the "" information which is delivered through OpenURL, and I'm trying to build in some "forgiveness" to allow for people/companies that don't follow spec's properly!

Things to consider:
  • with a full name, is it supplied first-name(s) last-name/surname, and if so, where does the surname begin? This is fine for a fair number of relatively simple names, but what about surnames which aren't, such as "van der Weerden"?
  • if you're going to receive name fragments, how do you build these sensibly into software, so you can give permutations of the name, e.g. Pasley, Tom == Pasley, T. == Tom Pasley == T. Pasley?
No doubt I'm not the first person to tackle this problem, and I'm probably over-thinking things slightly, but I'm open to tips about projects that/from anyone else who's tackled this problem...

Friday, July 17, 2009

What if OpenURL resolvers could blog?

I thought about this when I was thinking about having support for unAPI, etc. I found Mike Giarlo's plugin for WordPress, which added this, and I could make it work for txtckr, but why should I?

I may have overlooked something, (since I don't use WordPress), but:
  • what if WordPress was the OpenURL resolver, (well, actually it wasn't but just looked like it)?
  • what if txtckr could redirect to the WordPress post which had the request response once it had made a post which contained all of that info?
The advantage of automatic posting the details and output of an OpenURL resolver, (and subsequent redirection), to a blog post is that it increases the discoverability of the item being requested.

There's also the tools available through something like the WordPress platform, which further promotes the re-distribution of information about articles, books, etc., including COinS, RSS feeds, OAI-PMH, unAPI, etc.

Of course there's also the ability for others to comment and refer, (trackback), on the item being represented in the blog post.

Wednesday, July 15, 2009

txtckr under development

While I was working at Crop & Food Research, I developed an OpenURL resolver called textseeka.

txtckr is a GPL, (open-source), OOP focussed, replacement for textseeka, based on what I know now, in terms of programming, OpenURL, webservices and metadata sources.

This is not a simple overnight project, complicated by the fact that I need access to the original code-base, which is still at my old work. Over time, (snatched here and there), txtckr will be fleshed out, starting from the bones that are there now, (currently just a class for the "context object").

Saturday, July 4, 2009

If you can't bet them, join 'em...

I've been at UCOL Library for a while now, which means I'm firmly part of the Ex Libris picture, as we have Voyager, which is quite a part of my job, as the Information systems Librarian.

As such, I've spent quite a bit of time putting lipstick on the pig... and one of the features I've implemented recently is an APA citation for each of the books in our catalogue, which is bottom-rightish. This picture might be a little fuzzy, (but you can't see it unless you're on-campus anyway...):

Initially, I thought this would be quite easy, since I thought we must have at least some OCLC-derived records in our catalogue. The APA citation service requires an OCLC number, and I learnt that there are not that many records which have OCLC numbers... but there are, of course, lots which have ISBNs.

The solution required writing some web scripts which:
  • do a lookup on the ISBN, and get the OCLC number
  • use the OCLC number to get the APA citation
  • generate JSON output so this is accessible to the browser
Actually, thinking about it, I should be able to do this with javascript through and through... shouldn't I?

Isn't this what LibX does? Maybe this is part of what is currently possible with their LibApps?

Saturday, June 13, 2009

NTS:: When Sansa's go bad...

[Legal Warning: By following these instructions, you agree not to hold me liable for any bad sh*t that happens to your MP3 player.]

In case this happens again, here's how to un-brick a Sansa Express... in Windows XP, with access to the Administrative Tools (I'm not sure how to do the same in Linux - any ideas?).

I'm unsure how my son's Sansa Express was bricked, but a common cause fro "bricking" is the inability of the Sansa Updater to finish the complete process of updating the firmware. A complicating factor can be the lack of a newer version of the firmware... so you need to have a older version of the firmware on hand [thankfully chrisjs has a copy].

Okay, here's what to do:
  1. Unzip the files from the 7zip file, (see link above), into it's own directory.
  2. Hold the Volume Down - button on the Sansa Express while inserting it into a USB port on your computer, and hold for about 20 seconds.
  3. Release it, and hopefully in Windows Explorer you should see 2 Flash Drive symbols, but when you try to use them, they're empty, and you can't format them either (they're 0Mb in size :])
  5. In my case, access to the Admin Tools is off the Control Panel - the one that I've used is Computer Management.
  6. Under Storage, click on Disk Management
  7. In the top right panel, locate your dud "Flash Drive"... it shouldn't indicate a file system, because that's what a bricked Sansa looks like...
  8. Right-mouse click on the Disk Drive icon in the left column of that same panel, and you should see the option to format your Sansa - (at this point I can't remember is I used FAT as the filesystem, or FAT32).
  9. Format the Sansa
  10. Then go back to the directory with the unzipped files, and use SansaExpressUpdater.exe
  11. You should find that the Sansa Updater will complete the update, and when the process is complete, the Sansa Express should be un-bricked.

Thursday, May 21, 2009

Farewell, Hi UCOL

Well, I'm in my "break period" between jobs.

I start as the Information Systems Librarian at UCOL on Monday, so I might not be posting much for a little while, while I get my head around Voyager, etc.

I really enjoyed my time at Crop & Food Research, (which merged with HortResearch and became Plant and Food Research 1 December 2008), and then at Plant and Food Research. It was a really interesting place for me, as a librarian, to work: I felt appreciated by other staff, (particularly "the scientists"), I had great people to work with, and there was plenty to get my teeth into! There was also Magic for a while at lunchtimes, and though I seldom won, it was good fun.

So, thanks to all of the exCFR/PFR staff -those who were kind enough to put comments in my card, and for their farewells, and also for the lovely farewell gifts. I really enjoyed my time with you guys, and I hope to keep in touch!

The laptop bag I'm especially chuffed with, since it's also a backpack - great for those times when you're tired, and your laptop seems to weigh a ton! (- no, I don't have a 17" laptop, but I'm optimistic...).

Saturday, May 16, 2009

Librarian-developed/enhanced resources

Okay, I've hinted at this in my last post... in this category would be anything that (any group)/Library/Librarian develops for/with it's clients/users - some examples:

  • OpenURL resolver logged data (and no, I'm not meaning like Ex Libris bX). This is meshed-up data from a variety of web-services which your OpenURL resolver consumes to deliver the data and services which are displayed to Library clients/users as a result of their request.

  • Librarian-developed interfaces - an example being something like a Z39.50 interface to an existing catalogue/dataset which presents the data in a different way... like a more user-friendly interface which shows individual results in more detail, with an OpenURL link to request them (as I demo'd at work to Ann Barrie from NLNZ - I'll post a screenshot if I get time).

  • LinkedData - there's plenty of this stuff being done including on the dataincubator site (no, I'm not employed by Talis, and I'm not sure how many of the people in this group are actually "Librarians")
So, as you can see, it's not just formal/normal "vendors" who provide "information resources". Why does Ex Libris seem to miss this point?

Wednesday, May 6, 2009

Picking fights with giants...

A while ago I picked a fight with Ex Libris, mainly about their EL Commons, mainly on two points (although I'd hoped they'd be moot points now):
1) "I don't think it's appropriate for Ex Libris to promote "El Commons" as part of an "Open-Platform Strategy" though - I think you should promote it as a service to the "Ex Libris community". There's a big difference between the two, and I believe Ex Libris are mis-using the word "open" in this context."

2) I'd "expect that there would be a system where developers, would be able to at least view other developer contributions and either:
- submit details about resources for which they would like Ex Libris to provide adaptors
- provide plugin documentation to create suitable adaptors themselves (if they have "Documentation Center log in details")"
This was following on from an October 2008 post on a Talis blog [ ], which ended on the note that:
Well El Commons is up running and accessible at Unfortunately as Oren hinted, you can only enter the commons with your Ex Libris Documentation Center or SupportWeb user name and password – a bit of a misuse of the generally understood idea behind a commons methinks.

What I didn't manage to convey in the last email I sent, (partly because I got bored with them not biting back, and never finished my very last email), was the fact that I thought Ex Libris was overlooking another important information resource - Librarian-developed/enhanced resources.

[more about this later...]

Sunday, May 3, 2009

NTS:: CrossRef alternatives

CrossRef provide a great service with their DOI/OpenURL resolver. There are times however, (esp. since their service was not designed to be a real-time service), goes down, or is unreachable from our network (at work).

Thankfully, you are able to cache the data, as Chuck Koscher recently commented - but what about alternative services?

PubMed is good for recent articles, and other services (such as Web of Science, etc.) are probably no better, unless they have also happen to be able to access the CrossRef OAI-PMH service. You can probably do this search with other NLM databases too.

Example PubMed search:

10.1002/jmv.21494 [aid]

The [aid] is, predictably, the "article identifiers submitted by journal publishers such as doi (digital object identifier). These data are typically used for generating LinkOut links." These include PIIs (Publisher's Item Identifiers), used by Elsevier (and some other publishers). This search can be used as part of a PHP script, such as Alf Eaton's.

On a related note, PMC ids are also appearing in PubMed XML - to link to PubMedCentral with:
PubMedCentral ID (PMCID):
PubMed ID (PMID):

Tuesday, April 28, 2009

Cowtrails and standards

Comparing these two, the first thing you'd notice is that cowtrails are easier to follow... while standards give a sound base.

If you have to implement a service which conforms to a standard though, you need to have something readable, and standards, while normally explicit, don't necessarily go into much detail about the philosophy or thought processes behind them. I suppose there's the expectation that, if you're reading this standard, you're already "there".

I'm not sure about anyone else, but I found, despite comments like "you don't want to know WTF we were thinking" that getting some idea of these thought processes was useful. Admittedly, my experience is limited, and I've heard various comments about the Z39.88-2004 being over-engineered... but once I saw some of the stuff by Jeff Young on his Q6 blog, OpenURL 1.0 made sense.

Sunday, April 19, 2009

Cow trails and standards (prelude)

Just when I'm getting my head around something, I hear that what I've learnt could quite likely be irrelevant... such as OAI-PMH (OAI-ORE) and this:

"I have little doubt that Atom, a more widely supported specification than OAI-PMH, could supplant OAI-PMH, and I think that would be a good thing. Why? Because we could begin to see a dismantling of the silos that hold “library” or “archive” material on the web and keep it distinct from all of the other great stuff on the web."

Thankfully, @pkeane has promised an update on this... it's going to be interesting to see how our perceptions change over relatively short periods of time (there's always disruptive technology running amok somewhere).

Also via twitter, (love the way you can talk to anyone - easier than on the street!It's a pity you only get to overhear some of the conversation though be nice to have more of the threading made obvious ala NNTP/newsgroups):

Roderic Page made the comment that "Trick is to have people use it, so that the links get out in the wild" - I'm not sure exactly what this was referring to, but it made me think about standards and cow trails... (now I just need to untangle my thoughts about this...).

Friday, April 10, 2009

twitter (bye, facebook)

You can now reach me at @tompasley on twitter.

I've ditched facebook, so although I decided to leave the option for people to still tag me on photos, etc., my facebook account has gone. Why? I wasn't checking/visiting it often enough, let alone making any postings... I was pretty much cyber-squatting. I've since discovered friendfeed, which was a bit thick (duh), but somehow, I don't think that would've made enough of a difference.

Ultimately, I feel the concept of friends on facebook is artificial and flawed. I prefer the default of twitter, (though I reserve the right to change my mind!), which lets you communicate with anyone, and block those you don't want to to hear from... it's up to those twittering to say something worthwhile.

This way, I can approach people, who do some cool stuff, and if they don't want to hear from me, then they can just ignore me or block me, rather than have to jump through some hoops first, and without having to worry about their email addresses, etc.


Saturday, March 28, 2009

NTS:: Metadata sources

More sources of article metadata:

This data is useful because it groups related ids for each "view" of a resource (e.g. PMID & DOI for the same article). "
Article linkout data Mapping CiteULike article_ids to resources on the web can be done with the linkout table.
"The current snapshot is available at Data is available from 2008-02-02 onwards."
"To understand the data in this file, you should refer to "The linkout formatter" section of the plugin developer's guide. This file contains a number of spam links. Although CiteULike filters spam postings, traces of the spam still remain in this table. In time this spam content will eventually be removed. The file is a simple unix ("\n" line endings) text file with pipe ("|") delimiters. Literal pipes within the fields are represented escaped ("\|"). The columns are: 1. Article Id 2. Linkout type 3. ikey_1 4. ckey_1 5. ikey_2 6. ckey_2
NB If an article has n linkouts, then this will result in n rows in the file."

150mb download, but most likely worth it

NZETC (New Zealand Electronic Text Center)
Some solid work, indexed by Index New Zealand, so can use Z39.50 to grab the data from INNZ. I'd like to see what can be done with the TEI-XML too.

Worth listing for the version 2 of their wonderful web services.


Hi, I'm Tom, and I'm not on twitter.. yet.

Twitter might be a great medium, but I find delicious is good enough for my purposes at the moment for my form of microblogging...

I suppose I'm still in 'broadcast mode', though I'll probably get there originally.

As John C. Dvorak points out, there are a number of applications for twitter, and look how he turned 360!

tivo coming (to New Zealand) finally

Well, we've all seen the ads on our regular tv channels in New Zealand, but it's still going to be a while before we see the tivo here in New Zealand.

In Australia, there were problems with the pricing for the networking add-on, something TVNZ hopefully has learned, although if you're looking at wireless, it sounds a definite maybe: "Availability of TiVo complementary products will be confirmed closer to launch."

It sounds like a nice option to buy the unit outright, but in the meantime, there's some tivo alternatives, which might be worth considering... although it's hard to make comparisons when the final price for the tivo unit has not been established.

Friday, March 13, 2009

NTS:: Some ideas for a Windows-friendly approach for basic indexing of OAI-PMH content (amongst other sources).

Thinking about my brief flirtation with an EAV variant, I've realised that I should probably stick with a traditional structure, (and possibly use views to produce the output I would like).

  1. Grab OAI-PMH content

  2. For each record from OAI source, store it in Postgres (temp table? - alternatively use a table for each OAI source)

  3. Create Postgres indices/tables from xpath queries... a standard xpath query wouldn't work with all OAI sources, (some include non Dublin Core content such as DIDL)

  4. Run Sphinx over select Postgres tables

  5. (Empty the Postgres temp table?)

Then can use Sphinx for other sources as well...

Sunday, March 8, 2009

Biodiversity Heritage Library, Internet Archive

Recently, I've been looking at the amount of material available from each of these sites - the amount of content on the Internet Archive is impressive. The BHL have been harvesting this content, and first documented it this time last year.

That post doesn't mention the Solr interface though. It's worth knowing that the Solr port for the query is 8983, (in case you need to open this port in a firewall to allow it through!), especially as the Internet Archive's Advanced Search sends your formatted request through to the Solr server.

Example request:

there's some doubling up:

q=collection:biodiversity AND oai_updatedate:[2008-01-01T00:00:00Z TO 2008-01-31T00:00:00Z]


qin=collection:biodiversity AND oai_updatedate:[2008-01-01 TO 2008-01-31]

Example metadata requests:

Dublin Core:
- "Dublin Core metadata, as well as metadata specific to the item on IA (scan date, scanning equipment, creation date, update date, status of the item, etc)"

Sunday, March 1, 2009

Musical interlude: Del Ray by Sola Rosa

<a href="">Del Ray by Sola Rosa</a>

Get it here, more directly

Saturday, February 21, 2009

"Corporate drones" - some thoughts on "standard applications" and innovation...

Why the photo? Well, this post is about the aftermath of thoughts percolating since IO2009, and the result of conversations with various people...

I've heard quite a bit of talk, (in different contexts), about "standard applications". Standard applications are a good thing generally, and definitely make life easier for people who have to deal with IT.

I hope that when people talk about "standard applications", they not going to wait for vendors to innovate and supply a "solution" - I can think of times when in-house development, or open-source solutions can be useful, at least in the short-term:
  • proving worth/value of a "solution" (e.g. where there's risk that it might fall flat, or conversely, be really popular)
  • no vendor supplied product is ready (often people will develop stuff to suit their own needs, and then be nice enough to share it)
  • there's a "niche" which provides little incentive for a vendor to develop/sell a solution.
  • a vendor supplied product exists, but needs tweaking or customisation... I'm thinking here particularly of Umlaut which is used in conjunction with SFX - which is nice enough to be a relatively "open" system (from what I can tell).
So instead of "corporate drones" who might only think in terms of outsourcing and vendor-supplied products, I'd like to see more "cooperate beings" who try on some level, to:
  • consider the availability of a commercial solution (will there ever be one?)
  • develop and use solutions from other providers
  • put their stuff out there (in-house development might be risky -perhaps this diffuses some of that risk??)
Hmmm... is this a rant, or do I have a point? I'd appreciate your comments!

Tuesday, February 17, 2009

Information Online 2009 (part 3 + end)

I've finished typing up my notes, (last night), and it's just as well... I've had enough!

I'm not sure if I took too many notes, but typing up each session seemed to take too long - often because each of these presentations leads you into exploring other areas... either by "see also" references, or the "what was I doing" effect. Hopefully ALIA will sort out wifi next year, (... and everyone will have a silent notebook/netbook keyboard that has good travel/feedback, and somehow, that'll include me!)

I've tagged most of the stuff I've come across, (I can't say I've been that systematic, but anyway) - here's what I've added to delicious for io2009.

Monday, February 16, 2009

James Robertson's (not quite) presentation: A18 "What do innovative intranets look like?"

Not exactly what James presented, but the same subject matter, so you get the idea... [edit 17/2/2009: actually this first Slideshare is closer... ooops!]

Full paper is still "coming soon"

Thursday, February 12, 2009

Information Online 2009 (part 2)

This session was one that I really liked - stuff you can use, especially as libraries are often having to prove their worth, and quantitative techniques never tell the whole story. Below are my notes, but I see Nerida has submitted the whole paper, so it's worth reading that (rather than my hastily scribbled notes, badly transcribed)

Session C14 – Nerida Hart, Land and Water Australia

Evaluating information and knowledge services using narrative techniques – a case study”

This session is based on a methodology that was first used 2.5 years ago, but has been re-used numerous times since then... used in a special library, with 5 agencies, 40 staff, and 40,000 clients.

Wanted to focus on the biggest, value-added aspect of the work that they did – qualitative and improvement focussed.

Methods used were:

  • surveys

  • narrative techniques (anecdotal not stories) [Snowden, 2000.]

Cynefin complexity framework

Anecdote circles
  • explore the unexpected, and are allowed to follow tangents

  • used to related personal experiences and link events in a meaningful way

What's involved?

  • Preparation (½ a day)

  • Discovery 6-12 people (no more!) (90-120 minutes)

  • Sensemaking (1 day) – cluster by theme

  • Intervention design (? time) work out the outcomes

Sessions are recorded and transcribed (anonymised). They are facilitated by an impartial facilitator, and typically you need around 20-30 minutes before you get past the “oh, the Library's so great” phase.

Need to prove improvement, productivity.

Timing for this needs to be perfect – tackle this when there's nothing to pre-occupy members of the anecdote circle.


Sunday, February 1, 2009

Information Online 2009 (intro + part 1)

Yes, this conference has been and gone... I'm slack!

As this was my second time attending Information Online, there was less "gloss".

As with most events there are a few sessions which stood out for me [edit 17 Feb 2009: I've added the link to the full paper below]:

Session B4 – Jane Burke, Serials Solutions

“What’s beyond the OPAC: discovery layer services”

Builds on earlier comments made by John Law, also part of Proquest

Federated search

Difficulty identifying appropriates resources.

“Put systems together that speak to them in a very simple way”
  • User’s want to search first!
  • Connector tech [NISO MXG]
  • Don’t bury it, embed it
  • Don’t worry about result physical format
  • Full keyword searching
  • Facets
  • Relevance ranking
  • Simple user interface, graphical
  • Put facets on left not right (this is where Google puts it’s ads)
Open Source offerings:
  • VUfind – Villanova University
  • Summa – Netherlands/Swedish??
  • XC (extensible catalogue)

Friday, January 30, 2009

OpenURL & using rules to generate target urls

I already use rules with textseeka to generate urls (to different levels: volume; issue; articles), for targets such as Royal Society of New Zealand titles, using data from the incoming openurl requests...

When there are DOIs, it should be possible to do the same with titles such as Annual Reviews, where the publisher has a DOI resolver for their content, just as there are publishers' OpenURL-flavoured resolvers.

I've tagged some items which might be of interest, if this is your thing here: - I've just realised I hadn't tagged the URL Clearinghouse, which is a great resource, (but which I need to contribute more to!), but I've just fixed it now.

Monday, January 26, 2009

Dora shakes her booty

what can i say, I like retro stuff... and I like this twist:

Thursday, January 22, 2009

NTS:: RDA (Important Library Stuff)

Hmmm... it did look like RDA might be the sort of thing I was wanting... supposed to have the granularity I'd like, but just seeing the draft documents makes me just balk.

Once you download the zip file, and see the size of the pdfs, you realise how much work has been done, (a lot), but also how much work is involved in implementing it. Not being a cataloguer, getting my head around this would take too long, so I'm probably going to soften it, and go with something similar, but easier.

My grand plan, (which is all it is at this stage - whether it gets to be anything more is dubious), would be a central repository for the tags, and how they translate... whether this would be a networked thing remains to be seen.