tompasley::: misc ramblings and thoughts: August 2009

Recently I hit a slight snag on a fairly common problem... dealing with names. This is a problematic area, given that everyone has one, and trying to build in what we know about names into software is actually a bit of a slog!

What I'm doing is trying to parse names, (mainly author names), for txtckr, so that one of the output display formats could be a reference, (APA, for example). To do this, I also need to untangle the "rft.au" information which is delivered through OpenURL, and I'm trying to build in some "forgiveness" to allow for people/companies that don't follow spec's properly!

Things to consider:

with a full name, is it supplied first-name(s) last-name/surname, and if so, where does the surname begin? This is fine for a fair number of relatively simple names, but what about surnames which aren't, such as "van der Weerden"?
if you're going to receive name fragments, how do you build these sensibly into software, so you can give permutations of the name, e.g. Pasley, Tom == Pasley, T. == Tom Pasley == T. Pasley?

No doubt I'm not the first person to tackle this problem, and I'm probably over-thinking things slightly, but I'm open to tips about projects that/from anyone else who's tackled this problem...

Thursday, August 6, 2009

Dealing with humans' (names)

tompasley::: misc ramblings and thoughts

Followers

Blog Archive

About Me