Saturday, March 28, 2009

NTS:: Metadata sources

More sources of article metadata:

citeulike
This data is useful because it groups related ids for each "view" of a resource (e.g. PMID & DOI for the same article). "
Article linkout data Mapping CiteULike article_ids to resources on the web can be done with the linkout table.
"The current snapshot is available at http://static.citeulike.org/data/linkouts.bz2 Data is available from 2008-02-02 onwards."
"To understand the data in this file, you should refer to "The linkout formatter" section of the plugin developer's guide. This file contains a number of spam links. Although CiteULike filters spam postings, traces of the spam still remain in this table. In time this spam content will eventually be removed. The file is a simple unix ("\n" line endings) text file with pipe ("|") delimiters. Literal pipes within the fields are represented escaped ("\|"). The columns are: 1. Article Id 2. Linkout type 3. ikey_1 4. ckey_1 5. ikey_2 6. ckey_2
NB If an article has n linkouts, then this will result in n rows in the file."

BiodiversityHeritageLibrary
150mb download, but most likely worth it

NZETC (New Zealand Electronic Text Center)
Some solid work, indexed by Index New Zealand, so can use Z39.50 to grab the data from INNZ. I'd like to see what can be done with the TEI-XML too.

NCBI
Worth listing for the version 2 of their wonderful web services.

twitter

Hi, I'm Tom, and I'm not on twitter.. yet.

Twitter might be a great medium, but I find delicious is good enough for my purposes at the moment for my form of microblogging...

I suppose I'm still in 'broadcast mode', though I'll probably get there originally.

As John C. Dvorak points out, there are a number of applications for twitter, and look how he turned 360!


tivo coming (to New Zealand) finally

Well, we've all seen the ads on our regular tv channels in New Zealand, but it's still going to be a while before we see the tivo here in New Zealand.

In Australia, there were problems with the pricing for the networking add-on, something TVNZ hopefully has learned, although if you're looking at wireless, it sounds a definite maybe: "Availability of TiVo complementary products will be confirmed closer to launch."

It sounds like a nice option to buy the unit outright, but in the meantime, there's some tivo alternatives, which might be worth considering... although it's hard to make comparisons when the final price for the tivo unit has not been established.

Friday, March 13, 2009

NTS:: Some ideas for a Windows-friendly approach for basic indexing of OAI-PMH content (amongst other sources).

Thinking about my brief flirtation with an EAV variant, I've realised that I should probably stick with a traditional structure, (and possibly use views to produce the output I would like).


  1. Grab OAI-PMH content

  2. For each record from OAI source, store it in Postgres (temp table? - alternatively use a table for each OAI source)

  3. Create Postgres indices/tables from xpath queries... a standard xpath query wouldn't work with all OAI sources, (some include non Dublin Core content such as DIDL)

  4. Run Sphinx over select Postgres tables

  5. (Empty the Postgres temp table?)


Then can use Sphinx for other sources as well...

Sunday, March 8, 2009

Biodiversity Heritage Library, Internet Archive

Recently, I've been looking at the amount of material available from each of these sites - the amount of content on the Internet Archive is impressive. The BHL have been harvesting this content, and first documented it this time last year.

That post doesn't mention the Solr interface though. It's worth knowing that the Solr port for the query is 8983, (in case you need to open this port in a firewall to allow it through!), especially as the Internet Archive's Advanced Search sends your formatted request through to the Solr server.

Example request:

http://homeserver7.us.archive.org:8983/solr/select?
q=collection%3Abiodiversity+AND+
oai_updatedate%3A%5B2008-01-01T00%3A00%3A00Z+
TO+2008-01-31T00%3A00%3A00Z%5D
&qin=collection%3Abiodiversity+
AND+oai_updatedate%3A%5B2008-01-01+TO+2008-01-31%5D
&fl=identifier,title
&wt=xml
&rows=100


there's some doubling up:

q=collection:biodiversity AND oai_updatedate:[2008-01-01T00:00:00Z TO 2008-01-31T00:00:00Z]

and

qin=collection:biodiversity AND oai_updatedate:[2008-01-01 TO 2008-01-31]

Example metadata requests:

Dublin Core: http://www.archive.org/download/verzeichnisderpa03osha/verzeichnisderpa03osha_dc.xml
MaRCXML:
http://www.archive.org/download/verzeichnisderpa03osha/verzeichnisderpa03osha_marc.xml
Meta:
http://www.archive.org/download/verzeichnisderpa03osha/verzeichnisderpa03osha_meta.xml
- "Dublin Core metadata, as well as metadata specific to the item on IA (scan date, scanning equipment, creation date, update date, status of the item, etc)"

Sunday, March 1, 2009

Musical interlude: Del Ray by Sola Rosa

<a href="http://solarosa.bandcamp.com/track/del-ray">Del Ray by Sola Rosa</a>

Get it here, more directly