Sunday, March 8, 2009

Biodiversity Heritage Library, Internet Archive

Recently, I've been looking at the amount of material available from each of these sites - the amount of content on the Internet Archive is impressive. The BHL have been harvesting this content, and first documented it this time last year.

That post doesn't mention the Solr interface though. It's worth knowing that the Solr port for the query is 8983, (in case you need to open this port in a firewall to allow it through!), especially as the Internet Archive's Advanced Search sends your formatted request through to the Solr server.

Example request:

http://homeserver7.us.archive.org:8983/solr/select?
q=collection%3Abiodiversity+AND+
oai_updatedate%3A%5B2008-01-01T00%3A00%3A00Z+
TO+2008-01-31T00%3A00%3A00Z%5D
&qin=collection%3Abiodiversity+
AND+oai_updatedate%3A%5B2008-01-01+TO+2008-01-31%5D
&fl=identifier,title
&wt=xml
&rows=100


there's some doubling up:

q=collection:biodiversity AND oai_updatedate:[2008-01-01T00:00:00Z TO 2008-01-31T00:00:00Z]

and

qin=collection:biodiversity AND oai_updatedate:[2008-01-01 TO 2008-01-31]

Example metadata requests:

Dublin Core: http://www.archive.org/download/verzeichnisderpa03osha/verzeichnisderpa03osha_dc.xml
MaRCXML:
http://www.archive.org/download/verzeichnisderpa03osha/verzeichnisderpa03osha_marc.xml
Meta:
http://www.archive.org/download/verzeichnisderpa03osha/verzeichnisderpa03osha_meta.xml
- "Dublin Core metadata, as well as metadata specific to the item on IA (scan date, scanning equipment, creation date, update date, status of the item, etc)"

No comments:

Post a Comment