So I’ve had a ‘big data dream’ in the back of my mind for a while now: a vast, integrated, dynamic, internet collaboration of creators of plant biodiversity images and data (plots, dets, collections, taxonomy). Imagine, for instance:
...an attractive, well-designed site that draws upon plant images from a wide range of different sources and permits the user to easily scroll through thumbnails organized by taxonomy, to find a match for his/her own images. Finding a suitable match, he/she can then easily make a (machine-generated, RDF) public statement that, “My plant,
KalBar-SungaiDalam-PlotB-34, which I have given a morphotype taxon label of
KalBar-SungaiDalam-POLYBLA4is a good match (using images) to Jones’s
Brunei-HutanKota-12which he has determined to be Maasia sumatrana (Miq.) Mols, Kessler & Rogstad (Plantlist ID:
Nice idea hey! However, at the same time I’ve become increasingly realistic about how busy people are, and what we can and cannot expect collaborators to learn and do, and it is clear to me now that the only way to realize this dream is for someone other than the data creators to do the bulk of work needed to integrate these data. I’ve also noted the many barriers to collecting data into a single new biodiversity portal: most people will generally want to keep control of their data, publishing it in the way that best fits with their own needs and capacity. So the tasks of the ‘integrator’ are:
Phew! Lots to do! But... I really think this ‘bespoke’ approach will in the end be most productive, because we are not talking about hundreds of key collaborators/resources, but only a few tens. This integration is not like launching a new social media platform that has to be scalable and generic.
In the process of moving towards this and discussing the idea with people, I’ve met many researchers who would like to be part of this effort, but don’t have the tech skills to build their own online database. In the model above, there is no central repository for data and images, so where do I turn people to? There are a number of existing user-focused biodiversity platforms (e.g., iSpot, iNaturalist, Project Noah), and I just discovered that iNaturalist has an API to get the data back out; I need to look into this more as an option to recommend in future. However, so far I’ve found myself most often recommending Flickr as a great place to store the data about biodiversity observations, when those observations are represented by sets of photographs of individual plants, as they often are. The primary reasons for using Flickr are:
More about Flickr, below. There also exist a lot of identified images of plants on Facebook, particularly in groups (e.g., the amazing Plants Community for Kalimantan Barat), and it may be possible to extract images and determinations via the Facebook API (e.g., for each image posted to a group, seek the last taxonomic name in comments on that photo). However, the information in Facebook is less useful than Flickr for building a networked plant data resource, because Facebook is fundamentally a social interaction platform, as opposed to Flickr’s object orientation: it is harder on Facebook to associate metadata (tags, descriptions, locations) with each ‘object.’
I’ve been developing some ideas about using Flickr as a biodiversity image platform, particularly during i) discussions about the adoption of Flickr as a repository for staff-generated images of plants in the Arnold Arboretum, and ii) with Louise Neo, who has been taking amazing images of the plants of Singapore for a while now, storing them on Flickr (she also pointed me towards the amazing photos of Yeoh Yi Shuen and Ng Xin Yi). I’ll work towards writing a comprehensive set of guidelines for users (see a start I made here), but for the moment here are some pointers:
tpl:kew-2607792can be added in the album’s description field; this gives an unambiguous, machine linkable determination to a name from The Plant List.
high | mid | low, and for the latter,
s+ | s- | g+ | g- | f+ | f-: high confidence to species, low confidence to species, high confidence to genus, etc...; note that a specific epithet can still be given with a
g+, to indicate a very low quality det to a particular specific epithet. A text string can be included in the album’s description field, e.g.,
The Flickr API is awesome. See this small project for an example of using the API. The generic steps are:
method=flickr.photosets.getList&user_id=102148157%40N08; try it!)
method=flickr.photosets.getPhotos&photoset_id=72157642193981464; try it!)
method=flickr.photos.getInfo&photo_id=13078067683; try it!)
method=flickr.photos.getSizes&photo_id=13078067683; try it!).
With these calls you can easily and automatically build a database of a user’s images, with metadata. As discussed above, the particular solution for each committed contributor will be unique, allowing for a compromise between their choice of customization (e.g., of metadata codes) and of the ‘integrator’s’ needs, so each contributor script will probably look slightly different. Perhaps a good hybrid solution for many users would be to use Flickr to store the images and then post a simple (flat Darwin Core) CSV file of the associated metadata on a web-page that they can write to.
As for the photos for each individual plant, the more the better! While photos are (usually) inferior to a physical collection for the purpose of making a good determination, if they are high resolution, and of the appropriate plant parts, accurate determinations can often be made. The ‘appropriate plant parts’ for diagnosis will vary from taxon to taxon, and the general botanist will usually not know if he/she is photographing the key morphological and even anatomical characters, but, if time permits, a thorough, destructive, photographic recording of the plant will increase the chances that key parts are included. As a minimum, I recommend:
This full set is 21 images for each album. The images should be cropped to highlight the intended part of the plant. Here’s an example from our Xmalesia project database. See Baskauf & Kirchoff 2008 for a more comprehensive discussion of standard views of plants.
I’ve just been looking into the latest from iNaturalist. Wow! If contributors can be enticed to use it, iNaturalist and its API provide a fantastic platform for collaborative biodiversity work. Definitely the best ‘out of the box’ platform I’ve seen. The image tools are less developed than Flickr, and people who already use and love Flickr are unlikely to want to put their pics in two places, but for people who primarily want to contribute biodiversity data, iNaturalist is definitely the better choice:
The API allows very easy, unauthenticated access to user data, returned as XML (Darwin Core Simple Records) or JSON. For example, the query (try it):
returns all plant observations for Indonesia (NB: cool bounding box tool!), including image URLs, XML. Simple as that! The same API seems to be used for the website’s search (try same query) showing a pretty map:
I.e., iNaturalist already has most of the cool features I’ve wanted (and sometimes tried) to incorporate into my own biodiversity platforms. So maybe... I might even use it for my own data ;-)