Inexpensive, easy-to-use data collection: Kobo Toolbox as an example.

(Note- Originally posted here: on Feb 29, 2016. )

As promised in my previous post , this post is about using Kobo Toolbox as a data collection tool for archaeology. I also address important issues that have materialized as I design and use forms in Kobo Toolbox.

What is Kobo Toolbox? Kobo Toolbox is an open source suite of tools for the collection and analysis of humanitarian data- especially in remote places or after a disaster. Kobo Toolbox is comprised of three different tools, one for form design, one for data collection and another for data analysis (the latter is rather simple and won’t be discussed here). Kobo Toolbox can be used from their website or installed on a local server. It was designed by a fairly large collective organized by the Harvard Humanitarian Initiative and supported by lots of heavy hitters, including the United Nations, the International Rescue Committee and US government- through USAID. It is very well supported and has a very active and dynamic user and collaborator community.It appears to be sustainable. Similarly, the data collection form is based upon Enketo, another open source, well supported and actively growing tool.

Why Kobo Toolbox? First and most importantly, Kobo Toolbox is device independent (though there is an Android app as well). Instead of operating via an app, it runs as a webform that can be accessed through a browser on any device that can run a browser (browser does need to be updated to handle HTML5; Google Chrome seems to be the best at this point; see this). With the expansion of the use of smartphones throughout the world, this is incredibly significant. Anyone with a device with a browser can collect data. Personally, in trying to devise ways to collect archaeological data digitally, I purchased software and hardware into the $1000s- and I was doing it “on the cheap”. As my two iPads aged substantially between field seasons, I became increasingly frustrated because the expectation was that I was going to need to purchase new iPads in the near future- I was stuck in the technology treadmill. The system was unsustainable. With Kobo Toolbox, an inexpensive smart phone (yes, they do exist) is all one needs. Data collection becomes much less restricted and the opportunities for collaboration with interested communities, especially in remote places, is much, much greater.

Second, web forms can be used OFF-LINE! That’s right, a WEB form that can be used OFF-LINE. This is essential for nearly all situations in archaeology. Even if you have access to the internet (through nearby WiFi or a cellular data connection), it is likely that your connection will be cut at some point- usually the most inconvenient one- leaving you unable to collect data. Not with Kobo Toolbox. You can continue to collect data; it will upload and synchronize once a data connection has been reestablished. How much data is based upon the browser and settings with the browser; here’s details for Google Chrome. Important update: questions are stored in question banks that can now be shared!

Third, Kobo Toolbox includes an intuitive form design tool via the web. Most importantly, this means that the “learning curve” or threshold is very low. I was able to create basic forms in minutes the very first time I tried the tool. However, for more advanced data collection, the web design tool can be nearly as complicated as one desires. Data types are varied and include everything from alphanumeric to GPS location to images and audio. There are some qualifiers here; for example, barcodes can be collected using the Android app, but not the web form. Data can be collected by radio buttons, check boxes, drop down lists and many other ways. Options include skip logic (i.e. show certain questions based upon the response to previous questions) and validation criteria, both of which increase ease of use and the reliability of the data. Form construction can be even more complex if XLSforms ( )(based upon the open standard XForms; ) are used. XLSforms can be designed using the ubiquitous Microsoft Excel (LibreOffice Calc ( ) could be used as well and the file saved as .xls). The tool, therefore, is incredibly easy to use from the very beginning, but can be as complex as the user demands.

HE sample form
A portion of the data collection form in Kobo Toolbox

How about a quick example? In the fall of 2015, I taught a course entitled “Historical Ecology of the Lehigh Gap” at Muhlenberg College, which was “clustered” with another course, “Degradation and Restoration” taught by my colleague Kimberly Heiman. As a component of this course, we mapped plant distributions along a transect through the Lehigh Gap Nature Center. The Lehigh Gap is a Superfund site contaminated by heavy metals from a zinc factory. The purpose of the transect was to identify different plants that represented degraded or restored communities. I created the form (see example) in an evening and shared the link with the students.


In the field, students pulled up the link on their smartphones. Although there were some hiccups, the form worked like a charm. We had one phone that could not use the forms- we still do not know why. Some students found that one browser was preferable to others (Chrome seemed to be the best) and we had significant issues collecting photos. Except for a few exceptions, GPS coordinates were within their known error (c. 10 meters). Those points that were not located within the expected 10 meters appear to have been random. That is, there was no apparent pattern that would suggest particular phone brands, individual phones, or users that were less accurate than others. The data was exported (via CSV) and imported into CartoDB, which students used to analyze the spatial distribution of plants. The map below shows one day of collection and only shows one plant, sweet birch, which in this case actually shows the effectiveness of prescribed burns on the northeastern portion of the transect (birch tend to pull heavy metals from the soil and reintroduce them into the ecosystem; controlled burns limit the growth of birch).

HE sample map
A sample of data collected in Kobo Toolbox being displayed in CartoDB.

This was a particularly effective exercise. We were able to collect approximately 75 data points on 10 plants at 10 meters intervals (i.e. a distance of 3/4 kilometers) in approximately 2 hours with 17 students. Data collection required no special tools, but students were able to collect and analyze a rich data set with relatively simple, intuitive tools.

Ok, it may appear that I am trying to “sell” Kobo Toolbox. However, let’s face it, there are some drawbacks to Kobo Toolbox.
First, data entry via a small screen virtually requires that typing be minimized- drop down boxes, radio buttons, etc. are far superior. This means that it is preferable to design these types of data entry into the form, which has the positive effect of standardized data, but also can reduce recording important, narrative data. Really, this is not a drawback of Kobo Toolbox, but of the device. Narrative data would be best collected via audio or text through voice recognition, but neither of these is ideal either.

Second, because Kobo Toolbox is designed around location collected through the device GPS, it is wonderful for survey, but the location aspect is less useful for excavation. If a sub-centimeter RTK GPS was connected to the device (via Bluetooth?) the location aspect of the form would be much more useful for archaeology, but that requires serious expense (or, perhaps not). The device could also be connected via Bluetooth to a total station, etc. for increased locational control. Excavation data is likely better collected via a tablet, rather than a phone.

Third, initially I wanted a tool that could be connected directly to a relational database; at this point, this cannot be easily done with Kobo Toolbox. However, because the format of the Kobo export is a CSV file, the data can be easily synchronized with any database, relational or otherwise with relatively little effort. I am now convinced that this sort of compartmentalization is actually preferable. With such a format, the user can decide how to store, analyze and archive their data. While I now prefer a PostGIS database accessed through a LibreOffice Base, others may prefer to database types or GUIs. Compartmentalization means that no one tool is reliant upon any other, but it does mean that standardized (and preferably open) data formats are required to go between tools. Compartmentalization also means that forms must be designed with the database in mind and vice versa, the database must be designed with the intension that all data will be arriving in CSV files (alternatives include KML and XLS; except images, audio, etc., which must be entered into the database manually).

Fourth, I also initially wanted the ability to modify forms in the field. I was able to do this with my initial Filemaker Pro database, but only because I carried the server into the field. However, I used adjusted forms largely during field testing. Adjusting forms in Kobo Toolbox once they have been deployed is difficult, requires an internet connection and new “projects” must be created with new URLs, etc (side note- you can install Kobo Toolbox on your own server and take it into the field). However, once past the testing phase this may not be important; once data collection has begun in earnest, changing forms is usually a poor idea because it reduces consistency and makes synchronization more difficult.

Although Kobo Toolbox has some important limitations, I now consider these limitations to be beneficial. Open source tools that do one thing extremely well and that use open standards and open file formats are preferable because their output is useful in a wide variety of other tools (and in ways that I cannot even imagine).

Robust, Open, Flexible and Offline Digital Data Collection in the Field.

(Note- originally published here: on Sept. 25, 2015).

First, a little background… My name is Ben Carter and I am currently an assistant professor of anthropology at Muhlenberg College in Allentown, PA. However, I came to the project described below long before I was lucky enough to get my current job. My essential perspective was forged in the fires of many years in “in-between” states- as a graduate student, an adjunct and a non-tenure-track faculty member. At the same time, I was trying to run a field school- because I believe in the fundamental pedagogical value of field school for those going into archaeology and for everyone else. However, because I operated in these in-between states, I often had severe budget and time constraints. Initially, I had hoped to employ digital tools in the field in order to save time, thus allowing me to spend more time with my students discussing important anthropological issues and less time on data entry. Although I have field tested a range of options; none have done what I want them to do (and most cost me MORE time). Indeed, my students will tell you that these attempts, while perhaps educational, were quite frustrating.   Therefore, broadly speaking, my goal has been to develop a field data collection system that is inexpensive, easy to deploy and use and results in data in a format that can be openly shared. These characteristics would not only ensure my own ease of use, but enable other archaeologists to give it a whirl. The project should yield a product useful to graduate students, faculty with severely limited resources, community projects, small CRM firms and anyone else.

Beyond my own experience, archaeologists have been shifting towards collecting data digitally in the field for some time. These systems can be as simple as using an app on a smart phone or as complex as setting up a WiFi network- including a server (laptop) and multiple data collection devices (such as iPads)- in the field.  We have done this largely to reduce both time spent converting paper-based data to digital as well as to reduce the errors introduced in this process. Also, a digital collection system is simply the first stage of a digital pipeline that funnels data to both online publishing and archival platforms. However, many of the systems developed within archaeology have been hampered by five major concerns:

First, many employ proprietary software (Filemaker Pro is one of the best known) and/or hardware that limit the integration, analysis and open dissemination of data and restrict flexibility. Similarly, once entangled in proprietary software and hardware, it is extremely difficult to leave- at least partially because they have a steep learning curve.

Second, the cost of proprietary software and the devices needed to deploy it limit its applicability to  larger, well-funded academic projects or CRM companies. Smaller groups and individuals, including small academic teams, faculty at small less-well-funded schools, graduate students, and small CRM firms, cannot sustainably afford these systems, even if the may be able to afford a one-time purchase.

Third, while proprietary software often supplies a wide array of options making it attractive, they can neither provide all present requirements nor those that may be necessary in the future. An open, community-based tool can be relatively quickly re-purposed and modified to account for nearly any contingency- in the present or the future. All this should be possible with minimal time investment and assistance from an open source community.

Fourth, most of these data collection systems require either a wireless local network or a connection to a cellular network so that the collection tool, such as a tablet, can communicate with the database on a server. While this is logistically feasible for many (especially small, localized) excavations, they are not applicable to field survey. For these tools to be used in field survey, a cellular connection is required. The tool we hope to design can be used offline and then synchronized with a database once a connection is established.

Fifth, while there are some serviceable open-source projects, these are platform specific (e.g., FAIMS is designed for Android). We prefer a browser based system in order to avoid the problems associated with restricting tools to a particular platform. We would like to build a field data collection system that is open, inexpensive, highly flexible, browser-based, and can be used offline. We believe that this exists in a set of open-source software known as

The goal of my project- or rather OUR project for I am working with a fellow institute participant who will have a separate post and other partners are being courted- is to develop a tool that remedies these issues.  We hope that the end product of this project is a tool that:
1.    Is browser-based (and, therefore, device independent)
2.    Can be used off-line (now possible with HTML5)
3.    Employs open source code
4.    Can be connected to a database of choice (proprietary or open-source)
5. is inexpensive and simple (at least on the user end once it has been built)
6. promotes the open use of resulting data

We realize that this is a bit of a pipe dream for we will never be able to do all of this in one year. We plan to have a working prototype. We are feverishly working to figure out the details- more on that in coming posts.