Digital Data Collection- Tools

This section provides a brief description of each of the component tools. I mention other tools that may work similarly (note that I have tested few of these- I am going upon general similarities). I also include some basic limitations and critique based upon my own testing (no details of the testing are provided… yet).

One of the most important aspects of this project is interoperability. Digital tools need to work together and produce both digital and analog products (documents, maps, etc.). Otherwise the project is pointless. A digital tool that produces data that does not play well with other tools or make it easy for us to produce a final product hinders our work by making it more expensive, time-consuming and frustrating. The tools below play well together, but they are NOT the only valid, usable options.
These tools are a part of the system shown below:

digdatcolarch

Kobo Toolbox

Kobo Toolbox is a field data collection form. Forms are designed on a web interface that is best utilized via a computer (though possible with a tablet or even a phone). Designing the form structures what types and forms of data can be collected and how they are collected. It may be the most important stage and the user should consider the entire process (from collection to analysis to archiving) in the design (Kobo Toolbox has wonderful resources here). Once the form is designed, it is deployed. Data can be collected via a web browser or an Android app, KoboCollect. I have used the web browser exclusively. The web interface provides a short URL for the deployed form that must be loaded in your browser while still connected to the internet. As long as you don’t close the linked page, you can now enter data without a data connection (wired, wifi, or cellular). If you save it in the browser (or on the homepage), you can even close it and reopen when necessary. As a form is completed offline, a tally is kept (in the upper left hand corner) of the number of forms that have not been fully synchronized with the database. Once the device is back online, these numbers quickly disappear as the forms synchronize with your database which lives on Kobo Toolbox servers (you can put it on your own, see here). All data is synchronized with previously collected data- even data submitted by multiple individuals using multiple devices. This was one of the major hangups with all of the tools I have used previously, but Kobo Toolbox makes synchronization seamless and automatic. To get your data, you simply sign on to the web interface and download your data. All data, except media, can be downloaded as CSV or XLS (MS Excel format). Media can be downloaded as a zipped folder. To reconnect your data to your media, please see this page.

Although Kobo Toolbox is powered by Enketo web forms, which is deployed by other tools (see list here), Kobo Toolbox has one of the most user-friendly interfaces and has a near zero adoption threshold. Importantly, the extremely powerful Open Data Kit also uses Enketo, but is limited to Android devices. I simply advocate for an OS agnostic data collection tool.

Alternatively, as a part of the Filemaker environment, Filemaker GO can be used to collect data in the field. The most important benefit of such a system is that the GO app on your IOS (only) device interacts directly with your relational database! When I used Filemaker in 2011 and 2013, it worked incredibly well (and many other archaeologists have successfully used Filemaker in the field). The limitation was that, because I had no internet/ data connection, I had to bring the server (my laptop) into the field and set up a wifi access point. A wifi connection had to be maintained between the collection tool, a iPad 2, and the server. This arrangement, therefore, was unacceptable for survey (and even problematic at the edges of the wifi range). Kobo Toolbox is much more adaptive, flexible and mobile. Kobo Toolbox data, however, is not connected to your relational database, although and can be downloaded and reattached to your relational database.

Besides being an offline, device independent data collection tool, Kobo Toolbox offers other advantages as well. The most important is that, because of the way data is entered it is relatively “clean” meaning that little time is spent normalizing data so that it can be used.

I have two main critiques of Kobo Toolbox (for further discussion see this post). First, while surveying, I wanted to see the spatial relationship between the sites I was locating. Kobo Toolbox has a few web-based analysis tools, but I lacked the necessary internet connection and could not take advantage of them. I had to use a supplemental iOS app (called iGIS) in order to see the relationships between collected data as well as with other spatial data (which can be loaded into iGIS). In order to do this, Kobo Toolbox would need a whole new offline component that could “talk” to the collection form. I will say that this was not a huge headache, partially because the data I used for iGIS was exported from QGIS, which I was already using as a central component of my work flow. Clearly, only those with iOS devices will be able to take advantage of this solution. I hope to work on an Android solution soon.

The second problem with Kobo Toolbox (at least from my perspective), is that forms are not connected to other tables. That is, you can’t “call” options for a drop down menu from another table (as you can from true relational databases and as I could using Filemaker). This would both simplify work flow and increase normalization.

Neither of these problems override the benefits of Kobo Toolbox- not even close. The inclusion of these features would make Kobo Toolbox as perfect as software can be; at least from the perspective of one archaeologist.

PostGIS

PostGIS is simply a spatial extension of PostgreSQL, an “object relational database management system.” That is, it is a language for creating and organizing relational databases.  The PostGIS extension allows the PostgreSQL database handle locational data. The main reason for choosing this system is that it is incredibly popular, widely used in industry and academia. It is open source and works on all computer platforms; it is now the native on Mac OSX servers. It can be stored on a remote server or on your computer. I prefer a graphical user interface and pgAdmin, the “native” client for accessing and editing your database, is not intuitive to me.

There are two main reasons for utilizing a relational database (see this post also). First, they encourage normalized data because they must be formatted in a particular way. Kobo Toolbox, because it collects data in a normalized way (that you have determined), data going into a PostGIS database are already normalized (data stored in Excel files is rarely normalized and a great deal of time can be spent on “cleaning” this data). One may think that “normalizing” data may be restrictive in that it doesn’t allow for exceptions. While it may be true that within a single field, data must follow certain restrictions (that you have chosen), you can always add another field and that can have different restrictions (also set by you). You can always have a text field as well, but these are difficult to analyze and link (again, you should always record a narrative of the excavation along with this type of data collection). One of the key mantras of data organization is that each field should contain only one piece of data. For example, it is preferable to store last and first names in separate fields; although the first and last together identify the person, they are two separate pieces of information. If needed, you can always concatenate the data at a later time; it is more difficult to break pieces of data apart once they have been joined. Of course, you can and should do this in an Excel table- PostGIS enforces restrictions upon those two separate fields that it may not be able to impose upon one.

Second, relational databases encourage connections across data. This, of course, is the entire purpose of relational databases. The need for normalized data is partially due to the requirements of a relational databases. In order to relate two tables, two columns (fields) need to contain data equivalent in format and similar in content. Let us imagine two tables, one that describes a site and another that describes ceramics. Both tables include a column containing site number. In order to connect the data in the two tables together, the site number columns in both tables must be of the same format (e.g. TEXT) and each must contain the same site numbers (otherwise they won’t be able to connect. There are multiple ways to connect these columns. Below is a graphical demonstration of two examples: an OUTER JOIN (where any and all data is included; those that match in the linked columns and those that do not)

Outer Joins

and an INNER JOIN (where only data is included where all data is present and the data can be linked).

Inner Joins

If the site number for ME-425-001 was slightly different in one table- let’s say Me-425-001 instead of ME-425-001, the data for those two records cannot be joined. However, especially for 12-VG-1, all of the information is now connected. In this simplistic analysis, this may seem unimportant, but now you can do an analysis of ceramics by county (assuming that the database is much larger than this example).

QGIS

QGIS is open source GIS software that can connect directly to PostGIS (as most other GIS software can). QGIS is powerful, flexible and extremely well-supported by a user/contributor community. While it is not equivalent to the powerful commercial products, it is easily powerful enough to do all of the work of the vast majority of archaeologists will need to handle. Custom user-contributed plug-ins enhance the core package. Because the code is open you (or a paid contractor) could design a plugin to do something highly specialized. Importantly, QGIS is designed to interact with PostGIS databases (see details here). The DBManager plugin also adds the ability to add data to the PostGIS database (see instructions here).

Given all of this, the central argument for using QGIS versus proprietary software is that it is free and open-source. I have frequently heard the argument that open source software does not have the support of commercial software. Although I think the user community of QGIS is well-informed and helpful, you cannot pick up the phone and talk to a customer representative. Getting a response for questions about QGIS can be a lengthy process. However, for those with the funds and the need, you can purchase professional  support for QGIS (see this), which allows you to use an open-source program for all of the ethical reasons discussed previously but with support. QGIS has extensive tutorials and guides as well as workshops.

LibreOffice Base

LibreOffice is an open source office- or productivity- suite. It includes Writer (word processing), Calc (spreadsheets), Impress (presentations), Draw (vector graphics and flowcharts), Base (databases), and Math (formula editing). For users of Microsoft Office, LibreOffice is similar. LibreOffice Base is similar to Microsoft Access and can be used to access, control and utilize a PostGIS database. Indeed LibreOffice Base was built around PostgreSQL making it a great tool. Initially, I began using Microsoft Access to organize my relational data. Moving to LibreOffice Base seemed to be perfect. However, more and more I interact with the PostGIS data via QGIS because it recognizes spatial data better.