Digital Data Collection- Justification

Please read the Introduction first.

Why Digital Archaeological Data Collection?

This project begins with five concepts central to the practice of archaeology.

  • Archaeology is a field discipline.
  • Archaeological data is complex, relational and spatial.
  • Archaeology is public scholarship.
  • Archaeological projects are commonly short on resources, especially time, money and personnel.
  • Pedagogy is central to archaeology.

These are in no particular order and should be seen as interlocking and overlapping aspects of the field that interact in an iterative and hermeneutic manner. Let’s highlight the importance of these for a moment and address how they relate to digital data collection.

Field Archaeology

Archaeology is a field discipline. This means that we often work in difficult conditions, whether in the deserts of the American Southwest, the rainforests of Ethiopia, the tangled subterranean world of New York City, along a pipeline in the Northwest Territories or in the musty basement of a museum. We work where it is wet, damp, dry, dusty, muddy, windy, rainy, snowy, extremely hot or cold, buggy and mind boggling combinations thereof. Suffice it to say, conditions are never ideal in the field, which poses significant limitations on how data is collected and, therefore, on the types and quantities of data collected. Survey crews must limit their tool kits to what they can carry. Excavation crews are also limited in what they can deploy in the field. For example, they frequently need to secure tools, artifacts and perhaps the site itself on a daily basis. Throughout this document, important concerns about the effects of these conditions on digital data collection will be addressed. I will note here, however, that, for reasons addressed herein, digital data collection provides significant benefits over paper-based data collection.

The reader will notice that there is no section on “tough laptops” or “ruggedizing your iPad.” I have taken the approach that the archaeologist should choose the tool appropriate to their own preferences and past experiences. Indeed, it is likely that you already HAVE a digital tool that you use in the field, even if it is only a smart phone. I don’t argue which is the best device or devices, nor even offer a preference for operating system, but argue for a data collection tool that is device independent. You can use whatever you have or want! In this case, this means that data collection must be browser-based; most devices, whether it’s driven by Apple, Microsoft, or an Android operating system, have an internet browser. The first reaction that people have to this suggestion is, “An internet browser? Don’t you need internet for it to work?” The answer is a resounding “NO! Not any more.” (Okay, you will eventually need internet, but not during data collection). Because of new features of HTML5, browsers can store forms and data on your device for offline use. This is truly exciting. It means that you aren’t forced into a particular device (or operating system) by the app you choose for data collection. You can test components of this system right now (see discussion of Kobo Toolbox in the Tools section for links). This is fundamentally important. Of course, this doesn’t work if you don’t have a device with a browser that you can take into the field. If you don’t already have a device and are interested in testing the system proposed herein, you can go to the nearest computer store and use their devices to test some aspects of this collection system (Note that Best Buy has a 15 day return policy, which proved useful in testing devices). Similarly, devices (e.g. an inexpensive Android tablet) can be purchased for c. $100 (e.g., I tested out a Samsung Galaxy Tab A 7.0″, which costs c. $130). Of course, for many this cost is not negligible, but data collection in the field, even when on paper forms, is never completely free.

Complexity

Archaeological data is complex, relational and spatial. Arguably, archaeologists deal with “big” data on a daily basis, not in the sense that we have vast quantities of similar data (such as Amazon.com’s customer database), but we deal with distinctly interrelated chunks of data. As such our data is not necessarily objects (artifacts, ecofacts and features), but the relationships between these entities that increase geometrically as raw data increases mathematically. Associations and relationships between artifacts, matrices, soils, ecofacts, features, environment, sites, archaeological cultures, strata, landscape and more are the key components of ALL archaeological work. As such, one of the most important aspects of relationship is location and, perhaps more importantly, relative location. Archaeology is by definition spatial.

Data must be collected in a manner that best recognizes the centrality of relationships, spatial and otherwise. This is where digital data collection can excel. Data is frequently collected in a “flat” form- e.g., paper forms or spreadsheet tables. However, those forms are all interrelated. They can be brought into a relational database. This reconstructs the vital relationships across space, time, artifact type, ecofacts, soils, etc. This is the power of archaeology and of archaeological data.

Public Scholarship

“All archaeology is public archaeology” (e.g. McGimsey 1989:73, and many, many other places). The veracity of this statement is well demonstrated by the “Principles of Archaeological Ethics” of the Society for American Archaeology. The principles revolve around two foci- stewardship (i.e., responsibility to future publics) and responsibilities to concerned communities. Because archaeological remains are non-renewable resources (largely because it is the relationships that are important!), archaeologists are responsible for stewarding archaeological resources, including both those still in the ground as well as notes, artifacts, etc. from excavation, so that they last as long as possible (Principles 1,3,4,5,7,8). We are responsible for consulting with, engaging with and reporting to concerned communities, including, but not exclusively, Native Americans (Principle 2,3,4,6; see WAC ethics as well). We work on material that does not belong to us, is nonrenewable and can directly affect descendant and local communities. Members of the SAA, which at least represents most archaeologists in North America, have agreed to work openly and are responsible to the public (as broad and nonspecific as that may be). Digital tools can be used to promote public archaeology.

To me, the Principles suggest that maximal data should be shared with all concerned parties or even simply to “everyone” via the internet. Exactly how this is done has ethical implications as well. Certainly, sharing the location of “sensitive” sites may result in greater looting or other “impacts,” which places stewardship at odds with our responsibility to reporting to the public. However, masking certain locational data is relatively easy (this has been done on a large scale by DINAA). Beyond location, little other data should be redacted. Yet, little archaeological data is ready to be shared. Besides simply messy data, one of the main roadblock is that data collection procedures need to be explained. This means that metadata is extremely important. For example, sharing a database with a column of data labeled “drills-l” that contains measurements does not mean that anyone can understand these measurements. Even if one interprets that this column is drill length (can you be sure?), it is not clear precisely how the measurement was taken, in what units, with what device or with what precision. The scholar who collected the data likely know this all very well and may even have reported it, but in order to share it, the procedure and the justification behind it should be described in associated metadata. Clarity in data is a responsibility of our stewardship of the archaeological record. If others (archaeologists or otherwise) cannot understand our data 100 years from now, then we have effectively destroyed archaeological evidence. While this can be done in a variety of ways (e.g., with a separate document that includes a description of the process for measuring drills), digital data collection makes retaining the connection between metadata and data easier. For example, when designing a data collection form, explicit instructions can be used- perhaps even a direct quote from a well-known manual- within the form. When a researcher (or perhaps a student) enters the data, they have the explicit instructions in front of them. As long as those instructions remain attached to the data, metadata does not need to be recreated. This benefits archaeologists as well as the public. That is, the desire to, in the end, share archaeological data pushes us to create clearer and better defined and described data. In a similar manner, digital data collection aids collaborative research.

To this end, it should be noted that more and more funding agencies are requiring the open sharing of data- especially those supported by taxes, such as the National Science Foundation and the National Endowment for the Humanities in the US, Higher Education Funding Council for England in the UK and the European Research Council. Digital data collection, therefore, aids in the organization of data so that it can be shared, as required by funders, before it is even collected.

Open source tools! The Principles of Archaeological Ethics do not indicate that tools used by archaeologists should be open source. Although below I suggest that limited resources are an important reason to use open source tools, there are also ethical arguments to do so. In order to share the results of archaeological work, they need to be in a standard, well-supported and open format. Formats that are not open, that are controlled by money-making corporate interests, should be avoided. Proprietary formats are a “black box;” we cannot know exactly what is taking place in these formats. How can we rely upon data when we don’t really know what it looks like? How can we hope to retrieve it in the future? More importantly, data locked within proprietary formats also means that they are difficult to share (even between collaborating archaeologists). If shared in the original proprietary format, receivers will need to purchase software to use- or even view- the data. Sharing data in proprietary formats therefore locks in future users to specific, potentially expensive, software. This is not sharing. It’s like saying I will share this apple with you, but only if you go to the store and buy a special peeler that only works with this type of apple.  Open source software is frequently designed around open formats, which means that, if you understand the code, you know exactly what is happening with the format. Even if the archaeologist does not understand the code, they can find someone who does- as long as it is open. It also means that, if you share your data in open formats, others can read that format in many different tools, some of which are open source, but many proprietary software packages (which many already have on their computer) also support open formats alongside their own proprietary formats. Using open formats allows greater options for accessing and utilizing the data. Many proprietary software packages can also produce open format files, but using the software designed around open formats is preferable. Of course, not all open formats are created equally. Some are extremely well supported by a community of programmers, others are not. In order to satisfy archaeological ethics, the goal of this particular project is to maximize openness and “shareability”. This is best done through the use of open formats, which is best done through the use of open source software built around that format.

Resources

Although I view archaeological ethics as central to the justification for the use of open source software, our overall access to limited resources may be the most immediate and convincing argument. Very few archaeologists would argue that, as a whole, the discipline is awash in cash or that we have access to all of the resources needed to do our work. Data collection, analysis, artifact conservation, public presentations, museum exhibits could always be richer, more elaborate with better funding. Funding is the most significant limitations to archaeological ethics. Open source software can help- no, it’s not a panacea, but it can help. The well-supported open source software advocated herein is frequently as powerful as commercial products (but, often with different components and features) and definitely as powerful as the vast majority of users will ever need. And, of course, it is free! Replacing just a few components of the archaeologist’s digital toolkit can free up funds while satisfying the ethical concerns noted above. Yet, the amount of money saved is not huge (unless you are replacing your proprietary GIS software!) and, for those with large contracts and/or grants, the difference may be minimal.

The intended audience for this project includes archaeologists less at the economic echelon and more at the economic periphery of the discipline; individuals and groups whose access to even some of the most basic and important resources may be restricted, temporarily or long-term.  This group might include  graduate students, post-docs, unemployed archaeologists, non-tenure track faculty, early career faculty (especially at small institutions),  small contract firms and even vocational archaeologists in the process of becoming professional. I hope that this will also benefit indigenous archaeologists and archaeologists in non-Western countries. For those that occupy these statuses (or a combination thereof), resources and the “efficient” use thereof may be particularly important. Although I have experienced some of the more temporal of these statuses (and, obviously, will never be able to experience others) and getting through those times was particularly onerous, I am now in a privileged position and hope that my time, effort and increased access to resources can be leveraged to benefit others. When I was in graduate school and adjuncting, I knew others in similar situations who pirated software ( I never did! Nope, nope, nope) because they saw it as the only way out of this conundrum- needing digital resources (especially software) to do your work and therefore move to a more permanent and stable status, but no way to get them. The tools and workflow proposed herein can help relieve some of that pressure so that these archaeologists can be more effective at being awesome archaeologists.

There is one more factor that compounds the economic impact of open software- innovation. I love to try new toys. When I am faced with a problem that can be addressed with software, I want to be able to try out alternatives. This is difficult with paid software, but incredibly easy with open source software- just read the instructions, download and you are off and running. If you decide you don’t like it, uninstall and try a different one. Of course, purveyors of proprietary software love to offer either a “limited time” trial or a “lite” version that you can take for a spin. That’s fine, but be sure to take the open source software alternatives out for a spin as well before you invest. The ability to test drive software has made it much easier for me to determine which best suits my needs and preferences. This also means that those on the economic periphery of the discipline can make well informed choices. I have gotten to the point where I do my best to refuse to pay for software. I am attempting to replace all paid software with open source software. The process is not complete.

Archaeological Pedagogy

Pedagogy is central to archaeology. We are all teachers of archaeology, whether to students, community groups, government officials, peers outside our discipline, business interests… In our role as stewards, we are responsible for protecting through education. As such, I propose that digital tools are also pedagogical tools. Note that I have written here about the pedagogy of field schools.

Initially, my primary concern with developing a digital data collection was simply to save time, so that I could spend more time with my students on more meaningful endeavors, such as interpretation and interacting with community members, rather than on how to fill out forms, etc. To be honest, I have not yet seen that return (partially because I am still working out the kinks… but that may never end). More importantly, using the digital data collection system proposed herein, I have seen students much more engaged with both the process of data collection and with the epistemology of data itself. I use to simply make a form for the use of the field school- I was the expert, right? Now, students have a much greater role in form creation and modification and as such, they learn about data at a level I had not intended. The tool makes this more possible. Having students participate in the creation of paper forms seems almost ridiculous (possible, but a huge waste of resources, time and paper). Editing a digital form is relatively easy and, thereby allows students to make choices, recognize the limitations imposed by those choices and react and adjust. Not only that, but now student-collected data is much cleaner, because selections are made from drop-down boxes, not written on paper. This saves me a great deal of time by avoiding long hours of cleaning data.

Utilizing open software is a pedagogical decision. Students frequently dislike open software. It is unlike software they have seen. And yet, once they master a new software and assessed its usefulness for their own lives, they have learned not a skill, but a way of looking for solutions and around problems. They become adept at adaptation; at learning new things. This is much more likely when they are able to mess around with open source software. Otherwise, software is simply something they are given without their input. Alternatively, because of the cost and restricted formats, teaching students to solely employ expensive software with proprietary formats is contrary to learning. Teaching these softwares is fine, but teaching (or, really, helping your students exercise) decision making is more important.

McGimsey, Charles

1989 Perceptions of the Past: Public Archaeology and Moss-Bennett – Then and Now. Southeastern Archaeology. 8(1)