(Note: Originally posted here: http://digitalarchaeology.msu.edu/crafting-work-flow-kobo-toolbox-postgis-qgis-libreoffice-base-pgadminiii/ on May 21, 2016)
Having largely decided on what tools to use (see previous posts: here, here and here), ironing out how this process will actually work has been a bit more difficult than hoped.
Two quick reminders. First, the goal of this project is to design (“stitch together” may a better term) tools for data collection and management (and eventually for archiving, etc.) that have relatively low adoption curves for most non-techie users. The primary audiences is for those on the “fringes” (though the fringes may be larger and perhaps more important than the “core”) of archaeology- those who have limited resources, such as graduate students, contingent faculty, faculty in small under-resourced schools, independent scholars, small contract firms, etc. Second, largely because of the first, all tools should be open access and the aim should be open access data (yes, perhaps with some- i.e., location- modified).
Kobo Toolbox and PostGIS form the essential core tools in this process. Kobo Toolbox is an easy to set up online/offline browser-based data collection tool. There really are no other OPEN ACCESS tools comparable to Kobo Toolbox (though there are numerous commercial tools). PostGIS is one (of many) spatial databases. I have chosen this largely because it is well-supported and widely used, but other database types would be useful as well.
Ok, down to the nitty-gritty. How to actually make this work!
First, we need to get data collected using Kobo Toolbox into the PostGIS database (because that is where the relational magic happens). This can be done through all three possible tools- QGIS, Libreoffice Base or pgadminIII. Determining which tool/method was the quickest, easiest and most accurate way took many, many hours of tinkering. I haven’t talked much about pgAdminIII, which is the GUI created to work with PostgreSQL databases and, therefore, will work best with PostgreSQL/ PostGIS data (though that doesn’t mean it is the best choice). QGIS and Libreoffice are designed to operate with a larger number of database types.
The key to understanding which tool is the most appropriate is remembering that your data is spatial. If you bring data into pgAdminIII or Libreoffice Base, they do not recognize what type of data is in each field (a.k.a. column). You have to specify the type for each column. In a large data table, this can be quite laborious, especially when using the PostGIS extension. However, QGIS is designed to work with spatial data. I found that importing recently-collected Kobo Toolbox data is best done through QGIS. Here’s how it’s done:
Once you have your PostGIS database up and running (I needed a friend to help do this, but once it is up and running, you are good to go), start a “New Project” in QGIS. Within QGIS, click on the “Add Delimited Text Layer” button (). The following shows the resultant display completed:
Most importantly, note that, QGIS identifies the X and Y fields as the fields automatically labeled by Kobo Toolbox as “_Location_longitute” and “_Location_latitude.” If QGIS does not identify these columns as the geometry fields, you can do so with the drop down menu. Click “OK.” In the next box, you will need to identify a CRS (Coordinate Reference System). Kobo data is in WGS 84 (EPSG 4326), which is the most common CRS (if you need to you can transform your data to a new CRS later). Although it is not perfect, I encourage the use of WGS 84 because it can be deployed through the web more easily (e.g., via Google Maps, CartoBD, etc.).
Perhaps most importantly, QGIS also recognizes the format of many of the other columns. This is incredibly important because certain functions can be done with certain types of data (e.g. only numbers stored as numbers can be used in calculations; only data stored as text can be used in categorical labeling within a map; etc.). To see the format that QGIS identified for each column, right click the new layer and select Layers, then the Fields tab. You should see this:
Note many different “Types” of data- QString, int, double. If I bring this database into PostGIS via LibreOffice Base or pgAdminIII, I will need to specify the types. Of course, there are always problems with allowing software to automatically do nearly anything. In the above, “Students/student16” is identified as “QString,” but in reality this is a Boolean field (True or False). In this case, it was collected as a radio button in Kobo Toolbox and identifies whether or not this particular student was involved in the collection of each data point. This can be corrected later, but we do not want to do that yet.
The data is now in QGIS, but still lives in the CSV file. QGIS simply knows where to look to get data so that it can be mapped and the types of data so that they can be used appropriately (e.g., digits formatted as text cannot be used in a calculation).
We want this data in our PostGIS database so that it can be related to other data.
First, more the CSV data to PostGIS. This is relatively simple with DB Manager in QGIS. First, be sure to establish a connection with your database (see this link) Now that you have a connection, you can interact with your database. DB Manager is a tool to interact with spatial databases. DB Manager is a plugin that is now part of the core download. If it is not apparent, you can always install it as a plug-in. Click on Database–> DB Manager –> DB Manager at the top of the screen.
You will see:
Expand “PostGIS” by clicking on the +. You should see your database (if not you will need to establish a connection).
Your view should now look something like this:
With the layer from your imported CSV highlighted in the Layers Panel in QGIS, click on “Import Layer/ File” (). Use settings similar to these:
Please note that you must identify the primary key as “_uuid” because this is a unique id assigned by Kobo Toolbox. Every table in a relational database must have a primary key that it uses to uniquely identify each record (row). You should not identify a column for geometry because there isn’t actually a column in the CSV file for this. QGIS will create it based upon the TWO columns you told it to use as X,Y coordinates and store it in a “geom” column.
Once you click OK, you should see a message that your data was successfully imported.
Although you have imported the data into your PostGIS database, it will not yet appear in your QGIS map. To do so, click on “Add PostGIS Layers” (). In the subsequent screen, establish a connection (if not already established through the browser, you may need to click on “New”). Then select the newly imported file. Your screen should look similar to this:
Once you have selected the appropriate file, click on “Add”. This will add a new layer from your PostGIS database. It should look similar to this:
Note that, in the image above, the newly imported PostGIS data (in red)sits directly above the data in the CSV file (in green).
Finally, one should note that, the main difference between the CSV file and the table in the PostGIS database is that the data is defined by type.
However, there are two additional components of a relational database that make this conversion important. First is the ability to establish relationships between tables (which you cannot do with CSV files). Second is the ability to update your data with new information.
Although there is no space (or time) for addressing both of these issues at this point, these are important to remember in the strategy.
In subsequent posts, I will address how to update your data with newly collected information and how to establish relationships. It should be noted that this can be done through the same three tools- LibreOffice Base, QGIS and pgAdminIII.