Digital Data Collection- Work flow

Ok, down to the nitty-gritty. How to actually make this work!

Collecting Data with Kobo Toolbox

All right, I’m going to cop out here a bit, but mainly because the documentation for Kobo Toolbox is quite good. Please see their website.

Installing Postgresql with PostGIS extension

Yeah, again, I’m going to cop out here a bit as well. Eventually, I will include excruciating detail here, but until then… Ideally, you should install PostGIS on a server and use it remotely, but that is probably the single most complex process of this entire system. However, to test the system, I suggest a local install. Of course, you won’t be able to share your data (without risking the security of your computer), but a local installation will get you used to how PostGIS works. I tested much of the below with PostGIS installed on a Digital Ocean droplet. To get started I suggested downloading the OpenGeo Suite, which includes PostGIS and will install a local version. They also have excellent documentation. Admittedly, I needed a great deal of help at this stage. Once again, thanks go to Muhlenberg’s Tim Clarke.

Getting Kobo Toolbox data into QGIS and PostGIS

Once you have data, we need to get it into the PostGIS database (because that is where the sharable relational magic happens). This can be done through three possible tools- QGIS, Libreoffice Base or pgadminIII. Determining which tool/method was the quickest, easiest and most accurate way took many, many hours of tinkering. I haven’t talked much about pgAdminIII, which is the GUI created to work with PostgreSQL databases and, therefore, will work best with PostgreSQL/ PostGIS data (and that also comes in the OpenGeo Suite mentioned above). That doesn’t mean it is the best choice for you or me. QGIS and Libreoffice are designed to operate with a larger number of database types, including PostGIS.

The key to understanding which tool is the most appropriate is remembering that your data is spatial. If you bring data into pgAdminIII or Libreoffice Base, they do not recognize what type of data is in each field (a.k.a. column). You have to specify the type for each column. In a large data table, this can be quite laborious, especially when using PostGIS. However, QGIS is designed to work with spatial data. I found that importing recently-collected Kobo Toolbox data is best done through QGIS. Here’s how it’s done:

Once you have your PostGIS database up and running, start a “New Project” in QGIS. Within QGIS, click on the “Add Delimited Text Layer” button (a). Find the file you downloaded from Kobo Toolbox (labeled below as test_clustergeo_mod.csv) The following shows the resultant display completed:

b

Most importantly, note that, QGIS identifies the X and Y fields as the fields automatically labeled by Kobo Toolbox as “_Location_longitute” and “_Location_latitude.” If QGIS does not identify these columns as the geometry fields, you can do so with the drop down menu. Click “OK.” In the next box, you will need to identify a CRS (Coordinate Reference System). Kobo data is in WGS 84 (EPSG 4326), which is the most common CRS (if you need to you can transform your data to a new CRS later; e.g. into UTM). Although it is not perfect, I encourage the use of WGS 84 because it can be deployed through the web more easily (e.g., via Google Maps, CartoBD, etc.).

QGIS also recognizes the format of many of the other columns. This is incredibly important because certain functions can be done with certain types of data (e.g. only numbers stored as numbers can be used in calculations; only data stored as text can be used in categorical labeling within a map; etc.). To see the format that QGIS identified for each column, right click the new layer and select Layers, then the Fields tab. You should see something like this:

c

Note the many different “Types” of data- QString, int, double. If I bring this database into PostGIS via LibreOffice Base or pgAdminIII, I will need to specify the types for each field. Of course, there are always problems with allowing software to automatically do nearly anything. In the above, “Students/student16” is identified as “QString,” but in reality this is a Boolean field (True or False). In this case, it was collected as a radio button in Kobo Toolbox and identifies whether or not this particular student was involved in the collection of each data point. This can be corrected later, but we do not want to do that yet.

The data is now in QGIS, but still lives in the CSV file. QGIS simply knows where to look to get data so that it can be mapped and the types of data so that they can be used appropriately (e.g., digits formatted as text cannot be used in a calculation), but the data itself does not have this information included. If you were to sent this csv to others, that data would not be included.

We want this data in our PostGIS database so that it can be related to other data.

First, move the CSV data to PostGIS. This is relatively simple with DB Manager in QGIS. First, be sure to establish a connection with your database (see this). Now that you have a connection, you can interact with your PostGIS database with DB Manager. DB Manager is a plugin that is now part of the core download. If it is not apparent, you can always install it as a plug-in. Click on Database–> DB Manager –> DB Manager at the top of the screen.

d

You will see:

e

Expand “PostGIS” by clicking on the +. You should see your database (if not you will need to establish a connection).

Your view should now look something like this:

f

With the layer from your imported CSV highlighted in the Layers Panel in QGIS, click on “Import Layer/ File” (g). Use settings similar to these:

h

Please note that you must identify the primary key as “_uuid” because this is a unique id assigned by Kobo Toolbox. Every table in a relational database must have a primary key that it uses to uniquely identify each record (row). You should not identify a column for geometry because there isn’t actually a single geometry column in the CSV file. QGIS will create it based upon the TWO columns you told it to use as X,Y coordinates and store it in a “geom” column.

Once you click OK, you should see a message that your data was successfully imported.

Although you have imported the data into your PostGIS database, it will not yet appear in your QGIS map. To do so, click on “Add PostGIS Layers” (i). In the subsequent screen, establish a connection (if not already established through the browser, you may need to click on “New”). Then select the newly imported file. Your screen should look similar to this:

j

Once you have selected the appropriate file, click on “Add”. This will add a new layer from your PostGIS database. It should look similar to this:

k

Note that, in the image above, the newly imported PostGIS data (in red)sits directly above the data in the CSV file (in green).

Now that the data is in PostGIS, it is much richer and easier to use because it follows the PostGIS rules about structure and each field is identified as a particular type of information.

Reconnecting Photos, Videos and Audios to CSV

When the data is downloaded from Kobo Toolbox, the CSV does not contain the media files. These need to be reconnected. The media files should not be inserted into a cell within the CSV file (as you can do in Excel)- it cannot be exported as a CSV to be stored/ used in PostGIS (or most other databases). Within the CSV you will need to link to the mediafiles. This means that you will need a location for the media files, which can be as simple as a file location (local or hosted) or a location on the web. Kobo Toolbox provides guidance on how to do this while the photos are hosted on their servers (see here). The following instructions are from their website:

Download your data and open in Microsoft Excel (or LibreOffice Calc or other spreadsheet manager).
In a new column next to the one that contains the filenames of your images, enter the following formula:

  • =HYPERLINK(“https://kc.kobotoolbox.org/attachment/original?media_file=username/attachments/”&A2)

In the formula replace ‘username’ with your own username and ‘A2’ with the first cell reference that contains the filename of your photo. Then copy the formula down for all your records.

It may not be wise to leave the images on Kobo Toolbox, but to a local location or a hosted server. Use the above instructions and modify the URL to point to the location of your images.

Shapefiles into PostGIS via QGIS

Frequently, you will also have shapefiles (ESRI’s open format for vector data) that you will want to put into your PostGIS database as well (e.g., county limits from your state GIS office).

The process to move shapefiles into PostGIS is relatively straightforward as long as you remember that you are “importing” into the PostGIS database. First, open your shapefile in QGIS. Open DB Manager, as discussed above. Be sure that your PostGIS database is open and connected.

dbmanagerimport

Once your PostGIS database is open and connected, click on the “Import” icon.

importshapetopgis

In the “Input” box you should see a list of vectors utilized by the current project. Select the shapefile to be exported. In the “Output table” select the appropriate schema (usually “public”) and name the new file. You shouldn’t need to select any of the other options. Click OK and your shapefile should now be imported into your PostGIS database.

Importantly, one of the most common error messages that I have gotten is the following message.

importshptopgiserror

This error is a result of different file restrictions in shapefiles versus PostGIS. Shapefiles allow data that is both multipart and singlepart in the same file, while PostGIS does not. Multipart data includes two locations that are not connected while single part is only one. Imagine two polygons that are a part of the same piece of property; multipart data would keep the two polygons together as a single record, singlepart would define them as two separate records. PostGIS does not allow the mixing of these two types of geometry within the same file. There are lots of ways to “fix” this situation, but clicking the “Create single-part geometries instead of multi-part” is one of the easiest. Of course, remember that you have disconnected two pieces of data. Depending upon your data this may or may not be important.

Raster files into PostGIS via QGIS DBManager

This section is still under development.