Why Use Geopackage?

Recently, I rediscovered Geopackage. It’s a long story, but I tried to use them a while ago and it didn’t work so well. But, when I tried to use QField the other day to collect some data as I have done for years, it had -as you may not be surprised to hear- updated automatically. So, some things had changed – mostly for the better and easily adapted to. One of these was that I wasn’t able to load my DEM. It’s a bit of a monster (c. 20 miles long and a couple miles wide with resolution at around one meter- about a 1GB) and I have always used TIFF format (or more precisely GeoTiff). But, QField was now forcing me to convert to GeoPackage. So I shrugged and did it – easily in QGIS, which is where the maps that I use in QField are built.

But, this may have changed my life!

Alright, maybe that’s being a bit dramatic. Here’s the thing. I want to share data and publish it openly. My first experience with this was a bit of a bear (see my data in the Journal of Open Archaeology Data). What’s the problem, you ask? First, the standard file format for vectors (point, line, polygon) is shapefile (Do I hear some “boo”s?). Anyone who has used shapefiles knows that, in order to share shapefiles you need to share a minimum of three files (and usually more). If you have ten layers you would like to share with collaborators this means you need to share around 50 files. That’s just ridiculous! Not only is it an unnecessary hassle that makes it very difficult to collaborate and to version. But, the solution to that particular problem is a GeoJSON file. This solves a number of problems associated with shape files (see these links- 1, 2, and 3– for more information on  other issues with shapefiles). But, this still means that I need to share separate files for each layer, so for five layers, that’s five files. Not too bad, right. By the way, one of the major benefits of a GeoJSON file is that all of the information is internal to the file. That means, for example, I can publish the file online and stream the data (in my case, using QGIS– if you want to try it, you can use my data on charcoal hearths in PA). So, the data can live in an online repository such as Zenodo or Open Context and I can visualize that same data in a GIS program (I recommend QGIS) along with any other layers that live locally. Because the data is stored in a repository, I can rely upon it being consistent and so can my collaborators.

But, that still means that each layer is a separate layer and you cannot use GeoJSON for rasters (that’s not totally true, but it certainly was not designed for it). So, what would work better? How about a file that holds all of your rasters and your vectors AND styles them. That’s what GeoPackage does. It’s actually a “container” for a SQLite database, where each layer is a separate table. Rasters are stored as JPEGs and PNG– JPEGS provide lossless compression and PNGs are used at the edges because they support transparency.

Imagine this. I complete an archaeological project that involves georeferenced historical data, original LiDAR data (e.g., as LAS files), derivatives from the LiDAR (such as DEM, hillshade, slope analysis, etc.), points collected in the field, various polygons (in my case, State Game Lands boundaries, Appalachian Trail boundary, etc.) and lines (historic and modern roads, etc.). I want to archive everything. The way I did this the last time, I archived each file separately. The only link they had was a description (see this) that discussed how each layer was derived and interconnected. But, they still live as distinct, if tenously connected, digital objects. However, Geopackage allows me to bundle all of this together- remember it is a database- into a single package (i.e. file). I can then archive that file and everything REMAINS connected. So much easier for me and for any present or future collaborators and so much better for digital preservation . If I do another project, I can either archive a new Geopackage file or, if is additional research using the same data,  version the old one (retaining all versions, of course).

Lastly, as I mentioned above, it is very important for me to be able to archive data in an online repository AND be able to stream that data to my workstation (in QGIS). I could do this with GeoJSON, so I am a big fan. However, I have not been able to figure out how to do this with GeoPackage, but I’m still investigating.

I would also like to be able to store the files online, stream those to my workstation AND visualize them on the web. There is one tool that seems to be able to do this with Geopackage (see this) that promises to do this. You can use this link to see some a test of some of my data (http://ngageoint.github.io/geopackage-js/?gpkg=http://ironallentownpa.org/Testsmay27a_4326.gpkg ).  Sometimes it does not load (I don’t know why), but even when it loads, it does not seem to support rasters, which is a big problem.

Anyone out there with any thoughts, suggestions or recommendations please comment below!

Juxtapose Test

This is a test of Juxtapose  by the Knight Lab at Northwestern University. The two images below show the present (Jan 2019) compared to an aerial photo from 1938. The furnace, casting house and the “dwelling house” have all been demolished, along with other local buildings. The original buildings were identified using a application for insurance  for the furnace complex (valued at $5500) from 1828.

A quick test of Harvard WorldMap

For a long time, I have been looking for a way to both collaborate and publish geospatial data and map. Harvard WorldMap may be the answer. It is certainly the best thing I have found so far. Although it is based upon GeoNode and you (perhaps with help) could get your own instance up and running, the key to Harvard WorldMap is that it also aggregates maps from other sources.

With Harvard WorldMap, users can upload layers- including vector (points, lines, polygons) and georeferenced raster (e.g., aerial photos or historic maps) layers. Formats are currently limited to shapefiles and GeoTiffs. Once uploaded, the user must add metadata. This is a very good thing and a vital step in the production and sharing of any type of data, but is often difficult or imperfect for geospatial data.

The user can manage who can view, edit and manage the layers. Until it is ready for sharing, the user can keep it private. If they want to collaborate with others, they can allow only those individuals view, edit or edit and manage permission.

Once added, layers can be downloaded in a number of useful formats (Zipped Shapefile, GML 2.0, GML 3.1.1, CSV, Excel, GeoJSON, GeoTIFF, JPEG, PDF, PNG, and KML). Layers can also be streamed to your desktop GIS program (you are using QGIS, right?) via Web Mapping Service (WMS). This means that, to make other layers in your desktop program you can have the same data as all of your collaborators streaming rather than from a file on your computer.

Layers can be aggregated into maps, for which access can also be restricted or not in the same way as layers. You can add your layers, but you can also use their search engine to find layers that are connected to Harvard WorldMap, such as maps from USGS or from ESRI. The selection is not yet amazing, but I was able to find a few maps for my work in Ecuador that I had not found elsewhere.

Vector layers can be styled by changing the marker shape, color, size and label.

This map can then be published. Here’s a test of some data collected by my students and I regarding charcoal production on the Blue Mountain in Pennsylvania. Take a look. Note that you can change the layers (both my uploaded layers and the basemap).