GeoTrellis

×

Status message

This proposal has been approved and the GeoTrellis project has been created.
Basics

This proposal is in the Project Proposal Phase (as defined in the Eclipse Development Process) and is written to declare its intent and scope. We solicit additional participation and input from the community. Please add your feedback in the comments section.

Parent Project: 
Background: 

Azavea has been experimenting with techniques for accelerating processing of spatial data for almost 10 years.  Early efforts focused on understanding and overcoming performance bottlenecks in a single type of raster processing activity, a weighted overlay operation that could support geographic prioritization.  After creating an initial prototype application that supported real-time weighted overlay operations for prioritizing residential real estate decisions, we received a small research and development grant under the U.S. Department of Agriculture’s Small Business Innovation Research (SBIR) grant program (#2006-33610-16777).  This seed grant enabled the development of DecisionTree, a software framework for geographic prioritization.  This early framework successfully supported real-time weighted overlay operations at the city and county scale, but the fact that it only supported a single Map Algebra operation limited its utility as a general framework.

In 2010 work on two projects, a sustainable transit web app and an educational game, gave us an opportunity to implement a new framework that would draw on the experience gained with the previous projects but would result in a more generic low latency geospatial data processing framework.  While Map Reduce and its Hadoop implementation were attracting a great deal of attention for distributed processing, we elected to take a different approach, our primary use case was to provide real-time processing for web and mobile applications in which users could manipulate model parameters to generate new spatial data.  The need for responsive and scalable applications required a much lower latency approach.  After evaluating several language and architectural approaches, we selected Scala as the language and the Akka framework to implement an actor model of distributed processing. 

The first project, CommonSpace, was partially funded by the William Penn Foundation.  The result was a prototype framework.  The second block of work, performed for the Stroud Water Research Center under a grant from the National Science Foundation (DRL-0929763) required the implementation of a low latency raster computation system for a large watershed and a broad array of raster processing capabilities in order to create a water education application that could provide a game-like user experience.  Since then, the framework has been used to support a variety of applications, including planning, digital humanities, government infrastructure investment, forest growth simulation and modeling.  Recent work has also extended the framework to support machine learning applications for crime forecasting.  In 2011 Azavea decided to release the new software, now called GeoTrellis on GitHub under the GPLv3 license.  The project is currently integrated into two civic software products (HunchLab and OpenTreeMap) and other web applications.

After LocationTech’s formation in early 2013, we were encouraged to consider submitting GeoTrellis to the Eclipse Foundation. In anticipation of this, we recently released version 0.8.2 under the Apache 2 license.

Scope: 

GeoTrellis is a general framework for low-latency geospatial data processing developed using Scala and Akka. The core GeoTrellis framework provides an ability to process large and small raster data sets with low latency by distributing the computation across multiple threads, cores, CPUs and machines using the actor model of distributed processing.  The software includes the ability to rapidly process and distribute processing of raster data as well as data import and conversion tools for the ARG data structure.

Description: 

The core GeoTrellis framework provides an ability to process large and small data sets with low latency by distributing the computation across multiple threads, cores, CPUs and machines.  The software includes the ability to rapidly process and distribute processing of raster data as well as data import and conversion tools for the ARG data structure.

GeoTrellis is a general framework for low-latency geospatial data processing developed using Scala and Akka.  The goal of the project is to transform user interaction with geospatial data by bringing the power of geospatial analysis to real time, interactive web applications.  It is complementary to other open source geospatial projects such as GeoServer, OpenLayers and PostGIS.  GeoTrellis was designed to solve three core problems, with an initial focus on raster processing: 

  1. Create scalable, high performance geoprocessing web services;
  2. Create distributed geoprocessing services that can act on large data sets; and
  3. Parallelizing geoprocessing operations to take full advantage of multi-core architectures

GeoTrellis spatial data processing is organized into Operations.  Multiple operations can be composed into Models.  Operations include Local, Focal and Zonal operations for raster data, vector-raster conversion and network operations.   A geoprocessing model in GeoTrellis is composed of smaller geoprocessing operations with well-defined inputs and outputs.

GeoTrellis is designed to help a developer create simple, standard REST services that return the results of geoprocessing models.  Like an RDBS that can optimize queries, GeoTrellis will automatically parallelize and optimize geoprocessing models where possible.  In the spirit of the object-functional style of Scala, it is easy to both create new operations and compose new operations with existing operations.

Why Here?: 

Azavea already releases several open source projects, but most are vertical applications that would have less applicability as general frameworks.  GeoTrellis, however, is a general purpose, horizontal framework that we believe can be applied to a variety of scenarios.  We also believe that increased usage in a broad range of use cases will be the best way for GeoTrellis to grow and mature.  The business community provided by the Eclipse Foundation, the presence of a geospatial community in the form of LocationTech, and the stated interest in assembling a set of projects focused on distributed spatial data computation are all factors that have made LocationTech a compelling option.  In the long run, we would like to see GeoTrellis become useful for many more organizations and we believe that LocationTech is the best path in this direction.

Initial Contribution: 

GeoTrellis is currently available on GitHub at https://github.com/geotrellis/geotrellis.  We are considering submitting a second repository, geotrellis-ogc, but it would require some integration and testing before it would be ready and it may be better to defer this to the future.  A third repository, ScalaDocs is generated by a sbt command when the Jenkins build is run.  This generated documentation web site could be hosted by either Eclipse or Azavea.

The copyright is held by Azavea Inc.

Project Scheduling: 

We are working on an important refactor of the code-base that will result in version 0.9.  We would like to complete this work (expected in the next couple of weeks) before submitting the code base for legal review.  Once incubation is complete and the project has been accepted, we would like to move a the pre-1.0 version immediately to LocationTech.

Future Work: 

We feel that the following items are necessary for a 1.0 release version.

  1. Completion of an ongoing refactoring of data sources and operations (targeted for 0.9 release)
  2. Revised documentation to account for current refactor
  3. Fill in missing local, zonal and focal functions.
  4. Implementation of additional raster operations for extended neighborhoods (hydrology, cost distance, interpolation, kriging, viewshed, etc.)
  5. Enhance the actor model distribution system by adding Spark
  6. Support for multi-band rasters
  7. Develop Python and Javascript APIs to ease integration
  8. Implement database integration points for shapefiles (vector data) and PostGIS (vector and raster)
  9. Add examples and improve client integration points

A full list of issues is maintained at https://github.com/geotrellis/geotrellis/issues.

Azavea has recently presented GeoTrellis at FOSS4G North America and other events.  Additional community-building activities are planned for the coming months, including:

  • LocationTech roadshow in Philadelphia
  • Eclipse Day at Google
  • Blogs
  • Sample applications that demonstrate new features.
People
Project Leads: 
Interested Parties: 
  • Azavea
  • Oracle
  • IBM
  • BoundlessGeo