GeoMesa

×

Status message

This proposal has been approved and the GeoMesa project has been created.
Basics

This proposal is in the Project Proposal Phase (as defined in the Eclipse Development Process) and is written to declare its intent and scope. We solicit additional participation and input from the community. Please add your feedback in the comments section.

Parent Project: 
Background: 

As part of an effort to migrate a predictive spatial analytic to a cloud computing environment, we discovered that the supporting spatial storage, querying, and transformation infrastructure that we had come to rely on from PostGIS and Geoserver was largely non-existent in the cloud.  We implemented enough of the spatial foundation as required to support the immediate task of migrating the predictive analytic to a Hadoop/Apache Accumulo operating environment.  At the same time, we were exposed to larger data sets in the new environment and set about implementing a spatio-temporal indexing scheme to incorporate these data sets into our modeling approach.  We built the indexing capability on top of Accumulo and wrapped it in Geotools interfaces to abstract away the details of the data store.  After iterating on the structure of the spatial index for a while, we decided to compare performance with a traditional geospatial stack consisting of PostGIS and Geoserver.  On the Global Database of Events, Language, and Tone, our indexing scheme on Accumulo significantly outperformed PostGIS on cold (non-cached) queries.  We wrote up the design of the indexing scheme and query planning process as well as the results and submitted the paper to the IEEE BigData 2013 conference and it was accepted.  The paper found its way into the hands of OpenGeo who contacted us and have encouraged us to open source the capability.

Scope: 

GeoMesa provides a foundation for storing, querying, and transforming spatio-temporal data in Accumulo.  It implements the relevant Geotools interfaces such as DataStore, FeatureReader, FeatureWriter, etc that expose Accumulo as a data store for Geoserver and other Geotools related projects.  GeoMesa provides Geoserver plugins that facilitate exposing Accumulo as WMS, WFS, and WCS services over HTTP.  GeoMesa integrates with Geoserver and Accumulo's security model to provide cell-level security using native security label access to data. GeoMesa provides parallelized implementations of several spatial algorithms such as nearest neighbors, distance predicates, etc.

Description: 

GeoMesa is a suite of geospatial libraries and tools built on top of Geotools and Accumulo, a column-family oriented distributed database. GeoMesa contains a spatio-temporal indexing structure that enables efficient storage, querying, and transformation capabilities for large spatio-temporal data sets. The spatio-temporal index and querying capabilities are exposed as Geotools DataStores, FeatureSources, FeatureReaders and other relevant Geotools interfaces thus providing transparent access to Accumulo as a data store to users of the Geotools APIs. GeoMesa implements several spatial algorithms and transformation functions in parallel using Accumulo's custom iterator extension capability as well as Map/Reduce jobs for data ingest and processing. GeoMesa has a WFS plugin to Geoserver that allows Geoserver administrators to connect to an Accumulo data source and expose a table as an OGC service over HTTP. Additionally, GeoMesa provides support for rendering raster data stored in Accumulo through an implementation of Geoserver's WMS plugin interface.

Why Here?: 

GeoMesa rounds out the data stores of the Geotools suite of projects by providing a native cloud spatial infrastructure.  We feel that this will open up the distributed processing capabilities of cloud architectures to Geotools and facilitate the integration of very large spatial datasets.  

OpenGeo suggested that we consider open-sourcing GeoMesa through LocationTech and the Eclipse Foundation.  The business friendly approach of the Eclipse Foundation's governance of open-source projects aligns with our interests in seeing GeoMesa reach as broad a community as possible while still remaining viable in commercial efforts.  CCRi's purpose in open-sourcing GeoMesa is to decouple it from the analytics that depend on it, mature and generalize its capabilities, and support its deployment in commercial and government projects both with the OpenGeo suite and as a standalone capability.  Our preference for open-source projects that we use in our own systems are those with governance and commitment and therefore would like to see GeoMesa cross that threshold by becoming part of LocationTech.

Initial Contribution: 

We will provide one multi-module maven project with the following sub-modules:

  1. accumulo-geo-core - core spatio-temporal indexing scheme, implementations of Geotools interfaces
  2. accumulo-geo-utils - GeoHash implementation and associated utilities, shapefile ingest command-line tools
  3. accumulo-geo-plugin - Geoserver WMS and WFS plugin implementations
  4. accumulo-geo-dist - assembly project to collect GeoMesa components into a distribution suitable for deployments

The copyright is held by Commonwealth Computer Research, Inc.

Libraries that we depend on.
  • commons-codec commons-codec (Apache)
  • commons-pool commons-pool (Apache)
  • com.vividsolutions jts (LGPL)
  • javax.transaction jta (Oracle?)
  • joda-time joda-time (Apache)
  • junit junit (CPL v1.0)
  • net.sf.ehcache ehcache-core (Apache)
  • org.apache.accumulo accumulo-core (Apache)
  • org.apache.accumulo accumulo-server (Apache)
  • org.apache.accumulo cloudtrace (Apache)
  • org.apache.hadoop hadoop-core (Apache)
  • org.apache.hadoop zookeeper (Apache)
  • org.apache.thrift libthrift (Apache)
  • org.codehaus.jackson jackson-mapper-asl (Apache)
  • org.geotools (GPL with CLASSPATH exception)
  • org.joda joda-convert (Apache)
  • org.scala-lang jline (Apache)
  • org.scala-lang scala-compiler (Apache)
  • org.scala-lang scala-library (Apache)
  • org.slf4j slf4j-api (Apache)
  • org.slf4j slf4j-log4j12 (Apache)
  • org.specs2 specs2_2.10 (Apache)
  • xerces xercesImpl (Apache)

 

Project Scheduling: 

We would like to contribute the initial source code (as a pre-1.0 version) as soon as the project is approved to be a part of LocationTech.  We feel that the following four items are necessary for a 1.0 release version.

  1. Integrate with Geoserver and Accumulo's security model to provide fine-grained multi-level security access to geospatial data stored in Accumulo.
  2. Implement a guided ingest capability - the index is flexible and can accomodate data sets with different spatial and temporal resolutions.  Users need a guided ingest system that will configure the index appropriately for their data sets.
  3. Implement automatic tuning of the index based on the data set's resolutions and common query parameters.
  4. Support CRS and projections other than 4326.
 
Future Work: 

The following is a suggested roadmap but we would like to poll the community for prioritization.

  • Expand on the existing (E)CQL filtering capabilities.
  • Smooth the on-ramp for developers new to the project. 
  • Mature the support for line and polygon data sets.
  • Implement parallelized algorithms for k-nearest neighbors, measurements, spatial-transformations, etc.
  • Documentation: expand the existing "getting started" documentation into a complete manual, including both user- and developer-level sections.
  • Improve performance of query planning.
  • Consider supporting multiple, simultaneous indexes, with query-planning informed by data-table-specific statistics.

To grow the community, we have scheduled the following activities:

  • Present our paper, "Spatio-temporal Indexing in Non-Relational Distributed Databases", at the IEEE Big Data Conference 2013 in October.
  • Present at the 2013 Virginia GIS Conference sponsored by Virginia Association for Mapping and Land Information Systems (VAMLIS)
We would like to kick off the following activities:
  • Start a google group/mailing list for users and for developers.
  • Participate in the LocationTech roadshow in Washington D.C.
People
Project Leads: 
Interested Parties: 
  • Commonwealth Computer Research, Inc.
  • OpenGeo
  • EOIR
  • Oracle