Providing bindings between geospatial toolkits (which don't natively support large scalable data stores), and distributed key-value stores.
- Apache Accumulo is the first, and current implementation of this.
- HBase is the second system identified for support, due both to it's similarity to Accumulo as well as it's popularity and userbase. https://ngageoint.github.io/geowave/gsoc2015.html
Other datastores will be evaluated on the criteria of
- Scalability / distributed nature
- Lack of existing capability
- Userbase size and support
- New or novel features and capabilities
- Data store integration should go beyond simple storage and retrieval of data, and focus on the entire concept of interacting with these datasets in realtime: taking advantage of dynamic server side processing and large scale pre-processing (ie. utilizing concepts like spatial subsampling, distributed rendering, geometry simplification, vector tiles, raster pyramids, statistics, etc.)
Provide bindings between geospatial toolkits and distributed compute frameworks
- Map Reduce (under Yarn) and Accumulo Iterators are the current implementations of this.
- Spark integration is currently underway for certain algorithms, with a more general level of support to follow
- Tie-ins to GeoTrellis should be considered in the future
Other frameworks will be evaluated on the criteria of
- Unique or different capabilities
- User interactivity and presentation
- Userbase size and support
- The intent of these systems are to allow people to ask meaningful questions, interact with the data, and develop custom high level questions based on our low level building blocks and components (e.g. clustering, probabiliy density estimates, hotspot analysis, etc. might be the low level building blocks provided by GeoWave which an end user or outside developer can leverage to answer questions such as what will happen where and when).
Identify the geospatial toolkits to provide bindings for in (1) and (2)
- GeoTools / GeoServer is the current primary implementation.
- PDAL support has recently been added, and Mapnik support is coming soon
- GeoGig support is currently on our backlog, and something we are very, very interested in.
Other tookits will be evaluated on the basis of
- Lack of support for distributed systems
- Current userbase size and support
- General applicability to all the supported instances in (1) and (2)
Design goals for the above include
- Users of the geospatial systems integrated should be able to operate those systems in a natural manner with as little awareness of the distributed backend as possible.
- Users/Developers should be able to opt in / take advantage of the features, capabilities (such as cell level security, etc.), and other aspects of the distributed system if they want to (it just shouldn't be required / sane defaults should be provided when needed)
The project should provide a flexible spatio-temporal analytics platform, leveraging third party algorithms as much as possible and providing transparent interoperability (a common index, persistence, and data model, leveraging GeoTools as a commonality where appropriate).
Geotools will be used as much as possible as a common geospatial framework to tie the various components (storage, compute, systems) together.
The project is intended to easily integrate across language boundaries. Although the project is written in Java, there are currently routines to generate C++ bindings.
Adds multi-dimensional indexing capability to Apache Accumulo
Adds support for geographic objects and geospatial operators to Apache Accumulo
Contains a GeoServer plugin to allow geospatial data in Accumulo to be shared and visualized via OGC standard services
Both raster and vector data models are supported
Provides Map-Reduce input and output formats for distributed processing and analysis of geospatial data
Provides a PDAL plugin for interacting with point cloud data in accumulo through the PDAL library.
This project hopes to accelerate the ability of the open source geospatial community to leverage distributed computation and storage. Geospatial toolkits are currently limited to small sets of data, while the amount of available geospatial data is exploding. Providing the capability to keep up with this growth is key to allowing the geospatial community to leverage all the exiting work done in the distributed computing arena.
By providing extendable interfaces the hope is that a community will grow around this challenge and that LocationTech will foster that growth. Many of the projects we would like to work with to support; integrate with; and expand upon are currently location tech projects - so we feel this is an important group to interact with, and an appropriate place to host and develop GeoWave.
We're hoping the smart people at Eclipse can clarify that...here's an entry for "GeoWave" from USPTO search:
Scientific apparatus and instruments and geodesic apparatus and instruments, namely, oil and gas well downhole survey and measurement equipment for drilling, completing, intervening in and producing fluids from oil, gas, condensate and water wells, well bores and boreholes, seismic instruments and associated surface equipment, in the nature of computers, software and electronic input/output and human interface devices, namely, graphical user interface software, all for processing, acquisition, interpretation of geophysical data for the purposes of underground exploration or extraction and for the processing, interpretation of seismic data for prospecting and extracting hydrocarbon deposits; scientific apparatus and instruments and geodesic apparatus and instruments, namely, vibration generators; scientific apparatus and instruments and geodesic apparatus and instruments, namely, vibration sensors; electrical cables; fibre-optic cables
- Initial Contribute to Locationtech
- We expect to move all work on the project from our current public github site to the Locationtech site as soon as possible
Begin transition of issue/ticketing, code review, and CI systems to locationtech approved instances
- We currently use github issues, github pull requests, travis-ci, jenkins (RPM builds), codecov.io (code coverage reporting), and coverity as online services. Will need to work with locationtech to mesh this with eclipse requirements/policies.
- Begin transition of issue/ticketing, code review, and CI systems to locationtech approved instances
1.0 Release Candidate
- Project is currently 95% feature complete for the 1.0 milestone
Additional work to be done in:
- Packaging/distribution (RPM, DEB, Cloudera Parcel?, etc.)
- Documentation (~70% complete)
- Examples ( ~20% complete)
- Exemplar projects - walkthroughs/blog series using the project to do specific tasks (0% complete)
- 1.0 Release Candidate
- 1.0 Final release
Distributed and subsampled map request rendering
- Enable data sets to be rendered which are to large to use tranditional means.
- Current GSoC proposal, definitely on the road map
- Investigation to determine the feasability/supportability of this
Maturing an analytics framework (look for opportunities with GeoTrellis)
- Addressing the issue of both tools for traditional analytics, as well as the more BI type explorations - allowing users to interactively work with large data sets in a geospatial context.
Statistics based query planner (cost based query optimization)
- Required both for standard query planning as well as analytics pipelining, rendering decisions and other optimizations.
Statistics based probabilistic results/analytics
- Similar to the BlinkDB concept - allow for quick returns of imprecise values (if desired) - but with constrained error boundaries. This goes toward the goals of lettting peopel work interactively with massive sets of data.
- (probably outside of geowave's repository, geowave will be able to generate the bindings, but an external extension that we build we use the bindings to integrate with mapnik)
- Initial area of investigation is to make this a general geotools interface, which would open this up to more than just geowave; if that turns out to be non-viable a geowave specific binding would be generated.
- We would like to leverage geowave as the geospatial index store, and build an additional distributed graph and object store. The intent would be to allow for full provenance and history for any geospatial feature in our systems. Obviously looking to work closely with GeoGig on this.
- Upcoming presentation at the Accumulo Summit (April 28th and 29th)
The initial contribution will include all that is currently available here: https://github.com/ngageoint/geowave
- geowave-index: the n-dimensional index library,
- geowave-store: a key-value storage abstraction that uses the index library for locality preserving keys and a concept of data adapters for persisting a data model in the value
- geowave-accumulo: an accumulo implementation of the abstract storage
- geowave-vector: geotools plugins to handle vector data effectively to include the GeoTools DataStore implementation
- geowave-raster: geotools plugins to handle raster data effectively to include the GeoTools AbstractGridFormat
- geowave-ingest: an ingest framework with a command-line interface and a generalized engine for local and HDFS data ingest
- geowave-types: the basic types that support the ingest framework
- geowave-analytics: a basis for analytics with job definitions for common spatial analytics such as kernel density estimation and clustering
- geowave-test: a test framework for integration testing
- geowave-services: RESTful services for convenient interaction with a GeoWave store
- geowave-client: a java client to interact with those services
- geowave-utils: standalone convenience methods and applications
- geowave-benchmark: benchmarking routines for a large generated representative point dataset
- geowave-examples: basic examples for other developers
- geowave-deploy: source code compilation and deployment routines including generating a C++ library to access GeoWave functionality with JNI
- packaging: packaging scripts for scalable deployment
- docs: ascii-docs which can generate PDF, HTML, man pages, or DocBook
Here's a list of high-level dependencies with license information: