LocationTech GeoWave

Basics
This proposal is in the Project Proposal Phase (as defined in the Eclipse Development Process) and is written to declare its intent and scope. We solicit additional participation and input from the community. Please login and add your feedback in the comments section.
Parent Project: 
Background: 
GeoWave was developed at the National Geospatial-Intelligence Agency (NGA) in collaboration with RadiantBlue Technologies and Booz Allen Hamilton. 
 
The NGA released GeoWave under the Apache 2.0 License on June 9, 2014 with the hope to make it possible for other organizations to benefit from the agency’s development efforts and to reap benefits in innovation, creativity, and the power of a far-reaching community of developers who approach problems from different perspectives.
Scope: 
LocationTech GeoWave leverages the scalability of a distributed key-value store for effective storage, retrieval, and analysis of massive geospatial datasets.  It currently does so by providing plugins to connect GeoTools and PDAL to an Accumulo based data store.  The primary goal of GeoWave is to bridge the gap between popular geospatial projects, the realm of distributed key-value stores, and distributed processing frameworks. Geospatial operations tend to be an afterthought, or do not mesh well with many of these storage and compute capabilities. Through GeoWave we intend to make them first class supported citizens.
 
Explicitly in scope for this project are:
  1. Providing bindings between geospatial toolkits (which don't natively support large scalable data stores), and distributed key-value stores.
    1. Apache Accumulo is the first, and current implementation of this.
    2. HBase is the second system identified for support, due both to it's similarity to Accumulo as well as it's popularity and userbase.  https://ngageoint.github.io/geowave/gsoc2015.html
    3. Other datastores will be evaluated on the criteria of
      1. Scalability / distributed nature
      2. Lack of existing capability
      3. Userbase size and support
      4. New or novel features and capabilities
    4. Data store integration should go beyond simple storage and retrieval of data, and focus on the entire concept of interacting with these datasets in realtime: taking advantage of dynamic server side processing and large scale pre-processing (ie. utilizing concepts like spatial subsampling, distributed rendering, geometry simplification, vector tiles, raster pyramids, statistics, etc.)
  2. Provide bindings between geospatial toolkits and distributed compute frameworks
    1. Map Reduce (under Yarn) and Accumulo Iterators are the current implementations of this.
    2. Spark integration is currently underway for certain algorithms, with a more general level of support to follow
    3. Tie-ins to GeoTrellis should be considered in the future
    4. Other frameworks will be evaluated on the criteria of
      1. Unique or different capabilities
      2. Efficiency
      3. User interactivity and presentation
      4. Userbase size and support
    5. The intent of these systems are to allow people to ask meaningful questions, interact with the data, and develop custom high level questions based on our low level building blocks and components (e.g.  clustering, probabiliy density estimates, hotspot analysis, etc. might be the low level building blocks provided by GeoWave which an end user or outside developer can leverage to answer questions such as what will happen where and when).
  3. Identify the geospatial toolkits to provide bindings for in (1) and (2)
    1. GeoTools / GeoServer is the current primary implementation.
    2. PDAL support has recently been added, and Mapnik support is coming soon
    3. GeoGig support is currently on our backlog, and something we are very, very interested in.
    4. Other tookits will be evaluated on the basis of
      1. Lack of support for distributed systems
      2. Current userbase size and support
      3. General applicability to all the supported instances in (1) and (2)

Design goals for the above include

  • Users of the geospatial systems integrated should be able to operate those systems in a natural manner with as little awareness of the distributed backend as possible.  
  • Users/Developers should be able to opt in / take advantage of the features, capabilities (such as cell level security, etc.), and other aspects of the distributed system if they want to (it just shouldn't be required / sane defaults should be provided when needed)
  • The project should provide a flexible spatio-temporal analytics platform, leveraging third party algorithms as much as possible and providing transparent interoperability (a common index, persistence, and data model, leveraging GeoTools as a commonality where appropriate).
  • Geotools will be used as much as possible as a common geospatial framework to tie the various components (storage, compute, systems) together.
  • The project is intended to easily integrate across language boundaries.  Although the project is written in Java, there are currently routines to generate C++ bindings.

 

Description: 
LocationTech GeoWave leverages the scalability of a distributed key-value store for effective storage, retrieval, and analysis of massive geospatial datasets. 
Currently, GeoWave is an open source set of software that:
  • Adds multi-dimensional indexing capability to Apache Accumulo
  • Adds support for geographic objects and geospatial operators to Apache Accumulo
  • Contains a GeoServer plugin to allow geospatial data in Accumulo to be shared and visualized via OGC standard services
    • Both raster and vector data models are supported
  • Provides Map-Reduce input and output formats for distributed processing and analysis of geospatial data
  • Provides a PDAL plugin for interacting with point cloud data in accumulo through the PDAL library.
In simplified terms, GeoWave currently attempts to do for Accumulo as PostGIS does for PostgreSQL.
 
Work is underway at extending the same capabilities to other distributed key-value stores than Accumulo, as well as to other geospatial frameworks. The next back-end implementation will be HBase, while Mapnik is then next targeted geospatial framework.
Why Here?: 

This project hopes to accelerate the ability of the open source geospatial community to leverage distributed computation and storage.  Geospatial toolkits are currently limited to small sets of data, while the amount of available geospatial data is exploding.  Providing the capability to keep up with this growth is key to allowing the geospatial community to leverage all the exiting work done in the distributed computing arena.

 By providing extendable interfaces the hope is that a community will grow around this challenge and that LocationTech will foster that growth. Many of the projects we would like to work with to support; integrate with; and expand upon are currently location tech projects - so we feel this is an important group to interact with, and an appropriate place to host and develop GeoWave.

Project Scheduling: 
  • March 2015
    • Initial Contribute to Locationtech
    • We expect to move all work on the project from our current public github site to the Locationtech site as soon as possible
  • April 2015
    • Begin transition of issue/ticketing, code review, and CI systems to locationtech approved instances
      • We currently use github issues, github pull requests, travis-ci, jenkins (RPM builds), codecov.io (code coverage reporting), and coverity as online services.  Will need to work with locationtech to mesh this with eclipse requirements/policies.
  • June 2015
    • 1.0 Release Candidate
      • Project is currently 95% feature complete for the 1.0 milestone
      • Additional work to be done in:
        • Packaging/distribution (RPM, DEB, Cloudera Parcel?, etc.)
        • Documentation (~70% complete)
        • Examples ( ~20% complete)
        • Exemplar projects - walkthroughs/blog series using the project to do specific tasks (0% complete)
  • July 2015
    • 1.0 Final release

 

Future Work: 

Future work

  • Distributed and subsampled map request rendering
    • Enable data sets to be rendered which are to large to use tranditional means.
  • HBase support
    • Current GSoC proposal, definitely on the road map
  • Cassandra support
    • Investigation to determine the feasability/supportability of this
  • Maturing an analytics framework (look for opportunities with GeoTrellis)
    • Addressing the issue of both tools for traditional analytics, as well as the more BI type explorations - allowing users to interactively work with large data sets in a geospatial context.
  • Statistics based query planner (cost based query optimization)
    • Required both for standard query planning as well as analytics pipelining, rendering decisions and other optimizations.
  • Statistics based probabilistic results/analytics
    • Similar to the BlinkDB concept - allow for quick returns of imprecise values (if desired) - but with constrained error boundaries.  This goes toward the goals of lettting peopel work interactively with massive sets of data.
  • Mapnik plugin
    • (probably outside of geowave's repository, geowave will be able to generate the bindings, but an external extension that we build we use the bindings to integrate with mapnik)
    • Initial area of investigation is to make this a general geotools interface, which would open this up to more than just geowave;  if that turns out to be non-viable a geowave specific binding would be generated.
  • GeoGig integration
    • We would like to leverage geowave as the geospatial index store, and build an additional distributed graph and object store.  The intent would be to allow for full provenance and history for any geospatial feature in our systems.  Obviously looking to work closely with GeoGig on this.

 

Talks

  • Upcoming presentation at the Accumulo Summit (April 28th and 29th)
Provisioning
Source Repository Type: 
People
Initial Contribution: 

The initial contribution will include all that is currently available here: https://github.com/ngageoint/geowave

It includes:

  • geowave-index: the n-dimensional index library,
  • geowave-store: a key-value storage abstraction that uses the index library for locality preserving keys and a concept of data adapters for persisting a data model in the value
  • geowave-accumulo: an accumulo implementation of the abstract storage
  • geowave-vector: geotools plugins to handle vector data effectively to include the GeoTools DataStore implementation
  • geowave-raster: geotools plugins to handle raster data effectively to include the GeoTools AbstractGridFormat
  • geowave-ingest: an ingest framework with a command-line interface and a generalized engine for local and HDFS data ingest
  • geowave-types: the basic types that support the ingest framework
  • geowave-analytics: a basis for analytics with job definitions for common spatial analytics such as kernel density estimation and clustering
  • geowave-test: a test framework for integration testing
  • geowave-services: RESTful services for convenient interaction with a GeoWave store
  • geowave-client: a java client to interact with those services
  • geowave-utils: standalone convenience methods and applications
  • geowave-benchmark: benchmarking routines for a large generated representative point dataset
  • geowave-examples: basic examples for other developers
  • geowave-deploy: source code compilation and deployment routines including generating a C++ library to access GeoWave functionality with JNI
  • packaging: packaging scripts for scalable deployment
  • docs: ascii-docs which can generate PDF, HTML, man pages, or DocBook

Here's a list of high-level dependencies with license information:

Library Name Version License
Uzaygezen 0.2 Apache 2.0
GeoTools 12.1 LGPL
Junit 4.12 EPL
JTS 1.13 BSD,EPL
Hadoop 2.6.0 Apache 2.0
Accumulo 1.6.1 Apache 2.0
Guava 14.0-rc1 Apache 2.0
Kryo 2.21 New BSD
Avro 1.7.6 Aapche 2.0
ANTLR 3.3 BSD
MRUnit 1.1.0 Apache 2.0
log4j  1.2.17 Aapche 2.0
json-lib 2.4 Apache 2.0
Jersey 2.14 CDDL, GPL
Commons-CLI 1.2 Apache 2.0
Commons-Math 2.1 Apache 2.0
HttpClient 4.3.6 Apache 2.0
Spring-Security 3.1.4.RELEASE Apache 2.0

 

nish gau's picture

Here is the free availability of playstation codes http://codesgen.com/psn . No where on internet you can find free codes and access to playstation  platform.