This blog was originally posted on the Azavea Blog. This is the first part of a two part series written by Eugene Cheipesh on February 2nd, 2017
The GeoTrellis team recently had an opportunity to benchmark GeoMesa and GeoWave, two big data projects aimed at providing distributed persistence and analysis for geospatial applications based on Apache Accumulo. We used GeoDocker, a collection of Docker containers, to overcome challenges associated with developing, testing, and deploying these projects.
Benchmarking Big Data Projects
As is often the case with applications of this sort (multiple interdependent systems which, in production, live on different machines), a fully functioning cluster is presupposed for even basic operations. For instance, Accumulo requires both HDFS and ZooKeeper to be configured and running to even initialize. Maintaining a local installation of these resources is guaranteed to introduce unnecessary pain. It increases complexity, slows down the development workflow, and introduces opportunities for environment incompatibilities.
Why use Docker?
Using Docker allows us to sidestep these difficulties by encapsulating configuration and scripting changes to state. This can ensure a consistent and predictable state that can be versioned and shared. Because these components are designed around the socket interface, they provide a natural boundary on which we can decompose their dependency.
Thus, the expectation is that each Docker container will listen on a network port and possibly communicate with other containers over their socket. Currently the existing GeoDocker images provide a “good enough” state to start development and can be extended and and re-used for deployment.
GeoDocker Accumulo docker-compose in action
- Environment variables passed to the container for minimally required configurations.
- Volume mounting configuration files when they are available from existing shared resources.
Keep an eye out for Part 2…
Next we need to see exactly how we can use these Docker containers to deploy our application on top of AWS. In part 2, we’ll look at the tricks we used to overcome the problem of resource discovery on AWS.
As many of you know, LocationTech has been working closely with Bay Geo to organize the one of the biggest cross community events in our industry. Called CalGIS/LocationCon 2017, this event is bringing together the very best in open source and proprietary geospatial technology. The event will be held in Oakland, California on May 22 - 25, 2017 at the Oakland Convention Center.
I would to encourage everyone on this list to submit papers and workshops. This is a fabulous way to demonstrate your open source tools to a new audience. I will have space set aside for a LocationTech code sprint. It would be amazing if we could get lots of projects doing workshops and demos, as well as traditional talks.
I also would like to encourage all of our member companies to support this new hybrid community event and format via sponsorship. As the home of commercially-friendly open source software projects it would be great to show the traditional GIS community what we have in our working group.
Let me know if you have any questions about the event or need help during the submission process.
Please join us in welcoming Thea Aldrich, whom the Eclipse Foundation just hired to serve as a Developer Advocate for LocationTech. Thea works out of Austin, Texas, and you'll likely get a chance to meet her at some of our upcoming events.