GeoJSON is an open standard for describing geographical features. It supports the following data types:
Point
: an array of coordinates describing a single position.LineString
: an array of two or more positions.Polygon
: an array of LinearRing
coordinate arrays.Multipoint
: an array of positions.MultiLineString
: an array of LineStrings.MultiPolygon
: an array of Polygon coordinate arrays.A LinearRing
is a closed LineString
with four or more positions where the first and last positions are equivalent.
Geometries can be grouped together within a GeometryCollection
object. Geometries with additional information are called Features
. A group of Features
is called a FeatureCollection
.
The basic structure of a GeoJSON object is thus
{
"type": "FeatureCollection",
"features": [{
"type": "Feature",
"geometry": {
"type": "Point",
"coordinates": "[0.0, 0.0]"
},
"properties": {
"prop0": "value0",
"prop1": "value1"
}
}]
}
A GeoJSON object may also have the following attributes:
csr
: Coordinate Reference System. The default CRS uses the WGS84 datum, with longitude and latitude units of decimal degrees. The CRS should only be described in the top-most level of a GeoJSON object, and may not be overridden by children objects, i.e. all coordinates within a single GeoJSON object must be consistent. Note that CRS may be removed from the GeoJSON specification.bbox
: Bounding Box. Describes information on the coordinate range of a GeoJSON object. It has 2^N values, where N is the dimension of the coordinate system.TopoJSON is a topological geospatial data interchange format based on GeoJSON. A TopoJSON topology represents one or more geometries that share sequences of positions called arcs. It supports the same geometry types as GeoJSON and may contain additional information describing non-geographical data. For example,
{
"type": "Topology",
"objects": {
"example": {
"type": "GeometryCollection",
"geometries": [{
"type": "Point",
"coordinates": "[0.0, 0.0]",
"properties": {
"prop0": "value0",
"prop1": "value1"
}
}]
}
},
"arcs": [
[[102, 0], [103, 1], [104, 0], [105, 1]],
[[100, 0], [101, 0], [101, 1], [100, 1], [100, 0]]
]
}
TopoJSON consists of a single topology object, with multiple named geometry objects. It must have an arcs
attribute whose value is an array of arcs, and may have transform
and/or bbox
attributes. The transform
attribute allows for more efficient serialisation by representing positions as integers rather than floats.
To transform from a quantized position to an absolute position:
This walkthrough of two popular python libraries for GIS gives a good overview of what data formats are supported; from Shapefiles to GeoJSON. Shapely does manipulating and analysing data, whereas Fiona performs reading and writing of data formats. Since Shapely is based on GEOS, it is efficient and supports many operations.
Other python libraries:
There is even a GeoJSON MIME type available!
MapBox are actively developing Geobuf which defines a .proto
file describing TopoJSON and GeoJSON objects. Their read-me provides some benchmarks. Interesting discussions are available in their issue tracker. Note that the Geobuf encoding schema is not yet stable.
An alternative source of inspiration is the PBF Format from OpenStreetMap.
Granted, if one only wanted to support basic GeoJSON data, the interface may be simplified. For example,
package geojson;
message FeatureCollection {
repeated Feature features = 2;
}
message Feature {
required Geometry geometry = 2;
optional string properties = 3;
optional string id = 4;
}
message GeometryCollection {
repeated Geometry geometries = 2;
}
message Geometry {
required GeometryType type = 2;
repeated double coordinates = 4;
}
enum GeometryType {
POINT = 0;
MULTIPOINT = 1;
LINESTRING = 2;
MULTILINESTRING = 3;
POLYGON = 4;
MULTIPOLYGON = 5;
GEOMETRYCOLLECTION = 6;
}
Further optimisations can (and probably should) be made, such as:
properties
object and opting to encode using the Value
message as in Mapbox’s implementation. Serializing to a string keeps the interface inuitive, but does not provide the best encoding possible.id
in the Feature
message could also be stored as a sint64
if the ID is itself an integer (which is common). Then, better compression can be acheived.Header
message to store dimension
, precision
and transform
data. This means coordinates can be stored as integers for better compression.