GeoJSON is an open standard for describing geographical features. It supports the following data types:
Point
: an array of coordinates describing a single position.LineString
: an array of two or more positions.Polygon
: an array ofLinearRing
coordinate arrays.Multipoint
: an array of positions.MultiLineString
: an array of LineStrings.MultiPolygon
: an array of Polygon coordinate arrays.
A LinearRing
is a closed LineString
with four or more positions where the first and last
positions are equivalent.
Geometries can be grouped together within a GeometryCollection
object. Geometries with additional
information are called Features
. A group of Features
is called a FeatureCollection
.
The basic structure of a GeoJSON object is thus
{
"type": "FeatureCollection",
"features": [{
"type": "Feature",
"geometry": {
"type": "Point",
"coordinates": "[0.0, 0.0]"
},
"properties": {
"prop0": "value0",
"prop1": "value1"
}
}]
}
A GeoJSON object may also have the following attributes:
csr
: Coordinate Reference System. The default CRS uses the WGS84 datum, with longitude and latitude units of decimal degrees. The CRS should only be described in the top-most level of a GeoJSON object, and may not be overridden by children objects, i.e. all coordinates within a single GeoJSON object must be consistent. Note that CRS may be removed from the GeoJSON specification.bbox
: Bounding Box. Describes information on the coordinate range of a GeoJSON object. It has 2^N values, where N is the dimension of the coordinate system.
TopoJSON
TopoJSON is a topological geospatial data interchange format based on GeoJSON. A TopoJSON topology represents one or more geometries that share sequences of positions called arcs. It supports the same geometry types as GeoJSON and may contain additional information describing non-geographical data. For example,
{
"type": "Topology",
"objects": {
"example": {
"type": "GeometryCollection",
"geometries": [{
"type": "Point",
"coordinates": "[0.0, 0.0]",
"properties": {
"prop0": "value0",
"prop1": "value1"
}
}]
}
},
"arcs": [
[[102, 0], [103, 1], [104, 0], [105, 1]],
[[100, 0], [101, 0], [101, 1], [100, 1], [100, 0]]
]
}
TopoJSON consists of a single topology object, with multiple named geometry objects. It must
have an arcs
attribute whose value is an array of arcs, and may have transform
and/or
bbox
attributes. The transform
attribute allows for more efficient serialisation by
representing positions as integers rather than floats.
To transform from a quantized position to an absolute position:
- Multiply each quantized position element by the corresponding scale element.
- Add the corresponding translate element.
GeoJSON and Python
This walkthrough of two popular python libraries for GIS gives a good overview of what data formats are supported; from Shapefiles to GeoJSON. Shapely does manipulating and analysing data, whereas Fiona performs reading and writing of data formats. Since Shapely is based on GEOS, it is efficient and supports many operations.
Other python libraries:
There is even a GeoJSON MIME type available!
GeoJSON and Protocol Buffers
MapBox are actively developing Geobuf
which defines a .proto
file describing TopoJSON and GeoJSON objects. Their read-me provides some
benchmarks. Interesting discussions are available in their issue tracker.
Note that the Geobuf encoding schema is not yet stable.
An alternative source of inspiration is the PBF Format from OpenStreetMap.
Granted, if one only wanted to support basic GeoJSON data, the interface may be simplified. For example,
package geojson;
message FeatureCollection {
repeated Feature features = 2;
}
message Feature {
required Geometry geometry = 2;
optional string properties = 3;
optional string id = 4;
}
message GeometryCollection {
repeated Geometry geometries = 2;
}
message Geometry {
required GeometryType type = 2;
repeated double coordinates = 4;
}
enum GeometryType {
POINT = 0;
MULTIPOINT = 1;
LINESTRING = 2;
MULTILINESTRING = 3;
POLYGON = 4;
MULTIPOLYGON = 5;
GEOMETRYCOLLECTION = 6;
}
Further optimisations can (and probably should) be made, such as:
- not JSON serializing the
properties
object and opting to encode using theValue
message as in Mapbox's implementation. Serializing to a string keeps the interface inuitive, but does not provide the best encoding possible. - the
id
in theFeature
message could also be stored as asint64
if the ID is itself an integer (which is common). Then, better compression can be acheived. - the use of a
Header
message to storedimension
,precision
andtransform
data. This means coordinates can be stored as integers for better compression.