Mathlas repository

geo package

The geo package contains modules useful for dealing with geographical information.

coord_systems

The coord_systems module contains a routine for converting coordinates between EPSG coordinate systems. There is also a table with well-known EPSG coordinate system IDs and some of their common names.

distance

The distance module contains a routine to compute the ellipsoidal Earth distance (in km) between two points given by their EPSG:4326/WGS84 coordinates. The approximation is considered valid for distances below 475km.

cadastre_spain

The cadastre_spain module contains routines for automatically fetching (if correctly configured) and querying the information contained in the spanish land registry. It also contains a fuzzy address matcher for converting approximate addresses into coordinates based on their Levenshtein distance1 to well-known street names (municipality and province names must exactly match those in the cadastre).

The spanish cadastre does not include information from the Basque Country or Navarre. If you need those you might want to have a look at CartoCiudad, which includes all parts of Spain and has more detailed information on street names and postal codes (amongst other recent improvements).

More information on how the cadastre information is stored in their publicly-accessible site can be found here.

The following code fetches and prints the list of provinces in the spanish cadastre by relying solely on locally available information:

from mathlas.geo.cadastre_spain import SpanishCadastreFetcher
from mathlas.geo.cadastre_spain import SpanishCadastreLocationFetcher

# Let's print the list of provinces considered in the spanish cadastre like so:
cadastre_data = SpanishCadastreFetcher(online=False)
print('Provinces:')
for province in cadastre_data.get_provinces(force_offline=True):
    print('\t* {}'.format(province.name))

The following code fetches and prints ten municipalities in the province of Madrid from the spanish cadastre:

# Print the first 10 municipalities in Madrid:
municipalities = cadastre_data.get_municipalities('Madrid', force_offline=True)
print('First 10 municipalities in Madrid:')
for municipality in list(municipalities)[:10]:
    print('\t* {}'.format(municipality.name))

Let's now try to find the EPSG:4326 coordinates of the Mathlas office in Madrid:

# Province and municipality names must exactly match (minus case and
# initial and final spaces) those in the cadastre data, but the street
# name and number can differ a little, since we're performing a fuzzy
# match.
# Let's first look for the street name and number with an exact match
f = SpanishCadastreLocationFetcher(online=False)
coords = f.coords('Madrid', 'Madrid', 'Paseo de las Delicias', 30, verbose=True)
print(coords)

The code will find an exact match at the Mathlas office location. The cadastre data corresponds to lots, not street numbers, so the marker will be positioned inside the building when plotted on a map.

Addresses tend to be hand-input and usually contain mistakes. Our algorithm tries to fix that:

# This will still find the correct location
coords = f.coords('Madrid', 'Madrid', 'Paseo Deliciss', 30, verbose=True)
print(coords)

Both calls will return the same (correct) location, shown on the map below:

When the quality of the input data degrades, you will start to get false matches. For example, in Madrid we have "Calle Delicias" and "Paseo de las Delicias". The following example fails to find the location of the Mathlas office because the address does not specify if it's referring to "Paseo" or "Calle" and the algorithm chooses "Calle". Also, the code will warn you (in the console) that it could not find number 30 in "Calle Delicias" and provides the location for the number 31.

coords = f.coords('Madrid', 'Madrid', 'Delicias', 30, verbose=True)
print(coords)

This results in an incorrect location. In this case the location is near the real location but that's just a coincidence:

References