Skip to content

cristan/improved-un-locodes

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

523 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

An enhanced UN/LOCODE dataset with significant improvements:

Significantly better coordinates

The main reason this project exists: coordinates in the original UN/LOCODE list have major problems:

1. Only 80% of locations have coordinates

UN/LOCODEs without coordinates don't just include tiny villages, but world's most important cities like Oakland (USOAK), Barcelona (ESBCN) and London (GBLON).

2. Many coordinates are just wrong

Quite a few coordinates have typos (ATWIS), but many are just flat out wrong (EGSCN)

This project aims to solve most of these cases by combining the data with data from OpenStreetMap's Nominatim API and Wikidata. Cases where these disagree have been manually curated.

3. Multiple coordinate formats

Most UN/LOCODES coordinates look like USNYC: 4042N 07400W. However, entries in Bhutan like BTPDL have decimal coordinates: 26.8128N 89.1903E. This project solves this with 2 columns: the Coordinates column now has only the UN/LOCODE style degrees, while the CoordinatesDecimal column has a decimal representation.

CSV with improved locations

All this is solved with code-list-improved.csv. It has corrected coordinates, all in the same format and way more of them (98.6%; including all ports in World Port Index).

Other features

Next to the improved UN/LOCODEs, there are other nifty things in this repo:

A defined hierarchy

An example: DEBHQ (Bahrenfeld), is in Hamburg (DEHAM), but how would you know these are essentially the same place?

For this, parents.csv is created, which looks like this:

Unlocode,Parent
DEBHQ,DEHAM

With this, you can easily find out these are related.

Actually working aliases

It's impossible to find out that both "Vienna" and "Wien" are in fact the same city with UN/LOCODE ATVIE. That is, if you use the offical dataset.

Not so much with aliases-improved.csv, which looks like this:

Unlocode,Alias
ATVIE,Wien
ATVIE,Vienna

This is much more usable than the aliases in the original. Not only because of the improved user-friendlyness, but mostly because of its sheer size. The official dataset has less than 100 aliases, this one has over 670.000.

About UN/LOCODES

The United Nations Code for Trade and Transport Locations is a code list mantained by UNECE (a United Nations agency) to facilitate trade. The list is comes from the UNECE page, released twice a year. However, this dataset is based on datasets/un-locode, which is already much better than the original (e.g. no more encoding problems).

Shameless plug

CargoProbe offers the functionalities of this dataset as an API. This also has additional nifty things: like when you only have the city name "Rotterdam", it's clever enough to return the data of NLRTM instead of USRAJ.

Their main product is essentially track and trace for shipping containers. Check them out if that interests you.

License

UN/LOCODE data

All unlocode data is licensed under the ODC Public Domain Dedication and Licence (PDDL).

Nominatim data

ODbL 1.0. http://osm.org/copyright

Wikidata

CC-0 (No rights reserved)

All other contents in this repo

ODC Public Domain Dedication and Licence (PDDL)

About

UN/LOCODE dataset, but with more and actually reliable coordinates

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages

  • JavaScript 87.2%
  • Python 10.3%
  • Shell 2.2%
  • Makefile 0.3%