What we're building
We're going to track vessels as they approach and berth at major ocean terminals around the US. We will show a list of US ports, the ocean terminals that make up that port, and the vessels that are currently at each terminal. For those that are not familiar, it is common that port complexes are made up of many ocean terminals. Typically, on the east coast you will see that a single entity owns and operates the terminals that make up a port (think South Carolina Ports Authority, Georgia Ports). On the west coast (think Port of Los Angeles), often you will see many different MTOs (Marine Terminal Operators) operating their own terminal which together make up the larger port complex.
Why are we building this?
The main motivation for behind this tool is to show off some of the functionality that Deliberate API provides. In this case, given the latititide and longitude of some trackable entity (cargo vessel) we are able to find geographical sets of interest (ocean terminals) in close proximity to that entity. Here you will see the query used against Deliberate API to find the sets of interest.
curl -X POST https://dev-jeb.com/deliberate/api/external/v1/query \
-H "Authorization: Bearer $DELIBERATE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"where": {
"tags": {"contains": "ocean_terminal"},
"spatial": {
"within_radius": {
"lat": <SHIP_LAT>,
"lng": <SHIP_LNG>,
"radius_meters": <RADIUS_METERS>
}
}
},
"limit": 10
}'It is not hard to imagine following this same pattern to track other entites of interest. Any party that has access to the real-time location of a trackable entity could use this same pattern to build a similar tool.
Problem 1: Identifying US Ocean Ports and Terminals
This was/is the most time intensive part of the project. There is no public dataset in existence that accurately represents the geographic boundaries of all US terminals and the Ports they are part of. This is actually one of the first problems I wanted to solve with Deliberate API, well before building this tool. I wanted to build the most complete and accurate Port/Terminal dataset. Why? Becuase I work in the international freight software space and have an immense respect and interest in the industry... I love talking freight. So, I am constantly working to extend Deliberate API's high fidelity US Port/Terminal dataset. Here is an example of a port complex represented within Deliberate API. We will use the ATLAS tool available through the Deliberate API dashboard to visualize this set.
You can see that the SC Port Complex is defined with 12,897 h3 indexes. This is a great example showing the idea of multiple terminals being part of a single port complex. Not pictured here but still captured by Deliberate API are the two inland ports that make up the SC port complex. Inland Port Dillon and Inland Port Greer. These two rail terminals are serviced by CSX and Norfolk Southern respectively and provide a critical link between inland and ocean logistical hubs.
Now that Deliberate API supports a good number of Ports we can use them to track vessels arriving and departing. One of the first things we do in our vessel tracker is build a map that identifies the ports Deliberate API supports and the terminals they contain. First we use this query to find all the ports Deliberate API supports.
curl -X POST https://dev-jeb.com/deliberate/api/external/v1/query \
-H "Authorization: Bearer $DELIBERATE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"where": {
"tags": {"contains": "sea_port"}
},
"limit": 100
}'Next, we can loop through each port returned above and find all the sets tagged as an ocean_terminal that are a subset of the port. You will see we use the relationship functionality of Deliberate API to do this.
curl -X POST https://dev-jeb.com/deliberate/api/external/v1/query \
-H "Authorization: Bearer $DELIBERATE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"where": {
"tags": {"contains": "ocean_terminal"},
"relationships": {
"type": "SUBSET",
"direction": "outbound",
"to": {
"properties": {
"name": {"eq": "Port of South Carolina"}
}
}
}
}
}'This map will allow us to drive the basic UI functionality of the vessel tracker. We can show the ports we support and know which terminals are part of each port (This is useful in it of itself). The next step is to subscribe to publicly availble AIS data streams and start capturing vessel positions. This part is not very interesting and pretty straight forward. We will move ahead to what we do once we have vessel positions.
Vessel Proximity Detection
Deliberate API supports a spatial query that, given a latitude, longitude and radius in meters, will return all the sets within the radius. We can layer a filter on top of this to further refine the results. For our use case we will filter the results to only return sets that are tagged as ocean_terminals. The query will look like this:
curl -X POST https://dev-jeb.com/deliberate/api/external/v1/query \
-H "Authorization: Bearer $DELIBERATE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"where": {
"tags": {"contains": "ocean_terminal"},
"spatial": {
"within_radius": {
"lat": <SHIP_LAT>,
"lng": <SHIP_LNG>,
"radius_meters": <RADIUS_METERS>
}
}
},
"limit": 10
}'So how is this proximity logic implemented? First, you need to know that all sets in Deliberate API are represented as collections of H3 indexes. So when we want to find all the sets within a radius of some point, we are really asking: "What H3 indexes are within RADIUS_METERS of this point, and what sets contain any of those H3 indexes?". Lets walk through some examples below to show my reasoning.
You will notice there are three sets drawn out in this screenshot (Orange, Purple, and Blue). You will also notice a red point, this is the point of interest (POI), and in our case the vessel's current position. The green cell is an important cell and what we will call the containing cell (CC). The CC is found by indexing the POI at a given resolution R. This resolution R is important as it is how we bring RADIUS_METERS into the equation. To find R we ask the question "At what resolution is the side length of the hexagon equal to RADIUS_METERS (or closest to it)?". By definition and represented in the screenshot by L we know that the side length is equal to the distance from the center of the hexagon to any vertex. So we have generated a containing cell with roughly a radius of RADIUS_METERS.
Example 1: Orange Set
The orange set is made up of one h3 index at a resolution smaller (less granular) then the CC generated. Notice how the CC is wholly contained by the orange set. Therefore, the expected outcome is that the orange set is returned from the query.
- We are given
SHIP_LAT,SHIP_LNGandRADIUS_METERS. First we find R by asking the question "At what resolution is the side length of the hexagon equal toRADIUS_METERS(or closest to it)?" - We then index the POI at resolution R to get the CC.
- We generate a list of all of the parent/children of the CC at resolution range(MIN_H3_RESOLUTION, MAX_H3_RESOLUTION). Notice how this list will contain the cell that makes up the orange set as it is a parent of the CC.
- Deliberate API then queries the database for all the sets that contain any of the cells in the list.
- Deliberate API will filter down the list to contain sets that only meet the filters passed in the query.
- The final list of sets is returned to the caller.
Example 2: Purple Set
The purple set is made up of five h3 indexes (the green color covers up the purple, i.e. the green cell would be purple if it were not our containing cell). Notice how the CC is contained by the purple set. Therefore, the expected outcome is that the purple set is returned from the query.
- We are given
SHIP_LAT,SHIP_LNGandRADIUS_METERS. First we find R by asking the question "At what resolution is the side length of the hexagon equal toRADIUS_METERS(or closest to it)?" - We then index the POI at resolution R to get the CC.
- We generate a list of all of the parent/children of the CC at resolution range(MIN_H3_RESOLUTION, MAX_H3_RESOLUTION). Notice how this list will contain the CC which in the case happens to be in the purple set.
- Deliberate API then queries the database for all the sets that contain any of the cells in the list.
- Deliberate API will filter down the list to contain sets that only meet the filters passed in the query.
- The final list of sets is returned to the caller.
Example 3: Blue Set
The blue set is made up of three h3 indexes that are indexed at a higher (more granular) resolution than the CC generated. Notice how the CC is not contained by the blue set. However, part of the blue set does fall within the containing cell and is therefore within RADIUS_METERS of the POI (you might at this point think "well no that is not always true" and you would be right. We will talk about some tradeoffs around this later"). Therefore, the expected outcome is that the blue set is returned from the query.
- We are given
SHIP_LAT,SHIP_LNGandRADIUS_METERS. First we find R by asking the question "At what resolution is the side length of the hexagon equal toRADIUS_METERS(or closest to it)?" - We then index the POI at resolution R to get the CC.
- We generate a list of all of the parent/children of the CC at resolution range(MIN_H3_RESOLUTION, MAX_H3_RESOLUTION). In this case when we generate all the children of the CC the one blue cell that fell into the CC will be included in the list.
- Deliberate API then queries the database for all the sets that contain any of the cells in the list.
- Deliberate API will filter down the list to contain sets that only meet the filters passed in the query.
- The final list of sets is returned to the caller.
Discussion
We have walked through three situations that could arise and seen that our logic does produce the desired outcome. However, there are some tradeoffs that we should discuss.
- When you index the POI at a resolution R it returns the containing cell that contains the POI. This does not mean the POI fell at the exact center of the cell. The worst case here is the POI fell on the edge of the cell, for example the black dot you see in the screenshot above. If we went through the process we did for each of the examples above we would still return the blue set. We are wrong to do so as technically the blue set is not within
RADIUS_METERSof the POI. In fact the POI roughlyRADIUS_METERS+ 1/3 *Laway from the blue set. This is an inaccuracy I have accepted. I have not at the moment found a better way to handle proximity checking in a way that keeps query times reasonable. - Notice above when talking about finding R we always included the "or closest to it" language. There are fixed resolutions in the H3 world (0-15) and each of these resolutions has a set side length. So really
RADIUS_METERSshould only be allowed to take on values that are valid side lengths at the different H3 resolutions. This is an upgrade we will be making in the future. This page contains a list of valid side lengths. - This solution does not scale well when we get to larger radii. This is due to us needing to generate a list of all the parent/children of the CC at resolution range(MIN_H3_RESOLUTION, MAX_H3_RESOLUTION). At a large radius this list can get really big really fast. For exmaple I ran the query below against Deliberate API and killed my fargate instance with an Out Of Memory error. This parent/child list would have contained roughly 330 million cells. Safe to say I will be capping the radius to something reasonable before I publish this post lol.json
{ "where": { "tags": { "contains": "ocean_terminal" }, "spatial": { "within_radius": { "lat": 30.2345, "lng": -72.1234, "radius_meters": 10000 } } }, "limit": 10 }
So clearly this solution has some very real limitations. However, for reasonable radii it is very fast and fits well with the Deliberate API architecture. For the moment I am willing to take the hit on accuracy and large search radii to get the benefit of a fast and easy to use solution.