A traffic data service for OpenStreetMap

Started: 2025-12-11
Published: 2026-01-11
Last updated: 2026-01-15

I've encouraged hundreds of people to drop Google Maps and switch to freedom-respecting OpenStreetMap applications…and most of them find the lack of one major feature to be a dealbreaker.

This feature is traffic data - or more accurately, speed data. Applications need to know the average speed of traffic along a given road, allowing users to take less-congested routes and to get more realistic timeframes for their journeys.

In December 2025, I started making notes about what a community-owned speed data service might look like. Since I have never worked on such a project before (and couldn't find any prior art), I'm sharing my notes here to get feedback from the Internet. Please send me your comments.

Thanks to ceda_ei, niyabits, disaster2life, lunarequest, Thejesh GN, SphericalKat, Marc_marc[m], Mateusz_Konieczny, and Simon Poole for their feedback!

Can it work?

The most common objection I've heard is that there are too few OSM users for a crowdsourced speed data service to be effective.

My response is always the same - once this is implemented, we can at least tell people that "Yes, OSM applications have traffic data support, and the more the people that use it, the more accurate it will be."

That incentivizes people to use it, which contributes towards fixing the problem. The number of people who outright reject the idea of using OSM will decrease.

Also, it doesn't only have to be OSM clients that contribute to the service. For example, fitness apps like OpenTracks could do the same, contributing speed data for walking, running, hiking, cycling, etc along sidewalks, trails, etc.

Incident reporting

Users should be able to report disruptive incidents like accidents, construction work, other road blocks, and even general traffic congestion.

Users should also be able to include photos and/or videos in their report.

Similar to OSM notes, reports could also be marked as resolved or reopened by other users.

Incident reporting is inherently opt-in, and potentially more privacy-respecting than passive speed data collection. It also inherently requires a network connection and is unsuitable for offline use.

Prior art - closures.osm.ch.

Preventing false input

To prevent false reports, we could require registration. We could piggyback on OSM accounts and use OAuth for authorization. Long-standing OSM contributors with lots of edits may be considered more trustworthy than brand new ones.

We could also allow users to vote on incident reports and resolutions.

Passive speed data collection

This is basically a system for crowdsourcing GPS traces, except it does not require the user to manually start/stop recording.

At its simplest, when a user begins turn-by-turn routing in a client (like CoMaps, OsmAnd, OsmAPP, etc), and the user's device has GPS enabled, the client begins transmitting to the server -

an authentication header
the user's current GPS coordinates,
the timestamp,
the mode of travel,
and other information, like GPS accuracy/error.

This happens at a set interval (e.g. every 1 second). This interval could be user-customizable, or automatically increased or decreased based on connectivity (more frequent on unmetered networks) and power information (more frequent when charging).

The client stops transmitting when the user exits turn-by-turn routing.

Preventing false input

The server needs to verify if the mode of travel chosen by the user is correct, e.g. by checking the movement speed, and the location compared to map data (e.g. riding a bicycle in the middle of a motorway).

An important attack against such a service is a simulated traffic jam, as popularly demonstrated by the artist Simon Weckert. ¹¹ Google Maps Hacks (simonweckert.com) ^,²² An Artist Used 99 Phones to Fake a Google Maps Traffic Jam (Wired.com)

Such an attack could be used to manipulate traffic at massive scales.

To prevent it, we could…

Require contributors to have an OSM account, possibly of a certain age and with a certain amount of activity.
~~Limit the number of devices per contributor~~ - this was my initial idea, but it became unnecessary once the next one occurred to me.
Only allow one client per user to upload data at a time. Thus, mounting such an attack would require a separate account for each phone.

We could also perform validation based on the distance between nearby devices. e.g. multiple cars aren't likely to be on top of each other. That said, GPS inaccuracy hinders the effectiveness of this idea.

Contributor privacy

While authentication is necessary to prevent false input, servers must store the data without personally identifiable information (PII) to protect contributors' privacy. ³³ Does the GDPR apply to this service? If so, what will our obligations be?

Users may opt-in to allowing servers to associate their usernames with the tracks.

Decentralization

If there's only a single centralized server, we get the usual issues of centralization -

The more the users on a single server, the greater the incentive for bad actors (legal or illegal, state or non-state) to attack the server.

An attacker could gain access to the current location, movement, and mode of travel of all data contributors worldwide. The attacker could also perform traffic manipulation by sending false speed data and incident reports to clients.
With a centralized service, you also cannot make certain of what code the server is actually running, or whether it's actually doing what it claims to do or not. (e.g. storing the traces in an anonymized format)

If we have a federated architecture instead, anyone could host a speed data server. When a client queries from any one server, data would also be queried from all servers known and trusted by that server.

But federation also means any server can send false data to other servers, and - because this data is stored in an anonymized format - other servers can't rely on user reputation to validate it.

However, it should still be possible to validate incoming data from other servers, the same way we validate it for users. (Discussed above, in Preventing false input.)

We could also use machine learning (or just plain old static heuristics) to flag or reject suspicious data from servers - e.g. when the project and its contributor community is still small, it's unlikely that a hundred contributors would go through the same road within 5 minutes.

Personally, I'm in favor of decentralization, despite the issues.

Offline use

We can provide dumps of raw track data for offline use.

This would allow e.g. users of offline maps to know the average speed for different modes of transport along different ways. It's not as good as having live speed data information, but it's better than nothing.

Offline data consumers could reduce the storage needed on users' devices, by excluding older data and/or excluding data from certain modes of transport.

Preventing freeloading

Some users may want to use live speed data, but - for privacy reasons - not want to contribute to it themselves.

We can either respect this, and let them query the data without giving back. Or we can try to prevent freeloading, and enforce a system where users contribute to the data they use.

The server could maintain a score for each user. This score decreases when they query the data, and increases when they upload data. If it reaches 0, they can't query anymore.

Each user could be assigned a default starting score, so new users can query the data for some time without uploading anything.

Scalability

While this system would initially have a limited number of users, we must anticipate a situation where it comes to be used by millions of users.

Other uses

This speed data service could potentially also be used to provide real time public transport data, in regions where such data does not exist - or, when it does, is not made public by transport corporations.

I've seen proprietary apps ask users if they are currently on a certain bus route or train route.

When a client with PT routing support (like OsmAnd) is used, it could present the user (when it detects that a vehicle has been boarded, e.g. via movement speed) with a list of possible routes they may have boarded.

If the user answers with a route, that could be used to provide an ETA for other passengers waiting at future stops of that route. Given enough users, you could see the real time location and the ETA of multiple buses/trains in a city.

The server must validate the user's specified route and ensure it matches up with the existing route data.

This could also be used to estimate the occupancy of a bus or train, giving waiting passengers an idea of how full a bus/train is, and whether or not they should wait for the next bus/train instead.

We could also attempt to estimate the current and future occupancy of businesses.

Another potential application for traffic/speed data is to find quiet and uncongested roads. Some users have expressed a desire to look up walking routes along quiet roads (i.e. with low traffic). Some may wish to find quiet areas to conduct events in, or even live in.