Aggregating Mobility Data to Protect Privacy

Jascha Franklin-Hodge

Former Chief Information Officer, City of Boston

Remix shows the complete transportation picture, including aggregated trips taken using new mobility options.

Why we aggregate data

Remix uses anonymized mobility data to help cities manage public space and plan improvements to a city’s transportation system to maximize safety and equitability. But even anonymized mobility data can be sensitive. Individual trip records could be re-identified by combining anonymized data with other sources of information (such as a database of addresses) or through analysis of travel patterns over time.

Remix’s platform is designed to help cities gain insights about vehicles, fleets, and travel patterns, not to track individuals. Remix uses data aggregation so that the insights we show to our city customers do not reveal the travel behavior of individual people.

Read more about how Remix uses data to help cities manage mobility.

What is data aggregation

Data aggregation takes information about individual trips and combines them together into data that allows cities to answer important questions without revealing any single person’s activity.

Because aggregated data combines multiple individual trips taken by multiple people into a single number, it becomes hard or impossible to extract information about a specific person’s mobility patterns.

But to perform data aggregation, you typically need to start with data about individual trips. You also need an idea of how you want to use the aggregated data. For example, the data aggregation you’d need to show the number of trips taken between two neighborhoods by time of day is different from the aggregation you’d need to show where companies are deploying vehicles each morning.

Once you identify the ways you want to use data, you can process input data about individual trips and vehicles movements to produce the aggregated data outputs that you need. If your use cases change, you may need to reprocess input data to produce new aggregations suitable for the new purpose.

How we aggregate data

Remix does data aggregation on behalf of our customers and partners. The tools we provide to cities only show maps, reports, and statistics that combine multiple trips together. These are designed to prevent someone’s private travel behavior being revealed. We also provide insights from data that does not contain location information, such as vehicle status history.

Our approach to aggregation starts by creating separate environments for different types of data.

data architecture
Data is stored in an isolated environment until it is aggregated.

Trip and vehicle data is imported from mobility providers using the Mobility Data Specification Provider API. This data is stored in an isolated, secured computing environment. Just as a bank improves security by handling large amounts of cash inside a special secured room, we handle our most sensitive data in a specially protected set of servers and databases.

Although all of Remix’s technology systems are heavily secured, access to the isolated environment is the most tightly restricted due to the sensitive nature of individual trip data. The systems in the isolated environment have limited connectivity to the internet, only a small subset of Remix’s technical staff are allowed access, and activity on these systems is logged and can be audited.

Inside the isolated environment, aggregation processes are run that turn individual trip and vehicle records into the data aggregations that are used in the products we build for our clients. These data aggregations, which pose minimal risk of re-identification, are pushed into the application data environment where they can be accessed by our clients as maps, reports, and statistics. Individual trip data never leaves the isolated environment and our clients do not have direct access to data about individual trips through the Remix user interface.

Why trip data is important

As we develop new features or our customers and partners request new insights from the data, Remix can update the aggregation processes that live in the isolated environment. Because we retain some historical trip and vehicle data (subject to data retention policies and our contracts with cities), we are able to run the aggregation processes again to produce these new insights, looking backwards and forwards.

Were cities to receive aggregated data directly from mobility providers, it would limit their ability to ask new questions of data as city policy and provider business models evolve. For example, a city that agreed to receive data aggregated only at the neighborhood level might not be able to evaluate mobility to and from transit stations within that neighborhood without negotiating with providers for new aggregate data. With the rapid pace of change in urban mobility, cities need the flexibility to quickly use data in new ways as new challenges and questions emerge.

Cities understandably want independent verification and the option to aggregate data themselves, especially when it pertains to the enforcement of permit requirements. For example, cities may want to be able to independently calculate whether companies are complying with equity rules or are removing broken vehicles from the street in a timely fashion. Disaggregated trip data has the potential to be independently verified and audited, while pre-aggregated data generally cannot.

Remix allows cities to get the benefits of trip data while protecting individual privacy through aggregation and a secure technology infrastructure.

Securing data

Remix uses a variety of techniques, policies, and tools to ensure its systems are secure. You can read about these at https://remix.com/security. Taken together these comprise a “defense in depth” approach designed to minimize the risk of unauthorized access to or exploitation of data or systems.