How to Improve Catching a Taxi in New York City (by Hailing Open Data)

Let’s face it: as convenient as the idea of a taxi is, hailing one and the process of making it to your destination can be awful. Taxis can be hard to find. You can get upstreamed. Your driver can change lanes recklessly–provoke angry drivers–and run red lights. Your driver can hassle you about using your credit card, or ask you where you’re going before they pick you up. You can forget your wallet, your keys and your cell phone–and have no idea what vehicle you were in. Worse, when your passenger experience rises to the level of filing a complaint, you have to show up for what’s essentially taxicab jury duty in order for the experience to even hit the driver’s record. In short, taxis are hard to get, and there aren’t that many incentives for promoting good driver behavior.

Why hasn’t technology solved these problems?

In the private sector, the public has found a partial solution to its frustration with Uber, an on-demand taxi-hailing service that can be activated via a customer’s web browser, text messages, or smartphone application. Users can also see the physical location of nearby Uber vehicles, along with an estimate of how long it will be until a taxi can pick them up.

Similarly useful is the fact that once hailed, Uber users can see their taxi driver’s photo (which aids in identification and helps validate that the driver is appropriately licensed), the driver’s vehicle type, and the driver’s average rating. Unfortunately, this information is only available after a car has been ordered, though Uber does seem to respond seriously to poor ratings (I once gave 3/5 stars to a tardy and difficult driver, and I got a personalized message within about forty-eight hours). Once I’ve ordered a car, the second image appears.


(Luckily, I can also immediately cancel it–sorry to get your hopes up, Nader)

For a well-functioning taxi market, the fundamental enabler is information. If we know where all of the city’s empty cabs are at any given time, we could map that data to show citizens the best places to hail them. Moreover, if we can track individual taxis’ pickup and dropoff times from Point A to Point B, and we know how much each trip cost, then we can also rank cabs to determine which ones take the fastest and cheapest routes. We might even figure out which drivers are committing fraud.

Going even further, if users can “check in” to taxis with their smartphones and rate their passenger experience, users will have records of the cabs they took (facilitating lost item retrieval) and more crappy drivers will exit the taxi industry. Put one way, what is stopping us from creating Uber-Yelp for city taxis?

Surprisingly, not a lot. Each New York City taxicab is equipped with a GPS-enabled tracking system that provides data on a taxi’s location, pickups, dropoffs, and fare history. If the data is transmitted in real-time, then all that’s standing in the way of a solution like this (besides some mathematicians, computer programmers, and a supercomputer) are the municipal gatekeepers of this data. If, as a private company, Uber will visualize its proprietary locational data for its customers, then why couldn’t New York City share its taxicab data with its owners–us, the citizens?

In the old model of governance, if it did anything at all, a municipal government would probably spend a jaw-dropping sum for a tardily delivered and semi-reliable IT system. But we are now on the precipice of entering the post-bureaucratic age–what Tim O’Reilly refers to as “government as a platform.” Government doesn’t need to build such a system (though it could), because the private sector is better-resourced and more appropriately equipped to meet this kind of market-driven need. Instead, harnessing the power of the Internet, all government would need to do is find a way to make this information available to the public in machine-readable formats, and the financial incentive to the best developers would take care of the rest.

I was recently able to witness the promise of releasing taxicab data in a February 27 presentation to Beth Noveck’s Government 3.0 course at NYU WagnerJuliana Freire, a computer science professor at NYU Poly’s Visualization and Data Analysis (ViDA) Center, simultaneously provides a summary of both the technical difficulty and the real-world practicality of working with this data. Freire’s presentation relies on a massive set of 2011 taxi fare and trip data provided to her team at NYU Poly and NYU CUSP by the city. By analyzing and visualizing trip data, Friere can identify all sorts of interesting patterns in terms of taxicab routes, some of which point to broader societal insights (e.g., taxicab transportation patterns during Hurricane Irene). If the city was able to develop an application programming interface (API), or software that translated this collected data into a stream that other users could access, the right recipe for a better taxicab experience would be there.

But why should we stop at seeing a year’s worth of old data? To be sure, the privacy and technical hurdles are formidable. With regards to the former, prior to handing off the data, the city went through ample effort to scrub important identifying information from the records. Clearly, the city is right to protect users’ credit card numbers and personal trip histories. In regards to the technical question, I also have no doubt that transmitting 63 gigabytes of data on 170,000,000 annual taxicab trips in real-time presents challenges. But the problem is not overcomeable, as Uber clearly demonstrates on a smaller scale.

Moreover, the problem is becoming less intimidating by the day. Historically, the idea of large datasets is not new (think of the towering number of financial records at a company like Bank of America), and a number of factors are lowering the barriers to using big data. (In a separate presentation, CUSP’s Chief of Staff Michael Holland also discussed why an increase in the number of sensors is also improving the collectibility of big data.) On her end, Freire argues that big data is growing in significance due to increases in the volume of data available, the number of people with access to that data, and the amount of computer processing available to deal with such data.

Ultimately, the idea of opening up New York City’s taxicab data is simply one application of a powerful open data movement now afoot. In speaking on the power of big data, MIT Professor and Human Dynamics Laboratory Director Sandy Pentland (who is, per Edge, “one of the most-cited computer scientists in the world”), offers the following:

“The fact that we can now begin to actually look at the dynamics of social interactions and how they play out, and are not just limited to reasoning about averages like market indices is for me simply astonishing. To be able to see the details of variations in the market and the beginnings of political revolutions, to predict them, and even control them, is definitely a case of Promethean fire. Big Data can be used for good or bad, but either way it brings us to interesting times. We’re going to reinvent what it means to have a human society.”

As a blog dedicated to #crowdlaw, it’s worth noting that the issue of taxicabs isn’t immediately connected to the crowdsourcing of feedback into the legislative process. However, the crowd can be involved in other ways–not least of which is supporting strong open data legislation like the bill scheudled to be implemented here in New York City one week ago. Moreover, this kind of application of open data has the potential to reform the entire taxicab regulatory process. If even one percent of the ~18,000+ riders each taxi carries a year submits a rating, do we still need such a bureaucratic complaint review system? True, on average, about three drivers who share a taxi, but just like with a speed camera ticket, each complaint could be tied to a specific driver–or the three drivers could be jointly responsible for delivering quality customer service.

Everyone should be able to hail a taxicab. We should also all hail open data.


Kevin can be found on Twitter as @kevinmhansen.

Note: This post was initially authored for the blog CrowdLaw and can also be found here.