Monday, May 20, 2013

How do airlines choose by how many customers to overbook flights?

A few years ago, I was returning home from a trip to the Florida Keys, which required two layovers. After my first flight, the airline announced that the next flight was overbooked. A \$500 voucher would be awarded to the costumer that relinquishes his or her seat. Since this was the beginning of my lazy summer before I started graduate school, I jumped at the opportunity and took the \$500 voucher and free hotel room for the night.

Overselling or overbooking is the sale of a volatile good or service in excess of actual capacity. -Wikipedia.

For the next year, the voucher rotted in my inbox until it expired, as I didn't take the opportunity to fly with that airline again. While airlines likely count on a fraction of the vouchers to expire, overbooking can maximize profits even when customers are payed off with these pricey vouchers and hotel rooms.

Consider that a fraction of flyers do not show up in time for their flights due to a delay in their preceding connection flight or to personal circumstances. In anticipation of this, airlines overbook the plane (sell more tickets than capacity) and hope that just the right amount of customers show up to get a full plane.

Let's assume that an airline gives full refunds for flights missed due to personal circumstances, or equivalently for the math, that all missed flights are due to delays in preceding connection flights. Of course, airlines do not charge twice when a customer misses a connection because of a preceding delay and takes the next flight out. With this, an airline receives revenue from a passenger equal to the ticket price only when he or she actually boards the flight. Here, each empty seat is lucidly lost revenue: if the seat is empty, the airline does not receive the revenue from the ticket sale.

Overbooking makes it likely that a flight is full of passengers so the airline receives the most amount of income (seat capacity * ticket price). But, if the airline overbooks too much, it must fork out costly vouchers and hotel rooms to the passengers that get bumped from the flight and give them a seat on another plane, potentially perpetuating the cycle and, most importantly, decreasing the revenue. Obviously, if the airline overbooks too many flights, it is just giving out vouchers. Somewhere in between is the sweet spot that maximizes revenue.

Let's put ourselves in the place of the airline and say the cost (airline voucher + hotel room + ticket for next available flight + lost customer loyalty) of bumping a passenger is \$800, and we have a 100-seat plane that flies from SFO --> ORD at a ticket price of \$250**. By how many seats should we overbook the plane on this route?

Data-driven decisions. Out of the thousands of SFO --> ORD flights over the past ten years, our airline company knows:
the total number of airline seat tickets sold: A
the number of these A customers that actually showed up on time for the flight: B
Given a random customer, the probability that he or she will show up for their flight is thus p=B/A. We will use $p=0.9$, close to this source that reports 7-8% of customers are no-shows.

We can treat the event that a customer boards the flight as being independent of the other passengers boarding*** and occurring with probability $p$. Our goal is to find the number of tickets beyond capacity that we should sell, which we call $x$. The number of customers $N$ that show up for their flight on the 100-seat plane is thus a binomial random variable with $100+x$ trials and probability of success $p$:
$P(N=n)=\binom{100+x}{n} p^{n}(1-p)^{100+x-n}$.
The term $p^{n}(1-p)^{100+x-n}$ is the probability of a specific sequence of $n$ out of $100+x$ customers boarding their flight, whereas the term $\binom{100+x}{n}$ gives the number of combinations of such sequences (we don't care which of the customers show up-- just whether they do or not!).

One approach might be to choose $x$ such that the expected value of $N$ is equal to the number of seats so that just the right amount of customers show up in the long run:
$E(N)=(100+x)p=100$.

This approach is short-sighted since it does not take into account the cost of the airline ticket or the voucher award. For example, if the airline gives out \$1 million vouchers to overbooked customers, the airline wouldn't overbook at all. A better approach is to find a formula for the expected value of the revenue of this flight with our policy of overbooking by$x$customers and plot the expected revenue as a function of$x$to see which$x$maximizes revenue. The revenue$r=r(n)$depends on the number of passengers$n$out of$100+x$ticket purchasers that actually show up. We get income from each person that boards the plane and lose income from each person we bump off of the plane in the case that we are over capacity ($n>100$):$r(n) = 250n$if$n<100$[if less than 100 show, we get \$250 for each passenger that shows, and we don't lose any revenue since no customers were bumped.]
$r(n) = (250)(100) - 800(n-100)$ if $n\ge 100$ [if more than 100 show, we get \$250 only for first 100 passengers, and we lose \$800 for each of the $(n-100)$ customers that were bumped.].

Now, the revenue that we expect to make, given an overbooking policy:
$E($revenue $|x)=\displaystyle \sum _{n=0}^{100+x} P(N=n$ $| x) r(n)$.
The $P(N=n)$ is given by the binomial$(100+x,p)$ distribution given a few lines above. Since we are more likely to get a full plane with increasing overbooking $x$, we get more and more likely to get the maximum possible income \$(250)(100) from the flight as$x$increases. On the other hand, we are more and more likely to go over a full plane as$x$increases, and the \$800 cost of bumping passengers starts to erode our revenue stream.

Using the normal approximation to the binomial distribution (with a continuity correction), I plot the expected revenue as a function of overbooking $x$ in the graph below. There are a number of remarks from this plot that aid our intuition.
• During a full flight, the revenue would be \$250(100 seats)=\$25000, the upper y-limit on this graph. Note that, in the long-run, we cannot expect to fill every airplane seat-- even if we choose a good $x$.
• Selling 100 tickets for 100 seats ($x=0$) does not maximize the revenue. The maximum expected revenue occurs when we sell 109 tickets! That is, revenue is maximized when we oversell the flight 9 seats beyond capacity. [$x=9$ maximizes revenue, and is therefore the best choice.]
• Beyond 109 seats, the revenue decreases because the cost of bumping customers (vouchers, getting the next flight, this customer will fly on a different airline in the future) outweighs the higher certainty of getting a full plane and getting income from 100 full seats. Eventually, when we overbook the plane by 46, the airline is expected to pay more for bumping passengers than it receives in ticket sales!

It should be clear why and how airlines choose to overbook flights to maximize their profits. Each empty seat is lost money, but the airline must weigh this against the risk of paying for vouchers and hotels for customers that couldn't fit on the full flight-- and the lost customer loyalty that ensues*.

This analysis considers only the revenue of the airline. However, there is an externality associated with bumping passengers. Think about how this passenger may lose out on one day of pay, how his or her employer loses out of one day of valuable work, and how the local ice cream shop loses out on one customer that would have otherwise taken his or her family out for ice cream that day.

* Lost customer loyalty was theoretically included in the "cost of bumping a customer" and the analysis holds.

** Ticket price changes with season! We can see how complicated this gets.

*** Realistically, airlines will have models that take into account customer demographics. Perhaps even customer-specific data: one with a history of missing flights can be assumed to be more likely to miss a flight again. Further, tickets sold in a group may be treated differently: e.g., a whole family buying a set of tickets vs. a single businessman. See this article for how complicated airline models realistically may be. An interesting factor is the airport from which one is flying. Think about it: leaving Las Vegas vs. Cleveland-- who is more likely to miss their flight?