Some questions on a Skytrain meltdown
July 23, 2014
On Thursday July 17th, the SkyTrain system was shut down during the evening peak travel period due to a failed computer component. This left many passengers stranded both at SkyTrain stations and in SkyTrain cars for many hours. Then on Monday July 21th the skytrain system was brought to halt due to a tripped electric breaker protecting the SkyTrain’s operations centre. The power outage also halted the public announcement system
Having two skytrain melt down in a row is statistically improbable. Improbable but not impossible…drawing some hasty conclusions on the general state of the system based on exceptional event shouldn’t be done at this stage:
Some observers have been quick to link the skytrain glitches to lack of funding. We notice that the latest meltdown is linked to the extension of the Skytrain (Evergreen line work)…
Identifying the root cause of the trouble is a good step. Translink, which seems to have learnt how to manage crisis in Pyonyang, thinks it has then took the adequate measure: suspend the electrician whose is alledgely responsible for the tripping of the breaker.
We will note that if a breaker exists in the first place, it is to allow it to trip, and the consequence of a tripping should be known as well. so a first question
- Does the risk of accidental tripping of a critical breaker due to electrical work was properly assessed? and its corollary: Does the electrical work was appropriately scheduled to minimize risks on skytrain operation?
The handling of a crisis communication
A tripping breaker or something else shutting down a whole transit system is a rare occurence, but not something unprecedented:
During the great 2003 North east blackout, whole transit systems, in cities such as Toronto or New York, grind to a complete halt…
In such occurence, The question is: What is the response of the Transit authority and is it adequate?
- Does Translink expect people to roast in trains for hours without any information?
If a train evacation plan was in place, something one could have excepted to be decided in the minutes following the skytrain halt (a tripping breaker is a priori something quick and easy to troubleshoot, and the consequence on the time to “reboot” the system should be well know).
- Why Translink didn’t inform its customers about it?
Thought the passenger announcement system was down, medium like twitter was available (but used only to mention an unspecified “technical issue”). That brings us another aspect of the issue.
Is the Skytrain system rightly designed?
- In crisis situation, more than ever, communication is key: the passenger information system should be insulated of other control systems (be able to run on onboard battery…)
Wrong per design, is also the fact that a Skytrain “glitch”, seems always to bring the whole Skytrain system on its knees. The system seems to be too much centralized. The corollary of it:
The more the system expand, hence add complexity (be by mile of trackage or by number of trains in operation), the more the chance to have catastrophic glitches.
The occurence of it can be reduced by increasing the reliability of the system as is (that can be typically achieved by providing redundancy on key part …but eventually that will not prevent embarassing issues where the whole skytrain system break down, due to a too centralized management of it.
Better overall resilience could be achieved by a more decentralized system: having the different lines operated as much as independently as possible is a step in that direction . That could not necessarily means less over-all break down, but a break down could be of much minor consequence on the system (typically confined to one line). In that regard:
- With the advent of the Evergreen line (VCC-Douglas college), the Millenium line should be shortened to be (Watefront-Lougheed) which should reduce catastrophic break-down effect
- the poor design of the Lougheed station which can be already criticized for the lack of same platform transfer between future Evergreen line train (VCC-Douglas) and Millenium train (Waterfront-Lougheed), can also be blamed, for preventing to operate one line in total disconnection of the other in normal operation (excluding OMC access)
- We have to celebrate as an an eventually uninentended advantage, the fact that the Canada line is operated totally independently from the rest of the skytrain network
The Skytrain reliability is touted at 95%: that measures the % of train running no later than 2mn of its schedule.
A measure providing little meaning for the customer:
- train can run late, but as long as speed and frequency is maintained, the level of service for the customer is maintained.
The measure of the skytrain reliability doesn’t provide us with a good idea of how “late” or “slow” the 5% of trains not “on time” are.
The problem is that when a Skytrain is “running late”, it can very quikly means hour delay for the customer. In that light, 5% trains “running late” could be then considered as way too much (a bit like if a driver was facing incident like flat tire or engine break down once a month, but should feel content because the rest of the month, or 95% of the time, the drive is unevenfull…).
For matter of comparison, the reliability of french driverless subways is usually north of 99% 
To the risk to be at odd with Translink, a review to all of the above question is necessary: the findings could eventually help to reduce the occurence of skytrain systemic issues and more certainly will provide some guidance to help to improve the handling of such occurence in the future
 see Twenty Years of Experiences with driverless metros in France, J.M. Erbina and C. Soulas. As an example, the Paris automated line 14 reliability (percentage of passengers who waited less than 3mn during peak hour or less than 6mn during off-peak hours) is at 99.8% on the Paris automated line 14
 Per definition a “back-up” system is not working when the main system is…and back up system issue are typically discovered when we need it if not thoroughly and recuurently tested what involve significantly ongoing maintenance cost.
 As an example in Paris, each automated subway lines (taht is line 1 and 14 has its own central command center. That is also true of the Lille VAL system, which has 2 lines opened in 1983 and 1989