Bike Share Network Optimization
Graph theory meets urban mobility — processing 2M+ ridership records to optimize station placement across two Canadian cities.
Overview
This project applies network science and graph theory to analyze bike share systems in Vancouver and Toronto. By modeling stations as nodes and trips as weighted edges, we identified inefficiencies in station placement, over-saturated hubs, and underserved neighborhoods. The analysis culminated in data-driven recommendations for rebalancing and expansion.
The Problem
Bike share operators struggle with station imbalance — some stations overflow while others sit empty. Manual rebalancing is expensive and reactive. This project asks: can graph theory and clustering reveal structural inefficiencies before they become operational problems?
Questions Addressed
- 01
Which stations act as critical hubs, and what happens to network flow if they are removed?
- 02
Are there distinct geographic clusters of ridership that suggest natural zone boundaries?
- 03
How do seasonal patterns affect network topology, and where should new stations be placed?
- 04
What structural differences exist between the Vancouver and Toronto networks?
Methodology
Data Engineering
Ingested 2M+ raw ridership records from Mobi (Vancouver) and Bike Share Toronto open data APIs. Cleaned and normalized station coordinates, trip durations, and timestamps using Power Query and Python. Built a repeatable ETL pipeline that reduces manual refresh time from 4 hours to under 15 minutes.
Network Analysis
Constructed directed weighted graphs using NetworkX. Computed centrality metrics (degree, betweenness, PageRank) to identify critical nodes. Applied DBSCAN clustering to detect spatial station groupings and flag 15% of stations as noise — indicating poorly connected outliers.
Visualization & BI
Built interactive Power BI dashboards with custom DAX measures to track station utilization, trip volume by time-of-day, and cluster membership. Geospatial maps overlay network centrality scores on actual city maps for operator use.
SQL Analysis
Wrote analytical SQL queries to answer operational questions: top 10 origin-destination pairs, average trip duration by cluster, seasonal demand shifts. Results fed directly into the final presentation and recommendation deck.
Key Results
Key Findings
Three stations in downtown Vancouver account for 28% of all outbound trips — removing any one causes measurable cascade failures in network flow.
DBSCAN identified 4 stable geographic clusters that align closely with existing city neighborhood boundaries, validating the approach for zone-based rebalancing.
Toronto and Vancouver networks have fundamentally different hub structures: Toronto is polycentric (multiple hubs), Vancouver is monocentric (one dominant hub).
Seasonal analysis shows 40% drop in peripheral station usage in winter, suggesting temporary station deactivation as a cost-saving strategy.
Conclusion
Network science provides operators with a proactive lens on system health that operational dashboards alone cannot surface. The clustering and centrality analysis in this project directly supports a zone-based rebalancing strategy that could reduce truck dispatch costs by an estimated 20–30%. The methodology is portable to any city with open bike share data.
Gallery
