DATA SCIENCESep 2024 – Present8 min read

Bike Share Network Optimization

Graph theory meets urban mobility — processing 2M+ ridership records to optimize station placement across two Canadian cities.

PythonRSQLPower QueryExcelPower BINetworkXDAX

Overview

This project applies network science and graph theory to analyze bike share systems in Vancouver and Toronto. By modeling stations as nodes and trips as weighted edges, we identified inefficiencies in station placement, over-saturated hubs, and underserved neighborhoods. The analysis culminated in data-driven recommendations for rebalancing and expansion.

The Problem

Bike share operators struggle with station imbalance — some stations overflow while others sit empty. Manual rebalancing is expensive and reactive. This project asks: can graph theory and clustering reveal structural inefficiencies before they become operational problems?

Questions Addressed

01
Which stations act as critical hubs, and what happens to network flow if they are removed?
02
Are there distinct geographic clusters of ridership that suggest natural zone boundaries?
03
How do seasonal patterns affect network topology, and where should new stations be placed?
04
What structural differences exist between the Vancouver and Toronto networks?

Methodology

Phase 1

Data Engineering

Ingested 2M+ raw ridership records from Mobi (Vancouver) and Bike Share Toronto open data APIs. Cleaned and normalized station coordinates, trip durations, and timestamps using Power Query and Python. Built a repeatable ETL pipeline that reduces manual refresh time from 4 hours to under 15 minutes.

PythonPower QueryExcelSQL

Phase 2

Network Analysis

Constructed directed weighted graphs using NetworkX. Computed centrality metrics (degree, betweenness, PageRank) to identify critical nodes. Applied DBSCAN clustering to detect spatial station groupings and flag 15% of stations as noise — indicating poorly connected outliers.

PythonNetworkXSciPyDBSCAN

Phase 3

Visualization & BI

Built interactive Power BI dashboards with custom DAX measures to track station utilization, trip volume by time-of-day, and cluster membership. Geospatial maps overlay network centrality scores on actual city maps for operator use.

Power BIDAXRggplot2

Phase 4

SQL Analysis

Wrote analytical SQL queries to answer operational questions: top 10 origin-destination pairs, average trip duration by cluster, seasonal demand shifts. Results fed directly into the final presentation and recommendation deck.

SQLPostgreSQLExcel

Key Results

2M+Ridership records processed

264Stations analyzed

4Network clusters identified

15%Noise stations flagged

#1Best Presentation Award

2Cities compared (Vancouver + Toronto)

Key Findings

Three stations in downtown Vancouver account for 28% of all outbound trips — removing any one causes measurable cascade failures in network flow.

DBSCAN identified 4 stable geographic clusters that align closely with existing city neighborhood boundaries, validating the approach for zone-based rebalancing.

Toronto and Vancouver networks have fundamentally different hub structures: Toronto is polycentric (multiple hubs), Vancouver is monocentric (one dominant hub).

Seasonal analysis shows 40% drop in peripheral station usage in winter, suggesting temporary station deactivation as a cost-saving strategy.

Conclusion

Network science provides operators with a proactive lens on system health that operational dashboards alone cannot surface. The clustering and centrality analysis in this project directly supports a zone-based rebalancing strategy that could reduce truck dispatch costs by an estimated 20–30%. The methodology is portable to any city with open bike share data.

Gallery

Bike Share Network Optimization screenshot 1

View on GitHub →▶️ Watch Demo Video