The main activity feed on the Strava dashboard does a great job of keeping athletes up to date on the riding and running activities of other athletes that they’re following, but we thought it might be interesting for athletes to be able to engage more with the social activities of others on Strava: what activities are earning a ton of kudos, or sparking conversation among others being followed.
Enter the Minifeed, an experimental feature we recently introduced. The Minifeed is a widget on the Strava dashboard that displays comments, kudos, and activities involving athletes that you follow, in real time. If you haven’t already given it a try, you can enable it on the X-Feature page.
This blog post will describe some of the technical challenges we faced, decisions we made, and technologies we used to develop this feature on our server infrastructure.
When relevant events (comments, kudos, activities) occur on Strava, we need a system to identify these events, process relevant information, identify interested parties, and then finally deliver data to clients. We’ll cover these steps in roughly the same order that the event data follows.
Kafka is a distributed, durable publish-subscribe messaging system designed for very high throughput. We use Kafka throughout our infrastructure for various event and message processing tasks, including here for publication of application activity. When an athlete gives kudos, writes a comment, or creates an activity, an application server generates a corresponding event and publishes it to the relevant Kafka ‘topic’ (kudos, comment, activity).
Storm is a distributed real time computation system that allows for flexible, scalable processing of streams of data. Storm will handle many cluster management tasks automatically, including restarting and relocating worker processes when failures occur. As a developer, you specify the types and quantities of workers you want, their behavior, and their relationship to each other; Storm handles most of the rest. This makes it very easy for us to develop and deploy systems to process events from Kafka. There are three main important concepts in Storm:
- Spouts are simply sources of data streams that emit tuples.
- Bolts accept tuples from streams as input, and emit other tuples as output.
- Topologies specify networks of spouts and bolts. A topology specifies the spouts in the network, and defines the connections between those spouts and the bolts, as well as between bolts. For each bolt, it indicates which other bolts and spouts it should receive input from, the parallelism of the bolt (how many running instances of the bolt to create) and how that input is distributed, or grouped, across instances of the bolt.
There are several different types of grouping, including:
- shuffle grouping: tuples are distributed to bolts randomly
- all grouping: tuples are sent to *all* bolts
- fields grouping: tuples are sent to individual bolts based on a subset of fields in the tuple such that equal values for that set of fields always go to the same bolt
We use a storm topology to process events: determining what athletes should see each event, populating them with more detail, and finally emitting them to be sent to clients.
First, we have three spouts for handling events from the Strava application servers: a kudos spout, comment spout, and activity spout, all of which are instances of a Kafka spout that emits tuples consumed from a designated Kafka topic.
All three of these spouts then feed into a parse bolt (selected at random with shuffle grouping), which then determines what athletes should see each event in their Minifeed. We determine this based on the follower lists of the athletes involved, excluding those who are blocked by either. Of course, at any given time not all of these followers will be active on the Strava website, so we don’t want to bother doing anything to show events to those who won’t see them anyway. To handle this, we maintain a set of currently active site users (described in more detail below), and determine the intersection of the two sets: all active site users who should be shown the event in question. The parse bolt emits a tuple for each of these users with the user and event information.
Tuples from the parse bolt are then sent via shuffle grouping to a random event maker bolt, which reads the limited information provided with the event (e.g. IDs for athletes involved, activities, comments), and performs database reads to fetch details that we would like to display (e.g. athlete names, comment contents, activity titles). The bolt then emits the annotated result.
Finally, a sink bolt receives these annotated tuples and performs writes to a Redis instance, which is in turn read from when sending updates to clients.
We host a WebSocket server using the Play Framework. Web clients create WebSocket connections to this server when a Strava dashboard page with the Minifeed is loaded, and keep the connection open, listening for events as long as the page stays open in the browser. The server, in turn, reads from Redis and writes events to the WebSocket as they are read.
Before going into more detail on the data maintained by Redis, it’s helpful to understand more detail about how we determine “active” users. As mentioned above, we don’t want to process events for all Strava athletes all the time; it’s only necessary to handle events for those athletes who have the Minifeed enabled, and are active on the website. However, given the live nature of the Minifeed, if we only processed events for those athletes with an active WebSocket connection at any given time, the Minifeed would be empty when it is initially loaded. Instead, we consider any athlete that has connected at any point in the last week to be active, and maintain a list of recent feed items for each of these athletes in addition to handling events in a live fashion as they occur.
There are three different types of data handled by Redis in this scheme:
- Lists of recent feed events, one for each athlete, keyed by athlete ID. When an athlete initially loads the dashboard, before sending any live events, the list of recent feed events is read from Redis and sent to the client. The aforementioned sink bolt writes new events to this list, and keeps it trimmed to the most recent 25 events.
- The set of active subscribers. We maintain this in a global sorted set of athlete IDs with a “last active” timestamp as the score. This “last active” timestamp is updated by the websocket server once an hour as long as the connection is active, and the Storm topology removes entries more than a week old.
- Live events pub/sub: After sending the recent feed events to the client, the WebSocket server subscribes to a channel keyed by athlete ID. The Storm sink bolt publishes events to this channel, and the WebSocket server forwards them to the client as they are received. When the client disconnects, the server unsubscribes.
This is a general end-to-end overview of how the Minifeed works behind the scenes, there are some additional optimizations and bits of polish not detailed here both planned and already completed. As it’s currently an opt-in experimental feature, load on the system is fairly light compared to the rest of Strava infrastructure. Over time, as the number of users with the Minifeed enabled increases, it may be necessary to add additional caching layers, reconfigure the Storm topology to maximize cache hit rates, or reconsider how active subscribers are determined and handled. That said, since launching, it has been trouble-free, in no small part thanks to the convenience of our existing Kafka-based event logging system and the Storm and Play frameworks.