diff --git a/src/pages/case-study.mdx b/src/pages/case-study.mdx index 19f2481..28ab683 100644 --- a/src/pages/case-study.mdx +++ b/src/pages/case-study.mdx @@ -1,3 +1,9 @@ +--- +# Display h2 to h5 headings +toc_min_heading_level: 2 +toc_max_heading_level: 3 +--- + import LatencySlider from '@site/src/components/LatencySlider/LatencySlider' import TodoList from '@site/src/components/TodoList/TodoList' import Tabs from '@theme/Tabs' @@ -35,7 +41,7 @@ import { SnapshotID, } from '@site/src/components/Diagrams/index.js' -# Overview +## Overview Syncosaurus is a React-Javascript developer framework for rapidly building browser-based real-time, collaborative web applications backed by Cloudflare Workers and Durable Objects. @@ -48,9 +54,9 @@ In this case study, we will: - Explain how Syncosaurus was built - Discuss future improvements for Syncosaurus -# Introduction +## Introduction -## Defining Real-Time Collaboration Applications +### Defining Real-Time Collaboration Applications Broadly, real-time collaboration is a term used to describe software or technologies that allow multiple users to [work together on a project simultaneously](https://www.techopedia.com/definition/15608/real-time-collaboration). For our purposes, collaborative applications are browser-based web applications [that enable multiple users to simultaneously edit and maintain a synchronized view of an artifact](https://ably.com/blog/the-rise-of-realtime-collaboration) (e.g. a word document or an online whiteboard). @@ -60,7 +66,7 @@ With the rise of [remote work and distributed teams](https://medium.com/@anupamr While [a broader definition](https://www.microsoft.com/en-us/microsoft-365/business-insights-ideas/resources/real-time-collaboration-what-it-is-and-how-it-helps-your-business) of real-time collaboration applications can include video conferencing and messaging software, those types of applications have different technical requirements and will be excluded from the scope of this case study. -## Preface on Architecture +### Preface on Architecture At a fundamental level when multiple users are “collaborating” on the same artifact, they are exchanging data to manipulate the synchronized view of the artifact. Practically, this means that all users who are collaborating on the artifact will need to exchange information back and forth in real-time via the open internet. Although there are numerous architectures for web applications, [they can broadly be categorized ](https://www.geeksforgeeks.org/difference-between-client-server-and-peer-to-peer-network/)into two categories: @@ -77,17 +83,17 @@ At a fundamental level when multiple users are “collaborating” on the same a Though [it is possible](https://www.tag1consulting.com/blog/yjs-webrtc-part-1) to build decentralized real-time collaboration web apps, decentralized architectures are notoriously complex. Instead [many existing real-time collaborative apps use a form of the client-server architecture](https://www.figma.com/blog/how-figmas-multiplayer-technology-works/) due to their comparative [efficiency and ease of maintenance](https://fwx.finance/learn/article/centralized-app-vs-dapp). Therefore, we will assume that any application considering Syncosaurus will be using a form of centralized architecture. -# Building Real-Time Collaborative Applications Is Not Trivial +## Building Real-Time Collaborative Applications Is Not Trivial Initially, building real-time collaborative web apps may not seem different than building other web apps, however, if we break down each word of “real-time collaboration” into more concrete requirements, we quickly realize that is not the case. -## What Does Real-time Entail? +### What Does Real-time Entail? -### Defining Real-time +#### Defining Real-time Due to the laws of physics (e.g. the speed of light) and other causes of [network latency](https://aws.amazon.com/what-is/latency/), [true real-time](https://www.merriam-webster.com/dictionary/real%20time) on the internet can never be achieved. Therefore, real-time communication benchmarks for the Internet are often described in milliseconds - response times of up to [100 milliseconds](https://www.pubnub.com/blog/how-fast-is-realtime-human-perception-and-technology/) are usually categorized as real-time. The basis for this benchmark is a study that determined the average human requires [250 milliseconds](https://www.pubnub.com/blog/how-fast-is-realtime-human-perception-and-technology/) to register and process a visual event. -### Importance of Real-time +#### Importance of Real-time The need for real-time depends on the web application and generally apps that mirror in-person interaction or need rapid information sharing are strong candidates, including: @@ -99,7 +105,7 @@ To illustrate this need and the impact latency can have on a user’s experience -### Achieving Real-time Communication +#### Achieving Real-time Communication At the application layer of the internet, HTTP underpins much of the communication, however, it is designed around a request-response cycle pattern which is not necessarily conducive to real-time latency standards given that a roundtrip from client to server is required for all communication. Therefore, since the type of real-time collaboration applications we are considering uses a client-server architecture, need [bi-directional communication](https://en.wikibooks.org/wiki/Communication_Networks/Network_Basics) due to sub 100 ms latency requirements, and need [high data integrity](https://ably.com/topic/webrtc-vs-websocket#:~:text=While%20WebSocket%20works%20only%20over,the%20underlying%20reliability%20of%20TCP) to ensure all users see the same view, only a few communication options ([among several](https://rxdb.info/articles/websockets-sse-polling-webrtc-webtransport.html)) merit our consideration: @@ -109,7 +115,7 @@ At the application layer of the internet, HTTP underpins much of the communicati Note that using [streams over HTTP/2](https://getstream.io/blog/communication-protocols/) was also considered due to its bi-directional nature and built-in multiplexing, however, the need to broadcast data to multiple clients in real-time collaborative apps (which we will discuss later) made them unsuitable for our purposes. -#### Long Polling +##### Long Polling Long polling is a technique to emulate server push communications via normal HTTP requests. Long polling works like this: @@ -117,7 +123,7 @@ Long polling is a technique to emulate server push communications via normal HTT Although every browser supports long polling, it has high latency compared to other options such as WebSockets, and a risk of missing messages without extensive code on the client and server. Therefore, long polling [is generally better suited as a fallback option as opposed to a primary means of bi-directional communication](https://ably.com/topic/long-polling#:~:text=HTTP%20long%20polling%20solves%20the,make%20requests%20and%20servers%20respond.). -#### WebSockets +##### WebSockets [WebSockets](https://developer.mozilla.org/en-US/docs/Web/API/WebSocket?retiredLocale=de) is an application layer protocol that provides a [full-duplex](https://www.comms-express.com/infozone/article/half-full-duplex/) communication channel over a single, long-lived connection between the client and server. This means that similar to a phone call, the connection from the client to the server will stay open as long the network is not interrupted and neither the client nor the server actively terminates it. This open connection enables clients and servers to freely exchange data without the overhead of the HTTP request-response cycle, but because it is built on top of [TCP](https://www.ibm.com/docs/ro/aix/7.1?topic=protocols-transmission-control-protocol), it still has guaranteed in-order message delivery. WebSockets work like this: @@ -127,25 +133,25 @@ Once the handshake is complete, the WebSocket connection is established and both Websockets are [low latency compared to HTTP](https://ably.com/topic/websockets-pros-cons) and [are widely supported in modern browsers](https://caniuse.com/websockets), though they can be somewhat tricky to scale and maintain in production because the connections [have to remain open and there is a risk of message loss when connections are interrupted](https://ably.com/topic/websockets-pros-cons). -#### WebTransport +##### WebTransport WebTransport is an application layer protocol built on the [QUIC protocol, a more performant alternative to TCP](https://ably.com/blog/can-webtransport-replace-websockets) that still provides the benefits of guaranteed message delivery, with the performance of UDP. The WebTransport process works very similarly to WebSockets, however, the [handshake process](https://www.fastly.com/blog/quic-handshake-tls-compression-certificates-extension-study) is quicker. Unlike WebSockets though, WebTransport provides [multiplexing](https://en.wikipedia.org/wiki/Multiplexing), [which can decrease latency](https://www.videosdk.live/blog/websocket-vs-webtransport#what-is-websocket) and reduce [head-of-line blocking](https://en.wikipedia.org/wiki/Head-of-line_blocking) concerns. While the characteristics of WebTransport make it seem like a strong [fit for real-time collaborative ](https://www.videosdk.live/blog/websocket-vs-webtransport#what-is-websocket)web apps, the technology has [yet to reach the same level of browser support as previously mentioned options](https://caniuse.com/webtransport). -### Choosing a Technique +#### Choosing a Technique With real-time collaborative apps requiring a communication option with low latency, bi-directional messaging, wide browser support, and minimal data loss, many developers, including us, opt to use WebSockets. Though, as noted Websockets are not without their concerns, which will be discussed later in the case study. Now that we have a more concrete definition of real-time communication latency requirements (< 100 milliseconds) and the communication options to enable that, we will seek to define collaboration. -## What Does Collaboration Entail? +### What Does Collaboration Entail? As mentioned previously, when multiple clients are “collaborating” on the same artifact, they are exchanging data to manipulate the synchronized view of the artifact. This synchronized view is known as the application state - the data or variables that determine the application’s appearance or behavior and which is often referred to as the [“document”](https://diginomica.com/brief-history-collaborative-documents) in collaborative applications. To better illustrate the concepts of the current and subsequent sections in our case study, we will use a hypothetical example - a collaborative whiteboard application built using WebSockets. -### Demonstrating Collaboration +#### Demonstrating Collaboration In our example whiteboard application, all users expect to simultaneously see the same shapes (i.e. a consistent application state) so that they can react and respond accordingly: @@ -153,7 +159,7 @@ In our example whiteboard application, all users expect to simultaneously see th If we assume the app is using a client-server architecture where the clients do not hold any state of their own (i.e. [thin clients](https://www.fortinet.com/resources/cyberglossary/thin-client)), the change-initiating client would need to wait until the change is received and confirmed by the server and then propagated to all clients before seeing the change reflected in its state. -### Consistency vs Latency +#### Consistency vs Latency One may assume that because our application is using WebSockets, the information would travel fast enough to meet real-time latency requirements. However, depending on where the backend server(s) is located (among other factors), [the distance between the client and the server can increase latency](https://www.inkandswitch.com/local-first/#1-no-spinners-your-work-at-your-fingertips). Unfortunately, this latency can create a laggy user experience for the change-initiating client while it waits to communicate with the server (note that this discussion alludes to the inverse relationship between state consistency and latency which is described by the [PACELC theorem](https://www.scylladb.com/glossary/pacelc-theorem/)). @@ -163,7 +169,7 @@ Using our whiteboard example, you can see that the users have a different experi To increase consistency in our application, we will now explore [tools and techniques](https://developer.mozilla.org/en-US/docs/Web/Performance) to help us reduce latency. -### Latency Reduction to Local Client-state +#### Latency Reduction to Local Client-state When striving to reduce latency in real-time applications for the change-initiating client, some potential approaches include: @@ -187,7 +193,7 @@ An important implication of implementing optimistic UI is that each client has a However, as we’ll see in the next section, maintaining multiple copies of client state and a server state in the context of shared editing means conflicts can and will arise. -### Inevitability of conflicts +#### Inevitability of conflicts Since the advent of Google Docs, the ability for multiple clients to simultaneously edit a document has [become commonplace](https://diginomica.com/brief-history-collaborative-documents). However, simultaneous editing [invariably leads to shared state conflicts](https://rocicorp.dev/blog/ready-player-two) because multiple clients can make changes to the same part of the state at the same time. Using our whiteboard app as a demonstration let’s say client A decides that they want to change the color of an existing blue shape to green, while client B decides they want to change the color of the same shape to red at the same time: @@ -195,7 +201,7 @@ Since the advent of Google Docs, the ability for multiple clients to simultaneou If these changes occur simultaneously, a conflict clearly occurs and to maintain a consistent, synchronized view across clients (i.e. state convergence), there must be some kind of conflict-resolution strategy in place -### Conflict Resolution Strategies +#### Conflict Resolution Strategies There are [several well-known strategies](https://exaspark.medium.com/top-5-ways-to-implement-real-time-rich-text-editor-ranked-by-complexity-3bc26e3c777f) for resolving conflicts in a distributed state, including: @@ -205,7 +211,7 @@ There are [several well-known strategies](https://exaspark.medium.com/top-5-ways However, before comparing strategies, it is worth reiterating that the initial decision to focus on apps using the client-server architecture affords the option to designate the server as the source of truth and the machine with the sole authority to resolve conflicts to ensure state convergence. Whereas with a peer-to-peer architecture, the only option to ensure state convergence is to use some kind of [consensus algorithm](https://www.baeldung.com/cs/consensus-algorithms-distributed-systems), which can be complex and difficult to implement. -#### Transactional Conflict Resolution +##### Transactional Conflict Resolution [Transactional conflict resolution](https://rocicorp.dev/blog/ready-player-two), also known as Client-side Prediction and Server Reconciliation [in video games](https://www.gabrielgambetta.com/client-side-prediction-server-reconciliation.html), is a technique to resolve conflicts using the intent of each state change rather than the outcome of each change to ensure that state converges. @@ -231,7 +237,7 @@ However, depending on the application, it is possible to build in any custom con While this strategy may not be best suited for use cases where LLW is not the desired default resolution strategy or for decentralized architectures, it is highly flexible and easy to reason about -#### Conflict-Free Replicated Data Types (CRDTs) +##### Conflict-Free Replicated Data Types (CRDTs) A CRDT is a technique to solve conflicts using [a complex data structure that relies on the mathematical properties of commutativeness](https://en.wikipedia.org/wiki/Conflict-free_replicated_data_type) (i.e. states can be merged in any order and the same result will be produced) to ensure that conflicts are resolved and states converge. There are CRDTs designed for different use cases, but broadly, [there are two categories](https://jakelazaroff.com/words/an-interactive-intro-to-crdts/): @@ -246,7 +252,7 @@ Generally, although CRDTs [can be used in server authority models](https://exasp However, CRDTs require a lot of [computational and bandwidth overhead](https://www.figma.com/blog/how-figmas-multiplayer-technology-works/) and are [overkill for some use cases where they are applied](https://rocicorp.dev/blog/ready-player-two). -#### Operation Transformation (OT) +##### Operation Transformation (OT) OT is a technique to solve conflicts using [a set of algorithms that compare concurrent operations]() and detect if the operations will allow the state to converge. If not, the operations are modified (or transformed) to resolve conflicts before being applied to the state: @@ -256,34 +262,34 @@ Much like CRDTs, OT is known to be complex to reason about and can [theoreticall Unlike CRDTs, however, OT [is known to be less overhead](https://news.ycombinator.com/item?id=23806285#:~:text=You%20can%20approximately%20view%20it,store%20a%20lot%20of%20metadata.). -### Choosing a Strategy +#### Choosing a Strategy As illustrated, no one-size-fits-all solution for conflict resolution exists because it depends on developer preference, application requirements, and the application’s architecture. However, because transactional conflict resolution meshes well with the client-server model, is the most flexible (it can even be used in tandem with CRDTs), and gives control back to the developer to resolve any conflict how they see fit, it is a natural fit for many real-time collaborative applications. -## Real-time Collaboration in an Application +### Real-time Collaboration in an Application As demonstrated WebSockets and transactional conflict resolution are a strong fit for many real-time, collaborative applications. However, there are additional considerations if we were to build a production-ready collaborative application. -### Rooms +#### Rooms Returning to our whiteboard example, once clients have sent a request to connect to the whiteboard, we’d need a way to maintain their WebSocket connection, handle incoming updates, and then broadcast those updates to each client's WebSocket connections. Multiple collaborative application framework providers ([Reflect](https://hello.reflect.net/concepts/rooms) and [Liveblocks](https://liveblocks.io/docs/concepts/how-liveblocks-works#Rooms)) refer to this concept of a group of active WebSocket connections and a document as a “Room.” It is also the term Syncosaurus adopted in its terminology and we will be using it throughout the rest of this case study. -### Client Routing +#### Client Routing Let’s also assume we want to extend our application to allow a user to create multiple whiteboards and collaborate on each one with a unique set of other users. To achieve this, we’d need a routing mechanism to ensure clients are connected to the correct room and can initially load the correct document. -### Document Storage +#### Document Storage Finally, let’s assume we want to extend our whiteboard application to allow clients to collaborate on the same document across multiple sessions. To achieve this, we’d need a storage mechanism to enable the persistent storage and retrieval of documents upon room connection. -# Solutions +## Solutions Now that we understand some of the considerations of building real-time collaborative applications, let’s imagine we were to follow through with the implementation of our whiteboard application. Broadly, we’d have two options: - DIY which means provisionining infrastructure and building everything from scratch and/or integrating existing tools - Commercial which means using a comprehensive framework that addresses infrastructure, state syncing, and conflict resolution for real-time collaborative applications -## DIY +### DIY From a simplified perspective, implementing a DIY solution will consist of the following steps: @@ -299,84 +305,25 @@ From a simplified perspective, implementing a DIY solution will consist of the f As you can see, building from scratch is quite an undertaking, even if you take advantage of existing tools. However, this approach does offer greater control and customization over the entire system. -## Commercial +### Commercial Some of the most notable commercial solutions are [Liveblocks](https://liveblocks.io/) and [Reflect](https://reflect.net/), which are primarily differentiated by the conflict resolution model each one employs. These commercial solutions offer convenience by abstracting much of the discussed complexity of a DIY solution by automatically deploying the infrastructure and handling the backend logic while exposing a client-side SDK to developers. However, this convenience comes with a price - these solutions each use a monthly [freemium](https://www.investopedia.com/terms/f/freemium.asp#:~:text=our%20editorial%20policies-,What%20Is%20Freemium%3F,for%20supplemental%20or%20advanced%20features.) pricing model. Liveblocks paid tiers start at $99 / month and Reflect paid tiers start at $30 / month as of the time of writing this. Additionally, using a commercial solution limits the amount of control and customization the developer has over the backend of their application. -## Where does Syncosaurus fit in? +### Where Does Syncosaurus Fit In? Syncosaurus is a React Javascript client-side SDK with full ready-to-be-deployed backend functionality. It is best suited for developers of small-to-medium-sized applications who want to rapidly develop and ship real-time, collaborative features in their applications. Similar to commercial solutions like Liveblocks and Reflect, Syncosaurus exposes a client-side SDK while abstracting the backend logic and handling much of the deployment - the only thing a developer has to do to get the backend deployed is sign up for a Cloudflare account and use our CLI to deploy. However, unlike the commercial solutions, the Syncosaurus framework is free to use and open source so a developer can alter the default backend code if they choose. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
DIYLiveblocksReflectSyncosaurus
Open-source
Managed Service
Easy to Deploy
Conflict Resolution MechanismAnyCRDTsTransactional Conflict ResolutionTransactional Conflict Resolution
Authentication Support
Offline Client Support
Analytics
- -# Using Syncosaurus + + +## Using Syncosaurus To better understand Syncosaurus we will explore the syncing model and then walk through how a developer might use the tool. -## Syncing Model +### Syncing Model As mentioned, Syncosaurus uses a real-time syncing model with transactional conflict resolution to keep state consistent across multiple clients: @@ -396,11 +343,11 @@ As mentioned, Syncosaurus uses a real-time syncing model with transactional conf - When more than one client is connected to a given room, any mutations from one client are broadcast to all other clients by the server. Because an update from the server is authoritative, all client states are guaranteed to converge. -## Development +### Development Now that we understand the fundamental syncing model of Syncosaurus, we will discuss how to use the framework. -### CLI Setup +#### CLI Setup The first step to use Syncosaurus is to install the CLI tool by running `npm install -g syncosaurus-cli` @@ -408,7 +355,7 @@ Next one can choose to add Syncosaurus to an existing React project using `npx s [TODO insert nice CLI graphic / gif] -### Create and launch +#### Create and Launch To use Syncosaurus in a React application’s code, one should import and initialize Syncosaurus: @@ -426,7 +373,7 @@ Next define the logic to create and/or join a room by passing in a RoomID to the synco.launch(roomID); ``` -### Model Data Shape +#### Model Data Shape Next, one should model the data shape that backs their application. Though the data shape of the key-value data store (i.e. document) is not strictly enforced, modeling it upfront allows a developer to better reason about their application logic @@ -447,7 +394,7 @@ The shape for our to-do list could look like this in JavaScript: } ``` -### Define Mutators +#### Define Mutators Next, we need to define the “write” logic for the application using mutators. Mutators are javascript developer-defined functions that contain the logic to update and manipulate the shared state based on user events in your application. @@ -461,7 +408,7 @@ function addTodo(tx, { id, text }) { } ``` -### Define Subscriptions +#### Define Subscriptions Finally, to render the shared state on the client, we need to read the data from our local key-value store and determine how to display it using subscriptions. A subscription is implemented in the client code using a custom React hook called `useSubscribe` that reads data from the local store and re-renders components when updates to the value(s) for a specific key or set of keys (known as a watchlist) in the local storage occur. @@ -480,7 +427,7 @@ A subscription to get all todos could look like this: ); ``` -### Putting it all together +#### Putting It All Together We have prepared a simple todo list application to illustrate the mutator and client code and the behavior we expect. @@ -577,7 +524,7 @@ Go ahead and enter some todos into Client 1 and watch them appear on Client 2 af -## Deployment +### Deployment After you’re satisfied with the local version of the application, it can be deployed to Cloudflare by running `npx syncosaurus deploy`. Once the CLI completes deploying, simply update the `server` URL provided to your Syncosaurus object at instantiation. Note the switch from the `ws` to the `wss` scheme. @@ -589,7 +536,7 @@ const synco = new Syncosaurus({ }); ``` -## Monitoring +### Monitoring After your application is deployed, you can monitor its usage and get help debugging via the analytics dashboard and a live tail log. @@ -602,11 +549,11 @@ The dashboard includes hourly time-series metrics related to errors and usage fo For even more detail, a tail logging session for your deployed backend can be started by running `npx syncosaurus tail` -# Building Syncosaurus +## Building Syncosaurus Now that we understand Syncosaurus at a high level, we will discuss how we built it, starting with a brief recap on requirements, then a discussion on architecture, followed by a discussion on decisions and tradeoffs made while building Syncosaurus. -## Requirements +### Requirements Before we can discuss the architecture of Syncosaurus, it’s important to summarize the requirements we’ve mentioned so far. Since we are building Syncosaurus to support production-ready applications, at a minimum it should support: @@ -616,9 +563,9 @@ Before we can discuss the architecture of Syncosaurus, it’s important to summa -## Architecture +### Architecture -### Server vs Serverless +#### Server vs Serverless There are many different models of cloud computing [ranging from infrastructure-as-a-service to serverless](https://azure.microsoft.com/en-us/resources/cloud-computing-dictionary/what-is-cloud-computing) and each one offers varying degrees of infrastructure upkeep and flexibility. While server-based models (e.g. virtual private servers) offer greater control over the infrastructure, when choosing a model for Syncosaurus we quickly narrowed in on a [serverless model](https://martinfowler.com/articles/serverless.html#WhatIsServerless) (i.e. [functions-as-a-service](https://www.redhat.com/en/topics/cloud-native-apps/what-is-faas)) due to its alignment with our use case of supporting the rapid development and release of real-time features in small-to-medium sized applications, including: @@ -628,13 +575,13 @@ There are many different models of cloud computing [ranging from infrastructure- Although, serverless is not without its tradeoffs including [cold start issues and challenges around testing and debugging](https://www.cloudflare.com/learning/serverless/why-use-serverless/). -### Traditional Cloud vs Edge +#### Traditional Cloud vs Edge Several serverless model cloud providers offer both traditional and edge-based options (e.g. AWS Lambda vs AWS Lambda@Edge). Traditional cloud computing connotes remote servers in managed data centers that are not necessarily close to their clients which can increase latency. Whereas edge computing means servers and data storage are positioned “as close as possible” to the source of data with the intent of reducing latency. For Syncosarus, we ultimately elected to deploy on the edge and accept tradeoffs around [added complexity and reduced robustness](https://www.kio.tech/en-us/blog/data-center/advantages-and-disadvantages-of-edge-computing) since reducing latency in real-time applications is paramount. -### Cloudflare Workers and Durable Objects +#### Cloudflare Workers and Durable Objects When choosing a provider to build on, we narrowed in on a couple of options. @@ -659,11 +606,11 @@ However, it should also be noted that DOs have limitations, most notably that th As we’ve defined, real-time collaborative applications should emulate at most a board meeting (~15 WebSocket connections), not a town hall (~100 WebSocket connections), therefore scaling concerns around the number of concurrent WebSocket connections and the number of requests per second were minor with optimizations put in place as we’ll discuss shortly. -## Engineering Challenges +### Engineering Challenges When engineering Syncosuaurus, we had to be thoughtful about the efficiency of the syncing system and ensure that a developer using the framework would have a pleasant experience. In this section, we will discuss some of the considerations that went into those engineering decisions. -### Challenge: Authoritative Update Size +#### Challenge: Authoritative Update Size A naive implementation of authoritative updates sent from the server to clients would be to broadcast the entire state from the server to all clients when each change occurs. @@ -680,7 +627,7 @@ An alternative approach to prevent this is to send an update that only contains We ultimately decided on the delta update approach for the efficiency gains, however, this decision introduced the risk that a missed update by a client could lead to a divergent state. -### Challenge: Missed Authoritative Updates +#### Challenge: Missed Authoritative Updates As mentioned, a risk of using delta updates is the heightened potential for missed messages and divergent state. This is because [when a WebSocket connection temporarily drops and reconnects, missed updates can occur in the interim](https://socket.io/docs/v4/connection-state-recovery). Therefore, we needed to implement a connection state recovery mechanism that would bring the client back up to date in this scenario. @@ -690,14 +637,14 @@ To do so, we implemented an incrementing `batchID` which is sent by the DO as a Note that sending only the delta updates missed could reduce the state recovery message size and latency, however, the DO currently does not keep a log of the updates it broadcasts. This is an area for future investigation since it would likely create greater memory and storage demands on the DO. -### Challenge: Authoritative Update Frequency +#### Challenge: Authoritative Update Frequency Another area of consideration was the trigger mechanism of authoritative updates sent by the DO. Generally, we considered two approaches: - Event-driven model - Time-driven model -#### Event-driven model +##### Event-driven model Using an event-driven model means that for every message sent by a single client to the DO, M update messages are broadcasted by the DO, where M represents the number of connected clients. While this approach requires simple logic, in production, it can quickly present a scaling concern. @@ -707,7 +654,7 @@ If we assume all clients are sending messages simultaneously and we are targetin In [testing](https://news.ycombinator.com/item?id=37942258) done by Reflect, the target budget of a DO should be approximately 2,000 calls per second to `WebSocket.send `(a call to transmit a message over a WebSocket connection). This means, that if an app wants to target 60 FPS (a standard [in gaming](https://www.100ms.live/blog/frame-rate#)) it can only have 6 concurrent users (60 messages/second = 2000 calls / (6 users^2)). While a 6-user limit is suitable for some applications, it does shrink the pool of applications Syncosaurus could support. -#### Time-driven model +##### Time-driven model A time-driven model means that the DO would group state changes into a single update and send out a periodic message based on a configurable time frequency. @@ -719,7 +666,7 @@ The tradeoff with this approach is that a message will be sent whether an update -### Challenge: Re-rendering and Subscriptions +#### Challenge: Re-rendering and Subscriptions An additional consideration was reducing unnecessary client-side rendering in React and making our subscription system robust. A naive implementation of subscriptions would re-rendered the entire UI every time an update from the server was received. However, this could lead to [performance bottlenecks](https://medium.com/@nadeem.ahmad.na/react-bottlenecks-uncovered-a-comprehensive-guide-to-solving-common-performance-issues-in-your-app-90a3f98b3669) and does not take advantage of React’s model to re-render modular components individually when an update to the state they are concerned with occurs. @@ -729,27 +676,27 @@ To make subscriptions fit React’s granular rendering model, we decided to allo Though, there may be additional memory overhead with memoizing query results and maintaining a key watchlist, eliminating unnecessary UI re-renders and making subscriptions more robust leads to a snappier UI and gives developers more control over the rendering of their application, respectively. -### Challenge: Ease of use +#### Challenge: Ease of use The last area of consideration was how to make Syncosaurus easier to use by other developers. After researching several solutions in the space and putting ourselves in a developer’s shoes, we decided to implement a couple of additional features: - Optional authentication support for room access using JWTs - An Analytics Dashboard to monitor and debug rooms -### Authentication +##### Authentication The Syncosaurus framework supports token-based authentication that allows a developer to enforce proper room access if they choose. JWTs are commonly utilized for this, but other types of token-based authentication like OAuth can be used as well. To implement authentication, a developer must provide two components to work with our authentication handler: - A token(s) or preferably a library or service that can generate new tokens - An authentication service that verifies the validity of tokens -#### Analytics Dashboard +##### Analytics Dashboard As demonstrated, Syncosaurus includes an analytics tool to easily view and analyze aggregate and single-room metrics for an application. The dashboard allows a developer to gain insights into usage and debug their application if necessary. The architecture for the analytics dashboard displays data from Cloudflare’s endpoints and visualizes it in a locally running front-end application that pulls from a custom-built GraphQL backend: -# Future of Syncosaurus +## Future of Syncosaurus And that's Sycosaurus! While it fully supports real-time collaborative applications like the puzzle shown on our landing page and beyond, there is still much room for improvement and feature parity with existing solutions. A few areas we plan to investigate next are: diff --git a/static/img/screenshots/comparison_table.png b/static/img/screenshots/comparison_table.png new file mode 100644 index 0000000..e9ecdb5 Binary files /dev/null and b/static/img/screenshots/comparison_table.png differ