This project is a Node.js-based analytics application that integrates MySQL for database management and Redis for caching, designed to efficiently handle high-traffic scenarios.
It is fully Dockerized for ease of deployment, performance optimization, and scalability, reflecting careful consideration at every architectural layer.
The following sections outline the key decisions I have made, explain the design choices, and demonstrate how these decisions contribute to a robust and optimized system architecture.
You can always launch the project by running:
docker compose up
You can recreate the container by running:
docker compose up --build
to Launch front-end:
cd /src/client-interface
npm run dev
In a separate terminal, launch on the root of project:
node proxy-client/client.js
This will launch a demo script which will keep adding new data every second.
A demo html page can be accessed here. Make sure the docker containers are up and running before visiting this page.
Actual JS file containing tracking code can be found here
The overall architecture is based on a modular monolithic design, with each component having clear separation of concerns:
- Node.js serves as the primary application layer.
- MySQL provides structured storage for all analytics and session data.
- Redis is used to cache frequently accessed data, reducing the load on MySQL in high-traffic scenarios.
- Each component is containerized using Docker, ensuring that the entire system can be deployed consistently across environments, enhancing portability and reducing the risk of "it works on my machine" issues.
Important:
Redis is not a requirement for the sake of this assessment. But I chose to use it in this project for two main reasons:
- To demonstrate my skills with a caching layer since this topic was discussed during my interview session.
- To demonstrate my proficiency with Docker in a CD setup where whole deployment and configuration is being managed from the same codebase.
- src/client-interface: Frontend code for the analytics dashboard (Nuxt3).
- src/config: Configuration file for
Sequelize
ORM. - src/controllers: Controllers for business logic and handling requests.
- src/cron: Cron job setup for aggregation tasks. I have not implemented it considering the scope of current assessment but it can be implemented to update aggregate tables.
- src/db: Database initialization scripts for MySQL.
- src/middleware: Contains different middleware for rate-limiting global error handling etc.
- src/models: Contains all the
Sequelize
ORM models. - src/redis: Folder to hold advanced Redis configuration and all the cache layer implementation code.
- src/routes: API routes for the backend.
- src/tracker: Client-side JavaScript tracker for embedding on websites.
- /proxy-client: A client script to simulate multiple websites sending data to server.
Complete database schema can be viewed in DB_SCHEMA.md.
Database initialization and seeding automatically occurs at the time of creation of new container. If there is no disk volume created, the initialization routine will run and database will be created and seeded automatically. Upon subsequent restarts of the container, the data is not re-initialized until the container is destroyed along with its disk volume.
- Node.js Application: The central processing unit that handles API requests, processes session data, and communicates with MySQL and Redis.
- Nuxt 3 Frontend Application: The frontend dashboard to show different analytics.
- Tracker JS File: A javascript file to contain the code that will run in client websites, collect metrics and send back to server.
- MySQL Database: The primary storage for structured data, including user sessions, page views, and other analytics data.
- Redis Cache: Provides a caching layer to store frequently queried data, reducing the load on MySQL.
- Docker Compose: Used to orchestrate the multi-container architecture, ensuring smooth communication between the services.
The decisions for this project were driven by assuming a need for scalability, performance optimization, and maintainability as if this was a production-ready project.
Here’s a breakdown of the architectural decisions and their justifications:
-
Why MySQL?
Mainly because it was required by the assessment criteria itself, but I would have chosen it as the relational database for its structured data handling, ACID compliance, and extensive community support. This aligns well with the analytics use case, where data integrity and relational querying are critical. Additionally, MySQL supports complex queries and indexing, making it a reliable option for aggregating large volumes of analytics data. -
Performance Considerations:
I took into account that MySQL’s performance can degrade under high load if not properly configured. To mitigate this, we have optimized the database configuration (my.cnf
) to:- Increase connection limits (
max_connections
). - Adjust the buffer pool size (
innodb_buffer_pool_size
) to efficiently handle high read/write operations. - Enable query caching to optimize repeated queries for analytics dashboards.
- Increase connection limits (
Disclaimer
Custom configuration is only introduced in this project to demonstrate my abilities to understand advanced level configuration parameters that might need to be addressed in a live production setup to achieve optimal performance. Such configurations might not have any immediate impact in this limited setup.
-
Why Redis?
Redis was added as a caching layer to offload frequently accessed data from MySQL. The reasoning here is simple: in a high-traffic analytics environment, querying the database forpage_id
,browser_types
,demographics
etc. would require costly joins and extra database calls for each incoming data-point for page analytics.By storing such information in cache layer, I removed many of the indexes that otherwise would have been needed for such read operations. Hence, I was able to create highly write-optimized transactional tables. Redis, being an in-memory store, dramatically reduces the response times for such requests.
-
Design Consideration:
Redis has been configured to persist data using AOF (Append-Only File), ensuring that the cached data is not lost on container restarts. Additionally, we’ve optimized Redis memory usage by setting limits (maxmemory
) and using the LRU (Least Recently Used) eviction policy to maintain memory efficiency. -
How it works:
I have implemented as a self-contained cache layer as single source of truth for lookup data tables.SessionController
will use helper methods oncache
object to check the cache if a record exists for that particular value, if not, the cache layer will automatically check the database, if a record is found, it will update the cache and return the record. If it is not even found in the database, for the sake of this assessment, no further validation is performed and it is assumed that the data needs to be persisted in database. So a new record is inserted in database, cache is updated and record is returned back toSessionController
.
Disclaimer
Custom configuration is only introduced in this project to demonstrate my abilities to understand advanced level configuration parameters that might need to be addressed in a live production setup to achieve optimal performance. Such configurations might not have any immediate impact in this limited setup.
-
Why Modular Monolith?
A monolithic architecture was chosen for this project given the very limited scope of this assessment, yet each component (API, database interactions, session handling, etc.) is well-modularized within the monolith. This design allows for better maintainability and a potential future transition to microservices if the project needs to be scaled. -
Benefits:
- Simplicity: Easier and much quicker to develop and manage at this stage compared to microservices.
- Performance: Since everything runs in a single process, there is minimal inter-service communication overhead.
- Potential for Future Scaling: The modular design ensures that any of these modules can be extracted into its own service if required.
-
Why Docker?
Docker was chosen to containerize the entire application for consistency across development, demo and possible re-running of code on your end, should you choose to see it in live action :)Docker ensures that the same versions of Node.js, MySQL, and Redis are used across environments, removing platform-specific issues.
-
Why Centralize Configuration?
Centralizing the configuration in a.env
file ensures that sensitive credentials (e.g., database passwords) and environment-specific settings (e.g., ports, caching policies) are not hardcoded in the codebase or Docker files. This approach also makes it easy to switch configurations between development, staging, and production environments without changing the underlying code. -
Examples of Centralized Configurations:
- MySQL: Even though I could convert the
sql
files into templates and leverage database name, user credentials, and connection details defined in.env
but that would have introduced some extra steps at container initialization to process those templates before running at database initialization. That goes way beyond the scope of this assessment so I did not take that route. But it can be implemented for sure. - Redis: Port and caching policies are defined centrally for easier adjustment.
- Application Ports: Easily adjustable via the
.env
file, ensuring flexibility during deployment.
- MySQL: Even though I could convert the
-
Database Initialization:
The MySQL initialization scripts were made part of the codebase and the whole process is automated via docker to initialize database upon container construction and also some sample data is seeded into the database for proper visualization in the dashboard. -
Persistence Strategy:
Both MySQL and Redis can be set up with persistent volumes to ensure that data is not lost when containers are restarted. But that is beyond the scope of current assessment so assume any data will be lost in case the containers are restarted.
This file will be minified and all the assessment related code will be stripped to achieve maximum speed without affecting performance of the host websites.
trackingApiUrl
,trackingData
,detectBrowser()
,detectOperatingSystem()
, anddetectDeviceType()
are all clear and descriptive, making the purpose of each variable and function immediately understandable.- The code prioritizes readability, which is essential in an assessment context, making it easy to follow and maintain.
- The
sendTrackingData()
function uses the Fetch API to send the collected data asynchronously, ensuring that the process does not block the webpage or affect its performance.
- There is built-in fault-tolerance. The script will keep retrying until a
200
status code acknowledgement is received from the backend server.
- The script collects key data such as the page URL, referrer URL, user-agent, browser type, operating system, and device type, which are essential for a web tracker.
- It timestamps the event using
new Date().toISOString()
for precise logging of when the event occurred, making the data useful for detailed analytics.
Following script will be put in the client's website to load the tracker asynchronously:
<!-- Place this small snippet at the bottom of the HTML near <body> closing tag. -->
<script>
(function() {
var script = document.createElement('script');
script.src = 'https://yourdomain.com/tracker.js'; // Backend URL serving the tracker
script.async = true; // Non-blocking loading
var firstScript = document.getElementsByTagName('script')[0];
firstScript.parentNode.insertBefore(script, firstScript); // Insert script dynamically
})();
</script>
- This route serves the minified version of the JavaScript tracker when the client webpage requests it.
- The script is cached for a long period (
Cache-Control: public, max-age=31536000
) to optimize load times after the first request. This means that the client won’t have to re-download the script repeatedly, improving performance.
- The small embed snippet is non-blocking because it loads the script asynchronously using
async = true
. - The script dynamically creates a
<script>
tag, sets thesrc
to your hostedtracker.js
, and inserts it into the webpage, ensuring minimal impact on page load times.
- We can allow clients to pass parameters (e.g.,
siteId
) and the backend endpoint can send out custom JS that is tailored to a specificwebsite
orclient
.
Example:
<script async src="https://yourdomain.com/tracker.js?siteId=12345"></script>
Frontend dashboard is a Nuxt 3
based application. It is a SPA
and self-explanatory. All the websites are loaded in a dropdown on page load and there is a date-range selector. You can switch between different websites and date ranges to get different analytics.
Clicking on a
Date
point in the line graph drills down to show page view counts of individual pages within that website.
To run the frontend application, on the terminal, move to /src/client-interface
folder and run:
npm run dev
This project demonstrates a careful balance between performance optimization, scalability, and maintainability. The decision to use Docker ensures a flexible and isolated development environment, while MySQL and Redis were chosen for their performance characteristics in handling large volumes of analytics data.
Each architectural decision—from using Redis for caching to configuring resource limits on Docker containers—was made with a clear focus on demonstrating my ability to tackle advanced concepts and integrate different complex layers into a working system.
This setup is both robust and flexible, providing an excellent foundation for further expansion, such as integrating additional analytics features, moving to a microservices architecture, or scaling the database for even larger datasets.
- Horizontal Scaling: As traffic grows, we may consider introducing load balancers and clustering Redis to handle increased demand.
- Monitoring Integration: Adding a monitoring stack (e.g., Prometheus and Grafana) to visualize system performance and detect bottlenecks early.
- Microservices Migration: If the modular monolith grows in complexity, consider splitting core functionalities into microservices for greater scalability.