Nan An
Data Scientist, Software Developer, Mathematician, Physicist
The interactive globe displays data gathered from my analytics system. The system also stalls malicious requests.
© Nan An 2025 | Sapiens dominabitur astris
Signature > Gallery >
Data Scientist, Software Developer, Mathematician, Physicist
The interactive globe displays data gathered from my analytics system. The system also stalls malicious requests.
© Nan An 2025 | Sapiens dominabitur astris
Signature > Gallery >
I wanted to build an efficient, scalable analytics system to track user activity on my website. Since I have a Cloudflare free tier account, those resources are availible: Workers, KV caches, and D1 databases. Additionally, the system should serve API endpoints for a real-time 3D visualization of visitor locations. I designed the following specifications.
This mechanism tries to link sessions by IP and session ID. The session IDs are signed to avoid manipulation. A single user can generate multiple identities by changing IP and clearing browsing data, but in general this should give a reasonable approximation of user activity.
The system follows a pipeline structure to optimize performance. The frontend logs user activity and sends data to Worker A, which writes to a KV cache. To prevent excessive writes, Worker B (running on a cron job) periodically batches and writes data to the SQL database. Requests follow these general pathways:
Writes are cached using W-[ID]-[No]: Values and remain valid for one hour. I included [No] to handle multiple writes for the same ID. Worker B periodically collects these cached writes and batches them into the database.
Reads are cached using R-[ID]: Values, as only the most recent read needs to be retained. These entries expire after five minutes.
When Worker A attempts to read data, there are three potential sources: the write-cache, the read-cache, and the database. It first checks the write-cache for the most up-to-date data. If no entry is found, it falls back to the read-cache, and as a last resort, queries the database. This approach means that some retrieved data may be up to five minutes outdated, but this is an acceptable trade-off for improved performance.
Thanks to Cloudflare’s generous free-tier offerings, all the resources used in this system are free.
Email: ann5@mcmaster.ca (Institution), admin@annan.eu.org (Website)
I’m a Data Scientist and Software Developer pursuing an M.Sc. in Computer Science at McMaster University. My research focuses on software modeling, and I have experience in machine learning, AI, and data-driven problem-solving. I’ve worked on large-scale data projects at the Ministry of Health and Front Row Ventures. I also contribute to open-source projects like Neo4j.
Coming from a background in mathematics and physics, I love optimizition and thought experiments. I am an outdoor person and I love nature:
I find the connection between topological and algebraic structures beautiful.