Update on LoiLoNote School Server Overload and Prospects on Improvement 2020/5/1
Since Monday, April 13, there has been delays in our server connection for a few hours every morning. We apologize for your inconvenience, and not being able to provide consistent service.
We would like to report on our analysis and prospect on improvements.
There are two major causes to this issue.
Rapid increase in access.
A major change in usage.
Cause 1：Rapid Increase in Access
The record high access was 67 million accesses/day up until February, 2020.
However, on April 13 we saw a record-breaking 270 million access/day which is three times that of our previous record.
Although this maybe due to an increase in schools using our service starting from April (April is the beginning of the academic calendar in Japan), but we analyzed that this is largely due to a huge change in how our existing customers used LoiLoNote School under continuous closure of schools in the COVID-19 pandemic.
Cause 2：Major Change in Usage
The server experiences heavy access at specific times
The server load reaches its peak every day at 8:30, 9:45, 10:00, 10:45, 11:00, 12:00 (JST).
We're guessing that everyday health condition check and assignment deadlines in many schools are set to these times.
These times are used frequently all over Japan, so we kindly ask that you help us distribute the server load by either avoiding setting deadlines on these times, or dividing the submission deadlines to smaller groups. Thank you for your cooperation.
Classes by whole grades or whole schools
Thank you for your cooperation in restricting Class participants to less than 300.
Online classes have removed the normal class limit of 40 students, and more and more schools are beginning to conduct Classes by whole grades. Features such as the "Submission Box" and "Send" were developed for standard class size of around 40 students, so this change brought unexpected server load.
It was also unexpected that some schools would put all 2,000 students in one Class and use LoiLoNote School for school announcements.
Students use LoiLoNote School from their home instead of school.
Before this issue, students were using iPads and Chromebooks managed by schools, but now students are accessing LoiLoNote School from a variety of platforms such as smartphones and shared computers in their families.
Usage from a variety of platforms was also a reason we took time to identify the issue.
Why Hasn't This Issue Been Resolved Yet?
We sincerely apologize for the time it's taking to resolve this issue.
LoiLoNote School's server automatically creates additional servers according to the server load, dispersing access to each server and therefore designed to withstand overwhelming access. Currently about 200 servers are operating in daytime.
However, because it is difficult to disperse database to multiple servers, we are solving this issue by upgrading the server itself.
Our database couldn't endure the amount of access since April 13th, so we upgraded our database server.
We are currently using the high-end server of AWS (Amazon Web Service).
However, this issue continued despite this upgrade.
Before this issue occurred, the CPU load was around 10%. However, with multiple heavy access at specific times, the CPU load suddenly increased, and all connections have been experiencing delays.
Although it took a lot of time to identify the issue, through continuous analysis we are beginning to see potential causes.
Prospects on Improvement
On April 29th, we finally found some potential causes to this issue. We did a maintenance on May 1st, in midnight from 0:00 AM - 1:00 AM (JST) .
However, we also found that our app version also contains code that causes this issue. We are still working to resolve this part.
The aforementioned issue didn't occur in morning on May 1st, after the midnight maintenance.
We are gradually making progress towards fully solving this issue.
We are already installing APM (Application Performance Management).
We will also strengthen our monitor on our access logs, so that we would be able to respond to changes in usage and increase in server load.
You can check the server performance from the link below.