Page 4 of 13

Re: Site Freezing and Odd Behavior

Posted: Sat Aug 13, 2022 8:27 pm
by Kent Briggs
escorpius wrote: Sat Aug 13, 2022 7:35 pm We run on AWS -- the monitors for the EC2 instance show no critical stress on the system at the time of the crashes (e.g., 7% of CPU).
The system is spending most of its time just switching between threads, processing packets in and out for every player. That doesn't seem to affect the CPU percentage much, which is why I'm usually more interested in the thread count.

Re: Site Freezing and Odd Behavior

Posted: Sat Aug 13, 2022 11:24 pm
by BlackKnite69
The site locked up last night and had to be reset. It has gotten so regular, that tonight was the first night we had no games for most of the evening. Escorpius is correct, we will have no site without this being fixed.

Re: Site Freezing and Odd Behavior

Posted: Sun Aug 14, 2022 9:40 am
by Kent Briggs
BlackKnite69 wrote: Sat Aug 13, 2022 11:24 pm The site locked up last night and had to be reset.
What does your Error Log file show at that time? How many players were logged in?

Re: Site Freezing and Odd Behavior

Posted: Sun Aug 14, 2022 9:56 am
by BlackKnite69
We had to reset the site. I did not look. I know there were about 12 players logged in at the time because I was chatting with players in the table chat and watching the game. When it froze there were duplicate logins for players, but not all players. When I tried to refresh I could not access the site.

Re: Site Freezing and Odd Behavior

Posted: Sun Aug 14, 2022 10:45 am
by Kent Briggs
BlackKnite69 wrote: Sun Aug 14, 2022 9:56 am We had to reset the site. I did not look.
The error log file will still be in your data folder.

Re: Site Freezing and Odd Behavior

Posted: Mon Aug 15, 2022 2:27 am
by escorpius
Approximately 2 minutes before tonight's freeze:

Code: Select all

2022-08-14 21:26:15|System|Traffic - Seconds: 600, Bytes in: 28737, Bytes out: 3076561, Total: 3105298, Threads: 76, CPU: 0.0%, Memory: 32192 kb
As you can see, no stress from either memory or CPU on the server. Threads are at a level we operate on without issue. The server seems fine, which is also supported by AWS monitoring.

However, the Mavens software hung such that we could not connect to the admin and players could do nothing. We forced the service to close using the Service Manager, restarted it, and everything ran fine thereafter. This is the common pattern.

The ErrorLog shows nothing during this time. Approximately 10 minutes before it began failing, we saw two lines in the Error Log:

Code: Select all

2022-08-14 21:14:07|Invalid reconnect attempt from 172.xx.yy.zz
2022-08-14 21:16:12|Out of sequence packet (8) received from Silence, IP 76.xx.yy.zz
Note that the IP on the first line is for the Windows instance that is running the Mavens server.

Re: Site Freezing and Odd Behavior

Posted: Mon Aug 15, 2022 9:41 am
by Kent Briggs
escorpius wrote: Mon Aug 15, 2022 2:27 am Approximately 2 minutes before tonight's freeze:

Code: Select all

2022-08-14 21:26:15|System|Traffic - Seconds: 600, Bytes in: 28737, Bytes out: 3076561, Total: 3105298, Threads: 76, CPU: 0.0%, Memory: 32192 kb
As you can see, no stress from either memory or CPU on the server. Threads are at a level we operate on without issue. The server seems fine, which is also supported by AWS monitoring.
Would AWS monitoring show a loss of internet connectivity or a DDOS attack, etc? One thing I would suggest is to shorten the Traffic Interval setting (Log Settings group) from 600 seconds to 60 so you get the Traffic record more frequently.

Re: Site Freezing and Odd Behavior

Posted: Mon Aug 15, 2022 3:36 pm
by escorpius
Yes, we run full stack monitoring. Network monitoring shows no loss of server connectivity and triggered no alarms. Packets on the server appear to model the traffic on the server closely, including the surge when the outage happened -- which, the outage caused. Even that surge is [multiple orders of magnitude] far below a DDOS level and well within AWS thresholds -- < 1.5x average packet throughput to that point. Appears to be clearly an artifact of the outage.

I've lowered the threshold to 60 seconds for the Traffic record. If we have players, we will likely have sample data, again.

Re: Site Freezing and Odd Behavior

Posted: Wed Aug 17, 2022 9:56 pm
by setspike
I run my site on AWS as well and not seeing this behavior. Just curious, what instance type are you using?

Re: Site Freezing and Odd Behavior

Posted: Fri Aug 19, 2022 12:25 am
by escorpius
t3.large
Win Server 2012 R2
64-bit

And you?

We also make extensive use of the API and parse the logs in realtime. Are you doing anything similar?