Software versions:
PHP 8.1.5 (cli) mysql Ver 8.0.29-0ubuntu0.20.04.3 for Linux on x86_64 ((Ubuntu))
After migrating our database to the new server and new software I noticed strange behaviour which could be shown with this simplified code snippet below:
<?php $host = '<remote host (external IP)>'; $db = '<db name>'; $user = '<user>'; $pass = '<password>'; $charset = 'utf8mb4'; $dsn = "mysql:host=$host;dbname=$db;charset=$charset"; $options = [ PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION, PDO::ATTR_DEFAULT_FETCH_MODE => PDO::FETCH_ASSOC, PDO::ATTR_EMULATE_PREPARES => false, PDO::ATTR_TIMEOUT => 5 ]; try { //executing first statement, no problem echo 'CREATING PDO OBJECT'.PHP_EOL; $pdo = new PDO($dsn, $user, $pass, $options); echo 'PDO OBJECT CREATED, PREPARING STATEMENT'.PHP_EOL; $stmt = $pdo->prepare('SELECT * FROM someObject WHERE objectID = 1'); echo 'STATEMENT PREPARED, EXECUTING STATEMENT'.PHP_EOL; $stmt->execute(); echo 'STATEMENT EXECUTED, FETCHING RESULT: '; $result = $stmt->fetch(); echo count($result).PHP_EOL; $sleep = 258; echo 'SLEEP: '.$sleep.PHP_EOL; sleep($sleep); echo 'WOKE UP'.PHP_EOL; //executing second statement after sleep, hangs echo 'PREPARING STATEMENT'.PHP_EOL; $stmt = $pdo->prepare('SELECT * FROM someObject WHERE objectID = 2'); //hangs here echo 'STATEMENT PREPARED, EXECUTING STATEMENT'.PHP_EOL; $stmt->execute(); echo 'STATEMENT EXECUTED, FETCHING RESULT: '; $result = $stmt->fetch(); echo count($result).PHP_EOL; } catch (PDOException $e) { throw new PDOException($e->getMessage(), (int)$e->getCode()); }
This script output is:
CREATING PDO OBJECT PDO OBJECT CREATED, PREPARING STATEMENT STATEMENT PREPARED, EXECUTING STATEMENT STATEMENT EXECUTED, FETCHING RESULT: 20 SLEEP: 258 WOKE UP PREPARING STATEMENT
After this, the process becomes stale and does nothing, PREPARING STATEMENT is the last sentence I see in the output until max execution time is reached – then the process is dropped by PHP. It happens always if logic reaches 258 seconds of database inactivity.
If I reduce sleep time to 257 seconds – it always works, I see a second result, and the script finishes successfully. For me, it looks like there is some parameter that blocks connection after 257 seconds of inactivity. On the MySQL side, I see following
SHOW FULL PROCESSLIST; +------+-----------------+---------------------+------------------+ | Id | User | Host | db | Command | Time | State | Info | +------+-----------------+---------------------+------------------+ | <id> | <user> | <host> | <db> | Sleep | 1258 | | NULL | +------+-----------------+---------------------+------------------+
As you may see – Time is 1258 here, it never gets closed (only after it reaches MySQL wait_timeout). MySQL timeouts below
+-----------------------------------+----------+ | Variable_name | Value | +-----------------------------------+----------+ | connect_timeout | 10 | | delayed_insert_timeout | 300 | | have_statement_timeout | YES | | innodb_flush_log_at_timeout | 1 | | innodb_lock_wait_timeout | 50 | | innodb_rollback_on_timeout | OFF | | interactive_timeout | 28800 | | lock_wait_timeout | 31536000 | | mysqlx_connect_timeout | 30 | | mysqlx_idle_worker_thread_timeout | 60 | | mysqlx_interactive_timeout | 28800 | | mysqlx_port_open_timeout | 0 | | mysqlx_read_timeout | 30 | | mysqlx_wait_timeout | 28800 | | mysqlx_write_timeout | 60 | | net_read_timeout | 30 | | net_write_timeout | 60 | | replica_net_timeout | 60 | | rpl_stop_replica_timeout | 31536000 | | rpl_stop_slave_timeout | 31536000 | | slave_net_timeout | 60 | | ssl_session_cache_timeout | 300 | | wait_timeout | 28800 | +-----------------------------------+----------+ 23 rows in set (0.00 sec)
You may say that the DB connection shouldn’t be kept open if not needed and I agree with that, it is what I did to fix this problem. However, on the previous server, I noticed nothing like that and wondering what’s happening here. Please, don’t suggest using the persistent connection, I want to find the reason, not fix consequences.
Advertisement
Answer
This ended up being on the Azure side. Since both servers were hosted there, I found that Azure Firewall considers TCP connection dead and drops it after 4 minutes of inactivity as stated here https://docs.microsoft.com/en-us/azure/firewall/firewall-faq#what-is-the-tcp-idle-timeout-for-azure-firewall
It is about any connection, not only about SQL one. Maybe this would be useful to someone
Could be fixed by changing TCP kernel parameters default value: net.ipv4.tcp_keepalive_time to any value lower than 4 minutes