The following facts was tested and reproduced on 3 completely different environments and with SQL 2005, 2008, 2008R2, 2012 and 2016.
- When DB mail send email to Postfix(2.10.1 and 3.1.2) SMTP servers report several errors “lost connection after CONNECT” and a percentage of them will result in email lost.
- When DB mail send email to a non-Microsoft email relay software(http://emailrelay.sourceforge.net/) installed on a Windows server will report events ID 1002 “winsock select error: 10053“.
- A packet capture shows a specific behavior from DBmail that didn’t exist with other systems . DBmail initiate one or multiple TCP connections depending of the number of mail to send then in a matter of micro or millisecond forsome of those TCP connections, it send a FIN/ACK then a RST/ACK, which is interpreted as a lost connection from Postfix and the majority of SMTP daemons. Those TCP connections that didn’t have a FIN/ACK complete normally.
- When DBMail is talking to a Microsoft SMTP, it begins the same way including the FIN/ACK for some of the TCP connections but the SMTP conversation will continue under a different TCP source port and are followed by the sequence number. So Microsoft SMTP do not drop the whole SMTP connection when the initial TCP connection is dropped.
- So for some SMTP conversation, the source port changes during the conversation usually just after the 220 Welcome message which cause a lost connection for the majority of the SMTP services.
- SQL Database mail didn’t see any errors even when mails are lost.
I was always on the assumption that the SMTP protocol is a single TCP connection by design as described in RFC 821.
Right now the only work around we found was to configured DBmail to send emails to an IIS SMTP service which relay to our main Postfix server. We would like to get rid of the middle man if possible.
Can the EXE behind DBmail (Databasemail.exe) be replaced by something else more standard in term of SMTP conversation ?
Any help or suggestion are welcome?