EWS hangs while gromox-http stays active

souverain · 2026-05-22T08:31:39+00:00

Hello, We encountered a temporary issue with EWS access on a grommunio/gromox server. The webmail was working, but Apple Mail on macOS could no longer conn...

souverain

Follow-up / Additional information

The issue happened again after the initial recovery.

Before restarting gromox-http, I captured the state of the process while EWS was not responding.

At that time, gromox-http was still active and running:

PID: 16652
User: gromox
Elapsed time: 01:20:29
Threads: 47
Command: /usr/libexec/gromox/http

The local EWS endpoint was still timing out:

curl -sS -o /dev/null -w "HTTP=%{http_code} TIME=%{time_total}\n" --max-time 10 http://127.0.0.1:10080/EWS/Exchange.asmx

Result:

HTTP=000 TIME=10.001916
curl: (28) Operation timed out after 10001 milliseconds with 0 bytes received

During the failure, ss showed many sockets owned by the gromox-http process stuck in CLOSE-WAIT on port 10443.

Examples, anonymized:

CLOSE-WAIT ... [::1]:10443 [::1]:xxxxx users🙁("http",pid=16652,fd=xx))
CLOSE-WAIT ... [::ffff:127.0.0.1]:10443 [::ffff:127.0.0.1]:xxxxx users🙁("http",pid=16652,fd=xx))

There was also at least one CLOSE-WAIT connection on port 10080:

CLOSE-WAIT ... [::ffff:127.0.0.1]:10080 [::ffff:127.0.0.1]:xxxxx users🙁("http",pid=16652,fd=xx))

Some internal connections were still established, for example to [::1]:5000 and [::1]:6666.

The gromox-http threads were mostly waiting in:

hrtimer_nanosleep
futex_wait_queue
do_epoll_wait
do_sys_poll

The journal still showed repeated messages like:

exmdb-audit: truncated message /var/lib/gromox/user/example.tld/example-user:f1966080:m1966091 (rewrite)
exmdb-audit: truncated message /var/lib/gromox/user/example.tld/example-user:f1966080:m1966116 (rewrite)

I do not know whether these exmdb-audit messages are related to the EWS hang, but they appear repeatedly around the same period.

Restarting only gromox-http immediately restored EWS again:

systemctl restart gromox-http

After restart:

HTTP=405 TIME=0.000407

So the issue is reproducible on this system: gromox-http remains active, but EWS stops responding locally until the service is restarted.

The large number of CLOSE-WAIT sockets on 10443 may be relevant. Please let me know if there are specific debug logs, a backtrace, gcore, strace, lsof output, or any other diagnostic commands you would like me to capture if the issue happens again.

WalterH

souverain Which gromox version is installed?

souverain

WalterH gromox-3.7.224.m1fee87e-lp156.27.1.x86_64

souverain

Additional update:

The issue happened again this morning.

Before restarting gromox-http, the local EWS test returned:

curl: (28) Operation timed out after 10002 milliseconds with 0 bytes received
HTTP=000 TIME=10.002759

After restarting only gromox-http:

HTTP=405 TIME=0.000409

So the same pattern is confirmed again: gromox-http remains active, but EWS stops responding locally until the service is restarted.

WalterH

Try with the newer gromox versions: (3.7.226 for the supported edition and 3.7.239 for the community edition) .

Mister2

I have the same issue on both Grommunio servers I look after. One is running on OpenSuse 16.0 and the other on OpenSuse 15.6 - both from the latest OpenSuse 15.6 appliance ISO, the former having been updated using @WalterH excellent upgrade recipe.

Running 'systemctl restart gromox-http' does seem to work but takes several minutes to execute using the commandline. MacOS mail is once again delivered once gromox-http is restarted. I have tried WalterH - suggesting to apply the Gromox update 3.7.239 yesterday but issue persists. I have just seen gromox 3.7.240 is available - will try that next otherwise system is fully up to date for a OpenSuse 16.0 system.

As per @souverain's original post email is delivered OK to iPhones and the Gromox-Web Client - issue is just with MacOS mail client.

souverain

WalterH i have update with this version gromox-3.7.240.m66283a7-lp156.37.1.x86_64 but the same situation is already her :-(

souverain

Mister2 yep i have the same version but nothing change, the same issue :-(

1of16

same problem with 3.7.226.ab0dc23-lp160.4.1 🙁
started about one week ago

weini

Same issue here with gromox 3.7.240.

Event worse, I can´t go back to 3.7.130 (last version that didn´t have the issue to my understanding), because this triggers another bug discussed here: https://community.grommunio.com/d/2685-debian-13-update-error-after-upgrading-libc-bin-package.

Can we please label this as BUG to get a bit more visibility?

SHP

Good morning.

Perhaps same issue here with gromox 3.7.226 (Debian 13), but with Outlook-Clients.
We have to restart gromox-http that lasts approx. 1 minute to stop the service.

Mister2

I now also have an Outlook client who is experiencing the same issue as reported by SHP . So now have both MacOS Mail clients and Outlook (16) clients suffering from periodic outages (12 to 36hrs) of email until they report the issue and I restart the gromox-http service. After around a minute the services restarts and mail flow is restored.

EDIT: Can we get this ticket raised to a 'BUG' as it appears several servers are now experiencing this issue.

mho

Same situation here with grommunio supported - gromox 3.7.226.ab0dc23:
Log contains theses messages when the problem with Outlook (Outlook 2024) occurs:

Mai 28 07:12:56 server gromox-http[1643]: Rejecting connection from [::1]:37826: reached 400 connections (http.cfg:context_num)
Mai 28 07:13:05 server gromox-http[1643]: Rejecting connection from [::ffff:127.0.0.1]:50362: reached 400 connections (http.cfg:context_num)

After restarting gromox-http service (yes SHP restart takes 1 minute here too, which is quite long) all is fine for usually several hours.
For now i've opened a support ticket.

weini

Another attempt allowed me to downgrade to 3.7.130 now.
I first did a full upgrade "apt dist-upgrade" and then installed the older version of the gromox package.

souverain

Mister2 poke @WalterH can you do it please ?

Andrew

I believe I have the same problem on EL9 with gromox 3.7.242.m0d06f09-40.1.
About twice a day, Outlook loses connection. In the NGINX logs for Autodiscover, EWS, and MAPI I see no live upstreams while connecting to upstream. Restarting gromox-http returns it to service, although as the others have mentioned it is suspiciously slow to restart, like it's waiting for a timeout.

Duese6

I am seeing very similar behavior across multiple systems.

The last version that worked reliably for us was:
gromox-3.7.130

Since updating beyond that version, we started seeing occasional hangs or long-running requests within gromox-http.

Today I updated one system to:
gromox 3.7.242.m0d06f09-lp160.39.1 on openSUSE Leap 16.

With this version, the shutdown / restart behavior of gromox-http has clearly improved — it now stops cleanly and no longer hits the systemd timeout.

However, the underlying issue still seems to be present:
Outlook clients occasionally lose the connection, which suggests that some requests are still hanging or timing out during runtime.

This looks very similar to what is described here regarding long-running (possibly EWS) requests inside gromox-http.

Duese6

Small addition based on further testing:

I was able to correlate the behavior more closely.

Whenever Outlook loses the connection / appears to hang, gromox-http is also unable to stop cleanly at that exact moment. In that situation, stopping the service results in a timeout and SIGKILL.

Example:

Outlook hangs / disconnects
at the same time, systemctl stop gromox-http hits TimeoutStopSec and requires SIGKILL

This strongly suggests that there is an active request (likely still running inside gromox-http) blocking the shutdown.

When no client is hanging, gromox-http usually stops cleanly.

So it seems directly related to long-running / stuck requests rather than the shutdown logic itself.

souverain

Duese6 As for me, I install the latest updates every day, but the problem still persists

WalterH

souverain Mister2 poke @WalterH can you do it please ?

What to do?

souverain

WalterH Can we get this ticket raised to a 'BUG' as it appears several servers are now experiencing this issue.