- Edited
Hi folks,
Can anyone help with a grommunio-index problem after update to latest community version?
Summary: upgraded two near-identical hosts, did the stop/delete-indexes/start thing. Index rebuild on host one works exactly as expected; index rebuild on host two fails with "[FATAL] Send failed: Connection refused "
Question: What can cause a connection refused error like this? How can I diagnose and resolve?
(I did some investigation below, looks like connection to exmdb but not sure...)
Detail:
System config:
Grommunio on 2x Debian 12 hosts (arm64 in aws), automated build using chef with the same cookbook and near-identical settings on each (just variance of hostname etc).
uname -a
Linux hostname 6.1.0-30-cloud-arm64 #1 SMP Debian 6.1.124-1 (2025-01-12) aarch64 GNU/Linux
Also: Aptly and Chef on a third host. Aptly hosting a versioned mirror of the community edition with a "prod" repo containing community edition including gromox 2.31 (snapshot 8 August 2024) and a "dev" repo containing community edition including gromox 2.41.0.g97971a6-1+2.1 (snapshot 8 February 2025).
Upgrade process:
- Both systems on prod repo (gromox 2.31), update system software and ensure converged to target state with: apt-get update && apt-get upgrade && chef-client
- Check full functionality on both systems, snapshot EBS volumes in AWS for rollback
- Switch systems to dev repo (gromox 2.41), update again: apt-get update && apt-get upgrade
- Reboot both systems, log in to admin portal and front-end, check send/receive/webmail functionality
- Stop grommunio-index service & timer; remove legacy index files; restart index and timer
Results
- First host: everything works exactly as expected. No error messages, searches through web UI working.
- Second host: everything works EXCEPT index upgrade. After the stop/delete/start process grommunio-index produces the following error:
grommunio-index -A
[2025-02-08T11:55:57] grommunio-index /var/lib/gromox/user/0/1 -e my.fqdn.com -o /var/lib/grommunio-web/sqlite-index/user@fqdn.com/index.sqlite3
[FATAL] Send failed: Connection refused
... exited with status 2
[2025-02-08T11:55:57] grommunio-index /var/lib/gromox/user/1/0 -e my.fqdn.com -o /var/lib/grommunio-web/sqlite-index/user@fqdn.com/index.sqlite3
[FATAL] Send failed: Connection refused
... exited with status 2
Manually running the update with verbose mode for just a single user produces the same error:
root@hostname:~# grommunio-index /var/lib/gromox/user/0/1 -e my.fqdn.com -o /var/lib/grommunio-web/sqlite-index/user@fqdn/index.sqlite3 --verbose
[FATAL] Send failed: Connection refused
Debugging
I couldn't figure out how to get any more diagnostic information than "Connection Refused" so I checked grommunio-index.cpp on GitHub and noted there is only one block that produces a FATAL error with a variable content, so I think the problem is in this code block below that connects to exmdb:
static int single_mode()
{
msg<DEBUG>("exmdb=", exmdbHost, ":", exmdbPort, ", user=", userpath.value(), ", output=", outpath.empty()? "<default>" : outpath);
IndexDB cache;
try {
cache = IndexDB(userpath.value(), exmdbHost, exmdbPort, outpath, create, recheck);
cache.refresh();
} catch(const std::runtime_error& err) {
msg<FATAL>(err.what());
return RESULT_ARGERR_SEM;
} catch(int e) {
return e;
}
return 0;
}
I checked exmdb and it is listening on port 5000 with active connections (pretty sure it is working properly because all main system functionality is working: mail in, mail out, webmail, active sync):
root@hostname:/var/log# netstat | grep 5000
tcp6 0 0 ip6-localhost:47740 ip6-localhost:5000 ESTABLISHED
tcp6 0 0 ip6-localhost:32772 ip6-localhost:5000 ESTABLISHED
tcp6 0 0 ip6-localhost:42088 ip6-localhost:5000 ESTABLISHED
tcp6 0 0 ip6-localhost:32770 ip6-localhost:5000 ESTABLISHED
tcp6 0 0 ip6-localhost:47698 ip6-localhost:5000 ESTABLISHED
tcp6 0 0 ip6-localhost:5000 ip6-localhost:32772 ESTABLISHED
tcp6 0 0 ip6-localhost:5000 ip6-localhost:47712 ESTABLISHED
tcp6 0 0 ip6-localhost:47712 ip6-localhost:5000 ESTABLISHED
tcp6 0 0 ip6-localhost:47726 ip6-localhost:5000 ESTABLISHED
tcp6 0 0 ip6-localhost:5000 ip6-localhost:47740 ESTABLISHED
tcp6 0 0 ip6-localhost:5000 ip6-localhost:32770 ESTABLISHED
tcp6 0 0 ip6-localhost:5000 ip6-localhost:47698 ESTABLISHED
tcp6 0 0 ip6-localhost:5000 ip6-localhost:47726 ESTABLISHED
tcp6 0 0 ip6-localhost:5000 ip6-localhost:42088 ESTABLISHED
This is where I got stuck... the two hosts are virtually identical, the upgrade process was identical including exact version numbers before and after the upgrade, why would it work on one but not on the other?