On-premise Sentry under the High Load: How to stay up&running

YURY MUSKI
3 min readJan 6, 2021

--

We are using self-hosted sentry as the error monitoring system.

The setup flow is pretty simple: clone the repo, run ./install.sh and enjoy your docker-compose stack. For sure, you’ll need to update some config files, but generally that’s it.

If you look under the hood you will get 24 (!!!) containers that do all the staff for you. That is pretty cool and terrifying at the same time. In case of any issues you don't have any guide what to do (maybe only github issue history).

The time will come and one of your applications will generate enormous number of errors. For me it was Dec 31 and ~1kk events.

The server has issues on all fronts: LA, RAM, Disk.

The Load Average issue is the easy one, just fix your apps errors or disable sentry notifications ¯\_(ツ)_/¯ .

RAM and Disk issues are similar: the server is out of free RAM and the disk usage is over 90%, but disabling new events does not help with that.

RAM consumption

The investigation was pretty simple:

docker stats command shows CPU/RAM usage per container or you can use ctop tools for that.

docker exec -i sentry_onpremise_redis_1 redis-cli info 

Info command shows that eviction policy is noeviction, that means redis will use all your servers RAM and stop proceeding new entries after that.
Update the eviction policy and set maxmemory to the reasonable amount of ram to avoid that.

docker exec -it sentry_onpremise_redis_1 redis-cli CONFIG SET maxmemory-policy volatile-ttldocker exec -it sentry_onpremise_redis_1 redis-cli CONFIG SET maxmemory 10G

NOTE: these configs will be removed when the container restarts, so the long term solution is to mount updated redis.conf to the container.

Disk usage

The next issue to handle is the suddenly disappeared disk space.

Checking for the most space consuming folders:

du -Sh / | sort -rh | head -1530G    /var/lib/docker/volumes/sentry-kafka/_data/ingest-events-0
24G /var/lib/docker/volumes/sentry-kafka/_data/events-0

Obviously that not only redis was full of cache, kafka is full of the events too.

Volume sentry-kafka is mounted to sentry_onpremise_kafka_1 container.

Lets go inside it and check ‘events’ and ‘ingest-events’ topics size:

docker exec -it sentry_onpremise_kafka_1 bash
kafka-log-dirs --describe --bootstrap-server localhost:9092 --topic-list ingest-events
kafka-log-dirs --describe --bootstrap-server localhost:9092 --topic-list events

Cleaning space used by topic should be done by a bit tricky: we should set topics retention for very low value, and give kafka some time to clean up itself the right way.

# set retention for 1sec and wait for cleanupkafka-configs --zookeeper zookeeper:2181 --entity-type topics --alter --entity-name ingest-events --add-config retention.ms=1000
kafka-configs --zookeeper zookeeper:2181 --entity-type topics --alter --entity-name events --add-config retention.ms=1000
# revert retention back
kafka-configs --zookeeper zookeeper:2181 --entity-type topics --alter --entity-name ingest-events --delete-config retention.ms
kafka-configs --zookeeper zookeeper:2181 --entity-type topics --alter --entity-name events --delete-config retention.ms

That’s it, disk space has a reasonable amount of free space back.

--

--

YURY MUSKI
YURY MUSKI

Responses (1)