Ceph recovery issue

Our ceph environment only has one network, so that internal and public uses the same network.

When the number of osd changes, huge data transferring will cause client has no access to ceph.

In this case, limit internal net flow would be a good choice.

arg	discription	default
osd max backfills	The maximum number of backfills allowed to or from a single OSD	1
osd recovery max active	The number of active recovery requests per OSD at one time.More requests will accelerate recovery, but the requests places an increased load on the cluster.	3
osd recovery sleep	Time in seconds to sleep before next recovery or backfill op.Increasing this value will slow down recovery operation while client operations will be less impacted	0
osd recovery sleep hdd	Time in seconds to sleep before next recovery or backfill op for HDDs.	0.1
osd recovery sleep ssd	Time in seconds to sleep before next recovery or backfill op for SSDs.	0
osd recovery sleep hybrid	Time in seconds to sleep before next recovery or backfill op when osd data is on HDD and osd journal is on SSD.	0.025

modify options on the fly

ceph tell osd.* injectargs osd_recovery_max_active 1

ceph config set osd_recovery_max_active 1

Written on October 11, 2018