I do not agree much with this conclusion. If you can't measure very well, the safe bet is to disable THP because they are capable of improving of a given percentage the performance on certain use cases, but can totally destroy other use cases. So when there is not enough information the potential gain/lose ratio is terrible... So I would say "blindly disable THP", unless you can really go to use-case-specific costly measurement activities and are able to prove yourself that in your use case THP are beneficial.
It's much worse than that though because this isn't a case of measure throughput with, then without and see which is best. Rather, your application is sailing toward a submerged iceberg that when it hits (could be next week) will stall your process, and potentially the entire box, for 60 seconds.
And it doesn't print a message like "yeah I stalled your box for the last 60 seconds in order to shuffle deckchairs around, sorry" in syslog.
So you pull your hair out trying to figure out why your nice stable service all of a sudden sets off Nagios at 2am for no obvious reason, every week or two.
As a counterpoint, consider that random recommendations from the internet can easily get outdated.
So apparently, transparent hugepages have some issues in their current implementation that can cause big performance losses in some cases. Seems to me like that's a bug, and I see no reason why that bug couldn't be fixed in the future.
By following random recommendations, you get into situations where the underlying problem has been fixed for ages, but people still cargo-cult some workaround that actually makes things worse with the new implementation.
More like : if you can't measure the difference then definitely turn it off because if it is on there is a non-zero chance of significant instability events in your future.
If I'm understanding their comments correctly it's because the downside isn't just a possible not-increase/decrease in performance it's instability and unpredictable behavior. I worry that it could translate into those vague and difficult to reproduce "the application is weird/slow" reports.
Of course you could profile and measure performance to determine if the warning is applicable but is that something I should be doing for every part of the stack? I should but should I prioritize that over x, y or z?
I would also say the same if you host a Ruby or Python app, or anything using forking really.
Similar to the issues you had with Redis, the kernel change to THP on by default totally destroyed CoW sharing for forked Ruby processes, despite Koichi Sasada's change to make the GC more CoW friendly. Without disabling THP, a single run of GC marking can cause the entire heap to be copied for the child.
i feel the same, you should only use these kind of performance improvements if you MUST, not just to gain speed willynilly. Speed always comes at a price, and if it's not needed , then it;'s not needed. Faster is not always better!
Do not blindly follow any recommendation on the Internet, please! Measure, measure and measure again!
It's also important to measure in your actual use-case, and not just with benchmarks that seem "close enough"; I know it sounds odd, but I've seen others adjust settings and then prove that it worked with a benchmark that they claim is "representative", when in reality they didn't actually improve anything because that "representative benchmark" differed from the real use case in precisely the way that would not respond to the adjustment.
Blindly following "best practices" is bad enough, but "proving" that the changes work with crucially-different benchmarks is worse; and when it's some expensive consultant doing such things, I think it may even approach fraud.
I agree, with the caveat in the case of THP to disable it by default. And then measure to prove it’s worth enabling. Or even better, set the setting to ‘madvise’ and let applications decide whether they want huge pages or not.
It baffles me that THP became enabled by default (is it? I think it’s only a default on RHEL distros?). It really screws up many expectations that applications might assume about memory behavior (like the page size). In the majority of cases, THP is a bad, bad idea and anyone with perf or devops experience will agree with this I think.
Do you want to impose GC like pause characteristics to all processes on your box? And possibly double, triple, or 10x your memory usage? Enable THP then.
Since "enabled=always" is the kernel default value, anything that uses a stock kernel (example: Arch family) or has to build its own (Gentoo) will probably have it enabled by default.
I just checked, and my Gentoo and Manjaro systems have it set to "enabled=always".
It's enabled by default because it actually works fine in most cases, it has issues with certain workloads (databases, hadoop etc) where you'll do much better if you allocate Huge Pages (not to be confused with THP) region in advance.
Anyway, recently they added new "defer" mode for defragmentation so THP doesn't try to defragment (the main cause of the slow down) upon allocation and instead it is triggering it in background (via kswapd and kcompactd). This is now set to be the default. I think it is available in RedHat/CentOS 7.3+
Depends what you mean by "works fine". imho any feature that can sent the kernel off into a tens of seconds dream state underneath my process is just unforgivable and totally broken. Definitely good to hear that this is being done out of band now.
I guess what I said is that most of the time latency is not as important as throughput. And in those scenarios it generally works fine.
The best out of both worlds though (although then it requires more manual work) is pre-allocating HugaPages in advance and then let application use them (if the application supports it) or through libhugetlbfs (if it doesn't).
Edit: changed hugetlbfs to libhugetlbfs, so it's easier to find how to do it with man libhugetlbfs
FWIW, Debian 9 (stretch) has it set to "madvise", but my Debian unstable machine has it to "always". Looking further, I can see that /boot/config-4.12.0-1-amd64 has:
# CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS is not set
CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y
and /boot/config-4.13.0-1-amd64 has:
CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y
# CONFIG_TRANSPARENT_HUGEPAGE_MADVISE is not set
So this is a recent change.
Edit: The linux kernel source says the default is always (in mm/Kconfig), and that's been true since 2011.
The debian package changelog says the change occurred in 4.13.4-1:
* thp: Enable TRANSPARENT_HUGEPAGE_ALWAYS instead of
TRANSPARENT_HUGEPAGE_MADVISE
The reason is not given in the changelog itself, but it's given in the git log of the debian packaging:
As advised by Andrea Arcangeli - since commit 444eb2a449ef "mm: thp: set THP defrag by default to madvise and add a stall-free defrag option" this will generally be best for performance.
Edit 2: The mentioned commit (444eb2a449ef) dates back to 4.6, so presumably, at least some performance issues with transparent huge pages may be gone since that version of the kernel.
Interesting. I'm running Debian unstable, and recently my system would sometimes lock up under heavy memory pressure. I'm using VirtualBox, which has its own kernel module, so I can't be sure Linux itself is to blame, but the timing seems to coincide with when I switched to that kernel version. Maybe transparent hugepages uncovered a VirtualBox bug or even a kernel bug. And I care about worst case performance more than average performance, so I just now set it to "never".
The parent said "droplet" so I assume Digital Ocean. Unless you've installed the host yourself, from scratch, you can't be sure the option hasn't been changed.
Makes me think that your setting is a default and his was set by Digital Ocean.
No, I have another Ubuntu 16.04 machine at home - same kernel version, same settings. He must have installed kernel 4.11 manually , because linux-image-generic currently pulls kernel 4.4.0-101-generic on 16.04; settings depend on kernel version.
That is exactly the reason I wrote the post! Those advice are based on specific use case, bug or outdated kernel. The jemalloc (Digital Ocean post) case is a good example, it just doesn't (didn't) know about THP https://github.com/jemalloc/jemalloc/issues/243
I can only repeat it: "Measure, measure and measure again!"
https://alexandrnikitin.github.io/blog/transparent-hugepages...
Especially the conclusion is noteworthy:
> Do not blindly follow any recommendation on the Internet, please! Measure, measure and measure again!