Ryan333: RAM usage is an all-or-nothing affair. As long as you have enough RAM, you can keep loading more stuff and it won't impact performance as long as those processes aren't ALSO demanding CPU, GPU or disk resources -- in which case there would be a performance impact, but not because of RAM.
This isn't the whole story.
Operating system give processes a virtual memory space. To access things in RAM, these virtual addresses must be mapped to physical memory addresses. What appears as contiguous memory in a process may not be contiguous in physical RAM, and the higher your memory usage & the more processes (including drivers!) you have allocating and releasing memory, the more likely it is that you do not get contiguous memory (RAM is fragmented). If you don't have contiguous memory, then you're less likely to be able to allocate huge pages (larger chunks of contiguous physical RAM), which means you need more page table entries. The page table is what maintains the virtual-to-physical mappings. Unfortunately walking the page table is *slow*, which is why CPUs have a cache to speed up the process (TLB). As with any cache, the more data (page table entries in this case) you try to retrieve through it, the more likely it is that the requested data isn't found because it did not fit in the cache and then you have to do the slow lookup.
https://en.wikipedia.org/wiki/Translation_Lookaside_Buffer#Performance_implications https://en.wikipedia.org/wiki/Page_table Depending on various things, it is possible for the OS to de-fragment RAM on the fly. For example, on Linux you can trigger compaction by writing 1 to /proc/sys/vm/compact_memory. (Unfortunately it can't compact all memory!)
https://sysctl-explorer.net/vm/compact_memory/
I just ran a little benchmark and with a 1024MB working set, huge pages are up to about 20% faster, both for sequential and random access. If you're under heavy memory pressure and RAM is fragmented, you may not be able to use huge pages.
$ make run2
for i in 256M 512M 1024M; do echo "$i:"; ./test-tlb -H $i 64; ./test-tlb $i 64 ; ./test-tlb -Hr $i 64; ./test-tlb -r $i 64; done
256M:
S 4.03
s 5.95
R 98.08
r 113.60
512M:
S 4.04
s 5.85
R 99.36
r 116.10
1024M:
S 4.04
s 5.93
R 100.21
r 120.50
S stands for sequential, R for random. Capital letter means huge pages, small letter means normal pages. Numbers are latency of memory access, measured in nanoseconds. Benchmark is available here, though I'm running a slightly tweaked version for myself:
https://github.com/torvalds/test-tlb.
Reading the benchmark's README is very informative even if you don't intend to run it. Here's one of the more interesting tidbits:
the hugetlb case helps avoid TLB misses, but it has another less
obvious secondary effect: it makes the memory area be contiguous in
physical RAM in much bigger chunks. That in turn affects the caching
in the normal data caches on a very fundamental level, since you will
not see cacheline associativity conflicts within such a contiguous
physical mapping.
...
The effect is noticeable even with something like the 4-way L2 in
modern intel cores. The L2 may be 256kB in size, but depending on
the exact virtual-to-physical memory allocation, you might be missing
quite a bit long before that, and indeed see higher latencies already
with just a 128kB memory area.
In contrast, if you run a hugepage test (using as 2MB page on x86),
the contiguous memory allocation means that your 256kB area will be
cached in its entirety.
Of course the page table itself is in RAM too and using huge pages would make it smaller. A process with a very large virtual address space using lots of normal pages (and other mappings) can consume lots of page table entries, even if the process doesn't use lots of RAM directly. Here's an example of someone with page table taking up 2.5 gigabytes of RAM:
https://itectec.com/superuser/does-the-page-table-take-up-so-much-memory/