Note that even without an acceleration structure ("direct summation" in N-body research terminology), a CUDA program or GLSL shader program can exceed 60 fps with 10,000 to 20,000 particles. And a parallel, C/C++/fortran vectorized CPU code can do the same with over 5 thousand.
FPS is a poor metric anyway, things like this should be measured in frame time instead - but either are meaningless numbers without knowing the hardware it runs on.
Sure, I usually measure performance of methods like these in terms of FLOP/s; getting 50-65% of theoretical peak FLOP/s for any given CPU or GPU hardware is close to ideal.