Generating unique, random, and evenly-distributed numbers in ascending order with C++ (and in O(1) memory space)

Creating random numbers in C++ is a fairly easy task within itself. Many developers opt to use srand()
combined with the modulo operator or even use C++’s built-in functions like uniform_int_distribution
. However, a specialized problem arises when the following conditions need to be met:
- The generated numbers must be unique (no repeats).
- They need to be as evenly-distributed as possible. This way the resulting numbers aren’t incredibly skewed.
- The numbers needs to be in order (in this case, ascending) so that sorting is not necessary.
- The algorithm should be as quick as possible.
- The program needs to use as little memory space as possible.
These problems needed to be solved during development of combigen, a CLI tool that can generate combinations of an input that is heavily memory-optimized. This article discusses how I approached the problem and the eventual solution I was able to develop.
If you would like to skip down to the solution along with its explanation, scroll down to the section titled Solution. Otherwise, feel free to keep reading. The following approaches may work better for your use case and may offer some helpful guidelines for future development.
The first (and naive) approach: using srand()
One of the first things I tried just to see if it was possible was to just use a combination of srand()
and sorting to generate the random numbers I needed. The code looked something like this:
This code is fully functional and works. However, there are a lot of issues with this algorithm:
- The program needs O(n) memory space to hold these results. For 100,000
unsigned long long
values this isn’t too much, but what about 1,000,000? Or 100,000,000? Or even 1,000,000,000,000,000,000? This is way too much space to hold in RAM. - The program has to search through every single item in the vector to check if the number already exists. Worst-case scenario will be O(n) time just to check.
- The program now has to sort through the vector once all of the numbers have been generated. Although this is efficient, this still doesn’t meet our constraints.
- The numbers may be unique, but we have no guarantee if these are evenly distributed. This can be confirmed if the program is modified to use smaller numbers (e.g.
amount
is set to 10 andmax
is set to 100). - The runtime will vary greatly each time this algorithm is ran. It could be short, it could be incredibly long. This kind of performance is not acceptable.
Just for fun, I measured the time of this snippet of code (without any compiler optimizations) and it took 36.86 seconds to run.
While this may work for a small scale, this does not meet the requirements listed. Thus, a new approach will be necessary.
The second approach: using a set and C++’s built-in generators
Now that we know what not to do, let’s utilize some of the built-in features of C++. This should significantly reduce the runtime for our program. After doing some research, I wanted to approach the problem using the following tools:
set<>
— This data structure removes the need for sorting our numbers as well as checking if the data already exists. A set will only store unique values as well as maintain the order for our data.mt19937_64
— The Mersenne Twister Engine will be used to help generate random numbersrandom_device
— Cross-platform method of seeding the random engineuniform_int_distribution
— Ensure the generated numbers are evenly distributed
With all of this in mind, I developed the second approach that looked like this:
Not only does this significantly reduce the amount of time it takes to generate our random numbers, but it also allows us to achieve a more constant result.
Once again, I ran this on my test machine to measure the performance. This time, it always took roughly 0.13–0.16 seconds to run. That is a huge performance increase. Let’s also compare what prerequisites this approach meets:
- The generated numbers are indeed unique thanks to the
set<>
data structure. Plus, we avoided having to check if the generated number exists already since the data structure handles this for us. - The numbers are properly distributed thanks to
uniform_int_distribution
. - The numbers are already sorted thanks to the natures of the
set<>
data structure. - This algorithm is clearly fast as it can clearly generate numbers in O(n) time as well as give us constant runtime results.
So why continue to optimize? The only thing this approach fails to meet is to use as little memory as possible (specifically, this would require O(n) memory space). Plus, memory is incredibly cheap these days and most use-cases are really not going to require that much memory.
I’ll be completely honest, for most use-cases this approach is what you should use. Modern computers are so fast and efficient that even generating 10,000,000 numbers takes so little time (roughly 2 seconds on my machine) and the amount of memory used is only used for a short amount of time.
However, if you are trying to develop a program where CPU speed and disk spaces are your only bottleneck (which is what combigen is aimed for), then you should use the solution listed below. In this case, we will improve performance while also reducing the memory usage down to O(1).
Solution
After spending a long time researching the problem, I was able to derive an approach that allows for efficient number generation while still meeting the criteria listed at the beginning of this article. So, without further ado, here is the solution:
With the above code, we’ve now fulfilled the final requirement for our solution. The RandomIterator
class can now generate any range of numbers and still use the same amount of memory space. Thus, we’ve now reduced the memory space of the program from O(n) to O(1).
Plus, as an added bonus, I ran the program on my machine to see how long it would take to generate 1,000,000 of these random numbers. Consistently, it only took about 0.012–0.03 seconds to complete, which further improves the program’s performance.
I used a variation of this class to further development with combigen so that it could generate random combinations of data with as little memory as possible. If you would like to see how this was used, feel free to check out the library I developed called lazy-cartesian-product.
Conclusions
The best part about this solution is that the same algorithm can be used across different platforms. You can implement this same algorithm in Python, Java, C#, you name it. I wanted to make this available for people who may have needed the same (or similar) requirements I did.
This algorithm also offers a base to build off of. If you wanted to tweak it to generate numbers in descending order this can be done too. Or, this can be applied to a different number type, such as a float
or uint1024_t
. Regardless of how you use this, I hope you were able to learn something new today.