all vectors of a specific dimension summing to a particular number with elements from {0,1}

4 views (last 30 days)
This may be a number theory question, but I need to find ALL n-dimensional vectors(x_1,...,x_n) summing to j where each x_i is an element of {0,1}.
For example, for n=4 and j=2, then I need all possible ways x_1+x_2+x_3+x_4=2
Some cases that are apparent - (1, 1, 0, 0) or (1, 0, 1, 0). I need all such vectors for an arbitrary choice of n and j (including large values).
It would be nice if we could stack these vectors row by row in a matrix.
Any thoughts?
  2 Comments
the cyclist
the cyclist on 9 Aug 2016
Can you be more specific by what you consider to be "large" n? You are quickly going to run into memory problems trying to store all possible vectors.

Sign in to comment.

Answers (3)

the cyclist
the cyclist on 9 Aug 2016
Edited: the cyclist on 9 Aug 2016
This will do it, but will have memory issues when n gets "large", for some value of large.
n = 4;
j = 2;
M = dec2bin(0:2^n-1)-'0';
M(sum(M,2)~=j,:) = []
Note that this code is terrible in some senses. It actually creates every combination that does not add up to your specified sum ... and then discards them.
I ran this code for n=26 and j=13. It took about 80 seconds to complete. The final matrix requires about 2GB of memory, and the algorithm temporarily allocated about 13 GB.
n=28 will attempt to create a matrix larger than the default maximum array size.

Steven Lord
Steven Lord on 9 Aug 2016
How large are your "large values" and how much memory / time do you have?
The number of combinations are the total number of ways to select j items from n possibilities, or nchoosek(n, j). The list of indices is nchoosek(1:n, j).
inds = nchoosek(1:10, 5);
inds(77, :)
B = zeros(1, 10);
B(1, inds(77, :)) = 1 % One of the 252 candidates
C = zeros(1, 10);
C(1, inds(176, :)) = 1 % Another candidate, complementary to B
To determine how long it would take you to compute with each of those vectors at a rate of, say, 2 per second:
n = 20;
j = 10;
sec = seconds(nchoosek(n, j)/2);
hours(sec)
Depending on the values of n and j, you may want to convert seconds into minutes or years instead of hours.
  2 Comments
superkevin2009
superkevin2009 on 9 Aug 2016
The problem is n needs to be large, around 1000^2 or larger, and I think these functions won't work when n gets that large. Any thoughts?
Steven Lord
Steven Lord on 9 Aug 2016
Yes: rethink your approach for solving your problem. Let's see what would happen if j = 2.
>> years(seconds(nchoosek(1000^2, 2)))
ans =
15844.3534090365
At one vector per second, you'd need over FIFTEEN THOUSAND YEARS to process them all. And how much memory would you need to store these as full logical arrays?
>> sz = [nchoosek(1000^2, 2), 1000^2];
>> numElements = prod(sz);
>> exabyte = numElements/(1024^6)
exabyte =
0.433680435313333
You don't have that much memory available. Unless you have a machine with about 3% of Google's total storage capacity available to you.
With j = 3, you're looking at billions of years to process. I haven't checked how large j could be before you'd exceed the storage capacity of the observable universe, but I'm guessing maybe 4 or 5.

Sign in to comment.


John D'Errico
John D'Errico on 9 Aug 2016
Edited: John D'Errico on 9 Aug 2016
You are kidding, right? You have vectors in 1000^2 = 1,000,000 dimensions? And you want to generate all possible such vectors with elements that sum to a given number, where the elements may be 0 or 1?
Even if the sum is only a very small number, you will find it impossible to generate all such possible vectors, with trillions upon trillions of possibilities. And of course, each vector is itself huge.
Computers have finite capabilities. They are not infinitely capable, infinitely fast, with infinitely much RAM. People get spoiled.
You need to find a different way to solve your problem. Or you need to start working on smaller problems. Even computing all possible vectors of length only 200 (NOT 100^2 but only 200) to find all possible vectors that sum for example to 100, will be an impossible task.
I mean, how hard could that be? How many possible ways are there to generate a sum of 100. Easy, right?
nchoosek(200,100)
Warning: Result may not be exact. Coefficient is greater than 9.007199e+15 and is only accurate to 15 digits
> In nchoosek (line 92)
ans =
9.05485146561032e+58
Do you really understand how large of a number 9e58 is? For example, the mass of the entire earth is only roughly 6e27 grams. There are roughly only 1e50 atoms in the earth. So lets say that every atom in the earth was able to store ONE of those vectors of length 200. Some desktop supercomputer you have. Mine is not that good. It would fall short by a factor of roughly 1 billion. You would need a billion earths to store that much information.
Thats only for vectors of length 200. You are asking about something that is exponentially worse. If you have a computer that could store one bit of information per cubic picometer, in a space the size of the observable universe, it would be insufficient to solve your problem.
Just for kicks...
9.461e+15 meters in a light year. The visible universe is say 13 billion light years across. A picometer is 1e-12 meters.
(9.5e15*13e12*1e12)^3
ans =
1.883652875e+123
It would be easy to swamp the capabilities of even that supercomputer with just a small such problem. You can want to do the impossible. Just don't expect to be successful. In this case, I hope you have a big budget for computer memory. And of course, a very fast computer.
  5 Comments
Walter Roberson
Walter Roberson on 10 Aug 2016
You could encode it in 1.66e17 * 8 bytes by packing 3 twenty-bit indices into a 64 bit integer. That would only require 1.3 exabyte. That is probably more storage than Facebook has available, but it is less storage than Google has available.

Sign in to comment.

Categories

Find more on MATLAB in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!