Main Content

Results for


D.R. Kaprekar was a self taught recreational mathematician, perhaps known mostly for some numbers that bear his name.
Today, I'll focus on Kaprekar's constant (as opposed to Kaprekar numbers.)
The idea is a simple one, embodied in these 5 steps.
1. Take any 4 digit integer, reduce to its decimal digits.
2. Sort the digits in decreasing order.
3. Flip the sequence of those digits, then recompose the two sets of sorted digits into 4 digit numbers. If there were any 0 digits, they will become leading zeros on the smaller number. In this case, a leading zero is acceptable to consider a number as a 4 digit integer.
4. Subtract the two numbers, smaller from the larger. The result will always have no more than 4 decimal digits. If it is less than 1000, then presume there are leading zero digits.
5. If necessary, repeat the above operation, until the result converges to a stable result, or until you see a cycle.
Since this process is deterministic, and must always result in a new 4 digit integer, it must either terminate at either an absorbing state, or in a cycle.
For example, consider the number 6174.
7641 - 1467
ans = 6174
We get 6174 directly back. That seems rather surprising to me. But even more interesting is you will find all 4 digit numbers (excluding the pure rep-digit nmbers) will always terminate at 6174, after at most a few steps. For example, if we start with 1234
4321 - 1234
ans = 3087
8730 - 0378
ans = 8352
8532 - 2358
ans = 6174
and we see that after 3 iterations of this process, we end at 6174. Similarly, if we start with 9998, it too maps to 6174 after 5 iterations.
9998 ==> 999 ==> 8991 ==> 8082 ==> 8532 ==> 6174
Why should that happen? That is, why should 6174 always drop out in the end? Clearly, since this is a deterministic proces which always produces another 4 digit integer (Assuming we treat integers with a leading zero as 4 digit integers), we must either end in some cycle, or we must end at some absorbing state. But for all (non-pure rep-digit) starting points to end at the same place, it seems just a bit surprising.
I always like to start a problem by working on a simpler problem, and see if it gives me some intuition about the process. I'll do the same thing here, but with a pair of two digit numbers. There are 100 possible two digit numbers, since we must treat all one digit numbers as having a "tens" digit of 0.
N = (0:99)';
Next, form the Kaprekar mapping for 2 digit numbers. This is easier than you may think, since we can do it in a very few lines of code on all possible inputs.
Ndig = dec2base(N,10,2) - '0';
Nmap = sort(Ndig,2,'descend')*[10;1] - sort(Ndig,2,'ascend')*[10;1];
I'll turn it into a graph, so we can visualize what happens. It also gives me an excuse to employ a very pretty set of tools in MATLAB.
G2 = graph(N+1,Nmap+1,[],cellstr(dec2base(N,10,2)));
plot(G2)
Do you see what happens? All of the rep-digit numbers, like 11, 44, 55, etc., all map directly to 0, and they stay there, since 0 also maps into 0. We can see that in the star on the lower right.
G2cycles = cyclebasis(G2)
G2cycles = 2x1 cell array
{1x1 cell} {1x5 cell}
G2cycles{1}
ans = 1x1 cell array
{'00'}
All other numbers eventually end up in the cycle:
G2cycles{2}
ans = 1x5 cell array
{'09'} {'45'} {'27'} {'63'} {'81'}
That is
81 ==> 63 ==> 27 ==> 45 ==> 09 ==> and back to 81
looping forever.
Another way of trying to visualize what happens with 2 digit numbers is to use symbolics. Thus, if we assume any 2 digit number can be written as 10*T+U, where I'll assume T>=U, since we always sort the digits first
syms T U
(10*T + U) - (10*U+T)
ans = 
So after one iteration for 2 digit numbers, the result maps ALWAYS to a new 2 digit number that is divisible by 9. And there are only 10 such 2 digit numbers that are divisible by 9. So the 2-digit case must resolve itself rather quickly.
What happens when we move to 3 digit numbers? Note that for any 3 digit number abc (without loss of generality, assume a >= b >= c) it almost looks like it reduces to the 2 digit probem, aince we have abc - cba. The middle digit will always cancel itself in the subtraction operation. Does that mean we should expect a cycle at the end, as happens with 2 digit numbers? A simple modification to our previous code will tell us the answer.
N = (0:999)';
Ndig = dec2base(N,10,3) - '0';
Nmap = sort(Ndig,2,'descend')*[100;10;1] - sort(Ndig,2,'ascend')*[100;10;1];
G3 = graph(N+1,Nmap+1,[],cellstr(dec2base(N,10,2)));
plot(G3)
This one is more difficult to visualize, since there are 1000 nodes in the graph. However, we can clearly see two disjoint groups.
We can use cyclebasis to tell us the complete story again.
G3cycles = cyclebasis(G3)
G3cycles = 2x1 cell array
{1x1 cell} {1x1 cell}
G3cycles{:}
ans = 1x1 cell array
{'000'}
ans = 1x1 cell array
{'495'}
And we see that all 3 digit numbers must either terminate at 000, or 495. For example, if we start with 181, we would see:
811 - 118
ans = 693
963 - 369
ans = 594
954 - 459
ans = 495
It will terminate there, forever trapped at 495. And cyclebasis tells us there are no other cycles besides the boring one at 000.
What is the maximum length of any such path to get to 495?
D3 = distances(G3,496) % Remember, MATLAB uses an index origin of 1
D3 = 1x1000
Inf 6 5 4 3 1 2 3 4 5 6 6 5 4 3 1 2 3 4 5 5 5 5 4 3 1 2 3 4 5
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
D3(isinf(D3)) = -inf; % some nodes can never reach 495, so they have an infinite distance
plot(D3)
The maximum number of steps to get to 495 is 6 steps.
find(D3 == 6) - 1
ans = 1x54
1 10 11 100 101 110 112 121 122 211 212 221 223 232 233 322 323 332 334 343 344 433 434 443 445 454 455 544 545 554
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
So the 3 digit number 100 required 6 iterations to eventually reach 495.
shortestpath(G3,101,496) - 1
ans = 1x7
100 99 891 792 693 594 495
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
I think I've rather exhausted the 3 digit case. It is time now to move to the 4 digit problem, but we've already done all the hard work. The same scheme will apply to compute a graph. And the graph theory tools do all the hard work for us.
N = (0:9999)';
Ndig = dec2base(N,10,4) - '0';
Nmap = sort(Ndig,2,'descend')*[1000;100;10;1] - sort(Ndig,2,'ascend')*[1000;100;10;1];
G4 = graph(N+1,Nmap+1,[],cellstr(dec2base(N,10,2)));
plot(G4)
cyclebasis(G4)
ans = 2x1 cell array
{1x1 cell} {1x1 cell}
ans{:}
ans = 1x1 cell array
{'0000'}
ans = 1x1 cell array
{'6174'}
And here we see the behavior, with one stable final point, 6174 as the only non-zero ending state. There are no circular cycles as we had for the 2-digit case.
How many iterations were necessary at most before termination?
D4 = distances(G4,6175);
D4(isinf(D4)) = -inf;
plot(D4)
The plot tells the story here. The maximum number of iterations before termination is 7 for the 4 digit case.
find(D4 == 7,1,'last') - 1
ans = 9985
shortestpath(G4,9986,6175) - 1
ans = 1x8
9985 4086 8172 7443 3996 6264 4176 6174
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
Can you go further? Are there 5 or 6 digit Kaprekar constants? Sadly, I have read that for more than 4 digits, things break down a bit, there is no 5 digit (or higher) Kaprekar constant.
We can verify that fact, at least for 5 digit numbers.
N = (0:99999)';
Ndig = dec2base(N,10,5) - '0';
Nmap = sort(Ndig,2,'descend')*[10000;1000;100;10;1] - sort(Ndig,2,'ascend')*[10000;1000;100;10;1];
G5 = graph(N+1,Nmap+1,[],cellstr(dec2base(N,10,2)));
plot(G5)
cyclebasis(G5)
ans = 4x1 cell array
{1x1 cell} {1x2 cell} {1x4 cell} {1x4 cell}
ans{:}
ans = 1x1 cell array
{'00000'}
ans = 1x2 cell array
{'53955'} {'59994'}
ans = 1x4 cell array
{'61974'} {'63954'} {'75933'} {'82962'}
ans = 1x4 cell array
{'62964'} {'71973'} {'83952'} {'74943'}
The result here are 4 disjoint cycles. Of course the rep-digit cycle must always be on its own, but the other three cycles are also fully disjoint, and are of respective length 2, 4, and 4.
This stems purely from some play on my part. Suppose I asked you to work with the sequence formed as 2*n*F_n + 1, where F_n is the n'th Fibonacci number? Part of me would not be surprised to find there is nothing simple we could do. But, then it costs nothing to try, to see where MATLAB can take me in an explorative sense.
n = sym(0:100).';
Fn = fibonacci(n);
Sn = 2*n.*Fn + 1;
Sn(1:10) % A few elements
ans = 
For kicks, I tried asking ChatGPT. Giving it nothing more than the first 20 members of thse sequence as integers, it decided this is a Perrin sequence, and gave me a recurrence relation, but one that is in fact incorrect. Good effort from the Ai, but a fail in the end.
Is there anything I can do? Try null! (Look carefully at the array generated by Toeplitz. It is at least a pretty way to generate the matrix I needed.)
X = toeplitz(Sn,[1,zeros(1,4)]);
rank(X(5:end,:))
ans = 5
Hmm. So there is no linear combination of those columns that yields all zeros, since the resulting matrix was full rank.
X = toeplitz(Sn,[1,zeros(1,5)]);
rank(X(6:end,:))
ans = 5
But if I take it one step further, we see the above matrix is now rank deficient. What does that tell me? It says there is some simple linear combination of the columns of X(6:end,:) that always yields zero. The previous test tells me there is no shorter constant coefficient recurrence releation, using fewer terms.
null(X(6:end,:))
ans = 
Let me explain what those coefficients tell me. In fact, they yield a very nice recurrence relation for the sequence S_n, not unlike the original Fibonacci sequence it was based upon.
S(n+1) = 3*S(n) - S_(n-1) - 3*S(n-2) + S(n-3) + S(n-4)
where the first 5 members of that sequence are given as [1 3 5 13 25]. So a 6 term linear constant coefficient recurrence relation. If it reminds you of the generating relation for the Fibonacci sequence, that is good, because it should. (Remember I started the sequence at n==0, IF you decide to test it out.) We can test it out, like this:
SfunM = memoize(@(N) Sfun(N));
SfunM(25)
ans = 3751251
2*25*fibonacci(sym(25)) + 1
ans = 
3751251
And indeed, it works as expected.
function Sn = Sfun(n)
switch n
case 0
Sn = 1;
case 1
Sn = 3;
case 2
Sn = 5;
case 3
Sn = 13;
case 4
Sn = 25;
otherwise
Sn = Sfun(n-5) + Sfun(n-4) - 3*Sfun(n-3) - Sfun(n-2) +3*Sfun(n-1);
end
end
A beauty of this, is I started from nothing but a sequence of integers, derived from an expression where I had no rational expectation of finding a formula, and out drops something pretty. I might call this explorational mathematics.
The next step of course is to go in the other direction. That is, given the derived recurrence relation, if I substitute the formula for S_n in terms of the Fibonacci numbers, can I prove it is valid in general? (Yes.) After all, without some proof, it may fail for n larger than 100. (I'm not sure how much I can cram into a single discussion, so I'll stop at this point for now. If I see interest in the ideas here, I can proceed further. For example, what was I doing with that sequence in the first place? And of course, can I prove the relation is valid? Can I do so using MATLAB?)
(I'll be honest, starting from scratch, I'm not sure it would have been obvious to find that relation, so null was hugely useful here.)
I've now seen linear programming questions pop up on Answers recently, with some common failure modes for linprog that people seem not to understand.
One basic failure mode is an infeasible problem. What does this mean, and can it be resolved?
The most common failure mode seems to be a unbounded problem. What does this mean? How can it be avoided/solved/fixed? Is there some direction I can move where the objective obviously grows without bounds towards +/- inf?
Finally, I also see questions where someone wants the tool to produce all possible solutions.
A truly good exposition about linear programming would probably result in a complete course on the subject, and Aswers is limited in how much I can write (plus I'll only have a finite amount of energy to keep writing.) I'll try to answer each sub-question as separate answers, but if someone else would like to offer their own take, feel free to do so as an answer, since it has been many years for me since I learned linear programming.