THE COMPLETE ARITHMETIZATION PROJECT - PART II: METHODS

T. E. Raptis © 2020

 

A PDF with all three parts is available here

Introduction

 

This is the second part in the presentation of a project whose main aim is to find a minimal set of functions for as many as possible of low level pattern operations without actually accessing the pattern contents. This is close to building an arithmetic analog of a RISC machine without the registers! The reader is strongly advised to read the first part due to a lot of unavoidable technical jargon from various topics without which a clear explanation of the methods and the true scope of this research would not be possible.

Ubiquity of arithmetic fractals

 

The first step in every attempt at utilizing the internal self-similarity of the Hamming hierarchy is to attempt a few runs over a set of small intervals SL and SL+1 in order to verify the persistence of any particular pattern. From then on, given a concatenation scheme as mentioned at section "Patterns without patterns" of the first part, any one map under examination gives rise to a particular sequence of outputs. This can be further processed for revealing any interesting characteristics that could show say, some recursive, self-similar structure by successive foldings over different intervals as for instance by comparingfor all k in terms of some appropriate distance for all k assuming a list of symmetry and /or algebraic operators acting as . Mostly, if such folding reveals the presence of a direct analytically expressible intermediate map which we shall call a "Reproducing Map" m over a list, then we have a situation where one can directly write a simple recursive formula for this case as it will be shown with several examples. All such cases result in the following recursion over lists (using matlab like notation)

 

(given some L0)

 

One may also need to generalize to a set of primitive maps and as

 

(given some L0 )

 

Unfortunately, there is no specific way to fully automate such a procedure since one can try an arbitrary list of symmetry and/or other algebraic operations upon this type of sequences. For similar reasons there is no direct or ad hoc criterion known to the author for predicting whether a sufficiently simple process or automaton will result in a fractal or otherwise recursive sequence that can effectively be compressed this way. As a rule of thumb, such a process should not be applied directly to functional composites the reason being that any particular simple symmetries present in intermediate sequences of outputs may be distorted or completely lost. One last observation on the usefulness of large sequences as those that will be presented is probably machine or generally, implementation dependent. Given a sufficiently large memory, early deployment of such sequences may guarantee a high throughput as a tradeoff between space and time complexity.

 

We may start with the simplest example which also takes care of another problem - that of the arbitrariness in the definition of an appropriate bit length across the hierarchy which is context dependent. Specifically, we have to define a 0th-order fundamental sequence for the natural bit length defined from the maximal power of 2 present which has an analytical expression as

 

 

Here we allow the presence of a transcendental function only for convenience of reference. It is always possible to extract this information otherwise, the simplest case being that of a precomputed list of shifts using direct comparison at least for any finite interval SL. Since exponential lengths play a significant role in our hierarchical analysis whenever we work with a maximal length L over a maximal SigmaL matrix we shall call this the Dominant length. The set of all previous instances of lengths associated with Mersenne intervals at all positions 2i - 1, i = 1, 2,..., L-1 we shall call a set of Subdominant lengths. It is possible to find a special reproducing map for all natural lengths as

 

 

At this point we must stress an important distinction with respect to the way output sequences should be evaluated. Specifically, we must discriminate between two major cases of sequences, those that we shall call "amending" over different hierarchy levels and vice versa. In a more technical context sequences that are non-amending should be termed stable across the hierarchy or simply "H-stable". This is to say that certain maps represented by certain automata may or may not be sensitive into a potentially infinite list containing zeros beyond any maximal power of 2. In any such case, one cannot iterate any recursive evaluation from any previous list by going up say like. For instance permutations as well as so called, combinatoric necklaces are sensitive to zero padding so they must be handled with care although it is possible to form also stable sequences in certain cases. Using the previous terminology, the natural length sequence is a strictly non-amending or H-stable one since it does not affect previous instances even as we go up a higher level at say, L+1.

 

There is at least one more non-amending fundamental sequence that can be recursively evaluated the same way and we shall refer to it as the 1st-order fundamental sequence. It is the well known Sum-of-Digits given as

 

 

This is trivial to find without any experimentation since it only adds a single bit for any new exponential subinterval following the natural length sequence. This particular sequence can be used to isolate any combinatoric subset for any k = 1,...,Lmax inside any SigmaLmax matrix. Notably, this is also the largest subset that can be isolated with the use of the standard binary Shannon entropy. As a matter of fact, all probabilities become also functions of the above sequence as so that they can also be arithmetized as well as the total binary entropy sequence. A word of caution here concerns the amending character of any entropy sequence in case it is not dependent on the total length being chosen inside a Hamming space but by the previous natural length sequence which differs in the estimation of p0 for each subdominant length. We can also analyze the standard Shannon's formula using the algebraic expression of the probabilities which after some manipulation can be decomposed for each integer as

 

 

where and is a complementation parity operator. This of course is only valid for the non-amening case inside the hierarchy otherwise one should use L à l(x).

 

In some cases we may be able to find also individual closed formulas, the same way one often does in other ordinary sequences like say the Fibonnaci one, the main difference being that here we are dealing with list recursions leading Non-Markovian processes with expanding memory needs. One may may also consult the famous OEIS database to check for these as well as a variety of other strange sequences admitting a variety of analytical expressions serving as individual formulas. To examine a primitive case, we may ask for the parity of each binary integer x as a string. Given a list representation of the Digit-Sum as a function DS2(x) this is always given by

 

 

Both this and the original DS2 are H-stable. It is straightforward to compute the global map of Pi and following the recursive unfolding over the Mersenne intervals one immediately recovers the simple law

 

 

Interestingly, the limit of such a sequence when interpreted as a rational 0.011001..., is known as the Thue-Morse constant and there is already a generating function for it.

 

Another such well known case with a slightly more complicated reproducing map is given in terms of the binary reflected or (N, k)-Gray code [see ]. In this case we have to introduce a symbol reversal or reflection operator R ( the fliplr command in matlab) to obtain

 

 

The above examples satisfy the conditions of a fractal sequence and the Hamming hierarchy offers an easy way to extract its contents in terms of some more or less simple maps.

 

There is a last 2D example which is sufficiently elementary that allows showing how to extend these principles beyond simple sequences given all standard bitwise Boolean operators like bitor, bitand and bitxor. It is a kind of "folk theorem" observed by many that such operations and their combinations give rise to fractal matrices. It is then possible to use the periodicities involved in the expansion of every interval to extract a simple recursive scheme for their generation as a mapping without actually invoking the original operators directly. One can see this by the use of the square pulse function which gives immediately the bit values for any power of 2 in the expansion of any pair of. A product of intervals corresponds to an algebraic product like Pk(x)Pk(y) for a biwise AND and similarly for the other two operations. Results can be summarized using the matricized form of any logical gate as

 

OR

0

1

XOR

0

1

AND

0

1

0

0

1

 

0

1

 

0

0

1

1

1

 

1

0

 

0

1

 

This immediately gives the exact form of recursion over matrices as a direct consequence of the scale invariance implied by the structure of the Hamming hierarchy in the form

 

 

In the above M0 is the initial condition provided by each of the matrix expressions of individual bit gates which also gives the exponents for each new level. It is trivial to verify the above recursion against the standard implementations via an oneliner as [x, y] = meshgrid( 0:2^L-1 ); m = bit*(x,y) where (*) stands for each of the snd, or and xor terms. We notice that at least two of the above matrices can be produced by the third and an adder since all three satisfy

 

2bitand(x, y) + bitxor(x, y) = 2bitor(x, y) - bitxor(x, y) = x + y

 

Additionally, once notices that the total bitxor matrix is solely composed of a certain class of permutations its action being automorphic over any intervals. No other closed formula is known to the author at the moment for this last case although all such can be further analyzed into the so called irreducible representations of the total permutation group. This requires an analysis beyond our scope here. Next we can examine cases where certain combinatoric objects also exhibit an inherent fractality or regularity of some other type and where the output of the main map corresponds to a whole output list out of decompositions or similar operations.

 

 

The complete run lengths tree

 

This is in fact the simplest example where only the use of the fundamental sequence of natural bit lengths is necessary. The exercise then is to extract all run lengths of all binary integers in any interval SL without decoding and reveal the fractal structure of the overall set. The trajectories of the following elementary dynamical system from any integer given its natural length as an initial condition suffices to extract its run length encoding

 

The run lengths can then be obtained from the differences plus a (mod 2) bit in front of the final list to denote whether this alternating sequence starts with a zero or a ones block. Below is a short matlab script realizing this with some additional safety for termination at zero.

 

function [seq traj] = int2rl( num, L )

bit = mod( num, 2 );

L0 = ln( num );

seq = [];

if L > L0, seq = [L – L0]; end

ints = [ num ]; new = num;

while num > 0

    new = 2^L – num – 1;

    L0 = ln( new ); num = new;

    seq = [ L – L0 ]; L = L0;

    ints = [ new ints ];

end

seq = [ bit seq ]; ints = ints( 2:end );

end

 

function L = ln(x)

% The below line can be exchanged with any other

% implementation that avoids using the log2 function

if x > 0, L = floor( log2( x ) ) + 1; end

end

 

The reason for the above is fairly simple. Using the difference between the dominant length and the subdominant ones, the trick is to isolate the first block of ones by projecting simultaneously to its complement hence the previous ones block becomes a zero block. Evaluating the difference of the resulting natural lengths one eventually ends up with the equivalent run lengths. At the same time, this leads to a shortening of the successive integers xn which makes this map contractive hence always converging towards a fixed point at zero.

 

Now we may ask what happens if we run this on all integers in any Mersenne interval and plot the trajectories simultaneously as a Boolean array indicating their positions. Here is a short script for doing this

 

L = 8, size = 2^L;

w = zeros( size ); mat = w; y(1) = 1;

for i = 1:size-1

     [ seq num ] = int2rl( i, L );

     w( num, i + 1 ) = w( num, i + 1 ) + 1;

     y( i + 1 ) = length( seq( 2:end ) );

end

figure( 1 ), imagesc( 1 - w ), colormap gray

figure( 2 ), bar(y), %trajectory lengths

 

The result in the first graph very close to a rooted binary tree as in the picture below and since this can be verified inductively, it holds for any interval. Hence the resulting run lengths set as a whole also contains an underlying arithmetic fractal structure for any interval.

 

RL.jpg

Another important fractal sequence obtained this way comes directly form the length of each run length list which is shown in the second graph below and which we may term the 2nd-Order fundamental sequence. To get an idea of how this looks over the [0,...,255] interval here is a bar plot of all lengths

 

RLdim.jpg

It is relatively trivial to examine its inner structure for recovering the exact reproducing map by direct folding for arbitrary dominant lengths to find by direct comparison the associated reproducing map as

 

 

where again, the reflection operator is used to denote a left-right flip of the previous list. There is still another interesting and stable sequence which shall be mentioned and which follows a different logic from the previous cases. This is related to the first element of any run length list which is identical to the first factor in any prime factorization of any integer x > 0 when written as x = 2C13C25C3... It is directly found by isolating the first element of all run lengths of all even integers in the interval [1,...,63] with the result shown below

 

2factors.jpg

This is a particular case of a sequence with a tree-like symmetry of which an individual closed formula is easier to derive then a reproducing map since it is always composition of L-1 periods which can be verified inductively for all levels of the hierarchy. Hence it can be written analytically as a summation over Mod( x, 2i ) for all i in the range [1,...,L-1]. While this is true for the first exponent C1 of any factorization, the reader can verify that a similar pattern holds for all Ci using the Hamming hierarchies over all prime base alphabets by using the generic formula for any Sigma matrix entries given in the fifth section of the first part but this is a problem outside our main aim here. One may ask what about the same run lengths for only the natural scale of lengths instead of the whole Hamming space but this is trivial to derive since one only has to remove the last block standing for padded zeros. The next immediately related problem comes from combinatorics. Specifically, the problem of locating the set of integer partitions for any integer utilizing only the set of all run lengths.

 

Integer partitions for free

To simplify things we may start from the fact that all run lengths in any interval SL already form the integer partitions of a given dominant length L and try to show that they also exhaust the set of possible partitions. It suffices to consider any arbitrary partition like say 1+1+...+1 = L or 1+2+3+...+n = L or anything similar and interpret the corresponding list of factors as an alternating sequence of blocks of ones and zeros. But we do know that any Sigma matrix contains any possible combination of such blocks and there can be no combinations for the same dominant length outside this construct hence the enumeration of at least all unsorted partitions must indeed be exhaustive. A technical detail that should be accounted for concerns that fact that every Mersenne interval is naturally divided in two symmetric complements by the bit-NOT involution so that these two provide the same partitions hence it suffices for every L to work only on the SL-1 interval. In the below we give the simplest possible method that extracts unsorted unique partitions.

 

function p = part( L )

size = d^(L - 1) – 1;

for i=2:size

     seq = int2rl( i, L );

     if seq(2:end) > 1, mat( i-1, 1:length( seq )-1 ) = seq(2:end); end 

end

end

 

If we really want the same set as a completely sorted and lexicographically ordered one we are forced to use some more symmetry operations that can be appended to the previous script as

 

p = flipud( unique(mat, 'rows') );

p = fliplr( unique( sort( p, 2 ), 'rows' ) );

p = flipud( sortrows(p, 1) );

pc = [];  

    for i=1:size(p, 1)

        pi = p(i, :);

        tmp = pi( p i> 0 );

        Lt = length(tmp);

        mat = [];

        for j=1:Lt

             t = circshift( tmp, [0 j] );

             mat = [mat; [t, zeros(1, n - Lt)]];

        end

        pc = [pc; unique(mat, 'rows')];

    end

 

It is then not surprising that some fractal order underlies this set too. Suffice it to say that the problem of integer partitions enters also such remote areas like group representations and particle spin in quantum field theories due to their relation with a combinatoric object known as a Young Tableau <link>. This is a topic too deep and out of scope to engage further here.

 

Arithmetizing string operations

 

Subwords

 

In order to examine more complex operations we also need to introduce some more low level operations like the sub-word extraction in a fully arithmetized form. By a slight abuse of notation we may try the following

 

* A "Low Pass" filter 'Low(x, n)' cutting the first n bits from an integer representing a string

 

* A "High Pass" filter 'High(x, m, L)' cutting the last m bits from an integer representing a string

 

* A "Band Pass" filter 'Sub(x, n, L-m, L)' extracting a substring of arbitrary length lesser than the natural bit length of its corresponding integer

 

It is obvious that the last case can be formed by a combination of the two previous ones which should also commute under an exchange of lengths as

 

Standard expressions for the first two can always be given as

 

 

Thus everything reduces to a ternary instruction set including solely shifts, additions and periodicities. Some more observations are possible regarding an interesting algebraic symmetry underlying the above filters in association with reflections or pattern bit reversals. In abstract operator language we may write this as and since R is involutive this is equivalent to . This states the obvious fact that a high pass filter on any ordered patter can be exchanged with a low pass filter for its mirror image and then inverted again while for any palindromes we must have as well as . Additionally, for any De Brujin sequence of binary integers of length L with k common bits between adjacent elements we may write a condition over any such pair as.

 

Bit rotations

For integers in the same cyclic group C for any constant length expansion there is an obvious expression moving just one symbol which can also serve as a generic decoder since one can grasp one symbol at a time. An explicit formula for a single left to right cycle step is as below the opposite being also obvious by exchange of length arguments

 

 

By convention we can set all initial conditions at the minimal values of each cycle whenever there is no ambiguity (unique elements being lesser than L). There is at least one other interesting method given via concatenation. For instance, any string say, abcd can be doubled as abcdabcd in which case, one can use the Sub function to obtain one by one all arithmetic instances of a cyclic group. Since such a concatenation is equivalent to x + 2Lx for any one only has to use a multiplier c = 2L + 1 so that all such permutations can be obtained via the formula

 

 

The combinatorial necklaces formed by any list of cyclic permutations is sensitive to the presence of zero padding when taken on a Sigma matrix across the hierarchy so that it is not something that can give rise to any ordered pattern as the ones studied in previous sections. One can still follow the natural length sequence to obtain different classes of necklaces and their Lyndon words as the corresponding generators. These can be separated using the equivalence of two criteria, one being the absence of smaller periods (power of a smaller words), the other being the number of unique elements, thus one can readily write the condition L - length( unique({xi}) ) == 0 as a sieve to separate all such. This particular action over strings does not appear to give rise to any specifically interesting sequence apart from the sequence of Lyndon generators for each group which has been extensively studied in the past. More data can be found in the OEIS database on the relevant lemma.

 

Symbolic Substitutions

There is an important application of the above for all bounded strings when trying to arithmetize symbolic substitutions. Evidently, every cycle contains a sequence of updates composed by two main actions, a pattern locator or "matched filter" that isolates each instance from the left column of any rules LUT and a replacement or update act. Specifically, in all 0 level grammars with some replacement rule as say, abcd à dba, a set of rotations of four copies of the original string in subsequent bit positions suffices to verify the presence of any particular neighborhood matching a regexp in which case the original string manipulations can be used to form the new string. Generally, for a rule LUT with a maximal number of symbols for both left and right columns kmax one only has to form a large trajectory of rotations of any candidate redex string as x à {x1, ..., xk}, |xi| = |x| where together with the arithmetic form of all columns say as and then use the total instruction

 

In the above, χ is just a Boolean indicator as a matching condition. Replacement is then just a cut-and-paste action using the previously defined string operations for any intermediate k < L for which matching occurs. Additionally one can readily arithmetize any rule LUT as a single large integer plus the self-descriptive encoding of an appropriate divisor as explained in Part I in the total 'viral' form

 

[Div, 1][0...0][L1][R1]... [LN] [RN]

 

A word of caution is due here since there is a tricky substance regarding the use of the '0' symbol. Since there may be instances of rule sets like 'w00' à '0w0' an ambiguity would occur every time we have a number of 0s at the end of any block. This can only be overcome by a shift of the symbolic alphabet implying kmax à kmax + 1. Since any reductions and derivations only care about correlations between symbols any such remains invariant with respect to a shift of the symbolic alphabet.

 

Notice also that the existence of a non-polynomial representation in the so called Zeckendorf Arithmetics based on Fibonacci numbers also allows multiplications and divisions solely based on symbolic substitutions thus allowing the reduction of a complete ALU unit out to no more than an adder plus shifts and modulos albeit, at the cost of computing arbitrarily large terms of the Fibonacci sequence. Since substitutions are truly universal operations including such constructs as the 'lambda' or combinatory calculi, the role of any underlying symmetries inherited from the wreath product structure affecting the totality of all possible axioms and derivations across a Hamming hierarchy is an important ingredient which has to be carefully dissected and analyzed further. This shall be attempted in the last part of this series.

 

Bit reversals

This trick is something that has repeatedly been used in cryptographic algorithms for enhancing randomization due to the disordered nature of the correspondence between resulting bit patterns and their integer encoding. Yet, we will be able to reveal some partial hidden order using the same technique as in the first section. Firstly, we may examine the disordered nature of such a map between integers directly as in the two graphs below where the first shows all 'bytes' in [0...255] as a sequence and the second uses a 255 x 255 using inputs and outputs as coordinates

R8.jpgR8mat.jpg

The easiest way to obtain them at once is by using two functions from the communications package, bi2de and de2bi which serve as encoders and decoders respectively so that one only has to use the oneliner bi2de( fliplr( de2bi(i,L) ) ) for each instance.

We may now utilize a different representation known from dynamical system as the method of delayed coordinates to try revealing some interesting structure underneath. For this we assume at the moment that the sequence obtained in the first graph is a time series in which case we use a plot as plot( y(1:end-1), y(2:end),'.' ) where y stands for an array holding all reversed integers. The result is shown in the below

 

Rattractor.jpg

Now this appears to hide a fairly regular situation and yet such mappings are well known to be chaotic when used for a recursion. The particular way this was constructed allows using it as a means to reach any other reversed integer directly starting from the origin or some other inside a smaller integer and yet, such a method is not really good since it asks for an increasing number of steps as we move away from an initial small interval. The question then is whether we can find a better approach for fully arithmetizing bit reversal. The answer is in the affirmative using the following trick.

 

Assume a symbolic sequence given as abcde and perform a first step of a cyclic permutation and then proceed with the same on the next subword from the right as below

 

abcdeà [e, abcd] à [ed, abc] à [edc, ab] à [edcba]

 

This now has a constant number of L-1 steps and can be directly written using solely a combination of the previous arithmetized functions for string subwords and cyclic permutations so it is effectively reducible and hence arithmetizable. We can now make a step further and ask for decomposing the previous steps for any string in any interval SL and examine its structure which will basically be a permutation over the whole interval. This can be done with the below script which shows each different step as a separate map over a whole interval

 

clc, close all, clear all

n = 8; size = 2^n - 1;

for i=1:size

    s = [0 de2bi( i, n )];

    for j=1:n-1

         seq = [s(1:j), s(n+1), s(j+2:n), s(j+1)];  

         mat(j, i) = bi2de( seq(2:n+1) );

    end

end

hold on, for i=1:n-1, plot( mat(i,:), '.' ), end, hold off

 

swapmaps.gif

This representation has now the advantage that each separate map can be expressed also analytically with an additional trick. Checking each one of them we see that they all correspond to a branching between three linear maps with a varying cross section over exponential intervals as for instance in the below plots of mat(6, :) and mat(7,:)

 

map6.jpg map7.jpg

It becomes evident that the above is just the superposition of a product of the identity with a pair of square pulse functions P(x, k) presented at the first part. The square pulses act as indicators thus "tagging"" all inputs at once with their respective intervals of validity. At the end we obtain a formula like

Notice the shift in the exponent argument which is used to start with an all ones block instead of an all zeros one while the sign change moves above or below the main diagonal. With the notation introduced previously, one should be able to obtain any integer with a reversed pattern as

.

A last observation on an essential ingredient of our approach here is as follows. It was stressed in the first part that the use of a hierarchical method for studying maps over the integers entails a kind of 2D representation of them due to their necessary pairing with an arbitrary expansion length. In this case we are forced to further increase this into an actually 3D representation including also an indicator over intervals. This technique of data tagging has some unique characteristics regarding branched functions and it shall be generalized appropriately for a much more interesting application that was prescribed as the UMUX device in the last installment of these notes.

Parenthesized languages and Dyck Words

We know from theory that all finite automata accept regular languages and by a theorem due to Chomsky and Schutzenberger, we also know that any context-free language can be represented as an intersection of two simpler ones, the first being a regular one and the second, known as a Dyck language can be defined as with each particular string known as a Dyck word. It is of course irrelevant here whether we use actual parentheses as symbols. For the purposes of arithmetization we can just as well write or.

This way we are back in the context of the Hamming hierarchy plus one or more additional conditions which can be used to provide an appropriate arithmetic sieve for the actual numbers of which the expansion corresponds to a Dyck word in much the same way that Lyndon words were previously associated with minimal generators of cyclic permutations. It is obvious that only one of the two possible encodings suffices the other one being immediately obtainable from its complement while we are forced to work with only even levels of the hierarchy. Additionally, use of the hierarchy allows investigating any underlying fractality of the resulting sets. It is also clear from the beginning that any such sequences over any interval S2L shall be non-amending due to the inherent sensitivity on the constant expansion length at each level.

There are two possible sieves, the first being possible as a decimation strategy where one locates one by one all innermost pairs of the form '01' or '10' and eliminates them until no more pairs are found and the string is exhausted otherwise it is rejected. It is more interesting to check a second one where one can make a combined use of certain simpler properties in previous sections. Eventually, any particular subset with special properties can be isolated via a product of appropriate indicators for as long as a sufficient number of necessary characteristics can be properly expressed, analytically or otherwise. This is a generic strategy for writing functional sieves over large intervals. In this case it is possible to isolate three essential properties

·         The convention used say as '(' à 1, ')' à 0 immediately restricts attention to all odd integers in any S2L. We shall refer to these as the odd and even conventions.

·         For each of the two conventions consistency demands that the opposite symbol must be the last one. This also restricts to either all odd positions of the first half of any Sigma matrix thus being in an interval [0,...,22L - 1- 1], or the other half for the even convention.

·         Equality of left and right symbols also restricts attention to the central combinatoric subset for all even orders of the hierarchy. This can immediately be located by first making use of the Digit-Sum sequence using the recursion of the first section to isolate only those instances with DS2 = k filtered out. The number of possible Dyck words per interval is known by the terms of the Catalan sequence as .

·         Lastly, using the run length analysis of the previous sections in the int2rl function given before with a slight modification in encoding. In here we need strictly negative integers for zero blocks and positive ones for all ones blocks taking all successive partial summands. This effectively allows writing the last condition as simply the total summand of the resulting sequence being positive or zero.

The above leads directly to the below scripts for the odd convention only since the other is just its complement

function [ idx dword] = alldyck( n )

m = 2*n; k = 0;

idx = zeros(1, 2^m );

idx(2:2:end) = 1; % adopt the odd convention counting from zero!

idx = idx.*( sd2( m  ) == n ); % stay inside the central factorial

for i=1:length(idx)

    if idx( i )

        rl = int2rl( i - 1, m ); % extract run lengths / mind counting from zero!

        sgn = rl(1); rl = rl(2:end);

        if sgn, rl( 2:2:end ) = -rl( 2:2:end ); else rl( 1:2:end ) = -rl( 1:2:end ); end % reverse encoding   

        if  any( cumsum( rl ) < 0  % exclude negative partial sums

            idx( i ) = 0;

        else

            k = k + 1; dword(k) = i - 1;

            disp( [ num2str(i-1),' : ', num2str( de2bi( i-1, m) ) ] )

        end

    end

end

disp(['Total of ',num2str(k),' Dyck words found: ', num2str( 100*k/choose(m, n) ),'% of the factorial'])

end

 

function s = ds2( xp )

s = 0; for i=1:xp,  s = [s, s+1];

end

 

The reader can readily verify the correct counting for various orders (n) following the Catalan sequences. The resulting output idx is again a large Boolean indicator over the whole S2n interval for the presence of integers corresponding to valid Dyck words. We can now perform again a run length analysis using the standard runlength library for binary sequences finding again an interesting symmetric structure inherited from the multiperiodic structure of the Hamming hierarchy. The below graph shows exactly this structure for the case n = 5 as a bar graph of the total run length values for the original indicator together with the absolute values of its all zero blocks. The reason for the second is that any such contains solely isolated bits thus only the distances between them is important

RLindicator.jpg0blocks.jpg

Checking the contents of this sequences one can find a repetitive spiking pattern with subdivision. For instance, many of the spikes are of the form c1 = 2k and ck = 2k-1+1, followed by terms of the form Floor( ck/2 ) but many other spikes do not appear predictable. Searching through the entire OEIS database does not revealed anything hence this particular sequence is probably not known or extensively studied as yet.

.