Saturday, June 11, 2016

What kind of strings have Kolmogorov complexity equal to their length ?

I have been doing some reading about Information Theory and Kolmogorov complexity lately. I just thought about the following problem. Consider a language of all C programs (strings) which print themselves, formally Let $$L_{self} = \{ w\,\, | w\,\, \mbox{is a C program which prints itself} \}$$.
Following is an example of a C program which prints itself (on a side-note the trick behind the construction is to use the ASCII values for quotes and newlines). Clearly the Kolmogorov complexity of each w is less than or equal to the length of w -- given that we have constructed a program of length same as the string w.
I'm wondering if this is indeed optimal ? -- that is given a w, if its possible to produce a program to generate w and size strictly less than |w|.

char qt=34;
char nl=10;
char *str="char qt=34;%cchar nl=10;%cchar *str=%c%s%c;%cint main() {%c printf(str,nl,nl,qt,str,qt,nl,nl,nl,nl);%c}%c";
int main() {
printf(str,nl,nl,qt,str,qt,nl,nl,nl,nl);
}


BTW, compile the program with
gcc -w

Saturday, March 07, 2015

Some Special Cases Of the Sequence Assembly Problem.

Sequence Assembly (SA) is a well studied problem in Bioinformatics. To quickly summarize the problem: let $\Sigma$ be some alphabet and $S =\{ s_1 , s_2 \ldots s_n\}$ be a set of strings from $\Sigma$. The goal of the SA problem is re-construct a string $s_a$ such that every $s_i \in S$ is a sub-string in $s_a$. Although its very tempting to reduce the SA problem to the Shortest Super String Problem (which happens to be $\mathcal{NP}$-hard), the repeats in genome makes the SA problem complex and ambiguous.

Any way one of my friends showed me this problem yesterday, which is a much simpler special case of the SA problem. Let $s_{\pi}$ be a some permutation (unknown) of $\Sigma$. Now given a set $S' = \{s'_1,s'_2\ldots s'_m\}$ of sub-sequences of $s_{\pi}$, we want to re-construct this un-known $s_{\pi}$. Just like the SA problem for a given $S'$ there could be several answers for $s_{\pi}$ -- in which case we want to find out the lexicographically smallest permutation. If $\Sigma_i |s'_i| = N$ the problem could be solved efficiently by reducing it to a topological sorting algorithm on a DAG. Following is a quick solution to this problem I wrote while watching the RSA vs PAK world cup cricket match -- I'm little disappointed that RSA lost despite a ferocious knock by AB. The construction of the DAG from $S'$ takes $O(Nlog(N)$ operations, however the cost of topological sorting to report the answer is still $O(N)$ operations.

// 03/06/2015: vamsik (at) engr.uconn.edu//
#include <cmath>
#include <cstdio>
#include <map>
#include <list>
#include <unordered_map>
#include <iostream>
#include <algorithm>
#include <cassert>
using namespace std;

struct CDirectedGraph{
map<unsigned int, map<unsigned int, bool> > m_aEdges;
unordered_map<unsigned int, unsigned int> m_aInDegree;

inline CDirectedGraph() : m_aEdges(), m_aInDegree() {}

inline void add_edge(unsigned int i, unsigned int j){
if((m_aEdges[i]).find(j) == (m_aEdges[i]).end()){
if(m_aInDegree.find(j) == m_aInDegree.end()){
m_aInDegree[j] = 0;
}
m_aInDegree[j]++;
(m_aEdges[i])[j] = true;
}
}

inline void top_sort(){
map<unsigned int, bool> snodes;
unsigned int v;
for(auto vItr=m_aEdges.begin(); vItr!=m_aEdges.end(); ++vItr){
if(!m_aInDegree[vItr->first]){
snodes[vItr->first] = true;
}
}

while(!snodes.empty()){
auto vItr = snodes.begin();
v = vItr->first;
auto eEnd = m_aEdges[v].end();
auto eBeg = m_aEdges[v].begin();

for(auto eItr=eBeg;  eItr!=eEnd; ++eItr){
m_aInDegree[eItr->first]--;
if(!m_aInDegree[eItr->first]){
snodes[eItr->first] = true;
}
}
printf("%u ", v);
snodes.erase(vItr);
}
}
};

int main() {
unsigned int N;
unsigned int K, pnode, cnode;
scanf("%u", &N);
CDirectedGraph diGraph;

for(unsigned int i=0; i<N; i++){
scanf("%u", &K);
scanf("%u", &pnode);
for(unsigned int j=0; j<K-1; j++){
scanf("%u", &cnode);
pnode = cnode;
}
}
diGraph.top_sort();
return 0;
}


Sunday, November 16, 2014

Pseudo Polynomial Dynamic Programming

Sub Set Sum and Integer Knapsack are few examples of combinatorial optimization problems with pseudo polynomial running time algorithms. Often we the implementation of these pseudo polynomial dynamic programming algorithms is done using a huge array to hold intermediate sub problems (e.g. $OPT[K][n]$ would indicate a presence of a subset in the first $n$ integer elements ($x_1,x_2\ldots x_n$) which sum to $K$). If $N$ is the bound on the subset value you were looking for then the space complexity would $\Theta(Nn)$ -- which is clearly exponential on the input size which is $O(n\log(N))$ bits. To build robust algorithms -- reduce the dependence on the value of $N$ -- for these pseudo polynomial algorithms, it would be efficient if we could exploit the underlying DAG (Directed Acyclic Graph) nature of any dynamic programming formulation. But that might be a little bit more code to traverse the sub-problems in a topological order. However we can make our life a little easy by using a hash table to keep track of the sub-problems, which might add to the worst case asymptotic runtime by a factor of $O(log(nN))$. On the other hand the average case of sub-problem lookup would be a constant and would run fairly efficiently in practice.

The following code is a quick illustration of using a hash tables to keep track (and also lookup) of sub-problems in pseudo polynomial dynamic programming algorithms. You can read the problem statement here . Notice the obvious generalization of the problem (scheduling with constant number of resources) from the way I have formulated the dynamic program.

// 11/15/2014: vamsik@engr.uconn.edu//
#include
#include
#include
#include
#include
#include
using namespace std;
typedef std::pair KeyType;
struct KeyTypeHasher{
inline size_t operator()(const KeyType& a) const {
return (a.first)*1729 + a.second;
}
};
typedef std::unordered_map SubProbType;
typedef typename SubProbType::iterator SubProbItr;

bool IsFeasible(const std::vector& A, unsigned int G){
SubProbType sub_problems_even, sub_problems_odd;
SubProbItr itr;

//init//
if(A[0] <= G){
sub_problems_even[KeyType(A[0], 0)] = true;
sub_problems_even[KeyType(0, A[0])] = true;
}

for(size_t i=1; i < A.size(); i++){
SubProbType& curr_problems =
(i%2) ? sub_problems_even : sub_problems_odd;
SubProbType& next_problems =
(i%2) ? sub_problems_odd : sub_problems_even;

if(curr_problems.empty()){
return false;
}
next_problems.clear();
//create new sub-problems in the next level//
for(itr = curr_problems.begin();
itr != curr_problems.end(); ++itr){
const KeyType &a = itr->first;

if( (a.first + A[i]) <= G){
next_problems[ KeyType((a.first+A[i]), a.second)] = true;
}

if( (a.second + A[i]) <= G){
next_problems[ KeyType(a.first, (a.second + A[i]))] = true;
}
}
curr_problems.clear();
}

return ((A.size())%2) ? !(sub_problems_even.empty()) : !(sub_problems_odd.empty());
}

int main() {
/* Enter your code here. Read input from STDIN. Print output to STDOUT */
size_t T,N;
unsigned int G, a;
scanf("%lu", &T);
for(size_t i=0; i < T; i++){
std::vector A;
scanf("%lu %u",&N, &G);
for(size_t i=0; i < N; i++){
scanf("%u", &a);
A.push_back(a);
}
printf("%s\n", IsFeasible(A, G) ? "YES" : "NO");
}
return 0;
}


Monday, August 25, 2014

Planarity Puzzle

I'm obsessed with the untangling of the planar graphs. I'm now able to untangle a 30-vertex planar graph in ~240sec see below (yellow one is the final solution):

Sunday, August 10, 2014

Breaking cube into five tets

Picture says it all:

Sunday, January 05, 2014

Useful properties of the function $$f(x) = \frac{x}{x-k}$$

The function f(x) = x/(x-k) has some very useful properties -- especially when we restrict x to Integers. Let me start with a simple question about Maxima and Minima of f(x). Applying elementary Calculus may not be useful since f(x) ↦ +∞ as x ↦ k+ (goes to -∞ from the other side and is discontinuous). A quick plot on WolframAlpha reveals the vertical asymptotes [http://wolfr.am/JSXDdV].

Now lets move our attention to the case where x is restricted to Integers. First lets consider the function f(x) when x takes on integers strictly greater than k (i.e. x > k). Notice that f(x) is decreasing the maximum in this case occurs when x = k+1. This gives us the following useful fact:
$$f(n) = \frac{n}{n-k} \leq k+1 \,\,\,\,\,\,\, n \in \{ x\, | \, x > k \vee x \in \cal{I} \}$$

The above inequality is quite handy in your combinatorial analysis tool book. Let me give you a simple application of this inequality by proving that nlog(n) = O(log(n!)) (also recall that log(n!) = O(nlog(n)) which means nlog(n) = Θ(log(n!))).

$$\begin{array}{lcl} && \frac{n}{n-k} \leq k+1 \,\,\,\,\,\, k \geq 1\\ && \\ \rightarrow && n \leq (n-k)\times (k+1) \\ \rightarrow && n < (n-k) \times (k+k) = 2 (n-k)\times k && \\ &&\\ \rightarrow && n < 2(n-1)\times 1\\ \rightarrow && n < 2(n-2)\times 2\\ \rightarrow && n < 2(n-3)\times 3\\ && \ldots \\ \rightarrow && n < 2(n-(n-1))\times (n-1) = 2(1\times (n-1)\\ &&\\ &&\mbox{multiplying all the above n-1 inequalities you get the following} \\ &&\\ \rightarrow&& n^{n-1} < \displaystyle\Pi_{k=1}^{n-1} 2(n-k)\times k = 2^n ((n-1)!)^2 \\ \rightarrow&& n^n < 2^n (n!)^2 \\ \rightarrow&& \log(n^n) < \log(2^n(n!)^2) = 2\log(n!) + n \\ \rightarrow&& n\log(n) < 2\log(n!) + n < 2\log(n!) + \log(n!) = 3\log(n!) \\ \rightarrow&& n\log(n) = O(\log(n!)) \,\, \Box\\ \end{array}$$