Saturday, November 28, 2015

[IDL 2 CPP] Translate IDL Array Subscript Ranges to C++

I started a project that translates huge amount of IDL codes to C++ for sake of performance improvements. IDL is acronym for Interactive Data Language. This is IDL's homepage. And here is Wikipedia page.

Since the original data processing program which was written in IDL is so large, it's impossible to translate manually, I decide to develop a translator then translate it into C++ source code. I've worked on it for a couple of weeks. And today I finished the ranged array part. I think it's interesting so I'm going to share the development experience for you.

I am using flex/bison to generate a parser. The grammar rules for IDL's array and subscripts are here(ask me for sample code):

postfix_expression:
primary_expression
| postfix_expression '[' array_subscripts ']'

array_subscripts:
array_subscript
| array_subscripts ',' array_subscript
;

array_subscript:
expression
| range_or_whole_range ':' range_or_whole_range
;

range_or_whole_range:
expression
| '*'
;

No action codes here yet. These grammar rules cover simple array subscripts which will visit an element of an array, and also the subscript ranges case. For the simple case, the IDL code is easy to translate. For example, IDL code:

a = b[1]

which would be translated to

a = b[1];

Just exactly the same code in C++. But it would be more complex if the code uses ranged array, like this:

a = b[1:10]

Since there is no relevant grammar in C++, we have to do some fundamental support in C++. Firstly, there should be an object which could represent a range of an array. Say, we have this:

class ArrayDimDesc
{
int size;
int startIndex;
int endIndex;
};

class RangedArray
{
void* array;
int Dimensions;
ArrayDimDesc arrayDimDesc[8];
};

So IDL ranged array b[1:10] could be represented by class RangedArray object. This way made things easier. Replace b[1:10] by a function call to MakeArrayRange1D(), the whole statement can be translated just as simple array subscript case.

Let me show you a more complex IDL code:

coeffbk = where(sset.bkmask[nord:*, 1, 2:3] NE 0)

And the translated C++ code is:

coeffbk = where(MakeArrayRange3D(sset.bkmask, nord, -2, 1, -1, 2, 3) != 0);

I haven't finished all the work. But the main idea is just as above. There are lots of further work to do:
1, Memory management
2, Introduce smart pointer?
3, Is it necessary to move MakeArrayRange3D in front of the statement?



==== BISON grammar rules for IDL ranged array ======================
postfix_expression:
primary_expression
| postfix_expression '[' array_subscripts ']'
{
$$ = AllocBuff();

// check if array_subscripts is a range
if(IsRangedSubscripts($3))
{
FillMinusOneToEmptyRange($3);
// case 1: range, compose a ranged array; $3 is subscripts
// MakeArrayRange(array, startIndex, endIndex)
if($3->dimension == 1)
{
// 1 dim
sprintf_s($$, TEXT_BUFFER_LEN, "MakeArrayRange1D(%s, %s, %s)",
$1,
$3->subscriptsRange[0].rangeStart,
$3->subscriptsRange[0].rangeEnd);
}
else if($3->dimension == 2)
{
sprintf_s($$, TEXT_BUFFER_LEN, "MakeArrayRange2D(%s, %s, %s, %s, %s)",
$1,
$3->subscriptsRange[0].rangeStart,
$3->subscriptsRange[0].rangeEnd,
$3->subscriptsRange[1].rangeStart,
$3->subscriptsRange[1].rangeEnd);
}
else if($3->dimension == 3)
{
sprintf_s($$, TEXT_BUFFER_LEN, "MakeArrayRange3D(%s, %s, %s, %s, %s, %s, %s)",
$1,
$3->subscriptsRange[0].rangeStart,
$3->subscriptsRange[0].rangeEnd,
$3->subscriptsRange[1].rangeStart,
$3->subscriptsRange[1].rangeEnd,
$3->subscriptsRange[2].rangeStart,
$3->subscriptsRange[2].rangeEnd);
}
else
{
// unsupport dimension
yyerror("Unsupported array dimension in ranged array, dimension is %d.\n", $3->dimension);
YYABORT;
}
}
else
{
// case 2: scalar, simple and easy case
char *subscript_buf = AllocBuff();
subscript_buf[0] = 0;
int pos = 0;
for(int i=0; i<$3->dimension; i++)
{
sprintf_s((subscript_buf+pos), TEXT_BUFFER_LEN, "[%s]", $3->subscriptsRange[i].rangeStart);
pos = strlen(subscript_buf);
}

// ##ATTENTION :IDL array subscripts are different than those of C++'s##
sprintf_s($$, TEXT_BUFFER_LEN, "%s%s",
$1,
subscript_buf);

// release subscript_buf
}

printf("array final code: %s\n", $$);
}
| postfix_expression '.' IDENTIFIER
{
$$ = AllocBuff();
sprintf_s($$, TEXT_BUFFER_LEN, "%s.%s", $1, $3);
}
| function_call
;

array_subscripts:
array_subscript
{
$$ = Malloc(sizeof(struct ArraySubscripts));
$$->dimension = 1;
strcpy($$->subscriptsRange[0].rangeStart, $1->rangeStart);
strcpy($$->subscriptsRange[0].rangeEnd, $1->rangeEnd);

// release $1
}
| array_subscripts ',' array_subscript
{
strcpy($$->subscriptsRange[$$->dimension].rangeStart, $3->rangeStart);
strcpy($$->subscriptsRange[$$->dimension].rangeEnd, $3->rangeEnd);
$$->dimension++;
}
;

array_subscript:
expression
{// single subscript
struct ArraySubscript *pArraySubscript = Malloc(sizeof(struct ArraySubscript));
strcpy(pArraySubscript->rangeStart, $1);
pArraySubscript->rangeEnd[0] = NULL;
$$ = pArraySubscript;
}
| range_or_whole_range ':' range_or_whole_range
{// Ranged subscript
struct ArraySubscript *pArraySubscript = Malloc(sizeof(struct ArraySubscript));
strcpy(pArraySubscript->rangeStart, $1);
strcpy(pArraySubscript->rangeEnd, $3);
$$ = pArraySubscript;
}
;

range_or_whole_range:
expression
| '*' { $$ = "*"; }
;