Rusty-parser optimization
In developing multi-dimensional arrays in rusty-parser project, we stumbled upon an issue that parsing take insanely slow sometimes, even with a very simple script:
var a = array([[[1]]]);
print(a, type(a));
This seemingly naive script takes more than 2 seconds to parse even in release build.
real 0m2.447s
user 0m2.446s
sys 0m0.001s
This inefficiency quickly bloats as you add another level of nested arrays. Even replacing the function call with another extra level of array:
var a = [[[[1]]]];
makes it unreasonably slow.
real 0m28.689s
user 0m28.689s
sys 0m0.000s
The cause
I knew recursive descent parser (or LL(k) parsers in general) needs to backtrack if one of the candidates does not match if there are multiple ways to parse a specific code. However, I did not expect exponential growth with simple expression like array literals.
This is important since we want to build array manipulation language and the level of nested arrays can be 3 to 4 levels quickly.
I found that we use a pattern like below pretty often.
fn array_rows(i: Span) -> IResult<Span, Vec<Vec<Expression>>> {
let (r, (mut val, last)) = pair(
many0(terminated(array_row, ws(tag(";")))), opt(array_row)
)(i)?;
if let Some(last) = last {
val.push(last);
}
Ok((r, val))
}
This means a list of array_row
separated by a semicolon, optionally with a trailing semicolon, so both [1; 2]
and [1; 2; ]
are accepted. However, this will call array_row
twice if the input has no semicolons. In itself, this will only double the number of backtracks, but nesting this kind of parsers will quickly grow the number of backtracks.
The fix is easy. Instead of try <array_row> ;
and try <array_row>
again, we can parse <array_row>
first, and concatenate ;
later.
fn array_rows(i: Span) -> IResult<Span, Vec<Vec<Expression>>> {
terminated(separated_list0(char(';'), array_row), opt(ws(char(';'))))(i)
}
It’s a bit more complicated code, but the effect is remarkable as we see later.
Measuring invokes
I injected code like below to count the calls to each parser function. Since nom
doesn’t allow us to put context variables, we need to put a global variable to keep track of number of calls. Thankfully, it’s not too difficult, even though Rust doesn’t allow mutable globals:
static ARRAY_LIT: std::sync::atomic::AtomicUsize = std::sync::atomic::AtomicUsize::new(0);
pub(crate) fn array_literal(i: Span) -> IResult<Span, Expression> {
ARRAY_LIT.fetch_add(1, Relaxed);
// ...
Ok((r, Expression::new(ExprEnum::ArrLiteral(val), span)))
}
Before the improvement to array_rows
:
array_rows: 225888
array_literal_calls: 112968
primary_expression_calls: 5421384
postfix_expression_calls: 1807132
fn_invoke_arg: 4
cmp_expr: 903566
expr_calls: 451783
After:
array_rows: 28848
array_literal_calls: 28872
primary_expression_calls: 692424
postfix_expression_calls: 230812
fn_invoke_arg: 4
cmp_expr: 115406
expr_calls: 57703
We can apply the same optimization to array_row
, which does almost the same thing as array_rows
with ,
delimiter.
array_rows: 7536
array_literal_calls: 7560
primary_expression_calls: 90504
postfix_expression_calls: 30172
fn_invoke_arg: 4
cmp_expr: 15086
expr_calls: 7543
We can further optimize cmp_expr
parser to not repeat expr
array_rows: 516
array_literal_calls: 540
primary_expression_calls: 3132
postfix_expression_calls: 1048
fn_invoke_arg: 4
fn_invoke: 1048
cmp_expr: 1048
expr_calls: 524
I also optimized postfix_expression
and assign_expr
in a similar manner, and now the performance look optimal:
array_rows: 3
array_literal_calls: 5
primary_expression_calls: 6
postfix_expression_calls: 7
fn_invoke_arg: 1
fn_invoke: 7
cmp_expr: 7
assign_expr_calls: 7