Discussion:
Bug in regex compiler for language definition files
Andrius Rinkevicius
2014-08-24 11:38:28 UTC
Permalink
Olivier,
this is probably related to my earlier post somewhat. It looks that I found
bug in regex pattern compiler for language files.
In order to reproduce create js file with
var foo =document.body.textAnchor;
or something similar. Notice the highlighting.
Now modify all-javascript.bfinc file, line 1733 to be:
<element pattern="=[ ]/" is_regex="1" highlight="js-brackets">
This should be equivalent to original , which had patter "= /". However,
the highlighting behavior will change after restart of bf. Change is a
little bit weird, the "document" string is not highlighted for some reason.
But is we add two spaces between = and document, then document gets correct
highlighting again.
Is this bug? Or, maybe, I am doing something incorrectly?
Andrius
Olivier Sessink
2014-08-24 19:44:18 UTC
Permalink
Post by Andrius Rinkevicius
Olivier,
this is probably related to my earlier post somewhat. It looks that I
found bug in regex pattern compiler for language files.
In order to reproduce create js file with
var foo =document.body.textAnchor;
or something similar. Notice the highlighting.
<element pattern="=[ ]/" is_regex="1" highlight="js-brackets">
This should be equivalent to original , which had patter "= /".
However, the highlighting behavior will change after restart of bf.
Change is a little bit weird, the "document" string is not highlighted
for some reason. But is we add two spaces between = and document, then
document gets correct highlighting again.
Is this bug? Or, maybe, I am doing something incorrectly?
Yes the regex compiler is not perfect. It works good if the regex
pattern does not overlap with other patterns, but if it does overlap it
has issues. This may sound a little cryptic: all elements in a single
context are compiled into one long regex-like pattern that eventually
becomes a DFA table. If some of the DFA states overlap for multiple
subpatterns the compiler has some trouble dealing with that.

There are also issues with the javascript language file. I fixed quite a
few a while ago, but I remember not everything was solved. There might
be error output in the console when the javascript file is loaded.

Olivier
--
Bluefish website http://bluefish.openoffice.nl/
Blog http://oli4444.wordpress.com/
Andrius Rinkevicius
2014-08-26 17:31:59 UTC
Permalink
Olivier,
interesting enough, I am not seeing any error messages about overlapping
patters (if I disable jQuery syntax - jQuery lang file has number of
overlaps with generic js file, so it needs to be fixed).
I had a look at many lang files, and all of them are using multiple regex
expressions and context. It is hard to understand why some of them working
and exactly this one is not. I guess I need to continue to find most
optimal solution by trial and error.
One of possible causes is that regex pattern I am trying to use consist
from main context symbols: "= /". But still, it is not clear for me why it
works in non-regex form.
Andrius
Post by Olivier Sessink
Yes the regex compiler is not perfect. It works good if the regex
pattern does not overlap with other patterns, but if it does overlap it
has issues. This may sound a little cryptic: all elements in a single
context are compiled into one long regex-like pattern that eventually
becomes a DFA table. If some of the DFA states overlap for multiple
subpatterns the compiler has some trouble dealing with that.
There are also issues with the javascript language file. I fixed quite a
few a while ago, but I remember not everything was solved. There might
be error output in the console when the javascript file is loaded.
Loading...