DFASTAR is a DFA lexer generator, which reads a Lexical grammar and generates a DFA state machine in C/C++ source code that is capable of identifying the tokens defined in the lexical grammar. DFASTAR creates very fast lexers. It has 2 options for choosing lexer size: 'ts' for small and 'tm' for medium. The medium size lexers are about 10% FASTER than the small lexers.
High-performance Lexers
A lot of research and effort was put into DFASTAR for creating fast lexers. As a result, the C/C++ Lexer Speed Test shows that a DFASTAR lexer can process 31,286,000 tokens per second (in memory) when reading C/C++ source code. In this test, a DFASTAR lexer was 85% faster than a lexer created by Flex. When using the 'tm' option of DFASTAR, the generated lexer reads 34,290,000 tokens per second.
Small Lexers
In the C/C++ test, DFASTAR generated a lexer as small as the one generated by FLEX, but the DFASTAR lexer is 85% faster. Comparison to FLEX lexers is difficult because FLEX generates a program rather than a lexer. For testing, I had to manually copy code from the FLEX output and Paste It into the test program.
Generation & Build Time
The build time for DFASTAR lexers is very fast, running about 2 seconds for lexical grammars that have less than 2,000 keywords.
Table-Driven vs Direct Code
DFASTAR and FLEX generate table-driven lexers. The table-driven lexers compile and Link very fast compared to direct-code lexers. The number of lines of code generated by DFASTAR is small compared to direct-code lexers.
Keywords and Identifiers
DFASTAR lexers can recognize keywords and identifiers, simultaneously. This is faster than classifying all words as identifiers and doing a symbol-table lookup to discover that a word is a keyword.