Author: José Carlos Martínez Núñez | A01639664
- Table of Contents
- Lexical Categories
- Color Schemes
- Installation
- Usage/Examples
- The implemented algorithms and their execution time
- Screenshots
- Time Complexity
- The SpeedUp by using multiple threads
- Ethical Implications that this type of technology could have on society
- Acknowledgments
- References
- Contributing
The Lexical Categories supported by this program are:
- Preprocessor Keywords
- Reserved Words
- Types
- Operators
- Booleans
- Grouping Characters
- Multi Line Comments
- Single Line Comments
- Strings
- Function Names
- Class Identifiers
- Identifiers
- Package Names
These are all matched with their respective regular expressions in the Lexer.l file.
The syntax-highlighter supports two color schemes for each lexical category:
# Generate Lexical Analyzer
flex Lexer.l
# Compile Generated Analyzer with the main.cpp
g++ -pthread -std=c++17 Lexer.cpp main.cpp -o "syntax-highlighter"./syntax-highlighter [FILE | DIRECTORY]The output file will be saved in output/[FILE | DIRECTORY].html.
Once the program opens a file, the lexical analyzer will loop through each character, once a match is found the resulting string will be HTML encoded, once again looping through each matched character and replacing every HTML specific symbol (<, >, etc.) with their respective alternatives. Then depending on the selected color scheme it’ll assign a color to the specific matched string and write it to the output file.
The output of the program is an html document with the lexical categories in their respective colors. The console output will be the execution time of the program measured in milliseconds.
Note: The execution time of the program will vary depending on your computer specs.
The following example files are located in the examples folder.
![]() |
![]() |
|---|
**“example.js” Execution Time 2 milliseconds**
![]() |
![]() |
|---|
**“example.cpp” Execution Time 1 milliseconds**
![]() |
![]() |
|---|
**“example.c” Execution Time 1 milliseconds**
Flex (fast lexical analyzer generator) is a free and open-source software alternative to lex. It is a computer program that generates lexical analyzers (also known as “scanners” or “lexers”).
A Flex lexical analyzer usually has time complexity O(n) in the length of the input. That is, it performs a constant number of operations for each input symbol.
Apart from the code generated by the lexical analyzer once a token is matched we run another O(n) algorithm to remove any HTML special characters.
This means that the total time complexity of the whole program is O(n2).
Nevertheless, as seen in the last section the program runs fairly fast.
For these tests a total of 898 files were used (located at
/examples folder) with a total of 320 different subfolders divided
among the three languages JavaScript, C and C++.
As seen in the image the total time taken to convert all files was 9313 milliseconds.
As seen in the image the total time taken to convert all files was 4634 milliseconds.
The SpeedUp of a program is calculated by the following formula:
Where:
-
$p$ is the number of processors (or cores) -
$T_1$ is the time taken to execute the single processor version of the program -
$T_p$ is the time taken to execute the multi-processor version of the program using$p$ processors - Lastly,
$S_p$ is the SpeedUp obtained by using$p$ processors.
Using the above formula we can calculate the SpeedUp using a total of 8 threads.
We can conclude that by using 8 threads we practically doubled the speed of our program.
This type of technology has many uses apart from creating syntax highlighters, technologies like Flex allow us to create lexical analyzers that can be used for all types of purposes from creating our own interpreters for other programming languages to automating lots of processes that require identification of tokens. One example of how these technologies can help society is analyzing laws to fix ambiguous wording or in the research field to analyze dead languages from our past so that understand more about other civilizations. Performing all these tasks with the help of computers will greatly improve the amount of time it’ll take to do this manually. It is incredibly important that technologies like these are used for good and not personal gain.
This technology has incredible implications on society as it can basically speed up most actions that take a long time in all programming aplications. Using the example stated above this type of technology can seriously speed up the process of analyzing languages, making it so that we can perform the advances stated above orders of magnitude faster, taking advantage of all the computer power at our disposal.
The following GitHub repos were used as example code for this program:
-
Funchal, G. (2011, April 14). Most efficient way to escape XML/HTML in C++ string?. Stack Overflow. https://stackoverflow.com/a/5665377
-
Levine, John R.; Mason, Tony; Brown, Doug (1992). lex & yacc (2nd ed.). O’Reilly. p. 279. ISBN 1-56592-000-7. A freely available version of lex is flex.
Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature) - Commit your Changes (
git commit -m 'Add some AmazingFeature') - Push to the Branch (
git push origin feature/AmazingFeature) - Open a Pull Request







