I needed a fast communication between Arduino board and my laptop while working on my first Arduino project. I was monitoring an analog input, the reading on a Hall effect sensor, and some other variables that my Arduino program was using, and that all at a high frequency, 10 kHz. That was producing around 50 kB/s of data that I had to push through the serial connection. I accomplished this finally by modifying the wiring_serial.c file in the Arduino library.
You can dowload the modified here.(See below) See the instructions on how to use the file below.
I first started interfacing Arduino from Java, but soon ran into the problem with standard vs. nonstandard baud rates. The standard baud rates 57600 baud or 115200 baud are not good for the Arduino clock, 16MHz. While 57600 is quite reliable, there are many errors in the data stream from PC to Arduino at 115200. And 57600 was just too slow for my purposes. The solution would be using the baud rates that can be derived from the 16MHz clock using the formula
baud = 16,000,000/(16 * n);
where n = 0, 1, … is an integer. So I thought I would use 250, 500 or 1000 kilobaud. I was relieved when I found that the FT232RL chip, the USB/serial interface for the Arduino board, supports all of these. The RXTX Java library, however, does not support nonstandard baud rates. Since this is the only Java serial library for Windows I know of, I had to move on and use a different tool.
Mathematica 6 with the new dynamic functionality and the handy NETLink package was the perfect alternative. Also the built-in plotting functions came in handy.
Soon I had a 1 Mbaud link up and running. Everything worked fine when sending data from Arduino to PC. But bytes were missed and dropped when sending even short commands to Arduino. I thought that it was the limitation of ATmega168 microcontroller that was not capable of processing the incoming bytes. But looking at the Serial library, the file wiring_serial.c, I found that the library was simply not optimized.
The biggest problem was the use of division in the interrupt routine that reads bytes from the USART controller and stores them into a buffer. Since the ATmega168 doesn’t have an instruction for division, the division takes around 200 clock cycles. That makes the whole routine run for around 15μs (microseconds). But with 1Mbaud serial communication, there is only 10μs to process one byte. With the help of westfw on the Arduino forum, we found that there is a simple fix for this.
The library uses a ring buffer that is defined in the wiring_serial.c file as
#define RX_BUFFER_SIZE 128 unsigned char rx_buffer[RX_BUFFER_SIZE]; int rx_buffer_head = 0; int rx_buffer_tail = 0;
The head and tail locations of the first and the last byte of the data in the buffer. When the data is inserted or removed, the head or tail is incremented, respectively. To make sure that they wrap around and stay in the range 0 to RX_BUFFER_SIZE, the following code is used
rx_buffer_tail = (rx_buffer_tail + 1) % RX_BUFFER_SIZE;
The problem is that % gets compiled as the modulo operation, instead of the much faster bitwise and, & 127. The difference is around 200 clock cycles, or some 13μs, compared to 1 cycle. Also, int is not necessary since I don’t expect anyone use buffer sizes above 256 when there’s only 1kB of SRAM on ATmega168. The solution has two steps:
- define the variables as unsigned char
unsigned char rx_buffer_head = 0; unsigned char rx_buffer_tail = 0;
- use the following code to wrap the values
rx_buffer_tail = rx_buffer_tail + 1; rx_buffer_tail %= RX_BUFFER_SIZE;
Modifying all places with the % operation in the wiring_serial.c file results in a much improved function. I did some tests and found:
- An empty sketch (empty setup() and loop()) with the standard library takes 976 bytes. With the optimization it is only 852 bytes. That’s a difference 124 bytes for everybody, no matter if they’re using Serial. (The reason for that is the interrupt routine that is always linked). Also the other functions in Serial are somewhat shorter.
- The interrupt routine is heavily utilized when receiving data and with the current library it takes around 250 cycles only to read one byte. That’s 15μs. When reading 10 kB/s, 15% of the processor power is spend on that. With 1Mbaud, the routine is not fast enough to read the incoming bytes and they are dropped. Also the standard practice is to have if (Serial.available() >0) in the loop(), which takes as much as the interrupt. The modified routine happily works with 1 Mbaud (1,000,000 baud) speeds. The interrupt routine takes only 4μs to execute and that gives plenty time to process incoming bytes that come every 10μs. Also Serial.available() (14 cycles) and Serial.read() (18 cycles) run much faster, taking only a few cycles.
Since those optimization don’t change anything except they substantially improve the speed and code size, it would be nice to modify the wiring_serial.c file accordingly in the core Arduino library.
The other nice feature would be a buffer for outgoing data. The functions Serial.print and Serial.println send all bytes all at once and lock the program until all bytes are send. It can take a couple millisecond to send just a few bytes on the slower connections like 9600 or 19200. But it makes a lot of difference for faster connections as well. This can be avoided with buffering of sent data.
Until those features are included in the Arduino libraries, I posted the modified version of wiring_serial.c here. Just download the file wiring_serial.c and overwrite the old version in the directory ARDUINODIR\hardware\cores\arduino. Unfortunately, I lost the file while moving to a different server. Also, it is very outdated, so you are safer to download the updated official version.
The size of the outgoing buffer can be set by modifying the code
#define TX_BUFFER_SIZE 32
at the beginning of the file. Just replace 32 by your desired buffer size. But make sure to use a power of 2, i.e. one of the values 4, 8, 16, 32, 64, 128 or 256. With other values, the compiler can’t optimize the % statements and you have the old speeds… Setting buffer size 0 will use the original unbuffered output. That saves memory and some code that would be used by the buffered write routines, in case you don’t need buffered output.
I tested and use this version on my ATmega168. The file should work on ATmega8 as well. It compiles, but I couldn’t test it. If you find any problems or have a suggestion, drop a comment.