Turn-key PCB assembly services in prototype quantities or low-volume to mid-volume production runs

Microcontroller C programming

There is no doubt that everyone faces C language when programming microcontrollers. This is the most popular language among hardware programmers. There are plenty of books about this language – you have to open and read. This article is not about language basics, but c language’s effectiveness in embedded systems touched.

Quite often, you can find good examples of effective algorithms: faster code performance and code size. To write a good optimal algorithm, you have to know the structure of the compiler. Of course, we will not analyze compilers, but we can look through a few rules and tricks and achieve an optimal algorithm.

The Begining

What to do when the limit of program size is overrun, or there is not enough speed in some cases. Of course, you will say that it is best to write in assembler in these parts, but is this a solution? Maybe there is a way to do this with C?
Not a secret that the same result can be achieved in different ways, e.g., some programmers like to use dynamic variables other arrays, some like to use case other if statements. This is a Programming Style, and most everyone has their own style. For examples, bellow will be used AVR-GCC compiler with optimization keys “-O0″ (without optimization), â”-O3″ (optimization on speed), “-Os” (optimization on code size). “-Os” and “-O3” optimizations gives similar results.

What C cannot do

C is a high-level language; this means that this language isn’t tied to particular hardware. This why programmers don’t have access to processor recourses, e.g., stack or flags or registers. For example, with pure C:
• Impossible to check whether or not there was an overflow after the arithmetic operation (to check this, you have to read overflow flag);
It is impossible to organize multithreaded operations because you should save register values to save the states.
Of course, we all know that for those tasks, we can use libraries like io.h.

Harvard Architecture

Usually, we program for fon Neumann architecture (program and data memory uses the same memory space). But in the Harvard type, there exist many types of memory: Flash, EPROM, RAM, and they are different from each other. In traditional C, there is no provided support for different types of memory. It would be convenient to write like this:

ram char buffer[10]; // Array in RAM
disk char file[10]; // Array in  Disc

for (i=0;i<10;i++) // Writing 10 chars '0'
{
file[i]='0'; //To disc
}
strncpy(file,buffer,10); // from disc to buffer

Because it is not supported, we need to use special functions to work with different kinds of memories.
An array of structures or structure of arrays
The structure is one of the convenient C language constructions. The use of structures makes code easier to read and analyze; this is only one way to write data to memory in order. But let’s look if the use of structures is always a good way.
For example, let’s describe 10 sensors:

struct SENSOR
{
unsigned char state;
unsigned char value;
unsigned char count;
}

struct SENSOR Sensors[10];
What do you think what compiler is going to do when reading x sensor value. It multiplies by 3 and adds 1. So there is multiplication needed to read one byte – it is very ineffective. It is better to use arrays:

unsigned char Sensor_states[10];
unsigned char Sensor_values[10];
unsigned char Sensor_counts[10];

This is less readable, but code is performed faster because multiplication isn’t needed. But on the other hand, it is good to use structures when needed operations with structures like copying.
It is good to mention that the compiler, in this case, multiplication operation changed to shift, but in more complicated structures, it is impossible to do this.
The results:

==============================================================
-O0             -O3
words  clocks    words  clocks
==============================================================
Reading from structure            16      19      12       13
Reading from array                9      12       6        7
--------------------------------------------------------------
Value (times)                      1.8     1.6     2.0      1.9
==============================================================
Copy of structure	           41      81      26       42
Copy of array elements	           44      55      43       49
--------------------------------------------------------------
Value (times)                     0.9     1.5     0.6      0.9
==============================================================

Listings can be viewed here.

Branching “Switch”

In C language conditional sentences, it is convenient to write using switch(). The compiler well optimizes this construction. But there are some nuances like switch operator changes variable to an int type, even if the variable was chat type. For example:

char a;

char With_switch()
{
switch(a)
{
case '0': return 0;
case '1': return 1;
case 'A': return 2;
case 'B': return 3;
default:  return 255;
}
}

char With_if()
{
if(a=='0') return 0;
else if(a=='1') return 1;
else if(a=='A') return 2;
else if(a=='B') return 3;
else return 255;
}

Changing of types ads more code size. Listings are here.
Results:

====================================
-O0     -O3

words  words
====================================
With_Switch             57      33
With_If                 40      25
------------------------------------
Value (times)          1.4     1.3
====================================

Signed bytes

From AVR-GCC ver3.2, there is no possibility of passing a char type variable or getting results from functions. Always chat types are expanded to int type.

char b;
unsigned char get_b_unsigned()
{
return b;
}
signed char get_b_signed()
{
return b;
}

After compiling with the “-O3″ key, the results aren’t depending on variable “b.” Listing fragment:

get_b_unsigned:
lds r24,b     ; LSB to r24
clr r25       ; MSB=0.
ret

get_b_signed:
lds r24,b     ; LSB to r24
clr r25       ;MSB=0
sbrc r24,7    ;skip if LSB>0
com r25       ;0xFF otherwise
ret

Despite the result, which is one byte, GCC always calculates and another byte. Thus, if the number is unsigned, then MSB always is equal to zero; if there is a signed number, then the processor has to do 2 additional operations
Of course, it doesn’t play a significant role in performance in many cases, but if you know that there won’t be results with signs, it is better to use unsigned types. This might be more actual consideration for those who like to use a lot of functions, but on the other hand, performance is slower because of the frequent use of functions.
To use unsigned types by default, it can be set “-funsigned-char” in a makefile. This makes all char types to be unsigned, otherwise, the compiler thinks differently.

Help compiler

There are situations when in big program branches, there are common parts, e.g., branches end with the same sentences. For example, clean buffer, increment counter, set a flag, and so on. It is not always convenient to pot those operations in one function or macro. Well, the compiler can do this by itself – it just needs a little help.
Let’s see an example. The function does something depending on the variable, and then it does the same operations: increments counter, nulls state, and ads length to index (this is just an example to demonstrate). Lets write switch() statement:

void long_branch(unsigned char c)
{
switch(c)
{
case 'a':
UDR = 'A';
count++;
index+=length;
state=0;
break;
case 'b':
UDR = 'B';
state=0;
count++;
index+=length;
break;
case 'c':
UDR = 'C';
index+=length;
state=0;
count++;
break;
defualt:
error=1;
state=0;
break;
}
}

Compile this with the “-O3″ key. The result – 66 words. Let’s reorder sentences:

void long_branch_opt(unsigned char c)
{
switch(c)
{
case 'a':
UDR = 'A';
count++;
index+=length;
state=0;
break;
case 'b':
UDR = 'B';
count++;
index+=length;
state=0;
break;
case 'c':
UDR = 'C';
count++;
index+=length;
state=0;
break;
defualt:
error=1;
state=0;
break;
}
}

The compilation gives 36 words.
What happened? Nothing, just after reordering, every branch ends with the same parts. The compiler recognizes similar parts and compiled one part and in those places puts JMP. It is important to remember that those parts should be at the ends of branches. Otherwise, it doesn’t work.
In real programs, there is not always possible to do this, but:

  • 1. Sometimes, it can be done artificially adding code;
  • 2. Not always, all parts must end equally – there can be several groups of different parts.

So code size can be reduced by changing the order in sentences.

Why the “heaps” are needed

Many programmers like to use dynamic memory. For this reason, a special structure is used – heap. In computers, the structures are managed by the operation system, but in microcontrollers, where is no operation system, the compiler creates a special segment. Also, there are defined functions malloc and free for memory allocating and freeing in the standard library.
Sometimes it is convenient to use dynamic memory, but the price for convenience is high. And when resources are limited, this can be critical.
What happens when a heap is used? Let’s write a simple program that doesn’t use dynamic memory:

char a[100];
void main(void)
{
a[30]=77;
}

The compiled code size is small. Write to an array element done by two clock cycles because each element’s address is known. The program size is 50 words. Data memory is 100 bytes. The main() function is performed in 9 cycles with stack init.
The same program but heap is used:

char * a;
void main(void)
{
a=malloc(100);
a[30]=77;
free(a);
}

The program size is 325 words; the data memory is 114 bytes. Write to an array element is done in 6 cycles (5 opcodes). The main() function is done in 147 cycles with stack init.
The program increased by 275 words, where malloc takes 157 words, and free function takes 104 words. The other 14 words are for calling those functions. So there is more complicated work with array elements. The initialization of the array writes 0 to each element. 14 bytes of memory in data memory is used for: heap memory organizing variables (10 bytes), 2 bytes is the pointer to the array, and 2 bytes are in front of the memory block to save its size is used in the free function.
So it is better not to use dynamic memory when resources are minimal.

Typical errors

Let’s go through a few typical errors that can help to avoid some troubling.

Reading string from flash memory

AVR-GCC doesn’t understand where the pointer has to show – to program memory or data memory. By default is RAM. To read from Flash memory, you should use a macro which is in the “pgmspace.h” library:

#include
#include
prog_char hello_str[]="Hello AVR!";
void puts(char * str)
{
while(PRG_RDB(str) != 0)
{
PORTB=PRG_RDB(str++);
}
}
void main(void)
{
puts(Hello_str);
}

Reading bit from port

void Wait_for_bit()
{
while( PINB & 0x01 );
}

When optimization is turned on, the compiler calculates (PINB & 0x01) first and then write to answer register and then tests. The compiler doesn’t know that PINB can change at any moment – it doesn’t depend on program flow. To avoid this, you should use the macro from file “sfr_gefs.h” (which is in “io.h”). For example:

void Wait_for_bit()
{
while ( bit_is_set(PINB,0) );
}

Waiting for interrupt flag

Function has to wait until interrupt will occur:

unsigned char flag;

void Wait_for_interrupt()
{
while(flag==0);
flag=0;
}
SIGNAL(SIG_OVERFLOW0)
{
flag=1;
}

The problem is the same. The compiler doesn’t know when the flag can change. Solution is to make variable volatile:

volatile unsigned char flag;
void Wait_for_interrupt()
{
while(flag==0);
flag=0;
}
SIGNAL(SIG_OVERFLOW0)
{
flag=1;
}

Delay

This function has to delay time:

void Big_Delay()
{
long i;
for(i=0;i<1000000;i++);
}

The problem is hidden in compiler optimization. Obviously, the compiler that the function doesn’t do anything – doesn’t return any value and doesn’t change any global or local variables. This function can be optimized to zero, but of course, the compiler leaves several cycles.
To avoid this, there should be used macro from “delay.h” or assembler should be included in the loop in order to make the compiler to compile full loop cycle:

#define nop() {asm("nop");}
void Big_Delay()
{
long i;
for(i=0;i<1000000;i++) nop();
}

Source:
https://myavr.narod.ru/c_style.htm

6 Comments:

  1. Thanks Very much for this Good introduction,I Learned alot from it
    Thanks Again

  2. while going through it some points became clear.
    thanks

  3. I want code for lcd display in lpc938 through I2c.

    can you help me?

  4. thanks a lot for these information……..

  5. I’ve always wanted to use circuits to do specific things like calculate things, and manipulate circuits. Can you send something to my e-mail (Chris66634@yahoo.com) that is like a free tutorial to help me learn how?

  6. i want basic syntax of c programming for micro controller. i haven’t find it so useful for me.thank you.

Leave a Reply