circuitcellar.com
Magazine Support   Digital Library   Products & Services   Suppliers Directory 
 
 





 

Issue 146 September 2002
Build Your Own 8051 Web Server


RUN-TIME PROFILING

Web pages are for human consumption, and 100-ms response times appear snappy. Using the Finisar (formerly Shomiti; go to www.finisar.com for more information) Surveyor Lite, I measured the time required for my 8051 web server to serve the 7-KB web page to a 500-MHz Pentium machine running MSIE 5.5. [2] It took 60 ms. In contrast, it takes my 100-MHz Pentium server running Apache about 32 ms to serve the same page. This demonstrates that the response of the 8051 is respectable. For these measurements, I had to clear the browser’s page cache each time to make sure my browser was actually transferring the web page rather than just displaying a cached version.

To figure out how much time my web server spent doing various tasks, I added debug code that set a port pin when the CPU began the task and cleared the pin when it completed the task. Because many of the tasks I was interested in run a number of times while serving the web page, I had to add up all the pulse on times. This was hard to do with a oscilloscope, so I used an 82C54 timer chip. The 80C51 port pin output drives the 82C54 gate input. When in the high state, the 82C54 counts transitions from a 1-MHz oscillator. This provides accumulated pulse-width times with 1-µs resolution. I set up another 82C54 counter to count the number of times an event ran. Run times are summarized in Table 2.

The total of 47.4 ms falls short of the 60 ms it takes to transfer the page. When I added up the intervals between the 8051 sending an Ethernet frame and the browser’s 500-MHz Pentium responding, I came up with 8.5 ms. This accounts for most of the difference. It’s mind boggling, but true, that the 8051 is waiting for a Pentium.

Searching for and replacing tags is the most time-consuming task. My web page uses tags as placeholders for dynamic values, such as temperature. When it serves the page, it searches for these tags and then replaces them with the appropriate value.

It turns out that the strstr() function is the time hog. After some investigation, I found this to be true in general of strstr(). This makes sense because it has to parse through a lot of text, comparing each letter of the text to the corresponding letter of the search string. It may have many partial matches before it finally finds a complete match. One way to speed up the process would be to tightly limit the search range of strstr(). Another approach would be to keep an index of offsets to the tags, but the index would need to be changed each time a page was added or modified.

The second most time-consuming task is copying the web page from flash memory to RAM, using memcpy(). Why not just skip this step and copy directly from flash memory to the CS8900A? Again, the tags are the problem; they need to be replaced with actual values, and you can’t replace them while in flash memory. Perhaps a faster approach would be to copy directly from flash memory to the CS8900A, looking for tags as you copy. But then you would have a thorny problem with the TCP checksum. It’s computed over the entire segment, but must be inserted at the beginning of the segment.

It’s interesting to note that the checksum is computed a whopping 38 times to transfer a single web page. This transfer is made up of 19 Ethernet frames, 11 from my 8051 server and eight from the browser. It takes three frames to establish the connection, two frames to transfer the HTML page, eight frames to transfer the image, and six frames that are just acks. For both incoming as well as outgoing frames, two checksums are computed: one for the IP header and the other for the TCP segment, which makes 38 checksums. I was glad I used assembler for the checksum code!

I can’t help but wonder how much a 16-bit CPU would speed things up, just by virtue of its being 16 bits. The checksum would certainly run faster because the sum is done over 16-bit chunks. Also, CS8900A I/O is 16 bits. Other tasks, such as memcpy() and strstr(), may need custom library code, because many 16-bit compilers default to doing these operations 1 byte at a time.