Email: Password: Remember Me | Create Account (Free)

Back to Subject List

Old thread has been locked -- no new posts accepted in this thread
???
08/20/05 14:54
Read: times


 
#99736 - tradeoff
Responding to: ???'s previous message
There is usually a tradeoff between code size and speed.
For example, algorithm-based CRC-16 will take up around 20 asm lines (cca 30 bytes); the table driven version takes up 512 bytes on the tables alone... Here, the increase in code size is less pronounced, but the "tricks" certainly take up some code space.

My "fast" solution (I called it "behemoth") is huge compared to "standard code", as it is a quite enormous tree with a different branch almost for each input.
It started with dividing the shift into bigger parts according to '51 capabilities, performed one after the other: first the byte-shifts, then nibble-shifts and finally the bit-shifts (#1). Then I made up the "tree", so that the bytes which are completely zero don't get shifted by nibbles and bits (#2). The final version contains some tune-ups, e.g. replacing two clr c by one anl Rx,#3Fh (and two other "tricks" - see description in the header) (#3). Here is the evolution of times and size:
         -------- cycles ------|- bytes -
         worst   best   average   size
#1        81      20      50.6     77
#2        69      18      34.8    208
#3        64      18      32.5    204
-----
But nowadays we see increasing amount of code memory size in microcontrollers, so maybe it's time to change strategy.
Btw. isn't it possible make the choice of "strategy" (speed vs. size) an option in the compilers?

One more word of caution - the solutions are "speed-optimal" for standard 12-clocker '51s and the 2- and 6-clockers, which have the same instruction-cycle structure as the standard. The 4- and 1-clockers have different munber of instruction cycles per various instruction groups. As my "behemoth" uses quite a lot of jumps - and jumps execute longer on singleclockers than other instructions - and also might spoil the jump-cache on the >=40MHz variants (SiLabs, uPSD34xx). Also mutiply and divide tend to execute significantly longer on singleclockers compared to other instructions, so it might turn out, that the "conventional" solution is comparable in terms of execution time to these "tuned-up"'s.

Jan Waclawek


PS. IMHO you should ask Craig as per usage of the results in SDCC - although IANAL.
PS2. Craig, isn't it possible to see the other solutions, too?


List of 19 messages in thread
TopicAuthorDate
First challenge done, new challenge up            01/01/70 00:00      
   Seems about right to me...            01/01/70 00:00      
      2 weeks?            01/01/70 00:00      
   re:challenge            01/01/70 00:00      
      "move the data intelligently"            01/01/70 00:00      
         re:            01/01/70 00:00      
            Overlapping data is part of the challeng            01/01/70 00:00      
               Yep!            01/01/70 00:00      
            Should work            01/01/70 00:00      
   And the Winner is...            01/01/70 00:00      
      A worthy winner            01/01/70 00:00      
         Yes            01/01/70 00:00      
            one + one            01/01/70 00:00      
         Note taken            01/01/70 00:00      
            tradeoff            01/01/70 00:00      
      exec time and size            01/01/70 00:00      
         exec time and size II            01/01/70 00:00      
   Public domain....            01/01/70 00:00      
      Open source?            01/01/70 00:00      

Back to Subject List