[ARMedslack] ARMv4 assembler optimizations for OpenSSL

Michael Langfinger slackware at langfinger.org
Wed Oct 3 17:30:56 UTC 2012


 i just found out that the openssl package of Slackware for ARM 14.0 
 doesn't use the assembler optimizations available for ARMv4 in OpenSSL. 
 Since all the packages are built for the baseline architecture of 
 ARMv5te, enabling the optimization shouldn't affect the compatibility to 
 any of the platforms that are supported by Slackware for ARM 14.0.* It 
 would, however, have a huge impact on the performance of OpenSSL and 
 very likely all the programs that use OpenSSL libraries (like OpenSSH).

 (* I am not 100% sure about that because i am no expert regarding the 
 different ARM architectures, so maybe i am wrong here. Maybe someone on 
 the mailing list with more expertise on this matter can confirm my 
 assumption or correct me?)

 I rebuilt the OpenSSL package and ran some tests on my Sheevaplug. As 
 you can see the results are pretty impressive (the original output of 
 the "openssl speed" command is much longer, this is just an excerpt):

 OpenSSL 1.0.1c, default package (no optimization)

 The 'numbers' are in 1000s of bytes per second processed.
 type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 

 md5               3309.68k    11805.99k    34086.83k    64614.06k    
 aes-128 cbc      10003.75k    11224.32k    11537.49k    11619.33k    
 aes-192 cbc       8812.01k     9592.04k     9819.65k     9879.89k     
 aes-256 cbc       7777.37k     8376.77k     8548.95k     8593.75k     
 sha256            2670.09k     6356.61k    11432.70k    14294.36k    
 sha512             412.84k     1650.65k     2386.60k     3272.36k     

                  sign    verify    sign/s verify/s
 rsa  512 bits 0.002360s 0.000218s    423.8   4589.0
 rsa 1024 bits 0.012267s 0.000625s     81.5   1599.4
 rsa 2048 bits 0.074701s 0.002067s     13.4    483.8
 rsa 4096 bits 0.494286s 0.007278s      2.0    137.4

 OpenSSL 1.0.1c, with ARMv4 assembler optimization enabled

 The 'numbers' are in 1000s of bytes per second processed.
 type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 

 md5               4078.73k    14215.30k    38787.16k    68390.57k    
 aes-128 cbc      15095.54k    16847.00k    17382.31k    17521.32k    
 aes-192 cbc      13213.38k    14536.19k    14932.91k    15036.76k    
 aes-256 cbc      11756.32k    12785.60k    13089.37k    13167.96k    
 sha256            5125.39k    12176.15k    21252.61k    26089.81k    
 sha512            1446.27k     5780.57k     8251.65k    11255.81k    

                   sign    verify    sign/s verify/s
 rsa  512 bits 0.001101s 0.000106s    907.9   9456.0
 rsa 1024 bits 0.005549s 0.000313s    180.2   3190.6
 rsa 2048 bits 0.035971s 0.001120s     27.8    892.8
 rsa 4096 bits 0.257692s 0.004279s      3.9    233.7

 Short summary (performance increase, numbers rounded)

 aes-128 cbc: +50% (16 bytes)
 aes-192 cbc: +50% (16 bytes)
 aes-256 cbc: +50% (16 bytes)
 sha256: +90% (16 bytes)
 sha512: +250% (16 bytes)

 rsa 512 bits: +115% (sign) / +105% (verify)
 rsa 1024 bits: +120% / +100%
 rsa 2048 bits: + 110% / +85%
 rsa 4096 bits: + 100% / +70%

 If you want to test this yourself, just add the switch "-linux-armv4" 
 when you run the Configure script from OpenSSL or apply the patch [1] to 
 the debian-targets.patch file before running the Slackbuild script. 
 Warning: As OpenSSL is removed during the build process, you won't be 
 able to login with SSH during the build process and until you reinstall 
 the openssl package. So don't forget to temporarily enable telnet or 
 something similar, especially if you only have remote access to the 

 I found out about the assembler optimization from the Raspberry Pi 
 forum [2].


 [1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=676533
 [2] http://www.raspberrypi.org/phpBB3/viewtopic.php?f=66&t=8433

More information about the ARMedslack mailing list