SSE Optimizations for Intel and AMD

Closed Posted Jul 14, 2002 Paid on delivery
Closed Paid on delivery

*** Please note if you have not done SSE programming before you will probably not be able to meet the requirements. MMX is not the same. *** I have 24 lines of C++ code that I need to optimize as much as possible using SSE instructions for the Intel Pentium 3 and AMD Athlon processor (both of these support SSE instructions). The code to optimize is a nested loop that gets executed millions of times in the final application, which is why speed is so important. I have the code isolated into one simple function in a Win32 console application. The optimized version should also perform a check at the beginning of the function for support of SSE instructions, and if it is not present simply branch to the existing unoptimized version.

## Deliverables

A complete self-contained .cpp file containing the function is attached. 1) Performance Requirement: When finished the sample program attached must run in no more than half the time of the original on a machine with a Pentium 3 CPU. For example if the existing code takes 100 seconds, the optimized version must take 50 seconds or less. 2) Look at the code attached, there is no mystery here. You should feel comfortable with SSE, the code, and requirement (1) before bidding. 3) The SSE code must be integrated into the attached sample function using inline assembly. The function to be optimized gets called from C++, branches into the SSE inline assembly section if the support exists. 4) Code must compile successfully with no warnings under the level 3 setting of the Visual C++ 6.0 or 7.0 compiler. 5) The results of the calculations must not change when optimized, or lose precision. The input values should be assumed to be random single precision floating point numbers. 6) The SSE code should execute multiple multiplications in parallel where beneficial, and multiple additions in parallel where beneficial. SSE supports up to 4 single precision floating point operations executing in parallel. Complete and fully-functional working program(s) in executable form as well as complete source code of all work done. Complete copyrights to all work purchased.

## Platform

The code must run on Windows XP and all Windows XP compatible CPUs. (with no SSE support it defaults to no optimization).

## Deadline information

Must be completed by 7/25/2002. Please note performance requirement.

C Programming PHP

Project ID: #2860500

About the project

2 proposals Remote project Active Jun 20, 2004

2 freelancers are bidding on average $43 for this job

bitrake

See private message.

$34 USD in 14 days
(0 Reviews)
0.0
psconsultants

See private message.

$51 USD in 14 days
(0 Reviews)
0.0