Anti-aliased StretchBlt() / pure win32/x64 c++(repost)


See detailed reqs.

## Deliverables


Since first posting this project I found I could get a good enough solution using Gdiplus; however I would still much prefer a custom implementation which:

(a) Removes the dependence on Gdiplus

(b) Is faster than the Gdiplus version.

Please see the attached zip for exact spec. The .zip contains the current Gdiplus-based function and screenshots on XP (with default theme active) of it's result.

What I need you to do is replicate the functionality exactly. Note this will involve both bilinear and bicubic filtering algorithms. You can, however, assume that the bilinear algorithm will only ever be used to scale up, while the bicubic will only ever be used to scale down. (See code in attached .zip)

The most important requirements here are:

1. That your implementation is deterministic in producing exactly the same result as the existing function (within reason).

Specifically, on XP with default theme active, if you use [url removed, login to view] (for testing just link with [url removed, login to view] and assume the DLL is present) to draw the background of a progress bar, and then draw the fill part over the top (let's say to half-way/50%). If, having drawn this to an offscreen memory DC with a 32-bit DIB-section HBITMAP selected into it (on a COLOR_BTNFACE background) then call my function (see .zip) to blt from your memdc to some real HDC (most likely in response to WM_PAINT) you'll see what the result looks like.

Note there are also screenshots in the .zip which illustrate the result as well.

What I need is a drop-in replacement for the SmoothSBlt() function which produces as close a match as you can get to the existing Gdiplus-based implementation but without any dependency on Gdiplus (or anything else).

2. I also need your implementation to be as fast as possible so for starters absolutely 0 error checking. You can use the assert() macro from <assert.h> or include specialist checks inside #ifdef _DEBUG ... #endif blocks, but nothing at all for release builds.

2a. I would also prefer that you avoid using a c++ class (thus avoiding the overhead of vtable/vftable/this pointer etc. and just produce a .cpp/.h pair where public functions are declared in the .h and any private functions are simply declared/defined using the "static" keyword in the .cpp **unless** you use a single static object which you create on the stack on startup, and which is wholly contained in the .cpp

Using namespacing for public functions is fine but not required. In fact the only function that should be declared in the header file is SmoothSBlt() (I suggest you change the name to SmoothSBlt_NGdiplus() or SSBlt() or something so I- (and you) can easily have both versions available to produce test programs where you can visually inspect the results by screen-grabbing and zooming in in any bitmap app you like.)

2b. We're going for absolute speed here, so the following also apply:

2bi. If at any point you need random values, don't call rand() or any custom RNG function, simply pre-determine a large enough array of suitable random values and store this in the .cpp as a static array.

2bii. Do not include unnecessary functions. Ideally write the whole lot into the single SSBlt() - or whatever name you use - function, with no calls to anything else (unless it's the single static object I mentioned earlier).

2biii. Following on from that, don't use any unnecessary GDI/USER/stdlib functions. For example instead of calling SetRectEmpty(&rcExample); simply do: *((ULONGLONG*)&rcExample) = 0; *((ULONGLONG*)&rcExample + 1) = 0; thus avoiding the overhead of calling a function when you don't need to, and also using the most efficient way to zero the object. You could if you wish include a justifying assert() such as assert(sizeof(RECT) == (sizeof(ULONGLONG) << 1));

(The principal advantage in using ULONGLONG is that when compiled for x64, ULONGLONG (which is the Windows type corresponding to unsigned __int64) becomes a single primitive type, and the two statements are basicaly guaranteed to become exactly two machine code instructions in the compiled exe. Whereas if you set .left, .top .right, .bottom individually to 0 the optimiser might miss it.)

Also don't be tempted to use memset() for this type of thing. It is preferable to other functions, but it's still an unnecessary function call which means unnecessary saving of registers, unnecessary stack reservation, and unnecessary calling convention code.

2c. Another way to potentially speed this up is to combine the two operations. i.e. rather than doing the bilinear stretch then the bicubic shrink, if you could find some way to integrate the two that would be brilliant, but again, that's up to you. The status quo is fine as long as you make it as fast (efficient) as possible.

2d. Also, as another example, if you use any loops which are of either fixed iteration count, or one of a discrete set of fixed iteration counts depending on other things, unwind these loops completely: i.e. include every in-loop statement individually with no loop.

2di. Also apply this as far as possible for loops that aren't 100% predictable, but where they are just inline the unwound 'sub-loop'. Note: I don't care how much stack/heap mem you use (within reason) specifically in relation to code: i.e. because speed is so important here, I'd rather see an 'ugly' low-level (heavily commented) set of code which is much larger than could be achieved with more function use.

2e. Any other way you can think of speeding up the function, and pease back up your ideas by trying them out and discarding any that don't actually work in practice. Note you can use XP with SP2 or SP3 (and themed) exclusively as your test OS. No need to test on any other Windows versions.

3. Absolutely do not involve any hidden windows or anything of that nature. There should be no need since you're simply manipulating bitmap data in a single thread. (Essentially, this needs to be lean in order to be fast so absolutely nothing unnecessary.)

4. **BONUS STAGE** Once you have completed the main part of the project I will guarantee a decent and negotiable bonus payment if you can also build in a genuinely-time-saving caching system, where you maintain a reasonably small cache containing a subset of all the results of every call to the function.

You then, on entry to the function, compare the parameters and iff there is an exact match in the cache, you just blt the cached copy.

4a. Of course, iff the HDC address doesn't match OR the HBITMAP selected into it doesn't match the cached address, you will need to compare the actual HBITMAPS. I realise this is potentially adding overhead but that's the challenge of this extra step: can you build in the cache such that overall the function performs significantly quicker than without the cache, even if it might be very slightly slower when there is no cache match.

4b. Iff possible the cache should be contained in the function itself (again, minimise function calling)

5. Finally, must work on Windows XP+ and must not prevent execution on Win2K (other than assuming [url removed, login to view] is available). Also, please explicity state function calling convention when declaring or defining a function (unless it's __thiscall when calling an instantiated a c++ class object). As a general rule: if your function takes no paramtere use __stdcall; otherwise use __fastcall. Only use __cdecl if you need to. Also don't bother trying to write inline functions and don't use intrinsics. You can use #define'd macros however.

Note: The function I use to wrap drawing of progress bar via [url removed, login to view] is not included in the .zip but it does apply some 'touching up' post SmoothSBlt() and also does a 'pseudo-alpha-blend' by combining the final HBITMAP bits with COLOR_BTNFACE with some ratio, so the .bmp in the zip is not really definitive. It is included mainly for the look of the 'gray' part of the bar.

Also note **IMPORTANT** the progress bar in the .bmp actually uses two calls to the existing SmoothSBlt() function. The first reduces the original (actual size) bitmap to 1/8th actual size. The second then stretches back to normal size. If you really need the exact function then once the funds are escrowed I can give you whatever you need.

That's it. Serious bidders only please.

Many thanks.

* * *This broadcast message was sent to all bidders on Sunday Aug 19, 2012 9:29:24 PM:

[This is just to force the site to start/display message thread with invited workers.]

* * *This broadcast message was sent to all bidders on Sunday Aug 19, 2012 9:58:28 PM:

Brief note on caching (if you decide to include it): use only custom types. For example if you maintain your cache as a linked-list, don't use std::list, write a specialised POD struct{} and store your own first/last pointers. Again, this is an efficiency/speed consideration and is a theme you should adopt throughout the whole thing.

Skills: Assembly, C Programming

See more: stretchblt bicubic, win32 programming, while loop c programming, where to start programming, what's an algorithm, what is the need of public relation, what is pair programming, what is data entry speed, what is data entry alpha, what is c programming used for, what is a shrink, what is a programming algorithm, what is an object in programming, what is an array in programming, what is an algorithms, what is an algorithm in programming, what is an algorithm, what is a loop in programming, what is algorithms in programming, what is algorithms, what is algorithm in programming, what is a good data entry speed, what is a function in programming, what is a data entry test like, what is a data entry specialist

About the Employer:
( 30 reviews ) Leeds, United Kingdom

Project ID: #2769675

3 freelancers are bidding on average $334 for this job


See private message.

$600.1 USD in 28 days
(40 Reviews)

See private message.

$200 USD in 28 days
(5 Reviews)

See private message.

$200.6 USD in 28 days
(0 Reviews)