C++ Cuda kernel for transpose of 2D arrays

A c++ Cuda kernel for a transpose function is to be programmed. The kernel should be state of the art, which means maybe more than just plain copy (e.g. if useful, with coalescent memory access for both read and writing). The main goal is speed!!! It will be used with an RTX 2080 card. The input data consists of a 2D-array (an image) with float numbers. Mainly, the size of X and Y are not equal, not a multiple of 256 and varying.

An example of how to use the kernel needs to be given, e.g. load an image with the Nvidia-SDK, transpose it and save it.

Skills: C++ Programming, C Programming, CUDA

See more: cuda matrix multiplication, cuda tensor transpose, cuda row-major, transpose cuda example, cuda matrix multiplication c, cuda inplace transpose, cuda matrix-vector multiplication, opencl matrix transpose example, c++ cuda developer freelance, c++ programmers in atalnta looking for a job, c++ programming of budget calculator of a buisness organisation, c++ programming qustion plz give me a answer input the weights in kg of three people who are going to get in to a boat find the, Problem: Scholarship Endowment Fund Part 2 (fund2.c) One division of Programmers for a Better Tomorrow is their Scholarship Endo, a) For each of the following natural and virtual devices: state what is returned and give an example of a natural physical devic, A b c d e f g h i j k l m n o p q r s t u v w x y x.......a , how to transpose a set of data in excel, matlab cuda kernel, the data collected from the customers in restaurants about the quality of food is an example of a(n), write a program in c# to find the area of a square, pass 2d array to cuda kernel

About the Employer:
( 13 reviews ) Jena, Germany

Project ID: #26894759

Awarded to:


Hi, I’m Cuda programmer. I’m familiar with this job. Transpose cuda 2d use shared memory with arbitrary size.

$167 USD in 1 day
(0 Reviews)

4 freelancers are bidding on average $236 for this job

(7 Reviews)

hi, have been working on cuda since last 4 years. please let me know how much performance improvement you are aspecting. also if possible please share current performance matrix of the code you are running. please d More

$200 USD in 7 days
(2 Reviews)

Hi, I’ve expertise in image processing and Cuda kernels. Have been using Cuda extensively for parallel processing on large datasets.

$278 USD in 3 days
(0 Reviews)