Find Jobs
Hire Freelancers

parallel programming (Histogram)

$30-250 USD

Closed
Posted over 4 years ago

$30-250 USD

Paid on delivery
In this project, you will develop a complete CUDA program to compute the Histogram of the input array. You will implement the Histogram on the device GPU. After the device Histogram is invoked, your program will also compute the Histogram sequentially on the CPU, and compare that solution with the device-computed solution. If it matches, then it will print out "Test PASSED" to the screen before exiting. Assume the Histogram will have 256 bins, i.e., bin 0, bin 1, …, and bin 255. Input value i will be mapped to the bin i. Use the following pseudo code for array initialization. int *A; A=malloc(sizeof(int)*N); //N is the size int init =1325; For (i=0;i<N;i++){ init=3125*init%65537; A[i]=init %256; } Task 1 - Basic CUDA Program using global memory Develop a CUDA program with GPU threads collectively performing the histogram calculation. Use an atomic instruction to enforce one thread at a time accessing to individual locations in the global histogram array. Task 2 – CUDA program that takes advantage of shared memory In Task 1, you will find that you GPU program speedup compared to the CPU version is very limited due to the atomic access to the global histogram array. Modify the code in Task 1 to try to improve the speedup by using GPU shared memory and registers. Record your runtime with respect to different input array sizes as shown in the following table for task 1 and task 2, and compute the speed up using the GPU computation time, and the CPU computation time. I did not specify the thread block size, you might can explore different thread block size to find the best thread block size for each input array size. The thread block size of 256 is the most obvious choice. Optional: You can also include the memory transfer time between CPU and GPU in the GPU computation time (In that case, it might be fair to also include the time for matrix initialization in the CPU computation time), and re-compute the speedup. Time 131072 (128*1024) 1048576 (1024*1024) CPU computation time GPU computation time GPU memory transfer time Note that the compiling command for the CUDA program using atomic instructions should add the -arch compiler option. The following compiling command can be used to compile the source CUDA program with file name histogram.cu. nvcc [login to view URL] –o histogram -arch=sm_30
Project ID: 22640222

About the project

7 proposals
Remote project
Active 4 yrs ago

Looking to make some money?

Benefits of bidding on Freelancer

Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs
7 freelancers are bidding on average $206 USD for this job
User Avatar
I have read your description and I am so interested in your project. I am confident in your project and I can finish it clearly on time. I am well experienced and skillful CUDA/OpenMP/MPI programmer. I have +5 years of experience in software developing. I have finished a lot of project like this. I ensure the best quality of your project and to keep your deadline. Please contact me kindly and let us discuss in more detail. Working with me, you will have a good experience and good friend and save more time and money. Best regards!
$120 USD in 3 days
5.0 (89 reviews)
6.2
6.2
User Avatar
Hello, I am a CUDA expert with experience in algorithm design. I have developed a lot of algorithms using CUDA and I would like to implement histogramm algorithm using CUDA. Please contact me to discuss the details and the timeline.
$300 USD in 1 day
5.0 (4 reviews)
5.3
5.3
User Avatar
Hi, There. I have plenty of experience in C++, CUDA. I have also done a similar project. Please have a chat about the project. I shall be glad to work on this project.
$180 USD in 1 day
5.0 (12 reviews)
3.8
3.8
User Avatar
Hi, I am Goerge. If you ping me, I can give you the result in an hour. Thanks.
$250 USD in 1 day
4.9 (6 reviews)
3.8
3.8
User Avatar
No problem! I have read your description carefully and very interested in your project. I am working on Desktop App with C/C++,C#,Python & Java for 7years. I think i can do it perfectly. If you hire me, you will get cool results. i can work full-time in your time zone. Best Regards
$140 USD in 7 days
0.0 (0 reviews)
0.0
0.0
User Avatar
Hi, I have about seven years of experience in C and CUDA. I have developed similar algorithms in CUDA. I have two GPU cards, Telsa and Pascal. I will be able to complete your project as per your requirements and well within time. Thanks, Ajay
$200 USD in 3 days
0.0 (0 reviews)
0.0
0.0

About the client

Flag of UNITED STATES
fairborn, United States
0.0
0
Payment method verified
Member since Apr 25, 2019

Client Verification

Other jobs from this client

data structures hash table
$30-250 USD
Thanks! We’ve emailed you a link to claim your free credit.
Something went wrong while sending your email. Please try again.
Registered Users Total Jobs Posted
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Loading preview
Permission granted for Geolocation.
Your login session has expired and you have been logged out. Please log in again.