This project aims to classify News articles using hyperdimensional (HD) computing. And implement text classification algorithm in OpenCL. The HD computing has two main components:
1. Encoder. It accepts a sequence of input letters and produces a vector. Each vector has a size of D=10,000 components, and each component in the vector is a signed integer. Encoder performs vector operations: addition, multiplication, and permutation. It has also an Item Memory that maps and input letter to a D-dimensional random vector.
2. Associative memory. It finds a vector among pre-stored vectors that has the closest similarity to an input vector.
The details of these two components and text classification algorithm are provided in files below. These two papers use HD computing for the language recognition task, but you should reuse it for classifying News topics as shown in . In the language recognition task, there are 21 European languages for the classification (hence the associative memory has 21 pre-stored vectors), but here there are 8 news topics for the classification (hence the associative memory will have 8 pre-stored vectors). You can download the dataset from link below . You should download Reuters-21578 R8 that has 8 classes.
The MATLAB code for language recognition is available in , you can use it for more hints about the algorithm and also for double checking the correctness of your OpenCL implementation.
1. [url removed, login to view]~smimarog/textmining/datasets/
2. [url removed, login to view]