Images in a corpus share certain characteristics that are same across all the images contained therein. For instance, they will
all have the same size (i.e., same number of rows and columns), are either color or gray scale images, are consistently named,
and each image is associated with certain meta data, if any – date of acquisition, modality of acquisition, copyright owner, and
image annotation in natural language text.
For your project, you will search the Internet for pictures that meet a certain criteria that will be chosen by you. We want to
procure pictures that are free of copyright issues. You will note in the metadata the location where you received the picture.
You are also free to take pictures by yourself and add those to the corpus.
The data for this project are images that are stored in a hierarchically structured directory. Let us call the top-level directory
as dirA (replace dirA with a suitable and meaningful directory name, such as sports or travel. This directory will
contain some images as well as other (sub)directories. Let us call these (sub)directories as dirA1 and dirA2. Each of these
subdirectories in turn will contain some images and may contain other (sub)directories. This (sub)directory structure may
extend to an arbitrary number of levels.
The images in the directories can be of different sizes as well as different image file types (e.g., tiff, png, jpeg, pbm). I’ll suggest
that the images be of uniform size but they should not be larger than 640 480 pixels.