I am working on a security problem as part of university research, where I am trying to identify malicious images. I have a huge set of fuzzed/malformed images that were used as test images, and were categorized as Pass/Warning/Fail images based on the test results on various mobile platforms.

I need help in devising a feature extraction algorithm/methodology to identify features from the image metadata, which can be used with a software like Weka to run against various machine learning algorithms. This is so that I can identify future images which can cause crashes.

I have to mine for attributes from images (most likely from the metadata) that can be fed in to Weka to run various machine learning algorithms, in order to detect malicious images.

I had earlier used information like pixel information, histogram distribution etc using tools like ImageJ to help me classify images, however I am looking for a better way (with regards to the security) to identify and quantify features from the image/image-metadata.

