PARSING SCRIPT for *.PPTX slides content (Powerpoint feature extraction)

In Progress

Phase 1:


We need to have JSON describing slide structure in basic terms :

1) texts = array of text objects , when the text has multiple lines with different fontsize, this should be treated as subtext object , where their relative position is noted (line 1 : postion 0,0 , second line : 0,1 , third line : 0,2)

2) images = extract file to storage and note path, note crop and position & size of image , preferably in pixels or % of slide width/height

3) notes = extract slide notes into JSON structure - sample structure for sample slide attached separately ... JSON information in this file are more illustrative describing what needs to be done rather than detailing each field

Phase 2:


A) map to layout

the general idea is to extract grid layout from the content of the slide, there is need for

slight tolerance for items - idealy as option of 5-10px , (so text slightly overlapping image is still considered as text within image and therefore treated as in same grid position / label ) , top-left corner is important for assuming which row/colum is this image present

B) Layout 2 - grid

If there are multiple images

! grid should be projected onto image to determine number of columns / rows. Number of columns : determined by maximal number of images+texts in any column Number of rows : determined by maximal number of images+texts in any row

Texts present within bounding box of image and +5-10pixels (ideally variable as option) should be treated as text with same grid position as that image - Top Left of text is determining to which image is this text pinned , if text crosses multiple images , its span should be noted , if text has multiple lines each with different font size, is should be marked as subtext (with position)


Skills: Java, Perl, PHP, Python, XML

See more: structure detailing, overlapping line, 30 second script, text parsing, powerpoint to pdf, json script, image to slides, extract map, extract text pdf file, number grid columns, image json, php extract pdf file, php parsing text, brief script, pdf extract information, php powerpoint text, php pdf text extract, image parsing, storage box, pdf extract text, maximal, php parsing pdf file, text extraction images, php script image pdf, script height

Project ID: #4633519

Awarded to:


By tomorrow I will show you a demo of phase 1 (written in python) so you can decide for yourself if I can continue with the 2nd phase.

$200 USD in 8 days
(1 Review)

6 freelancers are bidding on average $206 for this job


Java Experts here! We can do it for you.

$263 USD in 30 days
(2 Reviews)

Dear Employer, I have gone through the functional requirements and attached files of given project. I can do this. Lets discuss.

$200 USD in 7 days
(1 Review)

I have gone through the all the attached documents. I can do this very well. Lets go ahead.....

$200 USD in 7 days
(0 Reviews)

I can do this job. If you need a powerpoint slideshow with high quality I am ready to make it as soon as possible. I have made lot of power point presentations and based on my experience I offer you this service.

$111 USD in 3 days
(0 Reviews)

i can do it

$263 USD in 20 days
(0 Reviews)