Create a pyspark code which runs xpath querys parallelized to extract content from HTML

This project was successfully completed by bibhu049 for €88 EUR in 3 days.

Get free quotes for a project like this
Project Budget
€30 - €250 EUR
Completed In
3 days
Total Bids
Project Description


here is an example of Data i have and the result i expect:

[url removed, login to view]

The Code for extracting the xpath querys should be fast, parallelized via the spark cluster. The XPATH-Query /HTML Extracting should be failure tolerant.

Only answers/proposal which mentions spark / pyspark will be considered.


Completed by:
Skills Required

Looking to make some money?

  • Set your budget and the timeframe
  • Outline your proposal
  • Get paid for your work

Hire Freelancers who also bid on this project

    • Forbes
    • The New York Times
    • Time
    • Wall Street Journal
    • Times Online