here is an example of Data i have and the result i expect:
[url removed, login to view]
The Code for extracting the xpath querys should be fast, parallelized via the spark cluster. The XPATH-Query /HTML Extracting should be failure tolerant.
Only answers/proposal which mentions spark / pyspark will be considered.
My approach to the problem would be: [url removed, login to view] a spark rdd from the html content 2'.Write a method to parse the html and extract required information(use Beautifysoup python package ). [url removed, login to view] these method on spar More
18 freelancers are bidding on average €184 for this job
We are a team of very highly skilled and capable team of professional software developer's. We include all sorts of development of web software IT mobile app iOS Android development.