Completed

Web service scraping

This project was successfully completed by SigmaVisual for $144 USD in 3 days.

Get free quotes for a project like this
Employer working
Completed by:
Project Budget
$30 - $250 USD
Completed In
3 days
Total Bids
14
Project Description

Hello,
I need someone who have experience with scraping data from web service/API help me with this work. If you are student, do not apply. I need professional people who have experience with python scraping that can finish it for me. Knowledge of visualization will helpful for bonus. Here is the job. We have a website with web service call like a survey interview, depend on the previous question, it will display the next question. So I need to get all the content of the survey.

The first request to the server is something like:
http://localhost/?s=Start&description=DES1&value=VAL1&units=UNIT&description=DES2&value=VAL2&description=DES3&value=VAL3
It will create a session variable for current session to use in the 2nd request as well as return the first content (next question).

Then there will be a series of questions for this above session with value Yes, No, Back (first question don't have Back) call with the URL like below:
http://localhost/?m=NAME&session=530cc4c3000082aafmd&navigationMethod=[Yes, No, Back]

Each time we click Yes/No/Back button, the navigationMethod variable will change accordingly and return a json content for the next question:
{"variable":
{
"variable1": "value1",
"variable2": "value2",
"variable3": "value3",
"variable4": "value4",
"variable5": "value5",
"variable6": "value6",
"variable7": "value7"
}
}

At the end, there will be no question, just a json return with a lot of information. But now, the session variable is killed so we can not get back to previous question.

What I need you to do is to write a python script that can help me to scrap all the data and save to a text file.

What I think we need to do is write a breadth-first search or depth-first search function to traverse along the tree that can save each state when we click the button. The state will include both the hierarchy of current node (like 1.2.1.1.1.2.1 ... [1 for Yes, 2 for No]) as well as the content (json returnted). And then when the session variable no longer work, we will need to create a new session variable and get the previous state we have saved and continue, continue until we traverse all the leaves of tree.


Below is requirements:

+ It should be a function that I can call from command line like:
CMD >>>> python [url removed, login to view] -DES1 -VAL1 -UNIT1 -DES2 -VAL2 -DES3 -VAL3 [url removed, login to view]

+ The content of text file should have some way that allow me to track the hierachy of questions, which one display first, which one is second, which one is the child of which one. [Because it limit to 4000 character, I put the sample in text file.]

The deep of each tree may be up to 50 levels, even more than that.


Bonus 1: If you can have some way that allow me to filter the content it return, like I only want to save variable3, variable5, variable6 from the json return to text file. There is a bonus for it.

Bonus 2: If you can format the data and write a tool allow me to import the text file [url removed, login to view] and display it like a tree diagram like this example [url removed, login to view] , whenever I click to one note, i can get the data popup, that would be great. There is an extra bonus.

Bonus 3: If you can write a tree diagram allow me to load the text file [url removed, login to view] and make change, and save it back to the original file. There is also another extra bonus.

Let me know with your price:

Basic: XX$
Bonus 1: XX$
Bonus 2: XX$
Bonus 3: XX$

Looking to make some money?

  • Set your budget and the timeframe
  • Outline your proposal
  • Get paid for your work

Hire Freelancers who also bid on this project

    • Forbes
    • The New York Times
    • Time
    • Wall Street Journal
    • Times Online