I've worked on this for another project, however the same can't be shared with you in this case. But, it is possible to produce similar results with a nee line of thought. Here's the approach:
1) Select a problem statement (yes, you want to predict coauthor relationship, but that's not a problem statement), maybe a methods/algorithm stance
2) I wish to use dblp's public domain dataset for this (if you've worked on it and parsed it before, it shall be helpful)
3) Prepare the sections, methods, implementation and results portion of the project
4) Prepare the code and documentation/optional (Here's a query: When you say documentation, you refer to the project report, or documentation for the code), and we can discuss on this if needed.
4) Prepare the report 15 to 20 pages
Deliverables
We shall have 3 milestone payments, 3 progress updates. Platform: Python 2. Package: code files, pdf report, figures used, trained classifiers files (2 weeks span)
Summary
It's my understanding that you need a complete package, and you don't care about the details of it. Also, you shall not be able to offer resources apart from the renumeration. Hence, I do not have to allocate time for explanation of each and every line.
Please, let me know your concerns, queries etc even if I'm not online. I shall reply ASAP.
Thanks! Enjoy the weekend.