Given three inputs - recipe, equipment_name, and part count - determine the expected processing cycle time. Build a model to test and demonstrate the solution using TrensorFlow and Python.
recipe and equipment_name are alpha-numeric.
part count is 5-25 with 10-20 being most common
cycle times are minutes to 10s of minutes
The application should provide simulated data for a minimum of 3-4 equipment names with 4-5 recipes per equipment (although there is an expectation that the equipment and recipes are unlimited except by training data). The simulated cycle times can be linear with part count, although a more complex simulation (explained in your proposal) might be valuable. cycle times can be normally distributed, although in practice there is a long tail.
In the real world, available training sets are about 1000-5000 known cycle times per equipment split approximately equally amongst recipes, but with a preference for 10, 15 and 20 parts (with all other values represented to some degree). So some indication of the accuracy of the data (loss) as the training progresses is required. The application should learn constantly, that is the cycle time should be calculated and then used to train the model so that over time the predictions should improve. (There is some logic associated with exceptions that will ultimately prevent training with some of the data, but for now all data from a model is acceptable).
The code should be documented such that it can be understood and extended by a moderately competent programmer.
Please feel free to ask any questions. (PS It's clear that there's a statistical solution to this problem that may, from a technical standpoint, be superior, however, at this point machine learning is a preferred approach because there is a goal of expanding to other data sets that don't have obvious statistical solutions).
One thing that seems to be causing confusion is that we expect you to build a simulated dataset based on the information in the project. The dataset is really simple:
recipe equipment_name part_count and cycle_time
recipe is, for example, "recipe1", "recipe2", etc.
equipment_name is "tool1", "tool2", etc.
part_count is 5-25 with clusters around 20 and 15
cycle_time is a value that is different for each recipes. So, "recipe1" may have a 120 minute cycle time for 20 wafers with a standard deviation of about 2%. "recipe2" may have a 13 minute cycle time and so forth. The time can be assumed to be linear (cycletime = setup + processtime per part * number of parts).