Mock Data Gen with Machine Learning Module - 01/12/2023 02:56 EST
- Status: Closed
- Prize: $500
- Entries Received: 1
- Winner: td7x
Winning submissions will include a GitHub repo of the software, complete with documentation and CICD using GitHub Workflows.
Employer reserves all rights to the software created under this contest but will redistribute the software under an Open Source license. All dependencies must have permissive OSI approved licenses and the software must be runnable offline, without dependence on an external web service or datastore and without dependency on specialized hardware.
Simple faker or charade libraries can be used for mock data in software development but the use can be labor intensive because they require a developer to select the correct method and to identify the input parameters for each data field. Developers have enough cognitive overhead and need a fake data solution that can use existing data models/schemas with zero configuration to create the fake data.
A NodeJS module that produces semantically accurate fake data from an arbitrary data model or schema with zero configuration. We are primarily a Typescript/NodeJS shop and describe the requirements from that perspective but welcome submissions that are Rust based and that compile to WASM are more than welcomed. Runtime portability such as in-browser, Bun, Cloudflare, etc is preferred but NodeJS is required.
Data model handlers for GraphQL SDL and JSONSchema are required. Extra preference will be given to submissions with additional handlers for TypeScript type definitions and protobufs.
Various fake data handlers should be supported. Required is a handler that accepts a single field name from the data model and returns semantically correct mock data consistent with the larger data model. Extra preference will be given to submissions with additional handlers that accept a GraphQL request shape (returning a GraphQL response shape) and a handler that does not accept an argument and returns an object for the data model (that could be stringified into JSON).
It is expected that this software will utilize existing generators such as FakerJs, ChanceJs, CasualJs and RandExpJs just as other higher level tools do:
Unlike these existing tools, this software will not statically code and thus limit itself to individual basic field types and require significant configuration for non-basic field types. How we overcome this limit is the crux of what makes this software different. Perhaps NLP string or vector comparisons can be used to select the correct generator function from the field name with only unmatched requests using an LLM. LangChain seems like a quite attractive pattern and tech for this.
Code will be written in strict TypeScript with strong typing and be compatible with Bun, Deno, and NodeJS. Code will be "Clean" and robust. OOP patterns are to be avoided in favor of "strategic" functional programming use. eslint-plugin-functional/recommended is great, using additional fp libs such as fp-ts or Ramda is not required. In general:
- Small composable functions.
- No nested code.
- Avoid if statements. Branches are only ok in the simplest and unavoidable use cases. Simple clean ternaries are fine.
- Along with avoiding branching, absolutely no try/catch.
- Never throw.
- No control loops.
- No unbounded iterators.
- Use maps rather than a switch or if/else.
- Functions should be small, pure, and composable.
- Separate configuration from code.
- Use arrow function syntax.
- Avoid async/await as one can accidentally block the event loop.
Fine grain testing of LangChain does not seem completely straight forward but there are current improvements to its testability and the LangSmith debugger should probably be used. Code should be decoupled so that mocks can be avoided. Vitest or Jest should be with fast-check as well as static assertions. Strict TDD is not required but preferred. Writing tests through the development and not at the end is required. The important thing is that testable code is cleaner, simpler, more robust. Tested code is easier to change.
The test suit should also prove the software works.