
Closed
Posted
Paid on delivery
Hey! I’m looking to hire an experienced developer to build a universal product-detail scraping pipeline that takes a product URL (any website) and returns a complete structured product record. This is not a “simple HTML parse.” Many target sites are React/Next/Vue, load content via XHR/GraphQL, hide details behind tabs/accordions/modals, and lazy-load images/PDFs. The solution needs to reliably extract everything a human can see on the page, plus the underlying data used to render it. What the scraper must do (high level) Given a product URL, the pipeline should: Load the page like a real user (handle cookies/overlays). Capture all content from multiple sources (DOM + network + interactions). Use GPT API strategically to increase accuracy (field mapping, variant extraction, doc classification, completeness checks). Output a strict, validated JSON record + optional Excel export. Data fields I need to extract (core) Required output fields: Product name manufacturer / brand description (clean, human-readable) images[] (high quality URLs, deduped; include context/alt when possible) documents[] (PDF/spec sheets/install guides/warranties/BIM/etc., classified) options[] / variants (SKUs if available; option dimensions like color/size/material; availability if available) attributes{} (everything else: specs, dimensions, sustainability/certifications, compliance info, finish codes, etc.) Additionally (for completeness & auditability): Full rendered page text: pageText Sectioned text: headings/paragraphs/lists/tables: pageTextSections Structured data capture: JSON-LD + embedded state blobs (e.g., __NEXT_DATA__) + meta tags Network payload evidence: selected API responses that contain product truth (saved with URL + snippet/hash) Provenance per field: source + confidence + evidence snippet Universal extraction approach I want (technical requirements) Tech stack (preferred): Playwright (preferred) or Puppeteer for browser automation Node.js or Python acceptable GPT API integration for: mapping, variants, document classification, and completeness audit loops Must-have capabilities: JS-rendered content support (wait for hydration; not just raw HTML) Network interception: capture JSON/GraphQL responses during load + interactions Interaction replay: scroll for lazy loads expand accordions (“See more”, “Specs”, “Downloads”) click tabs open modals/drawers (e.g., availability, downloads) attempt variant selection and record deltas Asset harvesting: harvest images & PDFs from DOM and network responses (not only <a href> / <img src>) Anti-fragility: robust waiting (not only networkidle) retry logic consistent error reporting Output validation: JSON schema validation deterministic structure even when fields are missing (nulls, empty arrays) How GPT should be used (important) I have a GPT API key and want it used heavily but intelligently: Decide page type and extraction plan (product vs category vs doc page) Identify which network payloads contain product data Normalize messy specs into key/value Reconstruct variants/options from partial signals Classify documents (spec sheet vs install vs warranty vs BIM) Run a completeness audit and suggest the next actions (click this / expand that) until the record is complete Rule: GPT must not hallucinate. If uncertain, output null + evidence + recommended next action. Deliverables A runnable scraper (CLI or small service) that accepts a product URL and outputs: [login to view URL] (structured) optional [login to view URL] A “self-healing” completeness loop with logs: what interactions were performed what was missing what sources were used (DOM/network/GPT/OCR if used) Documentation: setup instructions how to add sites / tune extraction how to run in headless mode Basic test set: run on ~10 diverse product URLs (Shopify + custom + [login to view URL] + heavy JS) and show outputs Nice-to-haves Dockerfile Queue/scheduler support for batch runs Proxy support (only if needed) Optional OCR fallback using screenshot + vision for hard edge cases
Project ID: 40228485
124 proposals
Remote project
Active 26 days ago
Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs
124 freelancers are bidding on average $163 USD for this job

I am confident that my skills in PHP, Python, Data Processing, Web Scraping, and Software Architecture perfectly align with the requirements of the project "Hiring Developer: Universal Product Page Scraper (JS-rendered sites) + GPT-assisted extraction + structured JSON output -- 2." I am eager to discuss the full project scope and adjust the budget accordingly to ensure a successful outcome. Please review my extensive 15-year-old profile to see the quality of work I have delivered. Let's start this project together, and I am ready to showcase my commitment from the get-go. Looking forward to your response.
$175 USD in 7 days
8.7
8.7

Hi there, I understand you need a robust product-detail scraping pipeline that can handle complex web architectures and extract comprehensive product data. This project requires a solution that not only captures visible content but also the underlying data, ensuring accuracy and completeness. I propose to utilize Playwright for browser automation, enabling us to interact with dynamic content and handle various loading mechanisms effectively. The integration of the GPT API will enhance our data extraction capabilities, allowing for intelligent mapping and classification of product details. Throughout the development process, I will maintain clear communication and provide regular updates to ensure the project aligns with your expectations. My focus will be on delivering a high-quality, reliable solution that meets all specified requirements. I am excited about the opportunity to work on this project and would love to discuss it further. Please feel free to reach out. Best regards, Burhan Ahmad TechPlus
$800 USD in 10 days
8.2
8.2

Hello! I am a seasoned developer specializing in building sophisticated scraping pipelines. I am excited about your project to create a universal product-detail scraper that can handle complex JS-rendered sites. My approach involves utilizing Playwright or Puppeteer for browser automation, integrating GPT API strategically, and ensuring robust error handling and output validation. I am confident in my ability to deliver a reliable solution that extracts all necessary product information accurately and efficiently. I am eager to discuss the technical details and requirements further to bring your vision to life. Let's collaborate to create a cutting-edge scraping tool that meets your needs. Looking forward to the opportunity! Thank you.
$180 USD in 3 days
8.2
8.2

Hello, With my deep understanding of JavaScript, Python, and web scraping, I am well-equipped to tackle your unique challenges in extracting data from JS-rendered sites. I am highly skilled in using Playwright and Puppeteer for browser automation, and have substantial experience in Node.js that aligns perfectly with your tech stack preferences. My expertise goes beyond simple HTML parsing; my solutions handle cookies, overlays, XHR/GraphQL content loading, document classification, as well as extracting content hidden behind tabs, accordions, modals and more. Moreover, I believe my proficiency in using the GPT API will be invaluable for your project. I understand how important it is to use this tool strategically to ensure accurate results without hallucinations. I can design your scraper to decide page types and extraction plans intelligently, identify the network payloads containing product data and normalize complex product specifications into a neat key value format. Additionally, my knack for clean coding will ensure that the delivered JSON records will maintain a deterministic structure even when some fields may be missing. Lastly, my ability to build scalable solutions and my meticulous attention to detail align perfectly with your requirements for a self-sufficient crawler. I am adept at creating structured logs that track interactions performed, ingredient details as evidence, sources used (DOM/network/GPT/OCR), and my cod Thanks!
$155 USD in 5 days
7.2
7.2

Hi there! I am a seasoned developer with extensive experience in web development, Node.js, React, and Excel automation. I am confident in my ability to create a universal product-detail scraping pipeline that will meet all your requirements. I have successfully completed similar projects in the past and am well-equipped to handle the complexities of scraping data from various sources. My approach will ensure accurate and reliable extraction of product details from any website. I am excited to deliver a top-notch solution for you. Let's get started!
$139 USD in 7 days
7.4
7.4

Hi! My name is Marjan and I'm here to offer you my services as a skilled applicant with over a decade of experience working on Freelancer.com. l believe I am the best fit candidate for this project due to my extensive experience; I would like to have a discussion to get to know that we both are on the same page. Once the scope will be locked, I will start working on it right away.
$140 USD in 7 days
6.6
6.6

Hi there! I am excited about the opportunity to develop the universal product-page scraper you need. With extensive experience in web scraping, particularly with JavaScript-rendered sites using tools like Playwright and Puppeteer, I can build a robust solution that captures all visible and hidden data from product pages. My approach will ensure accurate extractions through network interception and precise interactions with the page elements, effectively leveraging the GPT API to enhance data accuracy and completeness without hallucination. I propose a delivery timeline of 14 days to ensure thorough testing across various product URLs.
$120 USD in 5 days
6.6
6.6

Hi there! I’m thrilled to propose my services for your universal product-detail scraping project. With extensive experience in developing robust web scraping solutions and expertise in handling JS-rendered content, I’m confident I can deliver a meticulous and efficient pipeline that meets your specifications. Leveraging tools like Playwright for browser automation and integrating the GPT API strategically, I will ensure accurate extraction of all required product data, including variants, images, and documents. My approach guarantees that the solution is resilient to changes on the target sites, effectively managing complex interactions and lazy-load scenarios. We'll curate the scraper to output structured JSON records and, if needed, an Excel export. Based on your needs, I propose a timeline of 14 days for development. Please let me know how we can move forward!
$250 USD in 14 days
6.7
6.7

Hello Bhoomika S., I am a skilled developer with expertise in PHP, Python, Web Scraping, Software Architecture, JSON, Scrapy, Data Extraction, BeautifulSoup, and Selenium. I understand your need for a universal product-detail scraping pipeline for JS-rendered sites, incorporating GPT-assisted extraction for structured JSON output. My approach involves using Playwright or Puppeteer for browser automation, Node.js or Python, and strategically integrating the GPT API for accuracy enhancement. I will ensure the scraper loads pages like a real user, captures all visible content, and delivers a strict, validated JSON record with optional Excel export. Additionally, I will implement a self-healing completeness loop and provide comprehensive documentation for ease of use. I am eager to discuss how I can assist you further with this project. Thank you for considering my proposal. Best regards,
$140 USD in 7 days
5.9
5.9

Hello, I have over 7 years of experience in Data Processing, Web Scraping, Data Extraction, Python, and Scrapy. I have carefully reviewed the requirements for the project and am confident in my ability to deliver a solution that meets your needs. To create the universal product-detail scraping pipeline, I will use Playwright for browser automation, ensuring support for JS-rendered content and network interception. I will integrate the GPT API strategically to enhance accuracy in field mapping, variant extraction, and completeness checks. The pipeline will capture all visible content on the page, including images, documents, options, attributes, and structured data. Additionally, I will implement a self-healing completeness loop that logs interactions, missing data, and data sources used. The output will consist of a structured JSON record and an optional Excel export. The solution will also include documentation for setup, site addition, and running in headless mode. For further discussion and to delve into the specifics of the project, please connect with me in the chat. You can visit my Profile here: https://www.freelancer.com/u/HiraMahmood4072 Thank you.
$100 USD in 2 days
6.0
6.0

Hi there! I am excited about the opportunity to develop a universal product-detail scraping pipeline for you. With expertise in data extraction and web scraping, I will ensure the scraper can handle complex websites efficiently. Leveraging Playwright for browser automation and integrating the GPT API strategically will enhance accuracy and completeness in data extraction. The deliverables will include structured JSON records and an optional Excel export. Let's discuss the next steps and your specific requirements further. How can I assist you further with this project?
$155 USD in 1 day
5.9
5.9

Hello Dear! I write to introduce myself. I'm Engineer Toriqul Islam. I was born and grew up in Bangladesh. I speak and write in English like native people. I am a B.S.C. Engineer of Computer Science & Engineering. I completed my graduation from Rajshahi University of Engineering & Technology ( RUET). I love to work on Web Design & Development project. Web Design & development: I am a full-stack web developer with more than 10 years of experience. My design Approach is Always Modern and simple, which attracts people towards it. I have built websites for a wide variety of industries. I have worked with a lot of companies and built astonishing websites. All Clients have good reviews about me. Client Satisfaction is my first Priority. Technologies We Use: Custom Websites Development Using ======>Full Stack Development. 1. HTML5 2. CSS3 3. Bootstrap4 4. jQuery 5. JavaScript 6. Angular JS 7. React JS 8. Node JS 9. WordPress 10. PHP 11. Ruby on Rails 12. MYSQL 13. Laravel 14. .Net 15. CodeIgniter 16. React Native 17. SQL / MySQL 18. Mobile app development 19. Python 20. MongoDB What you'll get? • Fully Responsive Website on All Devices • Reusable Components • Quick response • Clean, tested and documented code • Completely met deadlines and requirements • Clear communication You are cordially welcome to discuss your project. Thank You! Best Regards, Toriqul Islam
$80 USD in 4 days
5.9
5.9

Hello Bhoomika S., I am Maryam Abbas, a seasoned developer with 4 years of experience in PHP and Web Scraping. I have carefully reviewed the project requirements for the Universal Product Page Scraper. To achieve the desired outcome, I propose utilizing Playwright for browser automation, incorporating GPT API strategically for accuracy enhancement, and ensuring robust error handling and output validation through JSON schema validation. My extensive experience with various technologies equips me to deliver successful projects consistently. You can explore my portfolio at https://www.freelancer.pk/u/maryam951 Let's discuss your project further to explore the details. Best regards, Maryam Abbas
$30 USD in 2 days
5.9
5.9

Hiring Developer: Universal Product Page Scraper (JS-rendered sites) + GPT-assisted extraction + structured JSON output -- 2 I’m a full-stack software engineer with expertise in React, Node.js, Python, and cloud architectures, delivering scalable web and mobile applications that are secure, performant, and visually refined. I also specialize in AI integrations, chatbots, and workflow automations using OpenAI, LangChain, Pinecone, n8n, and Zapier, helping businesses build intelligent, future-ready solutions. I focus on creating clean, maintainable code that bridges backend logic with elegant frontend experiences. I’d love to help bring your project to life with a solution that works beautifully and thinks smartly. To review my samples and achievements, please visit:https://www.freelancer.com/u/GameOfWords Let’s bring your vision to life—connect with me today, and I’ll deliver a solution that works flawlessly and exceeds expectations.
$100 USD in 7 days
5.8
5.8

Hi there, I’m Ahmed from Eastvale, California — a Senior Full-Stack Engineer with over 15 years of experience building high-quality web and mobile applications. After reviewing your job posting, I’m confident that my background and skill set make me an excellent fit for your project — Hiring Developer: Universal Product Page Scraper (JS-rendered sites) + GPT-assisted extraction + structured JSON output -- 2 . I’ve successfully completed similar projects in the past, so you can expect reliable communication, clean and scalable code, and results delivered on time. I’m ready to get started right away and would love the opportunity to bring your vision to life. Looking forward to working with you. Best regards, Ahmed Hassan
$120 USD in 2 days
5.2
5.2

High Quality, Low Price You’re looking for a universal, JS-rendered product-page scraper that reliably extracts DOM + network + hidden data, enriches it with GPT, and outputs a fully validated product JSON record—I can deliver this with a precise, stable, and self-healing architecture. I’ve built Playwright-based pipelines that handle dynamic frameworks (React/Next/Vue), expand UI interactions, capture GraphQL/XHR payloads, and integrate GPT for field mapping, document classification, and completeness auditing. My solution approach: • Build a Playwright-powered pipeline that loads pages like a real user, expands tabs/accordions, scrolls for lazy loads, and intercepts all JSON/GraphQL network payloads. • Use GPT to normalize specs, extract variants, classify documents, and run iterative completeness checks without hallucination (nulls + evidence when uncertain). • Output strict, schema-validated JSON including product details, variants, attributes, documents, provenance, and extracted structured data (JSON-LD, meta, NEXT_DATA). • Provide a clean CLI/service + logs, Excel export, test URLs, and documentation, with optional Docker, queue support, and proxy handling. Let’s jump on a quick chat if you are looking for a reliable and best problem-solver for your project. Best regards, Muamer Kaukovic
$140 USD in 7 days
5.3
5.3

As a seasoned developer with over 7 years of experience, I believe my comprehensive skill set aligns perfectly with your project's demands. My proficiency in Python, Node.js, and web scraping makes me well-equipped to tackle not just a "simple HTML parse" but the intricate task of rendering various JS heavy websites and capturing all relevant content for a complete structured product record. Furthermore, having worked extensively on diverse web development projects, I am accustomed to dynamically changing requirements like "load content via XHR/GraphQL, hide details behind tabs/accordions/modals, and lazy-load images/PDFs". My ability to create custom solutions using modern tools like Playwright or Puppeteer for browser automation ensures that I can precisely cater to your needs. Lastly, as an AI enthusiast, hence my fluency in implementing GPT-enabled solutions is another feather in my cap that distinguishes me from other freelancers. I know how to use GPT effectively yet intelligently to ensure that it doesn't hallucinate and provides the best possible outcomes for your data extraction needs. From using GPT API for field mapping and variant extraction to running a completeness audit and suggesting next actions until the desired record completion - consider these complex tasks handled with ease by hiring me for this project.
$30 USD in 7 days
6.5
6.5

Hello! I'm excited about your project to build a universal product-detail scraping pipeline. I understand that you need a robust solution that can handle complex JS-rendered sites and extract detailed product information reliably. This aligns perfectly with my experience in web scraping and data extraction, where I successfully developed a similar solution, efficiently capturing data from various interactive websites. ✅My Plan: - Utilize Playwright for accurate page rendering and handling various user interactions. - Implement GPT API for intelligent data mapping, extracting variants, and ensuring completeness checks. - Capture data from both the DOM and network responses, ensuring you receive a comprehensive product record. - Structure the output in strict JSON format, with an optional Excel export for your convenience. Are there specific product websites you want to prioritize in the testing phase, or any particular features you consider must-haves? Best regards, Hongqiang Chen
$190 USD in 2 days
5.1
5.1

Greetings! I see you're looking to develop a universal product-detail scraping pipeline that can handle complex, JS-rendered sites and return structured product records. This is an exciting challenge! I’d approach it by utilizing tools like Playwright or Puppeteer for browser automation to mimic real user interactions, ensuring we capture all relevant content, including details hidden behind dynamic elements. Integrating GPT API would enhance the accuracy of data extraction, helping to normalize messy specs and classify documents effectively. With a focus on creating a robust, reliable output in JSON format, I would ensure the scraper supports various scenarios, from lazy-loaded images to variant selections. My experience in web scraping and data processing would be instrumental in building a solution that meets your needs. Best regards, Mehran Riaz
$180 USD in 3 days
5.2
5.2

Hi there, We understand you need a universal, self-healing product-detail scraping pipeline using Playwright with network interception and GPT-assisted extraction to return a strict, validated JSON record (plus optional Excel) with full provenance, structured data capture, variant reconstruction, document classification, and completeness audit loops. SEO Global Team has built advanced scraping and data-engineering systems handling React/Next/Vue apps, GraphQL/XHR interception, anti-fragile automation, schema validation, and intelligent GPT orchestration without hallucination, delivering production-ready CLI services with Docker and batch support. We will architect a Playwright-driven extraction engine with DOM + network harvesting, interaction replay, asset capture, and evidence logging, integrate GPT for mapping, normalization, variant reconstruction, and audit loops with null-safe outputs, enforce JSON schema validation, and deliver a documented CLI service with test runs across diverse JS-heavy sites. Do you prefer Node.js or Python for long-term maintainability? Will this run in your infrastructure or require proxy rotation support? What timeline are you targeting for the first production-ready version? Looking forward to working with you, SEO Global Team
$140 USD in 7 days
5.0
5.0

Saint Augustine, United States
Payment method verified
Member since Jun 21, 2024
$30-250 USD
$30-250 USD
$30-250 USD
$30-250 USD
₹12500-37500 INR
$250-750 USD
$10-30 USD
$30-250 CAD
₹12500-37500 INR
₹600-1500 INR
₹16000-80000 INR
₹12500-37500 INR
₹600-1500 INR
₹600-1500 INR
₹1500-15000 INR
₹750-1250 INR / hour
₹12500-37500 INR
₹12500-37500 INR
€12-18 EUR / hour
₹750-1250 INR / hour
€250-750 EUR
$30-250 USD
$30-250 AUD
$30-250 USD