Data Collection Automation

In a Market Where Being 3 Seconds Faster Wins

Resell Community Platform

How a #1 Cook Group in North America achieved real-time monitoring of hundreds of Korean e-commerce sites with 0% message loss

Faster Than Anyone, Missing Nothing

The Extreme Requirements of Real-Time Limited Edition Monitoring

A 'Cook Group' is a large-scale purchasing community that shares information and provides proxy buying services for limited edition items. Knowing when limited sneakers or collaboration items drop on Nike, Musinsa, Kasina, Adidas, and other sites is literally money. The problem was that this information needed to be discovered 'faster than anyone else'.

Visualization showing various bypass methods needed per site

Hundreds of sites each with different security policies requiring unique bypass strategies

Visualization showing complex site access structures

Overcoming complex access restrictions: Cloudflare, captcha, rate limits

In this market, '3 seconds' is life or death. Being 3 seconds slower than competitors means losing member trust. And we needed to monitor hundreds of Korean e-commerce sites simultaneously, each with different security policies and different ways of blocking access. Cloudflare, bot detection, captcha, IP blocking... This was a completely different challenge from simple data scraping.

Existing solutions in the market all tried to 'fit us into their solution'. But the Cook Group's requirements were different. Simple collection wasn't the goal. Collection → Analysis → Notification in one flow, faster than competitors, without missing a single item. Using browser engines like Selenium or Playwright is accurate but too slow. To go fast, you need direct HTTP requests, but then you have to bypass each site's different security. And messages couldn't be lost. First delivery must be fast, but loss probability must be 0%.

Specific Challenges

  • Need to monitor hundreds of Korean e-commerce sites simultaneously in real-time
  • Different security policies per site (Cloudflare, captcha, bot detection, IP blocking, etc.)
  • Extreme speed competition where being 3 seconds slower than competitors means losing trust
  • Browser engines (Selenium, Playwright) are accurate but too slow
  • Direct HTTP requests are fast but require different bypass logic for each site - extremely high development difficulty
  • Various monitoring types needed: new release detection, restock alerts, price changes for specific items
  • Not a single message can be lost - first delivery must be fast, loss probability must be 0%
  • Existing market solutions offered zero flexibility with 'adapt to our solution' approach

We asked the solutions in the market. They all said we had to adapt to their solution. But our requirements weren't that simple. We needed to watch hundreds of sites simultaneously, be faster than competitors, and not miss a single message. We needed someone who could actually do this.

Cook Group Operations Team

30-50% Faster Than Competitors, 0% Loss Rate

A Custom Crawling Framework Redesigned from the Network Layer

Since simple collection wasn't the goal, we had to build from scratch. These requirements were impossible to meet with generic market solutions. We designed a multiprocessing-based lightweight crawler framework, modularized site-specific bypass strategies, and built a lossless notification system based on Kafka message queues.

The first challenge was each site's security barriers. Musinsa, Nike, Kasina, Adidas... hundreds of sites each block bots in different ways. Some use Cloudflare, some use proprietary captcha, some implement IP rate limiting. Breaking through required deep understanding of the N/W layer. We analyzed each site's authentication flow, session management, and request patterns to design optimized bypass strategies per site.

But we couldn't use browser engines like Selenium. They're accurate but too slow. Launching a real browser, rendering pages, parsing DOM - precious seconds wasted in each step. So we developed direct HTTP requests with precisely matched headers, cookies, and auth tokens for each site. We built our own lightweight crawler engine with proxy manager, header generator, and retry logic.

복잡한 비즈니스 로직을 체계화한 최저가 자동 계산 알고리즘

The key was the 'Template Method Pattern'. The framework handles common crawling logic (proxy management, header generation, error handling, retries), while site-specific parts (parsing logic, auth logic) are separated into modules. This means when a new site is added or an existing site changes structure, only that module needs modification. No need to touch the entire system. And any developer can modify 'the collection part, the parsing part' with minimal learning curve and apply it to the system.

We solved the message loss problem with Kafka. Regular webhook methods lose messages when there's network instability or receiver-side failures. By implementing a Kafka-based message queue, we created a structure that delivers as fast as possible while retaining messages until receipt is confirmed. Result: 0% message loss rate.

Discord notification example

Real-time Discord alerts - extensible architecture for various external channels

The external notification system was designed with the concept of 'external channels'. Discord is just one channel. Slack, Telegram, webhook, email - any channel can be added as a plugin. When a customer wants a new notification channel, we just add the channel module without touching the existing system.

And we made all of this controllable from a web service. Which sites to monitor, which keywords to watch, where to send notifications - all configurable on the web without developers.

Monitoring system login screen

Secure login system - permission management and access control

Main dashboard

Main dashboard - overall monitoring status at a glance

Real-time collection status is viewable on the logging monitor screen. When each crawler ran, how many products it collected, whether there were errors - all displayed in real-time. Problems are immediately visible, and individual crawlers can be toggled on/off directly from the web.

Real-time logging monitor screen

Detailed control and real-time logging monitoring - crawler status per site

Collected data can be utilized like a search engine. Search indexed products by keyword, check price change history, filter products by specific conditions. Not just 'a system that sends notifications', but an extensible structure for 'a platform that accumulates and analyzes data'.

Product search engine

Search collected and indexed products - search engine functionality for data utilization

Core Features of the Built System

100s+ Sites Simultaneous Monitoring

Real-time monitoring of hundreds of Korean e-commerce sites including Musinsa, Nike, Kasina, Adidas. When sites are added, just add a module.

30-50% Faster Than Competitors

Maximized speed with lightweight crawlers using direct HTTP requests instead of browser engines. Delivers notifications 30-50% faster than competitors.

0% Message Loss Rate

Kafka-based message queue ensures not a single notification is missed. Achieved both fast delivery and perfect reliability.

Custom Bypass Strategies Per Site

Modularized bypass strategies for different security policies per site - Cloudflare, captcha, IP blocking. Built with deep N/W layer understanding.

Extensible Notification Channels

Any channel - Discord, Slack, Telegram, webhook - can be added as a plugin. Send notifications to customer's preferred channel.

Web-Based Integrated Management

Manage monitoring targets, keywords, notification settings directly on the web without developers. Real-time logging and search engine included.

Earning the Trust of North America's #1 Cook Group

Strengthening Market Position with Overwhelming Speed and Stability

After system deployment, the client secured a clear competitive advantage. 30-50% faster notification speed than competitors, rock-solid stability that never misses a message. Member trust increased, and that trust led to member growth in a virtuous cycle. Most importantly, response development time for site changes or new site additions was dramatically reduced.

30-50%↑
Notification Speed
Faster notification delivery vs competitors
0%
Message Loss Rate
Perfect delivery guaranteed with Kafka queue
100s+
Sites Monitored
Major Korean e-commerce sites monitored simultaneously
80%↓
Response Dev Time
Just modify the module when sites change

What Actually Changed

Always first to notify, before competitors

In limited edition notifications, 3 seconds is critical. Now the client always sends notifications before competitors. Members feel 'this is the fastest' and that directly translates to retention and new signups.

Never miss a single notification

Before, notifications sometimes arrived late or not at all. When members say 'I didn't get the notification last time', it's fatal. Now with Kafka, every message is guaranteed to be delivered. 0% message loss.

Quick response to site changes

E-commerce sites frequently change their structure. Before, we had to dig through the entire system each time. Now we just modify the relevant site module. Response development time reduced by over 80%. Even the ops team, not just developers, can make simple modifications.

Fast addition of new sites

When requests come in to add new e-commerce sites, it used to take weeks. Now we just create a parser module following the framework, so it takes days. Thanks to a clear architecture that any developer can understand and contribute to.

We kept hearing 'we can't do that' from market solutions. But OTOworks was different. They heard our requirements and said 'this is how we'll do it', and they actually delivered. Not just faster than competitors, but no missed messages, quick response to site changes. Now it's clear why we're #1 in North America.

Cook Group Operations Team
#1 Cook Group in North America

Complex Data Collection, Can You Do It Faster Than Competitors?

Thinking "Hundreds of sites simultaneously? Zero message loss?"? This client also kept hearing 'can't be done' from the market. Let's talk and find a way.