Back to Blog
Freelance & BusinessAutomationScrapingNode.js

How I Automated My Freelance Prospection Pipeline

AG
Andy Garcia
February 28, 202610 min read

I spent three months manually checking Malt, Upwork, and LinkedIn every morning. Then I built a pipeline that does it for me — scraping, scoring, and sending Telegram alerts before my coffee is ready. Here's exactly how I did it and how you can too.

1. The Problem: Manual Prospection Is a Full-Time Job

As a freelance fullstack dev based near Toulouse, I work primarily through Malt and Upwork, with occasional LinkedIn leads. The morning ritual was always the same: open three tabs, scroll through dozens of listings, mentally filter out the irrelevant ones, copy-paste interesting opportunities into a Notion page, then forget about half of them by afternoon.

The real issue isn't that there aren't enough missions — there are plenty. The issue is signal-to-noise ratio. Most listings are either a bad tech fit, underpaid, or already three days old. By the time I responded to good ones, someone else had already landed an interview.

I'm decent at React Native, Next.js, and building AI integrations. I have clear preferences: minimum 400€/day, remote-first, projects lasting more than two months. Those constraints should filter 90% of noise automatically. I just needed a machine to do it.

2. The Architecture

The system runs on the same VPS I use for other automation projects — a €6/month Hostinger instance running Ubuntu 24.04. Nothing exotic. The pipeline has four stages:

Pipeline Overview

🕷️ScraperMalt / Upwork / LI
🔍FilterKeywords + Budget
⚖️ScorerWeighted match
📱AlertTelegram bot

node-cron triggers the full pipeline every 90 minutes · MongoDB handles deduplication

Each stage is a separate Node.js module. They share a thin interface: the scraper outputs an array of raw listing objects, the filter trims it, the scorer ranks what remains, and the alert module sends whatever crosses the threshold. MongoDB sits alongside as a simple seen-IDs store to avoid duplicate alerts.

3. Scraping Strategy: Not Getting Blocked

Malt has a public search that returns HTML. Upwork has an RSS feed for job searches (underrated — no scraping needed). LinkedIn is harder and I use their public jobs search page with Puppeteer sparingly.

The golden rules I follow to stay unbanned:

  • 1Rotate User-Agent strings on every request (I keep a pool of 12)
  • 2Add 2–5 second jitter between requests — never fire in rapid succession
  • 3Use the platform's official API or RSS where it exists (Upwork, some LinkedIn endpoints)
  • 4Run scrapes every 90 minutes, not every 5 — no platform cares about hourly crawls
  • 5Respect robots.txt for the public pages; I'm not extracting private data

Here's the core scraper structure for Malt:

// scrapers/malt.js
const axios = require('axios');
const cheerio = require('cheerio');

const USER_AGENTS = [
  'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
  'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15',
  // ... 10 more
];

async function scrapeMalt(query = 'fullstack react next.js') {
  const ua = USER_AGENTS[Math.floor(Math.random() * USER_AGENTS.length)];
  const url = `https://www.malt.fr/s?q=${encodeURIComponent(query)}&sort=date`;

  const { data } = await axios.get(url, {
    headers: {
      'User-Agent': ua,
      'Accept-Language': 'fr-FR,fr;q=0.9',
      'Accept': 'text/html,application/xhtml+xml',
    },
    timeout: 10000,
  });

  const $ = cheerio.load(data);
  const listings = [];

  $('[data-mission-id]').each((_, el) => {
    const $el = $(el);
    listings.push({
      id: $el.attr('data-mission-id'),
      title: $el.find('.mission-title').text().trim(),
      budget: parseFloat($el.find('.mission-budget').text().replace(/[^0-9.]/g, '')),
      description: $el.find('.mission-description').text().trim(),
      skills: $el.find('.mission-skill').map((_, s) => $(s).text().trim()).get(),
      postedAt: new Date($el.find('time').attr('datetime')),
      url: 'https://www.malt.fr' + $el.find('a.mission-link').attr('href'),
      source: 'malt',
    });
  });

  // Jitter: wait 2–4 seconds before returning
  await new Promise(r => setTimeout(r, 2000 + Math.random() * 2000));
  return listings;
}

For Upwork, I use their job search RSS endpoint which is public and returns clean XML — Cheerio parses it in under 50ms. For LinkedIn I use Puppeteer only once a day during off-peak hours (6am), targeting the public jobs search page.

4. Smart Filtering: Cutting the Noise

Before scoring, I filter aggressively. A listing that doesn't pass the filter never reaches the scorer. This keeps the Telegram feed clean and prevents alert fatigue.

// filters/index.js
const REQUIRED_TECH = ['react', 'next.js', 'nextjs', 'node', 'typescript'];
const BONUS_TECH   = ['react native', 'python', 'ai', 'openai', 'langchain'];
const BLOCKED_TECH = ['php', 'wordpress', 'drupal', 'cobol', 'sharepoint'];

const MIN_BUDGET_DAY = 350; // €/day minimum
const MIN_DURATION_WEEKS = 4;

function passesFilter(listing) {
  const desc = (listing.title + ' ' + listing.description).toLowerCase();

  // Hard blocklist — instant reject
  if (BLOCKED_TECH.some(t => desc.includes(t))) return false;

  // Must mention at least one core tech
  const hasRequired = REQUIRED_TECH.some(t => desc.includes(t));
  if (!hasRequired) return false;

  // Budget check (if available)
  if (listing.budget && listing.budget < MIN_BUDGET_DAY) return false;

  // Duration check (if mentioned)
  const durationMatch = desc.match(/(\d+)\s*(mois|months?|semaines?|weeks?)/i);
  if (durationMatch) {
    const value = parseInt(durationMatch[1]);
    const unit  = durationMatch[2].toLowerCase();
    const weeks = unit.startsWith('mois') || unit.startsWith('month') ? value * 4 : value;
    if (weeks < MIN_DURATION_WEEKS) return false;
  }

  return true;
}

module.exports = { passesFilter };

In practice this filter rejects about 75% of raw listings. The remaining 25% go to the scorer.

5. Scoring System: Ranking What Remains

Each surviving listing gets a weighted score out of 100. I tuned the weights based on what I actually care about when choosing a mission.

FactorMax PointsLogic
Tech stack match35Bonus tech stack hits (React Native, AI, Python)
Budget vs. rate25Linear scale from 350€ to 600€+ per day
Client rating20Platform reputation / review count
Project duration10Preference for 2–6 month missions
Remote flag1010pts if fully remote, 5pts if hybrid, 0 if on-site only
// scoring/index.js
function scoreListing(listing) {
  let score = 0;
  const desc = (listing.title + ' ' + listing.description).toLowerCase();

  // Tech match (35 pts)
  const bonusHits = BONUS_TECH.filter(t => desc.includes(t)).length;
  score += Math.min(bonusHits * 10, 35);

  // Budget (25 pts)
  if (listing.budget) {
    const budgetScore = Math.min(
      ((listing.budget - 350) / (600 - 350)) * 25,
      25
    );
    score += Math.max(budgetScore, 0);
  }

  // Client rating (20 pts)
  if (listing.clientRating) {
    score += (listing.clientRating / 5) * 20;
  }

  // Duration (10 pts)
  if (listing.durationWeeks) {
    if (listing.durationWeeks >= 8 && listing.durationWeeks <= 24) {
      score += 10;
    } else if (listing.durationWeeks > 4) {
      score += 5;
    }
  }

  // Remote (10 pts)
  if (desc.includes('full remote') || desc.includes('100% remote')) score += 10;
  else if (desc.includes('remote') || desc.includes('télétravail')) score += 5;

  return Math.round(score);
}

module.exports = { scoreListing };

Listings with a score ≥ 70 trigger an instant Telegram alert. Scores between 50 and 69 are batched into a daily digest sent at 8am. Anything below 50 is discarded silently.

6. Telegram Alerts: Format That Makes You Act

A good alert needs to contain everything I need to decide whether to apply, without opening the browser. I settled on this format:

Example Telegram message

🔥 HIGH MATCH — Score: 84/100

Senior Next.js + AI Integration Dev

📍 Remote · 4 mois · 520€/j

🏢 Client: 4.8★ (47 avis) · Malt

Stack: Next.js, OpenAI API, Supabase, TypeScript. Mission de refonte d'un SaaS RH avec intégration IA...

→ malt.fr/mission/abc123

Posté il y a 2h · Alerte à 07:34

// alerts/telegram.js
const axios = require('axios');

const BOT_TOKEN  = process.env.TELEGRAM_BOT_TOKEN;
const CHAT_ID    = process.env.TELEGRAM_CHAT_ID;
const SCORE_HIGH = 70;

function formatMessage(listing, score) {
  const emoji  = score >= 80 ? '🔥' : score >= 70 ? '⚡' : '📋';
  const label  = score >= 80 ? 'HIGH MATCH' : score >= 70 ? 'GOOD MATCH' : 'DIGEST';
  const budget = listing.budget ? `${listing.budget}€/j` : 'Budget n/c';
  const remote = listing.isRemote ? 'Remote' : listing.city || 'On-site';
  const age    = timeSince(listing.postedAt);

  return [
    `${emoji} ${label} — Score: ${score}/100`,
    `*${listing.title}*`,
    `📍 ${remote} · ${listing.duration || '?'} · ${budget}`,
    listing.clientRating
      ? `🏢 Client: ${listing.clientRating}★ · ${listing.source}`
      : `🏢 Source: ${listing.source}`,
    '',
    listing.description.slice(0, 200) + '...',
    '',
    `→ ${listing.url}`,
    `Posté ${age} · Alerte à ${new Date().toLocaleTimeString('fr-FR')}`,
  ].join('\n');
}

async function sendAlert(listing, score) {
  const text = formatMessage(listing, score);
  await axios.post(
    `https://api.telegram.org/bot${BOT_TOKEN}/sendMessage`,
    { chat_id: CHAT_ID, text, parse_mode: 'Markdown' }
  );
}

module.exports = { sendAlert };

7. The Stack

Node.jsRuntime for all scripts
CheerioHTML parsing / static scraping
PuppeteerJS-rendered pages (LinkedIn)
MongoDBDeduplication store (seen IDs)
node-cronSchedule scraper runs
Telegram Bot APIPush alerts to phone
axiosHTTP requests
dotenvSecrets management

The entire thing sits in a single repo, around 600 lines of code. No framework overhead, no transpilation step — plain CommonJS modules that PM2 keeps alive and restarts on crash.

// index.js — main orchestrator
const cron     = require('node-cron');
const { scrapeMalt }   = require('./scrapers/malt');
const { scrapeUpwork } = require('./scrapers/upwork');
const { passesFilter } = require('./filters');
const { scoreListing } = require('./scoring');
const { sendAlert, sendDigest } = require('./alerts/telegram');
const db       = require('./db'); // MongoDB helper

// Run every 90 minutes
cron.schedule('0 */90 * * * *', async () => {
  console.log('[cron] Starting pipeline run...');

  const raw = [
    ...await scrapeMalt('fullstack next.js react'),
    ...await scrapeUpwork('nextjs typescript remote'),
  ];

  const newListings = await db.filterSeen(raw); // dedup by listing.id
  const filtered    = newListings.filter(passesFilter);
  const scored      = filtered
    .map(l => ({ ...l, score: scoreListing(l) }))
    .sort((a, b) => b.score - a.score);

  await db.markSeen(newListings.map(l => l.id));

  for (const listing of scored) {
    if (listing.score >= 70) {
      await sendAlert(listing, listing.score);
    }
  }

  console.log(`[cron] Done. ${scored.length} scored, ${
    scored.filter(l => l.score >= 70).length
  } alerts sent.`);
});

// Daily digest at 08:00
cron.schedule('0 8 * * *', async () => {
  const digest = await db.getPendingDigest(); // score 50–69 from last 24h
  if (digest.length) await sendDigest(digest);
});

8. Results

I've been running this pipeline for about three months. Here's what changed:

< 5 min

Average response time to high-score missions

Was: 3–6 hours

3

Contracts landed in the first month

Was: 1 in 2 months

~20 min

Daily time spent on prospection

Was: 45–90 min

The biggest win isn't the contracts themselves — it's the mental load reduction. I no longer start my morning anxiously scrolling. I have a cup of coffee, check Telegram, and maybe spend 20 minutes reading the two or three alerts that came in overnight. Everything else is noise that the pipeline already handled.

One thing I didn't expect: the speed advantage is huge. Responding to a Malt listing within 30 minutes of posting gets a significantly higher response rate than responding after a day. The pipeline gave me that edge without any extra effort.

9. How to Build Yours: Step-by-Step Outline

You don't need to copy my exact setup. Here's a minimal path to get something working in a weekend:

  1. 1

    Pick your target platform

    Start with one source. Upwork RSS is the easiest entry point — no scraping, just XML parsing. Get that working before adding Malt or LinkedIn.

  2. 2

    Define your hard filters

    Write down your blocklist tech, minimum budget, and minimum duration. Be ruthless. If you're unsure, err toward filtering more — you can loosen it later.

  3. 3

    Build a minimal scorer

    Start with just two factors: tech match and budget. Weights can be refined once you see real alerts. A simple 0–100 score is enough.

  4. 4

    Set up a Telegram bot

    Create a bot via @BotFather (5 minutes), get your token, find your chat ID by messaging the bot and calling getUpdates. That's your alert channel.

  5. 5

    Run it manually first

    Don't set up cron until you've run the script manually a few times and verified the output looks right. Fix your selectors and filters against real data.

  6. 6

    Deploy on a VPS with PM2

    A €5–6/month VPS is plenty. Install Node, clone your repo, run `pm2 start index.js --name prospector`. PM2 will restart it on crash and survive reboots.

  7. 7

    Iterate on the scoring weights

    After two weeks of alerts, review which ones you actually applied to. Increase weights for factors that correlated with good missions, decrease others.

Want the full source?

I'm considering releasing a cleaned-up version of this as an open-source template. If you're interested, reach out via the contact form — if enough people ask, I'll publish it.

Back to Blog

Andy Garcia · Toulouse, France