TenderIntel
7:00 AM PKT System Online

Getting Started

What is TenderIntel?

TenderIntel is an AI-powered tender aggregation platform built by Ideazshuttle LLC. It automatically scans Pakistan's leading newspapers every morning, extracts tender notices using OCR and large language models, and presents them in a single searchable interface.

Instead of manually checking multiple newspaper websites and classified sections each day, procurement teams and contractors can log in once and see all active tenders in one place — searchable by sector, status, and deadline.

Who is this platform for?

TenderIntel is designed for:

  • Procurement officers who need to monitor government and corporate tenders daily.
  • Contractors & vendors looking for new business opportunities across sectors.
  • Consultants & law firms tracking public procurement for clients.
  • Corporate compliance teams monitoring competitor tender activity.
Do I need to install anything?

No. TenderIntel runs entirely in your browser. There are no plugins, no desktop apps, and no spreadsheets to manage. Every page is server-rendered — all data is live from the database.

Data Sources

Which newspapers are currently monitored?

TenderIntel scrapes the following active e-paper sources daily:

Business Recorder
Daily Ausaf
Daily Express
Daily Pakistan
Dawn
Dunya News
Jang
Nawaiwaqt
The Nation
The News
The full list of newspapers and their status is available on the Newspapers page.
Why are some newspapers shown as inactive?

A newspaper is marked inactive when its e-paper portal is temporarily unavailable, has changed its URL structure, or blocks automated access. The team monitors these and re-enables them once access is restored.

Inactive newspapers do not run during the daily auto-scrape to avoid failed jobs and error noise.

How far back does the data go?

Historical data depends on when each newspaper was first added to the platform. From that date forward, all scraped tenders are retained indefinitely in the database. You can use the Manual Scrape tool to backfill any specific past date as long as the newspaper's e-paper archive is still available online.

Can I request a new newspaper to be added?

Yes. Contact Ideazshuttle at info@ideazshuttle.com with the newspaper name and its e-paper URL. The team will assess the site's structure and add a compatible fetcher — typically within a few business days.

How It Works

What happens during a scrape?

Each newspaper goes through a 6-stage automated pipeline:

  1. 1
    Fetch Pages
    Downloads all e-paper page images for the target date from the newspaper's website. Pages are fetched as high-resolution JPEGs (~500 KB–2 MB each).
  2. 2
    OCR — Text Extraction
    Each page image is processed by Tesseract OCR to extract raw text. For Urdu newspapers, specialised OCR models handle Nastaliq script. Confidence scores are logged per page.
  3. 3
    Image Storage
    Page images are uploaded to S3 (or saved locally as a fallback). Each image is addressable by newspaper + date + page number.
  4. 4
    LLM Extraction
    OCR text is sent to a large language model which identifies tender notices, then extracts structured fields: title, company, sector, deadline, reference number, and requirements.
  5. 5
    Deduplication
    Before saving, each extracted tender is matched against existing records by newspaper + date + page + title fingerprint. Exact duplicates are skipped; near-duplicates are updated.
  6. 6
    Database Save & Log
    New and updated tenders are written to the MySQL database. A ScrapeLog entry is created recording pages, OCR confidence, tender counts, and duration.
How accurate is the extraction?

Accuracy depends on two factors:

  • OCR confidence — English-language papers (Dawn, The News, Business Recorder) consistently achieve 85–95% OCR accuracy on clean newsprint. Urdu papers vary more due to complex script rendering.
  • LLM extraction — Structured fields like tender title, company, and deadline are extracted with high precision for well-formatted notices. Freeform requirement text may be partially summarised.

Pages with OCR confidence below the threshold are skipped automatically to avoid polluting the database with garbled extractions.

Always verify submission deadlines against the original printed notice before acting on them.
How does deduplication work?

When a tender is extracted, the system generates a fingerprint from: newspaper_id + published_date + page_number + normalised_title. If a matching fingerprint already exists in the database, the record is updated (e.g. corrected OCR confidence or additional fields) rather than inserted as a duplicate.

This means re-running a scrape for the same date is safe — it will not create duplicate tender entries.

Using the App

How do I search for tenders?

Go to Tenders in the top navigation. The filter bar supports:

  • Keyword search — full-text search across tender title, summary, and requirements. Uses MySQL boolean full-text search for fast, relevant results.
  • Sector filter — dropdown with all sectors present in the database. Select one to narrow results to a specific industry.
  • Status filter — filter by Active, Closing Soon, or Expired.

Filters combine — you can search for "hospital equipment" in the "Healthcare" sector with status "Active" simultaneously.

What does the Dashboard show?

The Dashboard gives you a real-time snapshot:

  • Active Tenders — total tenders currently open (Active + Closing Soon).
  • Closing Soon — tenders whose submission deadline is within 7 days.
  • Expired — tenders past their deadline, kept for reference.
  • Today's New — tenders published on today's date.
  • Tenders by Sector — bar chart of the top 10 sectors by tender volume.
  • Recent Tenders — the 8 most recently added tenders with quick links.
What are the Deadline Alerts?

The Deadline Alerts page groups active tenders by urgency:

  • Due Today — deadline is today. Act immediately.
  • Due Tomorrow — one day left to prepare.
  • This Week — deadline within 7 days.
  • Next Week — deadline between 8–14 days out.

Tenders without a recorded deadline are counted separately. Check these on the Tenders page to confirm their actual closing date from the original notice.

How do I read the status badges on tenders?

Each tender carries one of four status badges:

  • Active (green, pulsing) — open for submission; deadline is more than 7 days away.
  • Closing Soon (amber) — deadline is within 7 days. Prioritise review.
  • Expired (grey) — deadline has passed. Retained for research and audit purposes.
  • Unknown (blue) — no deadline could be extracted from the original notice.

Statuses are recalculated automatically each day based on the current date vs. the submission deadline.

Manual Scraping

When should I use Manual Scrape?

Use Manual Scrape to:

  • Backfill a specific past date that the auto-scraper missed (e.g. server was down).
  • Re-scrape today's paper after a newspaper corrects a page (OCR may improve).
  • Test a newly configured newspaper before it joins the daily schedule.
  • Run a single newspaper immediately without waiting for 7:00 AM PKT.
How long does a manual scrape take?

Duration depends on the number of pages and the newspaper's image size:

  • Small papers (4–8 pages) — typically 3–6 minutes.
  • Large papers (16–24 pages) — typically 8–15 minutes.

The progress bar shows live stage updates (fetching pagesOCR 1/NLLM 1/Nsaving tenders to DB). You do not need to keep the browser tab open — the job runs in the background.

Use Scrape All to run every active newspaper sequentially for a chosen date. Each newspaper has a 15-minute timeout before the batch moves to the next.
What does "partial" status mean in a scrape result?

Partial means the pipeline started but could not complete fully. Common causes:

  • The newspaper's website returned no pages (e.g. the e-paper for that date hasn't been published yet, or the URL pattern changed).
  • All fetched pages had OCR confidence below the minimum threshold and were skipped.
  • The newspaper timed out during a Scrape All batch run (>15 min for that paper).

A partial run still saves any successfully extracted tenders up to the point of failure.

Auto Schedule

When does the automatic scrape run?

The daily auto-scrape runs at 07:00 AM Pakistan Standard Time (PKT / UTC+5) every day. It processes all newspapers marked as active in the database.

By 7:00 AM most Pakistani newspaper e-papers have published their day's edition, making this an optimal collection time. Results are available in the Tenders list within 15–30 minutes of the run starting, depending on how many newspapers are active.

What happens if the server is offline at 7:00 AM?

The scheduler is configured with a 1-hour misfire grace window. If the server restarts within 60 minutes of 7:00 AM, the missed job will execute automatically on startup.

If the server was offline for more than an hour, you can trigger a manual backfill from the Manual Scrape page by selecting the missed date.

Can the schedule be changed?

Yes. The schedule is set in app/scheduler.py using APScheduler's CronTrigger. Change the hour and minute values and restart the server. A matching entry also exists in pipeline/celery_app.py for Celery-based deployments.

Tender Statuses

What is the difference between Active and Closing Soon?

Both statuses mean the tender is still open for submission:

  • Active — deadline is more than 7 days from today.
  • Closing Soon — deadline is 7 days or fewer from today. This is a visual alert to prompt immediate action.

Statuses transition automatically — an Active tender becomes Closing Soon as its deadline approaches, without any manual intervention.

Are expired tenders deleted?

No. Expired tenders are retained permanently in the database. They are useful for:

  • Researching which organisations issued tenders in a given period.
  • Estimating future tender volumes and patterns by sector.
  • Audit trails and procurement history.

Use the Status: Expired filter on the Tenders page to view them.

Technical Details

What technology powers TenderIntel?

TenderIntel is a fully server-rendered Python web application:

  • Backend — FastAPI (Python 3.11+) with async SQLAlchemy ORM.
  • Database — MySQL with full-text indexes for keyword search.
  • OCR — Tesseract (English + Urdu) with per-page confidence scoring.
  • LLM — Large language model for structured tender field extraction.
  • Scheduling — APScheduler (embedded) + Celery (optional distributed).
  • Image storage — AWS S3 (primary) with local disk fallback.
  • Frontend — Jinja2 server-rendered HTML; no JavaScript frameworks.
  • Stealth browsing — undetected-chromedriver for newspapers requiring JS rendering.
Is there an API available?

Yes. TenderIntel exposes a versioned REST API at /api/v1/. The interactive documentation (Swagger UI) is available at /docs and the OpenAPI schema at /openapi.json.

The API can be used to integrate tender data into other internal systems, Power BI dashboards, or custom notification workflows.

Can the system handle multiple concurrent scrape jobs?

Individual manual scrapes are each run in their own background thread and are fully concurrent. The Scrape All batch runs newspapers sequentially (one at a time) to avoid overloading the server's CPU/memory with simultaneous OCR workloads.

Newspapers that require a stealth browser (Chrome) use a mutex lock to prevent concurrent driver launches, which avoids a Windows file-rename conflict when multiple papers start simultaneously.

Support

Who do I contact if a newspaper stops working?

Contact the Ideazshuttle team at info@ideazshuttle.com. Include the newspaper name, the date you attempted to scrape, and any error message shown on the Manual Scrape page. The team typically responds within one business day.

A tender has incorrect information — how do I report it?

Email info@ideazshuttle.com with the tender ID (visible in the URL of the tender detail page, e.g. /ui/tenders/abc123), the field that appears incorrect, and the correct value from the original notice. Data corrections are applied manually and improve the LLM prompts for future extractions.

Where can I learn more about Ideazshuttle?

Visit ideazshuttlellc.com to learn about Ideazshuttle's full range of AI engineering, agentic systems, and strategic advisory services. Offices in Harrisonburg VA (HQ), Fujairah UAE, and Karachi Pakistan.