Skip to content

lqtri/WebPage-Segmentation--WPS-

Repository files navigation

WebPage-Segmentation--WPS-

Introduction

This is WPS-DB, our webpage segmentation method, different from other method like VIPS, Block-o-matic, we use DB-SCAN instead of K-mean for clustering our data.

Testing for Stack Overflow (Questions tab)

https://stackoverflow.com/questions

Testing for Stack Exchange

https://stackexchange.com

Testing on more pages (using Block-O-Matic's dataset)

Please visit this site to view the results:

https://drive.google.com/drive/folders/1uEAfsyFiR82Vejc26fgoWBLR1VpSaI-b?usp=sharing

Usage

  • Install independencies: pip install -r requirments.txt

  • Run WPS-DB:

    • Download our Jupyter Notebook and run your testing
    • Use command: python WPS_DB_Test.py <your webpage's url>
  • Check your Screenshots folder in the current work directory to see the segmentation layout.

About

Webpage segmentation use DBSCAN

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •