• Login
    • Library Home
    View Item 
    •   BracU IR
    • School of Engineering and Computer Science (SECS)
    • Department of Computer Science and Engineering (CSE)
    • Thesis & Report, BSc (Computer Science and Engineering)
    • View Item
    •   BracU IR
    • School of Engineering and Computer Science (SECS)
    • Department of Computer Science and Engineering (CSE)
    • Thesis & Report, BSc (Computer Science and Engineering)
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Clustering web pages based on doc type structure in a distributed manner

    Thumbnail
    View/Open
    thesis paper.pdf (653.9Kb)
    Date
    2015
    Publisher
    BRAC University
    Author
    Kader, Kazi Samiul
    Nawar, Sagufa
    Ananna, Nusrat Sharmin
    Khan, Sarah
    Metadata
    Show full item record
    URI
    http://hdl.handle.net/10361/4381
    Abstract
    Web page clustering is an important part of modern web technology. By structuring similar web pages together we can find related information, suggest similar choices etc. All modern search engines depend on web page clustering. It is interesting to work on this topic as it presents a novel academic challenge and also practical application. In this thesis we clustered web pages by using the HTML tag structure of web pages. We represented each web page as a vector of tag percentages and clustered them using k-means clustering algorithm and DBSCAN clustering algorithm. We selected k-means and DBSCAN algorithm because they are well known clustering algorithms and also they have not been applied together and compared in the field of web page clustering as we did in this thesis. After clustering on three different category of five websites in three stages, both algorithms produced over minimum 88% accuracy in clustering compared to the original clusters. In this process we used the weka data mining software, because it is well tested in terms of accuracy and efficiency. It is also open source.
    Keywords
    Computer science and engineering; Web page clustering
     
    Description
    This thesis report is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2015.
     
    Cataloged from PDF version of thesis report.
     
    Includes bibliographical references (page 56-58).
    Department
    Department of Computer Science and Engineering, BRAC University
    Collections
    • Thesis & Report, BSc (Computer Science and Engineering)

    Copyright © 2008-2019 Ayesha Abed Library, Brac University 
    Contact Us | Send Feedback
     

     

    Policy Guidelines

    • BracU Policy
    • Publisher Policy

    Browse

    All of BracU Institutional RepositoryCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    LoginRegister

    Statistics

    View Usage Statistics

    Copyright © 2008-2019 Ayesha Abed Library, Brac University 
    Contact Us | Send Feedback