Cleaning of web scraped data with Python

Tarannum, Tasnuva

dc.contributor.advisor	Majumdar, Mahbub Alam
dc.contributor.author	Tarannum, Tasnuva
dc.date.accessioned	2019-07-14T07:13:43Z
dc.date.available	2019-07-14T07:13:43Z
dc.date.copyright	2019
dc.date.issued	2019-04
dc.identifier.other	ID 14101133
dc.identifier.uri	http://hdl.handle.net/10361/12354
dc.description	This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2019.	en_US
dc.description	Cataloged from PDF version of thesis.
dc.description	Includes bibliographical references (pages 24-25).
dc.description.abstract	Today, data expect a basic occupation in individuals' step by step works out. With the help of some database applications, for instance, decision sincerely steady sys- tems and customer relationship the board structures (CRM), accommodating Data or taking in could be gotten from gigantic measures of information. Notwithstand- ing, examinations exhibit that various such applications disregard to work viably. High bore of information is a key to the present business accomplishment. The idea of any sweeping veritable information accumulation depends upon di erent segments among which the wellspring of the information is much of the time the noteworthy factor. It has now been seen that a ridiculous degree of information in most information sources is dingy. Plainly, a database application with a high degree of messy information isn't strong with the ultimate objective of information mining or deciding business understanding and the idea of decisions made depen- dent on such business learning is moreover con icting. In order to ensure high gauge of information, adventures need a system, methodologies and resources for screen and look at the idea of information, theories for foreseeing as Ill as perceiving and xing unsanitary information. This suggestion is focusing on the improvement of information quality in database applications with the help of current information cleaning methods. It gives a conscious and comparative portrayal of the examina- tion issues related to the improvement of the idea of information, and has kept an eye on di erent research issues related to information cleaning. In the underlying fragment of the hypothesis, related composition of infor- mation cleaning and information quality are examined and discussed. Developing this investigation, a standard based logical arrangement of chaotic information is proposed in the second bit of the hypothesis. The proposed logical order compresses the lthiest information types as Ill similar to the reason on which the proposed methodology for grasping the Dirty Data Selection (DDS) issue amid the infor- mation cleaning process was created. This makes us structure the DDS technique in the proposed information cleaning framework delineated in the third bit of the suggestion. This framework holds the most captivating characteristics of existing information cleaning approaches, and improves the capability and feasibility of in- formation cleaning similarly as the dimension of automation in the midst of the information cleaning process. Finally, a great deal of assessed string planning counts are considered and exploratory work has been grasped. Inferred string organizing is a basic part in various information cleaning approaches which has been particularly focused for quite a while. The test work in the recommendation con rmed the clari cation that there is no obvious best framework. It shows that the traits of information, for instance, the proportion of a dataset, the screw up rate in a dataset, the sort of strings in a dataset and even the kind of syntactic oversight in a string will have basic e ect on the execution of the picked frameworks. Similarly, the characteristics of information moreover have sway on the assurance of sensible edge regards for the picked planning counts. The achievements subject to these exploratory results give the key improvement in the structure of "calculation assurance component" in the information cleaning structure, which overhauls the execution of information cleaning system in database applications.	en_US
dc.description.statementofresponsibility	Tasnuva Tarannum
dc.format.extent	25 pages
dc.language.iso	en	en_US
dc.rights	Brac University theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission.
dc.subject	Web scraped data	en_US
dc.subject	Python	en_US
dc.subject.lcsh	Scripting languages (Computer science)
dc.subject.lcsh	Dataindsamling
dc.title	Cleaning of web scraped data with Python	en_US
dc.type	Thesis	en_US
dc.contributor.department	Department of Computer Science and Engineering, Brac University
dc.description.degree	B. Computer Science and Engineering

Files in this item

Name:: 14101133_CSE.pdf
Size:: 866.6Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Thesis & Report, BSc (Computer Science and Engineering) [1480]

Show simple item record