Show simple item record

dc.contributor.advisorMajumdar, Mahbub Alam
dc.contributor.authorTarannum, Tasnuva
dc.date.accessioned2019-07-14T07:13:43Z
dc.date.available2019-07-14T07:13:43Z
dc.date.copyright2019
dc.date.issued2019-04
dc.identifier.otherID 14101133
dc.identifier.urihttp://hdl.handle.net/10361/12354
dc.descriptionThis thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2019.en_US
dc.descriptionCataloged from PDF version of thesis.
dc.descriptionIncludes bibliographical references (pages 24-25).
dc.description.abstractToday, data expect a basic occupation in individuals' step by step works out. With the help of some database applications, for instance, decision sincerely steady sys- tems and customer relationship the board structures (CRM), accommodating Data or taking in could be gotten from gigantic measures of information. Notwithstand- ing, examinations exhibit that various such applications disregard to work viably. High bore of information is a key to the present business accomplishment. The idea of any sweeping veritable information accumulation depends upon di erent segments among which the wellspring of the information is much of the time the noteworthy factor. It has now been seen that a ridiculous degree of information in most information sources is dingy. Plainly, a database application with a high degree of messy information isn't strong with the ultimate objective of information mining or deciding business understanding and the idea of decisions made depen- dent on such business learning is moreover con icting. In order to ensure high gauge of information, adventures need a system, methodologies and resources for screen and look at the idea of information, theories for foreseeing as Ill as perceiving and xing unsanitary information. This suggestion is focusing on the improvement of information quality in database applications with the help of current information cleaning methods. It gives a conscious and comparative portrayal of the examina- tion issues related to the improvement of the idea of information, and has kept an eye on di erent research issues related to information cleaning. In the underlying fragment of the hypothesis, related composition of infor- mation cleaning and information quality are examined and discussed. Developing this investigation, a standard based logical arrangement of chaotic information is proposed in the second bit of the hypothesis. The proposed logical order compresses the lthiest information types as Ill similar to the reason on which the proposed methodology for grasping the Dirty Data Selection (DDS) issue amid the infor- mation cleaning process was created. This makes us structure the DDS technique in the proposed information cleaning framework delineated in the third bit of the suggestion. This framework holds the most captivating characteristics of existing information cleaning approaches, and improves the capability and feasibility of in- formation cleaning similarly as the dimension of automation in the midst of the information cleaning process. Finally, a great deal of assessed string planning counts are considered and exploratory work has been grasped. Inferred string organizing is a basic part in various information cleaning approaches which has been particularly focused for quite a while. The test work in the recommendation con rmed the clari cation that there is no obvious best framework. It shows that the traits of information, for instance, the proportion of a dataset, the screw up rate in a dataset, the sort of strings in a dataset and even the kind of syntactic oversight in a string will have basic e ect on the execution of the picked frameworks. Similarly, the characteristics of information moreover have sway on the assurance of sensible edge regards for the picked planning counts. The achievements subject to these exploratory results give the key improvement in the structure of "calculation assurance component" in the information cleaning structure, which overhauls the execution of information cleaning system in database applications.en_US
dc.description.statementofresponsibilityTasnuva Tarannum
dc.format.extent25 pages
dc.language.isoenen_US
dc.rightsBrac University theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission.
dc.subjectWeb scraped dataen_US
dc.subjectPythonen_US
dc.subject.lcshScripting languages (Computer science)
dc.subject.lcshDataindsamling
dc.titleCleaning of web scraped data with Pythonen_US
dc.typeThesisen_US
dc.contributor.departmentDepartment of Computer Science and Engineering, Brac University
dc.description.degreeB. Computer Science and Engineering


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record