dc.contributor.advisor | Sadeque, Farig Yousuf | |
dc.contributor.author | Hossain, Ariyan | |
dc.contributor.author | Haque, Rakinul | |
dc.contributor.author | Hannan, Khondokar Mohammad Ahanaf | |
dc.contributor.author | Rafa, Nowreen Tarannum | |
dc.contributor.author | Musarrat, Humayra | |
dc.date.accessioned | 2024-06-13T11:33:47Z | |
dc.date.available | 2024-06-13T11:33:47Z | |
dc.date.copyright | 2023 | |
dc.date.issued | 2023-09 | |
dc.identifier.other | ID 20101099 | |
dc.identifier.other | ID 20101290 | |
dc.identifier.other | ID 20101079 | |
dc.identifier.uri | http://hdl.handle.net/10361/23457 | |
dc.description | This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2023. | en_US |
dc.description | Cataloged from PDF version of thesis. | |
dc.description | Includes bibliographical references (pages 60-66). | |
dc.description.abstract | Machine learning has the potential to uncover data biases resulting from human
error when it’s implemented without proper restraint. However, this complexity
arises from word embedding, which is a prominent technique for capturing textual
input as vectors applied in different machine learning and natural language processing
tasks. Word embeddings are biased because they are trained on text data,
which frequently incorporates prejudice and bias from society. These biases may
become deeply established in the embeddings, producing unfair or biased results in
AI applications. There are efforts made to recognise and lessen certain prejudices,
but comprehensive bias elimination is still a difficult task. In Natural Language
Processing (NLP) systems, contextualized word embeddings have taken the place of
traditional embeddings as the preferred source of representational knowledge. It is
critical to evaluate biases contained in their replacements as well since biases of various
kinds have already been discovered in standard word embeddings. Our focus is
on transformer-based language models, primarily BERT, which produce contextual
word embeddings. To measure the extent to which gender biases exist, we apply
various methods like cosine similarity test, direct bias test and ultimately detect
bias through probability of filling MASK by the models. Based on this probability,
we develop a novel metric called MALoR to observe bias. Finally, to mitigate the
bias, we continue pretraining these models on a gender balanced dataset. Gender
balanced dataset is created by applying Counterfactual Data Augmentation (CDA).
To ensure consistency, we perform our experiments on different gender pronouns
and nouns - “he-she”, “his-her” and “male names-female names”. These debiased
models can then be used across several applications. | en_US |
dc.description.statementofresponsibility | Ariyan Hossain | |
dc.description.statementofresponsibility | Rakinul Haque | |
dc.description.statementofresponsibility | Khondokar Mohammad Ahanaf Hannan | |
dc.description.statementofresponsibility | Nowreen Tarannum Rafa | |
dc.description.statementofresponsibility | Humayra Musarrat | |
dc.format.extent | 76 pages | |
dc.language.iso | en | en_US |
dc.publisher | Brac University | en_US |
dc.rights | Brac University theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. | |
dc.subject | Natural Llnguage processing | en_US |
dc.subject | Gender bias | en_US |
dc.subject | Debiasing | en_US |
dc.subject | Continued pretraining | en_US |
dc.subject.lcsh | Natural language processing (Computer science) | |
dc.title | Exploration and mitigation of gender bias in word embeddings from transformer-based language models | en_US |
dc.type | Thesis | en_US |
dc.contributor.department | Department of Computer Science and Engineering, Brac University | |
dc.description.degree | B.Sc in Computer Science
| |