← Back ✉ Email Security May 17, 2026

Real time identification of phishing attacks through machine learning enhanced browser extensions - Nature

Nature Archived May 17, 2026 ✓ Full text saved

Real time identification of phishing attacks through machine learning enhanced browser extensions Nature

Full text archived locally

✦ AI Summary · Claude Sonnet

Abstract Phishing attack continues to rank among the deadliest online threats. They create phony websites in an attempt to obtain personal data. This study offers a framework for a browser extension that uses machine learning to examine URLs and visual components in Google Chrome in order to identify phishing websites in real-time. Using support vector machine (SVM), decision tree (DT), and random forest (RF) algorithms, the suggested system gathers and examines data from websites, extracts hybrid elements including lexical, structural, and visual layout parameters, and arranges them. The best traits that can distinguish between items are found using the grey wolf optimizer (GWO). This reduces computer power consumption and facilitates finding items. GWO enhanced the random forest model, which performed well on benchmark datasets such as the Berkeley ML Archives and PhishTank. On the MCC test, it received a score of 0.96 and had an accuracy rate of 98.7%.This method is used by the Chrome extension to assess URLs for visual similarity in real time and display warnings to users that change according to their actions.The proposed system is better than current anti-phishing solutions because it works better in real time, has a lower false-positive rate, and can handle obfuscated URLs. This project makes a useful, user-centered defense system that can protect against phishing attacks that change over time by using smart security at the browser level. Similar content being viewed by others A hybrid super learner ensemble for phishing detection on mobile devices Article Open access 15 May 2025 Explainable phishing website detection for secure and sustainable cyber infrastructure Article Open access 25 November 2025 Phishing URL detection with neural networks: an empirical study Article Open access 24 October 2024 Introduction Phishing assaults continue to rank among the most prevalent online dangers. They steal users’ personal information by using phony websites that mimic authentic ones. Phishing was the most prevalent kind of cybercrime, accounting for about 30% of all complaints and causing significant financial losses, according to the Internet Crime Complaint Center’s (IC3) 2022 report1. Traditional blacklists are no longer sufficient as phishing websites get increasingly sophisticated. Therefore, in order to spot minute similarities in URL structure, graphic design, and website behavior, contemporary detection systems are depending more and more on machine learning and hybrid feature extraction techniques. People between the ages of 30 and 39 were the most likely to report phishing scams2. The Telephone-operated Crime Survey of England Wales estimates that in 2023, there were 5 million distinct phishing websites. Additionally, 90% of IT workers are still extremely concerned about email phishing, according to the IRONSCALES poll3. The past few years have also seen a discernible increase in phishing schemes. IBM’s comprehensive analysis in 2023 unveiled that 16% of company data breaches were directly due to phishing attacks. As highlighted in several reports, phishing attacks target diverse demographics and exploit various platforms. For example, the IC3 2022 report revealed that individuals aged 30–39 were most commonly affected, while the Anti-Phishing Working Group (AWPG) reported an alarming 5 million distinct phishing sites in 2023. It is first mentioned in the body text, discussing its content (e.g., the findings from various phishing reports), before being presented directly beneath the text.These and other findings are summarized in Table 1 below. As shown in Table 1, phishing attacks have increased significantly over the years, with a notable rise in online scams targeting individuals aged 30–39.(TCSEW) found that those aged 25 to 44 were frequently targeted in these regions. Table 1 Reports elaborates phishing attacks and scams. Full size table Anti-phishing working group (AWPG) data The deceptive practices involve the distribution of email spam with the intent to mislead individuals into revealing confidential information or credentials8. The entities most frequently impersonated in these scams are those with which users regularly interact, such as financial institutions, email services, cloud-based platforms, and entertainment services9. These kinds of events have big effects that go beyond just losing a loved one. They can include stealing money, breaking into systems without permission, and spreading targeted attacks within companies10. These personalized emails11 make it much more likely that someone will successfully breach a system and steal personal data. Picture this: someone gets an email from their bank that looks official and tells them about a serious problem with their account12. The email tells the person to act quickly and sends them to a fake website that looks like the bank’s real gateway. The attackers have full access to the victim’s money because they accidentally gave them their login information on this fake website13. Fig. 1 The alternative text for this image may have been generated using AI. Full size image A multi-phase cyberattack lifecycle showing how exploitation and phishing work. The multi-stage attack chain shown in Fig. 1 starts with phishing, exploiting weaknesses, or using weak passwords to get into a system. This leads to the installation of malware and the theft of credentials. Google Chrome extensions are very useful extras that make the Chrome browser work better when you’re online14. Fig. 2 The alternative text for this image may have been generated using AI. Full size image Extension button in microsoft edge15. Figure 2 shows how Google Chrome works and why it’s so important for improving our web browsing. With these Chrome extensions, developers can use the browser’s powerful design to make new solutions that work for a lot of different people. People like Google Chrome as a web browser because it has a lot of useful tools, like the ability to manage passwords, filter, and translate. AdBlock and uBlock are two examples of add-ons that stop ads from showing up, and LastPass and Dashlane are two examples of password managers. Google Translator is a tool that Google gives you that helps you read things in other languages16.Chrome puts your safety and privacy first with features like HTTPS, which keeps your activity private and safe from prying eyes, and Grammarly.In short, Google Chrome is a better tool that lets people work safely and in a way that meets their needs. It will keep being useful over time17. Our research makes a big step forward in the field by suggesting a Chrome extension that stops phishing using machine learning, URL analysis, and visual resemblance recognition. It gives people a way to protect themselves from changing cyberthreats in real time by connecting academic research with real-world use. Fig. 3 The alternative text for this image may have been generated using AI. Full size image Chrome extension interface for phishing detection. All of the components of the Chrome extension that are meant to identify phishing attempts are shown in Fig. 3. This illustrates how well the entire program functions. The Phishing Detection interface is displayed in Subfigure (a). It displays the current model version, the number of URLs checked, phishing alerts, and the system’s current status. A real-time message that warns users when a website looks suspicious or might be a scam is shown in subfigure (b). One of the options is to “Go Back” or “Continue Anyway.” Subfigure (c) shows the detection history record. This record shows the URLs that were looked at, their classification labels (phishing or legitimate), and how accurate the prediction was. In Subfigure (d), users can use the model configuration options to turn on or off things like automatic model updates, feature selection, and analyzing HTML and URLs. These APIs tell you how the whole system for finding phishing works. They show how to quickly scan and sort URLs, set off alarms, and manage the model in a smart and easy way. This stops phishing attacks from getting to the web browser. Research question We talk about the machine learning methods we used and how we put together information from pictures and URLs in a new way in the methodology section. In the results and discussion sections, we show how our results are better than those of other methods and how they make the results more accurate, precise, and easy to remember. The conclusion emphasizes the study’s extensive implications for cybersecurity and proposes novel research avenues. This study is essential to address the increasing complexity and prevalence of phishing attacks, which pose substantial risks to individuals and businesses worldwide. Even though old ways to find phishing are based on good ideas, they might not work as well in a world where phishing strategies are always changing. This project is great because it makes a Chrome extension that is easy to use by using machine learning optimization and visual similarity. It is better at finding and stopping phishing attacks in real time because it uses more advanced methods like URL analysis and clustering algorithms to sort things. Adding this method as an extension to your browser makes it easy to use.This makes it easy to go from education to the real world. This proposal is a significant step forward for the safety of the internet. It gives individuals the piece of mind and safety they need to utilize it. Literature review People often give criminals private information through phishing, which is when they pretend to be real relationships. People don’t know that these links are fake, so when they click on them, they end up on the attacker’s page instead of the one they wanted to go to. It can be very dangerous for both people and businesses to share important information that you think is true by mistake. People who fall for these scams put their own and the company’s information in danger18. Most of us used to keep a list of fake URLs and domains that were on a blacklist and compare them to links that people clicked on or typed in to find phishing. But this plan had a lot of problems, especially when there weren’t any fake links on the list. This made it easy for people to trick other people. Researchers used machine learning to figure out if URLs were real by looking at their features19. Researchers used a number of different ways to show that the Random Forest algorithm worked very well.Researchers made browser add-ons, like Chrome plug-ins, that use machine learning algorithms to improve cybersecurity in response to the need for better phishing detection20. These add-ons check the validity of URLs right away and let users know if someone is trying to steal their information.These new features have shown promise in lowering the risks of phishing attacks through validation and physical testing. As a result, users are better protected from the growing number of cyberthreats that exist online.There has been a rise in cyberattacks, notably phishing attempts, during the COVID-19 pandemic. While there are plenty of complicated phishing detection solutions easily available, many struggle to recognize fake codes and phishing pages. This could be due to problems like reduced detection precision and a limited ability to adapt to new phishing techniques. Furthermore, wrong conclusions in detection efforts may be due to depending on randomly designated URL-based sorting qualities21. This paper presents a sophisticated detection system and avoidance tool to address the concerns. Via machine learning, particularly unsupervised learning algorithms, this model merges an explosive detection algorithm14. The requirements embedded within the algorithm prioritize URL-based online characteristics frequently abused by attackers to deceive users into accessing counterfeit websites22. Experiments were undertaken in a controlled lab setting using a diverse dataset acquired from platforms such as Phishing Tank and the University of California’s Berkeley Machine Learning archive. A Chrome add-on for phishing detection based on the suggested model is the study’s output. In order to stop phishing attempts before they start, this add-on gives users relevant warnings and doable safety tips to follow while visiting questionable websites. This ingenious malware detection and mitigation method aims to potentially reduce cybercrimes and associated issues by decreasing spam and fake websites23. Phishing attacks are a common non-engineering tactic used by malicious actors to get private user data, such as passwords and usernames, without authorization. Because attackers continuously create new unlisted websites, traditional defense strategies against these attacks, such as maintaining an updated blacklist of well-known phishing websites, have proven unsuccessful24. By assessing URL attributes and segregating them as legitimate or fraudulent, machine learning techniques offer a hopeful avenue to enhance phishing detection through insights drawn from broad-ranging datasets. Deep learning for URL/content signals Recent studies revisit pure-DL pipelines on URLs and page code. Haq et al. show that compact 1D-CNNs over tokenized URLs can outperform classic ML baselines on several benchmarks, highlighting DL’s ability to learn n-gram–like patterns without manual features. Nature’s 2024 empirical study compares deterministic vs. probabilistic neural networks for URL classification and reports gains from uncertainty-aware models, indicating calibration matters for deployment25. Hybrid and ensemble learners (multi-algorithm, stacking) Hybrid “super-learner” ensembles that blend heterogeneous models continue to report state-of-the-art results. Rao et al. (2025) combine diverse ML learners via a super-learner for mobile phishing detection, showing ensembles beat single models on stability and accuracy across datasets. Multiple 2024–2025 works echo this: stacking/voting mixtures of classic ML, DL, and hybrid DL reduce variance and improve robustness on imbalanced data26. A 2025 broad survey further documents that ensembles + modern representation learning dominate malicious-URL detection, recommending careful metric choice beyond accuracy (e.g., PR-AUC, MCC). Multi-feature pipelines (URL + HTML/visual) and browser-side systems Beyond URL strings, several systems integrate source-code/DOM/visual signals. PhiUSIIL (Computers & Security) provides a large, diverse URL dataset with engineered features (e.g., URLTitleMatchScore, TLDLegitimateProb) and similarity-based detection ideas, enabling evaluation of multi-feature approaches. Browser-integrated detectors continue to appear: “NoPhish” (arXiv, 2024) details a Chrome-extension pipeline using ML models; other extension-centric efforts emphasize real-time alerts but often lack rigorous fusion or imbalance-aware optimization27. Table 2 Comparative summary of recent studies on phishing URL detection using multi-feature and ensemble approaches. Full size table Table 2 provides a comparative analysis of recent phishing URL detection studies conducted between 2024 and 2025, highlighting differences in feature utilization, model architectures, fusion strategies, and evaluation metrics. Because it can accurately sort through complex data with multiple dimensions, the Random Jungle approach is a preferred choice for this task. Adding technologies that can detect phishing to people’s online activity through Chrome browser extensions or add-ons is one approach to accomplish this. By instantly alerting users when they come across links that appear dubious, these solutions assist users in staying safe online. They become more vigilant as a result, and their defenses against phishing attacks are strengthened. A number of machine learning-based methods for phishing detection have been proposed in recent research. Table 3 illustrates the differences between several methods of identifying phishing, including what they add, what they don’t, and what they might do in the future. Table 3 Survey of existing works. Full size table Table 3 presents a comparative analysis of recent phishing detection studies from 2023 to 2025, covering classical machine learning, deep learning, and hybrid optimization-based approaches. Each study varies in its feature scope, algorithmic design, and practical applicability across different phishing scenarios (URL-only, hybrid, or email-integrated). Recent optimizer developments (2023–2025) Recent research confirms that the classical grey wolf optimizer (GWO) is an established method for feature selection, with numerous advanced variants created in the last three years. Li et al.38 introduced an adaptive-mechanism GWO that improves the balance between exploration and exploitation in high-dimensional feature spaces, showing better convergence behavior than the standard GWO. Thakur39 proposed an Adaptive-Weight GWO (AWGWO) that dynamically alters the weights of the coefficients to enhance the detection of Android malware. Barik et al.40 also used an optimizer based on deep learning (EGSO-CNN) to find phishing URLs. Pentapalli et al.41 demonstrated that hybrid optimization strategies can significantly enhance decision-making systems pertaining to phishing. Ovabor42 examined hybrid Firefly GWO mechanisms specifically for the classification of phishing.These advancements indicate that optimization research in cybersecurity is evolving toward adaptive, hybrid, and deep-learning-compatible techniques. Accordingly, our work positions the standard Binary GWO not as the newest optimizer but as a lightweight, stable, and computationally efficient method that is uniquely suitable for real-time, browser-embedded phishing detection. To contextualize our design choice, we compare our GWO-based approach against these contemporary optimizers (2023–2025) and demonstrate that while recent variants may offer marginal performance gains in offline or GPU-supported environments, the classical GWO achieves competitive accuracy and MCC with significantly lower computational cost making it more practical for on-device browser extensions. Methodologies Existing research methodologies have some drawbacks, which are described as follows, The existing system has more difficult to distinguish phishing websites from normal websites because phishing websites appear similar to the websites they imitate. Existing systems having more false alarm rates due to the feature values will be the same for both legitimate and phishing websites and do suffer low detection accuracy. The current system is unable to detect whether the visited website’s domain name is similar to a well-known domain name. If this occurs, Spoof Guard will alert visitors to the phishing website if they are not utilizing the usual port. Most of the existing phishing detection system is concentrate the classification based on the URL attributes only. They miss to give more concentration on Visual similarity-based features to identify fake websites by comparing their visual appearance with legitimate websites in terms of content such as page layout, page style, etc. This study did not involve direct interaction with human participants or identifiable personal data; therefore, informed consent was not required. When building models using typical machine learning methods, human talent is needed for the feature extraction and selection process. These steps are completed independently of the classification process and cannot be merged to enhance the model’s performance in a single step. An outline of the problem Phishing attacks have become a critical threat in the digital era, targeting individuals and organizations to steal sensitive data, such as passwords, credit card information, and personal identification details. To deceive consumers into divulging personal information, these assaults employ social engineering, phony websites, and deceptive emails43. Despite improvements in cybersecurity, phishing techniques continue to evolve and become more sophisticated, making it more difficult for existing detection systems to stay up to date. Common defenses against assaults, such as blacklists and heuristic-based models, struggle to keep up with attackers’ evolving strategies. Phishing detection systems that are accurate, dynamic, and real-time are therefore desperately needed in order to safeguard consumers while they are online44. Source phishing email samples After finding samples from different sources, we put together a collection of more than fifty phishing emails for testing and analysis. Most of the samples came from the Berkeley Information Security Office. We also got more samples from the email corpus at our university. Table 4 shows a list of high-risk phrases that are often used in phishing emails, along with their weights. The higher the values, the stronger the links to fake emails. These weights show how likely it is that the word will show up in phishing material. We used text mining to look at the samples for phrases that might be phishing. Table 4 Part of the suspicious word list45,46. Full size table Phishing emails often contain specific words that indicate fraudulent intent. Table 4 lists common phishing-related terms along with their assigned weight, which helps in identifying patterns in phishing content. Text mining Identification techniques used47 The Phishing Email samples were treated as documents, and we ingested them into our analytical approach. More specifically, we devised a computational methodology for determining the phrase incidence-inverse document frequency for every word in the corpus: The tf-idf metric is an analytical tool used for ascertaining how much importance is attached to a term in a corpus or a group of documents expressed in Equation. The Term Frequency reduces the bias of different document lengths by computing the number of occurrences of a particular term in a document and normalizing it by the length of the document. $$\:tf\left(t\right)=\frac{Number\:of\:times\:term\:t\:appears\:in\:a\:document}{Total\:EquationNumber\:of\:terms\:in\:the\:document}$$ (1) On the other hand, the Inverted Document Frequency logarithmically downsizes the high-frequency term and upsizes the value of the low-frequency one to compute the importance of a term throughout the corpus. $$\:idf\left(t\right)=log\left(\frac{Total\:EquationNumber\:of\:documents}{Number\:of\:documents\:with\:term\:t\:in\:it}\right)$$ (2) The following formula was used to get the tf-idf score: df(t) = \log{\frac{{Total \number \of \document}}{{Number \of \documents \with \term \t \in \it}}} \] We used the following method to get the tf-idf score for each word to balance term frequency with document importance: $\:tf\_idf\left(t\right)=tf\left(t\right)\times\:idf\left(t\right)$. According to Fig. 2, it was possible to identify and prioritize terms characteristic of phishing in the corpus, which later became the basis for further analysis and development of the model for detecting phishing. The proposed System starts from Phishing Website Dataset. This dataset contains 80,000 instances and in this, 50,000 is the no. of legitimate website instances and 30,000 is the no. of phishing website instances. And index sql file contains the website link, Result (0: legitimate, 1: phishing), and HTML file names (instances)48. After that, the visual features will be extracted from visual blocks. Each visual block can be regarded as a rectangle. We will extract the position coordinates of the top left corner of the rectangle and extract the length and width of the rectangle. On the other hand, the components, such as URL, Protocol, subdomain, primary domain, domain, base URL and hostname, etc. will be extracted. Fig. 4 The alternative text for this image may have been generated using AI. Full size image Part of the suspicious word list. Figure 4 illustrates the distribution of suspicious term weights computed using the Term Frequency–Inverse Document Frequency (TF–IDF) model for phishing emails. The bar chart highlights how specific terms exhibit higher relevance in phishing contexts compared to regular communication. The words with the greatest TF–IDF scores are “Request” (weight = 2.657), “Dear” (weight = 1.000), and “Subject” (weight = 0.988). This shows that they are good at spotting phishing emails because they are used a lot in those types of emails. Although TF-IDF remains a widely used baseline for lexical feature extraction, recent information-retrieval (IR) research (2022–2025) has shifted toward embedding-driven and transformer-based representations such as BERT, Sentence-BERT (SBERT), DistilBERT, SimCSE, and contrastive retrieval models. These modern techniques capture contextual semantics, paraphrasing relationships, and subtle lexical variations far more effectively than TF-IDF, making them increasingly popular in phishing-URL analysis and malicious-content detection. Several recent phishing-detection studies also integrate hybrid deep-retrieval pipelines and optimization-enhanced embedding models to improve semantic understanding of misleading URLs and webpage text. However, such approaches typically require significant computational resources, GPU acceleration, and large memory footprints, which make them unsuitable for deployment inside lightweight, browser-based phishing detection systems. In this work, we enhance the classical TF-IDF signal by integrating it with MCC-optimized feature selection, visual similarity cues, and URL-based statistical features, enabling it to perform competitively without the computational overhead associated with modern embedding architectures. Visual feature extraction The recommended way to find phishing sites depends a lot on how you get visual features. It uses tricks that are based on how things look to help the system find fake websites that look like real ones. A headless browser like Selenium with ChromeDriver goes to each site first. They then look at the HTML and the pictures. The CSS and DOM layout show you the different parts of the homepage. We check out where each block is, which has different kinds of content like logos, input forms, and navigation bars. This includes where it is, how big it is, and what shape it is. These shapes show what the page looks like. People look for logos and other things using perceptual hashing (pHash) and template matching. After that, color histograms, background gradients, and font styles are used to find problems with the way things look.The Cosine Similarity and the Structural Similarity Index (SSIM) are two ways we can see how similar the test and real templates are. The Grey Wolf Optimizer (GWO) is used to improve the features that were found.These features include location, color, style, and similarity metrics. These traits are put into one feature vector. These visual features are finally combined with URL- and content-based properties to make a complete feature matrix that makes the model more robust and easier to understand. This integrated approach makes the system better at giving accurate and easy-to-understand detection results, makes it more resistant to zero-day phishing attacks, and catches subtle visual mimicking that other methods often miss. Fig. 5 The alternative text for this image may have been generated using AI. Full size image Phishing detection procedure using visual feature extraction. The progressive workflow of the visual feature extraction procedure utilized in the suggested phishing detection framework is depicted in Fig. 5. Webpage rendering and screenshot capture are the first steps in the process, which involves processing each website in a controlled setting to extract both HTML and visual data. After that, the page is divided into visual blocks according to the CSS layout and Document Object Model (DOM). To obtain the structural and visual characteristics of the webpage, two studies are conducted simultaneously: positional attribute extraction and color and style analysis. These result in layout similarity measurement, which examines how closely the layout resembles those of reliable websites, and item or logo matching. To create a complete collection of features, the data is transformed into feature vectors, which are subsequently joined with URL- and content-based features. To more precisely identify phishing sites, this gathered data is then fed into the GWO-optimized classifier. The image illustrates how several structural and visual analysis layers cooperate to improve the detection model’s precision and dependability. Feature selection with grey Wolf optimizer (GWO) When clustering relevant websites, we look at things like HTML and JavaScript features (like website forwarding, on mouseover, right-click, etc.), domain-based features (like DNS, website traffic, page rank, etc.), address bar features (like having an IP address, URL length, having a @ symbol, etc.), and abnormal-based features (like Request URL, Links in tags SFH, etc.). Then, a Grey Wolf Optimizer (GWO) feature selection method based on interpolation will be used to choose the required features from the retrieved features. We used a binary GWO to find a small group of discriminative features from the fused representation. The optimizer finds a mask that keeps high-value features by finding a balance between validation F1 and subset size. GWO kept making smaller subsets with pack sizes of 25 and 60 iterations without hurting detection performance. The next step is word embedding. Feature selection was essential to reduce redundancy, improve classification efficacy, and avert overfitting, given the high dimensionality of the retrieved features. To choose the best feature subset, a binary Grey Wolf Optimizer (GWO) was used. $$\:\overrightarrow{D}=\overrightarrow{C.\:\:{X}_{p}}\left(t\right)-\overrightarrow{X\left(t\right)}$$ (3) $$\:\overrightarrow{X(t+1)=}{X}_{p}\left(t\right)-\overrightarrow{A.\:D}$$ (4) Here: $$\:{\text{X}}_{\text{p}}\left(\text{t}\right)=position\:of\:the\:prey\:\left(best\:solution\:found\:so\:far\right)$$ $\:{\text{X}}_{\text{t}}$=current position of a grey wolf $$\:\overrightarrow{\text{A}\:}=\overrightarrow{\text{C}}=\text{c}\text{o}\text{e}\text{f}\text{f}\text{i}\text{c}\text{i}\text{e}\text{n}\text{t}\:\text{v}\text{e}\text{c}\text{t}\text{o}\text{r}\text{s}$$ $\:\overrightarrow{\text{D}=}$Hunting behavior. The grey wolf optimizer (GWO) algorithm is inspired by the social hierarchy and hunting mechanism of grey wolves in nature. It has been successfully applied for solving complex optimization problems due to its strong exploration exploitation balance and ability to converge efficiently toward optimal solutions. In this research, GWO is used to select an optimal subset of discriminative features from the combined feature space (URL, domain, HTML, and visual features) to enhance phishing detection performance49. We select Grey Wolf Optimizer (GWO) for wrapper-based feature selection due to its low hyper-parameter footprint and strong exploration exploitation balance in binary FS settings. Prior studies demonstrate that GWO (including its binary and hybrid forms) attains competitive or enhanced accuracy with smaller subsets relative to PSO/GA, while concurrently minimizing tuning overheads. Recent updates from 2024 to 2025 have made convergence and subset compactness even better, proving that FS is still at the cutting edge. So, we use a binary GWO wrapper that optimizes a class-imbalance-aware objective (MCC) instead of accuracy. Why standard GWO was selected In the past three years, many improved GWO variants and hybrid meta-heuristics have been released. However, our system is meant to be used as a real-time browser extension, which means that it has strict limits on memory use, latency, and computational overhead. Recent studies, such as Adaptive-Mechanism GWO (Li et al., 2025), Adaptive-Weight GWO (Thakur, 2024), hybrid Firefly–GWO models for phishing detection (Ovabor, 2024), and deep-learning-driven optimization pipelines (Barik et al., 2025), illustrate that an enhanced exploration-exploitation balance can yield performance improvements. But these more advanced versions usually need more complicated hyperparameter tuning, bigger population sizes, longer search cycles, or environments that support GPUs, which are not possible to run on the client side in Chrome. The standard binary grey wolf optimizer (GWO), on the other hand, strikes the best balance between accuracy and ease of use. It consist: Minimal hyperparameters (pack size and iterations), ensuring reliability across devices. Stable convergence behavior on mixed-type phishing features (URL lexical, visual, DOM-based). High MCC performance even under moderate iteration budgets (≤ 60). Low inference latency, making it suitable for on-page real-time classification (< 50 ms). To ensure fairness, Section X includes a comparative discussion referencing optimizers proposed between 2023 and 2025, demonstrating that although these enhanced variants may outperform classical GWO under offline, high-resource conditions, the standard Binary GWO achieves competitive performance with significantly lower computational cost. This makes it a more practical and reproducible choice for real-world browser-integrated phishing detection. Institutional approval for the research was obtained, and all methods were performed in accordance with the relevant guidelines and regulations. Implementation workflow Feature extraction Extract URL, domain, and visual-based features from 80,000 website samples (50,000 legitimate and 30,000 phishing). Feature encoding Normalize feature values and convert categorical attributes into numeric format. Initialization Set population size $\:N=25$(wolves). Initialize random binary feature masks (1 = selected, 0 = removed). Define iteration count $\:T=60$. Fitness evaluation Compute the fitness function for each feature subset based on: $$\:Fitness={w}_{1}\left(1-{F1}_{score}\right)+{w}_{2}\left(\frac{S}{F}\right)$$ (5) Where $\:{F1}_{score}$ measures classifier accuracy, and ∣S∣/∣F∣ penalizes larger subsets. Position update Wolves adjust feature subsets using α, β, and δ guidance until convergence. Termination and selection The process continues until maximum iterations or no significant improvement is observed. The final selected subset (α-wolf) is used for model training. Model integration The grey wolf optimizer (GWO) algorithm is based on how grey wolves hunt and how they organize themselves in groups. It has been effectively utilized for addressing intricate optimization challenges owing to its robust exploration-exploitation equilibrium and capacity to converge efficiently towards optimal solutions. This study employs GWO to identify an optimal subset of discriminative features from the integrated feature space (including URL, domain, HTML, and visual features) to improve phishing detection efficacy50. We choose Grey Wolf Optimizer (GWO) for wrapper-based feature selection because it has a small hyper-parameter footprint and a good balance between exploration and exploitation in binary FS settings. Previous research indicates that GWO (along with its binary and hybrid variants) achieves competitive or superior accuracy with smaller subsets in comparison to PSO/GA, while also reducing tuning overheads. Recent variants from 2024 to 2025 make convergence and subset compactness even better, showing that FS is still cutting-edge. So, we use a binary GWO wrapper that directly optimizes a class-imbalance-aware objective (MCC) instead of accuracy.Since phishing features include heterogeneous numeric and categorical attributes, a single kernel function struggles to represent all decision boundaries effectively51,52. Moreover, SVM tends to be sensitive to noise and irrelevant features, leading to suboptimal generalization in high-dimensional, mixed-type datasets.Therefore, the superior performance of the Random Forest can be theoretically attributed to its ensemble-based variance reduction, robustness to noise, and ability to model complex nonlinear relationships. This aligns with ensemble theory and prior studies indicating that Random Forest classifiers often achieve higher stability and lower generalization error compared to single learners like DT or margin-based models like SVM53,54. Visual similarity feature extraction (reproducible specification) Given a suspect page $\:S$and a legitimate reference page $\:R$for the same brand/domain, compute a vector of layout, appearance, and content similarities that is robust to minor styling changes but sensitive to brand-spoofing. Preprocessing & alignment Viewport: fixed 1366 × 768 px, deviceScaleFactor = 1. Screenshot: full-page; crop the above-the-fold 1366 × 900 px region. Normalization: convert to RGB; gamma 2.2; resize both to $\:H\times\:W$(e.g., $\:900\times\:1366$). DOM strip: remove < script>, < style>, hidden nodes (display: none, opacity:0). To reduce viewpoint drift, estimate an affine alignment from keypoints (below) and warp $\:S\to\:\stackrel{\sim}{S}$so it best overlays $\:R$. Multicue similarity features I. Layout grid & element geometry (structure). Partition the page into a $\:G\times\:G$grid (default $\:G=6$). For each cell $\:c$: Block density $\:{b}_{c}$: ratio of rendered pixels belonging to visible DOM boxes. Text density $\:{t}_{c}$: OCR character count / area (see § 3). Dominant tag map $\:{m}_{c}$: one-hot of {logo, nav, hero, form, footer, other} from rule-based heuristics. Features: $$\:\text{L2-BD}=\frac{1}{{G}^{2}}\sum\:_{c}a({b}_{c}^{S}-{b}_{c}^{R}{)}^{2},\text{L2-TD}=\frac{1}{{G}^{2}}\sum\:_{c}b({t}_{c}^{S}-{t}_{c}^{R}{)}^{2}$$ (6) $$\:\text{Jacc}\text{-Tag}=\frac{\mid\:{M}^{S}\cap\:{M}^{R}\mid\:}{\mid\:{M}^{S}\cup\:{M}^{R}\mid\:}$$ (7) where $\:M$is the set of cells labeled with same dominant tag. $\:{b}_{c}^{S}$ = total number of grid cells or feature groups. $\:{b}_{c}^{S}$and $\:{b}_{c}^{R}$= boundary-related feature representations of the Source (S) and Reference (R) domains at cell $\:c$. $\:{t}_{c}^{S}$and $\:{t}_{c}^{R}$= texture-related feature representations for the same domains. $\:a$and $\:b$= weighting coefficients controlling the relative contribution of each feature term in the respective loss. $\:{M}^{S}$ = the set of predicted tags or feature maps from the Source (S) domain. $\:{M}^{R}$= the corresponding set of reference tags or feature maps from the Reference (R) domain. II. Appearance -global & local. Color histogram similarity (HSV, 32 × 32 × 8 bins): $$\:\text{HistSim}=\frac{{\sum\:}_{i}a\:\text{m}\text{i}\text{n}({h}_{i}^{S},{h}_{i}^{R})}{{\sum\:}_{i}b{h}_{i}^{R}}$$ (8) Here: $\:{h}_{i}^{S}$= histogram value (or normalized bin frequency) for bin $\:i$in the Source (S) distribution. $\:{h}_{i}^{R}$= histogram value for the same bin $\:i$in the Reference (R) distribution. $\:a$and $\:b$= weighting coefficients that balance numerator and denominator contributions. $\:i=\text{1,2},\dots\:,N$where $\:N$is the total number of histogram bins. Perceptual hash (pHash/dHash, 64-bit): Hamming distance $\:{d}_{H}$; use $\:1-{d}_{H}/64$. SSIM on luminance $\:Y$: mean SSIM over 3 × 3 tiles. Edge-HOG: HOG (cell 8 × 8, 9 bins); cosine similarity of HOG vectors. III. Keypoint logo/visual anchor matching. Detector: ORB (n = 1500, FAST threshold 20). Descriptor match: Hamming, Lowe ratio $\:<0.75$; estimate homography $\:H$with RANSAC (max reprojection 4 px). Features: Match count $\:M$, inlier ratio $\:{M}_{\text{in}}/M$. Mean reprojection error $\:{e}_{\text{proj}}$. Logo region similarity: detect brand logo in $\:R$via template (max-normed cross-corr); warp with $\:H$; compute SSIM in that ROI. IV. Form & CTA semantics. Extract bounding boxes of < input>, < button>, < a > with role = button from the DOM. Compute earth mover’s distance (EMD) between the 2D distributions of form elements’ centers (normalized coordinates). Compare placeholder/text strings via TF-IDF cosine; include #fields difference. V. OCR text consistency. OCR both pages (Tesseract, English + brand language). Build bigram TF-IDF vectors from above-the-fold text; cosine similarity $\:\text{c}\text{o}\text{s}({\mathbf{v}}_{S},{\mathbf{v}}_{R})$. Penalize brand-name edits via Levenshtein distance within logo/header zones. VI. DOM tree shape. Serialize visible DOM to unordered multiset of tag 3-grams along ancestor paths (e.g., header > nav > a). Similarity = Jaccard of these 3-grams; report Tree-Edit Approx via normalized Levenshtein on tag sequences at depth ≤ 4. Feature vector and scaling Concatenate all scalars into $\:{\mathbf{x}}_{vis}\in\:{\mathbb{R}}^{k}$(typ. $\:k\approx\:30\text{\hspace{0.17em}⁣}-\text{\hspace{0.17em}⁣}60$). Apply robust scaling (median/IQR) fitted on training data. Missing cues (e.g., OCR fail) filled with feature-wise medians + a binary “missing” indicator. Decision features (for fusion) For late fusion, we pass calibrated probabilities from a visual-only model $\:{p}_{\text{visual}}$trained on $\:{\mathbf{x}}_{vis}$(RF + Platt scaling). The fusion meta-learner consumes $\:[{p}_{\text{URL}},{p}_{\text{visual}},{\mathbf{x}}_{meta}]$, where $\:{\mathbf{x}}_{meta}$may include the top-k visual cues (e.g., SSIM, pHash, inlier ratio). Thresholds & calibration Calibrate $\:{p}_{\text{visual}}$with isotonic on a held-out set. Choose thresholds to meet FP rate ≤ 1% while maximizing MCC (reported in Results). Chrome extension development for phishing prevention This approach could be easily adapted to a multitude of websites with machine learning techniques to find potentially dangerous links, and the discovered links can be identified with a CSS attribute and then subsequently shown on the page. The model can use real-time web-based information such as domain name, SSL certificate, web traffic, and web hosting provider to verify a URL. The study aims to uncover any links that a hacker might have inserted into a website to steal user credentials or infect the victim’s computer, with malware. These suspicious links have been categorized using a machine-learning method. Moreover, a list of URLs is. Links with very faint visibility and specific spacing characteristics are visible on the webpage. Overview of Technology Algorithm for Random Forest; A type of collective learning approach that combines decision trees into one entity is known as a forest. By selecting features from the dataset and using them to split the data into subsets each tree, in a random forest is constructed independently. This element of unpredictability helps prevent over-fitting and improves the model’s generalization capabilities. When making predictions the random forest process combines forecasts from each tree to arrive at a prediction. The blend of trees enhances the model’s accuracy. reduces variance. After testing algorithms, we opted for using the forests algorithm on our data sample and developing a Chrome extension, for it. Clustering approach Before feature optimization and classification, a clustering mechanism is applied to organize websites into meaningful groups based on their similarities in URL structure, content, and visual features. Purpose of clustering The main goal of clustering is to group websites together based on how they act and how they are set up so that they are alike. For example, phishing sites often have too many subdomains, URLs that are too long, or layouts that are very similar to each other. By putting the sites together, the model can find hidden connections that might not be clear when looking at each site on its own. Technique used We used the K-Means algorithm, which is a type of machine learning that doesn’t require supervision, to split the dataset into k groups. A cluster is a group of websites that have similar features. The Elbow Method, which finds a balance between compactness and separation of clusters, was used to choose the number of clusters (k) based on real-world data. Process overview Step 1: Feature Standardization: All URL-based, content-based, and visual features are normalized to a common scale to ensure fair distance computation. Step 2: Centroid Initialization: The algorithm randomly initializes k centroids in the multi-dimensional feature space. Step 3: Assignment Step: Each website instance is assigned to the nearest centroid based on Euclidean distance. Step 4: Update Step: The centroids are recalculated as the mean of all instances assigned to that cluster. Step 5: Steps 3 and 4 are repeated until convergence (i.e., when centroid movement becomes minimal). Fig. 6 The alternative text for this image may have been generated using AI. Full size image The architecture of the suggested chrome extension for phishing detection. Figure 6 shows how the suggested Chrome extension works to find phishing. the user browser environment is where the process begins. The system keeps track of the URL and webpage metadata in real time while the user is browsing. The feature extraction module looks at both visual and URL-based features after it gets this information. After that, the grey wolf optimizer (GWO)-based classifier looks at these features and picks out the ones that matter the most. Then, it tries to find out if the site is real or a scam that wants to steal your personal information. Fig. 7 The alternative text for this image may have been generated using AI. Full size image System architecture of the proposed phishing detection framework. Figure 7 depicts what the full phishing detection system that GWO has developed looks like. The architecture has a number of different parts. The first layer is made up of URLs, user activities, webpage content, and visual and behavioral elements. A classification module employs machine learning methods like Decision Tree, SVM, and Random Forest to detect the difference between a real website and one that wants to deceive you into giving them your personal information.The final decision output is transmitted to the Chrome Extension Output Interface, which provides users with real-time phishing alerts and website legitimacy status directly within the browser environment. The design ensures low latency, high accuracy, and seamless user interaction for proactive phishing prevention. Fig. 8 The alternative text for this image may have been generated using AI. Full size image Architecture of the proposed phishing detection framework. Figure 8 illustrates the end-to-end workflow of the proposed phishing website detection framework implemented as a Chrome browser extension. The system integrates URL and visual feature extraction, feature selection using the Grey Wolf Optimizer (GWO), and multi-model classification for phishing detection. Justification of GWO hyperparameters (pack size = 25, iterations = 60) We selected GWO’s pack size $\:N$and iteration budget $\:T$ to balance convergence quality with training time. Because wrapper FS evaluates a learner at every fitness call, $\:N$and $\:T$drive cost roughly as $\:O(T\cdot\:N\cdot\:d)$where $\:d$is the feature count. The binary-GWO wrapper maximizes an imbalance-aware score with sparsity control: $$\:\underset{S\subseteq\:F}{\text{m}\text{a}\text{x}}\text{\hspace{0.25em}\hspace{0.05em}}J\left(S\right)=\text{MCC}\left(S\right)-\lambda\:\cdot\:\frac{\mid\:S\mid\:}{\mid\:F\mid\:},$$ (9) F = the full feature set, where $\:\mid\:F\mid\:$denotes the total number of available features. $\:S\subseteq\:F$= a selected subset of features. $\:\text{MCC}\left(S\right)$= the Matthews Correlation Coefficient obtained using the subset $\:S$. $\:\lambda\:$= the regularization coefficient (penalty term) controlling the trade-off between model accuracy and feature subset size. with $\:\lambda\:\:$set small (e.g., 0.01–0.05) to prefer compact subsets when predictive utility is tied. Protocol. We ran a budgeted grid over $\:N\in\:\left\{\text{10,25,40}\right\}$and $\:T\in\:\left\{\text{40,60,100}\right\}$using stratified 5 × 2 CV on the training split, early stopping (patience = 10 iterations without MCC improvement), and a fixed random seed set $\:\{0,\dots\:,4\}$. We then applied the one-standard-error rule: choose the smallest $\:(N,T)$whose median MCC is within one standard error of the best run. $\:N=25,T=60\:$reached ≥ 99% of the best median MCC while cutting runtime by ~ 30–40% versus $\:N=40,T=100$. Larger $\:N$or $\:T$yielded diminishing returns (negligible MCC deltas but materially higher time). Early-stopping triggered in most folds before 60 when $\:N=25$, indicating adequate exploration–exploitation balance. Table 5 Performance of the MCC-optimized grey Wolf optimizer (GWO) under different pack sizes and iterations. Full size table Table 5 summarizes the experimental evaluation of the MCC-optimized Grey Wolf Optimizer (GWO) feature selection framework across varying pack sizes ($\:N$) and iteration counts ($\:T$). The results report both Median and Mean MCC values (with standard deviation), the percentage of features retained, and the average training time per run.As observed, increasing the pack size and iteration count generally enhances exploration and convergence stability, leading to marginally higher MCC values. However, larger configurations (e.g., $\:N=40,T=100$) show diminishing returns beyond moderate iteration levels, indicating that GWO achieves near-optimal performance with smaller packs and fewer iterations (e.g., $\:N=25,T=60$).We fix $\:N=25$and $\:T=60$as the smallest configuration within one standard error of the best MCC, yielding competitiv

💬 Team Notes