Back to Blog

Combatting WordPress Form Spam: Effective Solutions

Security
May 22, 2024

Share this post

Combatting WordPress Form Spam: Effective Solutions

Against Evolving Cybersecurity Challenges

Today, while constructing a high-performance WordPress site is paramount, ensuring robust security against spambots and injection attacks is equally critical for site owners. High-performance sites not only deliver optimal user experiences but also attract more visitors and support business objectives effectively. However, without adequate security measures, such as implementing effective anti-spam solutions and fortifying defenses against cyberattacks, sites can be vulnerable to malicious activities that compromise user data and undermine site credibility.

For those who experienced the early days of the dot-com era from the 1990s to the year 2000, the World Wide Web was vastly different from what we know today. Terms like spam or malware were virtually non-existent, and the internet served primarily as a tool for navigation and accessing publicly available information. Email inboxes were used for correspondence without the interference of spam filters.

Fast forward to the present day, and our inboxes are inundated with spam emails, whether it’s our work email or personal accounts like Gmail or Microsoft.
Despite the immense investments made by major companies like Google and Microsoft to protect their users from spam, these unwanted messages still find their way into our inboxes.

A Target for Spambots and Bad Actors

Given the popularity of WordPress as a CMS, it often becomes a target for bots and malicious actors seeking vulnerabilities in themes and plugins. Spambots, in particular, aim to spam comment sections and contact forms. Common anti-spam solutions like Google Captcha and Akismet are frequently used to mitigate spam. However, these measures are not foolproof, as bots continually evolve and become more sophisticated, outpacing the capabilities of many anti-spam plugins.

Based on our experience, the majority (70-80%) of form spammers fall into the category of basic script form spammers. These spammers typically fetch the form once and then initiate a looping submission process. They lack recorded page visits and do not execute JavaScript, employing simplistic methods to spam forms.

When we launched this site, we made a conscious decision to forego the use of form plugins in our Contact us form creation.While it’s entirely feasible to achieve functionality using CSS, HTML, and PHP, it does demand additional time and effort. The rationale behind this approach is straightforward: it grants us greater control over our form’s construction.

That being said, it’s not a matter of being opposed to plugins. On the contrary, plugins like Contact Form 7 or Gravity Forms are fantastic tools, offering an array of built-in features that streamline form creation on WordPress platforms effortlessly.

Building The Contact Us Form

Since we built our form from scratch, we would need to take some fundamental security measures into considerations. These include securing the form and sanitizing input fields using nonces.

In WordPress, a nonce (number used once) is a security feature used to protect against certain types of malicious attacks, particularly CSRF (Cross-Site Request Forgery) attacks. A nonce is essentially a token that is generated and associated with a particular action or request. It ensures that the action being performed originates from the expected user and not from a malicious third party.

WordPress uses nonces primarily in forms and URLs that perform sensitive actions, such as saving settings, deleting posts, or updating user information. These nonces are included as hidden fields in forms or as parameters in URLs. When the form is submitted or the URL is accessed, WordPress verifies that the nonce provided matches the one it expects for that specific action. If the nonce is valid, the action is allowed to proceed; otherwise, the request is rejected.

On top of that we would also need to use regular expressions (regex) to validate email addresses, ensuring data integrity, protection against malicious inputs, and smooth user experience.

1. Ensuring Data Integrity

Format Validation: Regex can verify that the input matches the standard format of an email address (e.g., [email protected]). This ensures that the data stored in your system conforms to expected patterns, reducing errors and inconsistencies.

Consistency: By enforcing a specific format, regex helps maintain consistent data across your application, making it easier to manage and process email addresses.

2. Security

Protection Against Injection Attacks: Proper email validation with regex helps prevent injection attacks, such as SQL injection or email header injection. Malicious inputs often exploit poorly validated fields to inject harmful code. A robust regex pattern can reject such inputs by only allowing characters that are valid in email addresses.

Mitigation of Spam and Phishing: Validating email addresses can reduce the likelihood of accepting addresses from automated scripts (bots) used for spamming or phishing. While regex alone won’t fully prevent such activities, it’s a crucial first step in a broader security strategy.

3. User Experience

Immediate Feedback: By validating email addresses on the client side (e.g., using JavaScript), you can provide immediate feedback to users, helping them correct mistakes before submission. This improves the user experience and reduces frustration.

Reduction in Failed Communications: Ensuring that email addresses are valid and correctly formatted reduces the chances of failed email communications, such as bounced emails, which can affect both user experience and your application’s functionality.

When combining all of that, it looks something like this:


  // Check if the form has been submitted
    if ($_SERVER['REQUEST_METHOD'] == 'POST' && isset($_POST['submit'])) {
        try {
            // Initialize rate limiter and limit requests
            $rateLimiter = new RateLimiter();
            $rateLimiter->limit($_SERVER['REMOTE_ADDR']);

            // Check nonce for security
            if (!isset($_POST['contact_nonce']) || !wp_verify_nonce($_POST['contact_nonce'], 'submit_contact')) {
                $formMessage = "Nonce verification failed. Please try again.";
            } elseif (!empty($_POST['middlename'])) {
                // Honeypot field check - should be empty
                $formMessage = "Bot detected.";
            } else {
                // Sanitize and validate input data
                $fname = sanitize_text_field($_POST['fname']);
                $lname = sanitize_text_field($_POST['lname']);
                $email = sanitize_email($_POST['email']);
                $phone = sanitize_text_field($_POST['phone'] ?? '');
                $company_website = sanitize_text_field($_POST['company_website']);
                $business_type = sanitize_text_field($_POST['business_type']);
                $message_content = sanitize_textarea_field($_POST['message_content']);

                // Validate email with regex
                if (!preg_match('/^[a-zA-Z0-9][a-zA-Z0-9._-]*[a-zA-Z0-9]@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/', $email)) {
                    $emailError = 'Please check your email address.';
                } else {  //  Continue with your form processing

We were thinking of using Google Recaptcha since it has been around and it’s known for its anti-spam protection. However, it’s not a viable choice in GDPR-compliant scenarios and user experience.
Turnstile emerges as an outstanding alternative to Google reCAPTCHA.

Turnstile is available at no cost and can be integrated into your website regardless of whether you’re using Cloudflare or not. Unlike conventional CAPTCHA systems, Turnstile is invisible mode, which eliminates the need for users to solve bothersome puzzles ensuring a seamless user experience. Also, Turnstile is lightweight, adheres to GDPR regulations, and operates without relying on cookies.

Adding Extra Security Layers

Given the sophisticated nature of spambots, we were further tightening our Contact form security with both honeypot and rate-limiting methods, also referred to as throttling.

Honeypots serve as hidden form fields within web forms, strategically placed to intercept spam submissions. These fields are embedded in the form’s HTML code but are invisible to human users, often through the use of CSS to hide them from view. The concept relies on the fact that legitimate users will never see or interact with these hidden fields. However, spambots, which typically fill out all fields in a form indiscriminately, will inadvertently populate the honeypot fields.

When the form is submitted, our server-side processing script checks these honeypot fields. If it detects that any of these fields contain data, it recognizes the submission as likely spam and discards it immediately. This method is effective against many basic spambots that do not differentiate between visible and hidden fields, thereby reducing the number of unwanted submissions.

Despite their effectiveness, honeypots have limitations. More advanced spambots, or automated tools like XRumer, are designed to recognize and bypass honeypot fields by mimicking human behavior more closely. These sophisticated tools can ignore hidden fields, rendering honeypots less effective.

To address this limitation, we employed server-side rate-limiting measures to prevent abuse of our form submission system. Our rate-limiting function, also known as throttling, helps control the number of requests allowed from a single IP address within a specified time period. This prevents spammers or malicious actors from overwhelming our system with excessive submissions.

Controlling Form Submission Limit

The rate-limiting basically operates by tracking the number of requests made by each IP address using a distributed key-value store, such as Redis. When a request is received, the function increments a counter associated with the requester’s IP address. If the number of requests exceeds a predefined threshold within a set time window (typically measured in seconds or minutes), the function triggers a rate limit exceeded error.

Using Redis to store key-value pairs, such as IP addresses, offers several advantages over using PHP sessions for rate-limiting:

1. Performance

Speed: Redis is an in-memory data store, which means it can read and write data extremely quickly compared to PHP sessions that typically rely on file storage or database systems.

Efficiency: Its efficient handling of in-memory operations makes Redis ideal for high-throughput scenarios like rate-limiting.

2. Scalability

Distributed Nature: Redis can be distributed across multiple servers, allowing for horizontal scaling. This is crucial for handling large volumes of traffic and maintaining consistent performance.

Consistency Across Multiple Servers: In a distributed system, using Redis ensures that the rate-limiting logic is consistent across all servers handling requests, whereas PHP sessions are typically stored locally or in a session management system that may not scale as effectively.

3. Data Persistence and Expiration

Built-in Expiry: Redis natively supports setting expiration times for keys, which is useful for implementing rate-limiting by automatically expiring the count of requests after a specified period.

Persistence Options: While primarily an in-memory store, Redis can also persist data to disk, providing a balance between speed and durability.

4. Ease of Use and Flexibility

Simple API: Redis provides a straightforward API for setting, getting, and managing key-value pairs, making it easy to implement and maintain rate-limiting logic.

Advanced Data Structures: Redis supports various data structures like lists, sets, and hashes, which can be useful for more complex rate-limiting strategies.

5. Robustness

Atomic Operations: Redis operations can be atomic, ensuring that read-modify-write cycles (like incrementing request counts) are safe and free from race conditions.

High Availability: Redis supports replication, clustering, and failover mechanisms, enhancing the reliability of your rate-limiting solution.

Redis is often being used in WordPress to improve performance bottlenecks by storing frequently accessed data in memory, thereby improving response times, scalability, and versatility for various uses like caching, session storage, and real-time analytics.

In our case, we use it as a session storage to store IP addresses.

Below is the source code of the rate-limiting function. You can find a complete source code on github.


class LimitExceeded extends Exception {
    public function __construct($message = "Rate limit exceeded", $code = 429, $previous = null) {
        parent::__construct($message, $code, $previous);
    }
}

class RateLimiter {
    private $redis;

    public function __construct() {
        $this->redis = new Redis();
        try {
            $this->redis->connect('127.0.0.1', 6379);
        } catch (Exception $e) {
            error_log($e->getMessage());
            throw new Exception("Failed to connect to Redis", 0, $e);
        }
    }

    public function limit($ipAddress) {
        $key = "rate_limit:" . $ipAddress;
        $current = $this->redis->incr($key);
        $this->redis->expire($key, 60);  // Set the limit period to one minute
        if ($current > 3) {
            throw new LimitExceeded();
        }
    }
}

function getRealUserIp() {
    if (!empty($_SERVER['HTTP_CF_CONNECTING_IP'])) {
        return $_SERVER['HTTP_CF_CONNECTING_IP'];
    } elseif (!empty($_SERVER['HTTP_X_FORWARDED_FOR'])) {
        $ipList = explode(',', $_SERVER['HTTP_X_FORWARDED_FOR']);
        return trim($ipList[0]);
    } else {
        return $_SERVER['REMOTE_ADDR'];
    }
}

$rateLimiter = new RateLimiter();

// Only apply rate limiting for POST requests (form submissions)
if ($_SERVER['REQUEST_METHOD'] === 'POST') {
    try {
        $ipAddress = getRealUserIp();
        $rateLimiter->limit($ipAddress);
    } catch (LimitExceeded $exception) {
        http_response_code(429);
        die("Rate Limit Exceeded. Please wait before trying again.");
    }
}

// Continue with your form processing or other logic here

Yet, we acknowledge the limitations of implementing rate limits on our Contact form, given that many bots utilize proxies with different ip addresses to evade rate-limiting and blocklisting, as well as to potentially distribute the load across multiple servers [1].

However, we still think that reducing spam or abuse in our form submissions is essential for upholding the integrity of our website and conserving resources for authentic user interactions.

Other Possible Solutions

We could also integrate third-party email validation APIs like Mailgun, SendGrid, or ZeroBounce into our Contact form, in addition to using regular expressions (regex) for validation before submission. This will ensure that only valid and working email addresses are accepted, thereby reducing spam and improving the quality of form submissions. This additional checkpoint not only safeguards our form submission process but also enhances user experience by minimizing submission errors.

Other add-ons we could implement include placing a proxy in front of the /form/submit endpoint to check for the presence of cookies, as many spammers neglect to include them, and embedding JavaScript within the form to automatically add a timestamp value in a hidden field. These additional layers of validation make it more difficult for automated spam bots to successfully submit the form undetected.

Conclusion

While our anti-spam form implementation methods, including Turnstile, honeypot, and rate limiting, have proven effective in reducing spam emails in our Contact form, the most challenging spammers utilize headless browsers. These sophisticated bots possess cookies, execute JavaScript, and closely resemble regular users.

Combating them often requires advanced anti-spam and bot software, such as Cloudflare Bot Management, which leverages TLS fingerprinting combined with threat intelligence to combat malicious bot traffic. TLS fingerprinting involves analyzing subtle variations in the cryptographic handshake during the establishment of a TLS connection, enabling the identification and classification of different clients, such as web browsers or bots.

The use of Machine Learning (ML) and Artificial Intelligence (AI) to detect and prevent spam and malicious bots is gaining momentum. These trained models learn from vast amounts of data, adapt to new threats, and automate complex detection and response processes, showcasing significant potential in combating malicious bots and online payment fraud. However, like Enterprise Cloudflare Bot Management, building such ML models is expensive, requiring significant upfront costs as well as ongoing expenses for running and maintenance. Many small businesses and companies lack the resources to pursue such endeavors. Therefore, they must use other means at their disposal to combat cyberattacks and keep their applications as secure as possible.

Source: https://ieeexplore.ieee.org/abstract/document/9519384