OSINT experiment: Trying to scrape completed contact forms

I recently completed a contact form on a technology website to send in a query.

The site looked like it was built on WordPress.

After populating the usual fields, rather than displaying an email confirmation or populating a dropdown with some standard confirmation text, the site instead displayed a copy of my query on a confirmation URL.

The URL structure looked familiar to me.

I was almost certain it was Contact Form 7 — which I used to use before jumping ship to WP Forms, which is the form embedded in this site.

I was wrong. There are services for detecting which themes and plugins a WordPress site is running, but in this case all I had to do was inspect the source HTML:

The confirmation URL looked like this:

http://www.website.com/contact-us?contact-form-id=243&contact-form-sent=24245&contact-form-hash=353533598988958&_wpnonce=cdf79b8067#contact-form-865

So we can clearly identity four variables:

  • The contact form ID (a three digit number). Its raw string is contact-form-id
  • The contact form confirmation number (a five digit number). Its raw string is contact-form-sent
  • A 40 character hash, in UTF-8. Its raw string is contact-form-hash.
  • A second unique form identifier: ‘Wpnonce’

The contact form hash alone is enough to prevent people from being able to access other users’ completed forms simply be polling URLs with sequential contact form IDs.

I presumed that the plugin was generating these confirmation messages with the appropriate SEO tags.

But —being the naturally inquisitive type, this immediately made me wonder: are these confirmation URLs ephemeral? And even if they are supposed to be in the deep web, might any have slipped through the net and be retrievable in the wild?

I went searching.


Query Building


After analyzing the URL structure of the confirmation page, my next task was building a Google query that might dredge these up.

Clearly the confirmation messages might vary, so polling for a specific URL pattern seemed like a better approach than looking up text strings.

I figured that "contact-us?contact-form-id=" should be common to all form completions using this WordPress plugin.

First query, no success:

The URLs either don’t seem to persist or are well de-indexed.

But simply searching for the text string dug up some interesting information — potentially content created through integrations:

A form has been partially completed and held online. But it looks like a bot submission:

Unfortunately, directly looking for queries doesn’t yield any obviously exposed form completions.

I decide to look for inurl: contact-form-hash and I start to find actual form confirmations:

Unfortunately these sites haven’t piped the form output to the confirmation page.

And if they did, what’s a word somebody might use in a form? Regards, perhaps?

Note: any time you run advanced operators like this, you are virtually guaranteed to start getting this:

Which merely draws up more Russian spam:

Unfortunately these are not cached in Google:

One interesting observation is that the backend fields are sometimes captured by search engines, probably due to a misconfiguration:

We can find some more by searching for:

Or:


Outcome

This OSINT adventure didn’t yield too much in the way of useful information — but it did reassure me that my form completions are probably not being scraped. But every challenge like this is certainly an interesting experiment.

Some things I picked up from this:

  • Contact form completion URLs are generally properly deindexed and so are on the deep web rather than the open internet — as they should be.
  • The URL structure includes several unique tokens, including a hash, which would make attempting to guess live completion URLs, or brute force them, basically impossible.
  • The only completion available ‘in the wild’ on the open web are from Russian spammers.
  • Sometimes, form fields that should be kept within the backend are exposed to Google, making it possible to identify site owners’ private email addresses.