🔴 Day 9 — Google Dorking & OSINT
Day 100 — Professional Pentester

09

Before a professional ethical hacker touches a single tool against a target, they spend hours — sometimes days — gathering information using only publicly available sources. This phase is called reconnaissance. And it starts somewhere most people would never think to look: Google.

Today you will learn to use Google the way security professionals do — not as a search engine, but as an intelligence tool. You’ll be surprised how much sensitive information is sitting in plain sight, indexed and searchable, waiting to be found by anyone who knows the right operators.

Google Dorking isn’t about hacking Google. It’s about using Google’s own features — features designed for finding specific types of content — to surface information that organisations accidentally expose to the public internet. Login pages left unprotected. Configuration files indexed by crawlers. Sensitive documents uploaded to public servers. Internal directories browsable without authentication.

I always tell students: the best recon happens before you touch a network. Everything you learn from Google, WHOIS, certificate transparency, and LinkedIn requires zero interaction with the target’s systems — meaning it’s completely passive, creates no logs on their end, and gives you intelligence that shapes everything you do next.

⚖️ The Legal Framework — Read This First

Google Dorking searches publicly available, indexed information — it is legal. However, there are clear lines to respect:

Searching for information is legal — it’s what Google is designed for
Using found public information in authorised security assessments is standard practice
Reporting exposed information to the organisation responsibly is ethical and often welcomed
Accessing credentials or systems found through dorking without permission is illegal
Downloading confidential documents you were not intended to access crosses legal lines
Using found vulnerabilities to attack systems without authorisation is a criminal offence


What Is OSINT — And Why Every Pentest Starts Here

OSINT stands for Open Source Intelligence — information gathered from publicly available sources. The term comes from intelligence agencies, where “open source” means publicly accessible rather than classified. In ethical hacking, OSINT is the passive reconnaissance phase — you learn as much as possible about a target without sending a single packet to their network.

Why does it come first? Because the more you know going in, the more targeted and efficient your active testing becomes. A company’s tech stack, their employee names and emails, their domain structure, their software versions, their publicly exposed files — all of this shapes your attack surface map before you’ve written a single command. Intelligence professionals have a saying: “Time spent in reconnaissance is seldom wasted.” It applies equally here.

🔍
Google Dorking
Find publicly indexed sensitive files, login pages, exposed directories

📡
Shodan
Search internet-connected devices, find exposed services globally

📧
theHarvester
Collect email addresses, subdomains, IPs from public sources

🌐
WHOIS & DNS
Domain ownership, name servers, IP ranges, registration data

🔐
Cert Transparency
Find all subdomains from SSL certificate logs


Core Google Search Operators — The Building Blocks

Google search operators are special commands that filter and target search results with precision. You’ve probably used some of them without knowing they were called “operators.” In security work, we combine them to find very specific types of content. Here are the operators every ethical hacker needs to know.

OperatorWhat It DoesExample
site:Restrict results to one domainsite:target.com
filetype:Find specific file typesfiletype:pdf site:target.com
inurl:Find pages with keyword in URLinurl:admin site:target.com
intitle:Find pages with keyword in titleintitle:”index of” site:target.com
intext:Find pages containing text in bodyintext:”password” filetype:log
cache:View Google’s cached version of a pagecache:target.com/page
related:Find sites similar to a domainrelated:target.com
” “Exact phrase match“index of /backup”
Exclude a term from resultssite:target.com -www
ORMatch either termfiletype:sql OR filetype:db
💡 Combining operators: The real power comes from combining operators. site:target.com filetype:pdf intitle:"confidential" finds PDF files on a specific domain whose title contains “confidential.” Each operator narrows the result set — stacking them produces surgical precision.

Dorking by Category — What Security Professionals Actually Look For

Now let’s combine those operators into real dork queries, organised by what they find. In an authorised security assessment, these help you understand what a target has accidentally exposed. I’ve structured these by category so you understand the intent behind each query — not just the syntax.

🗂️ Finding Exposed Files & Documents

File exposure dorks — for use on authorised assessment targets only
# Find sensitive documents accidentally indexed
site:target.com filetype:pdf
site:target.com filetype:xls OR filetype:xlsx
site:target.com filetype:doc OR filetype:docx
# Configuration and data files often containing credentials
site:target.com filetype:env
site:target.com filetype:conf OR filetype:config
site:target.com filetype:sql
site:target.com filetype:log
site:target.com filetype:bak OR filetype:backup
# Find files that contain sensitive keywords
site:target.com intext:”confidential” filetype:pdf
site:target.com filetype:xls intext:”password”
intext:”username” intext:”password” filetype:log

🔓 Finding Login Pages & Admin Panels

Login and admin panel discovery
# Admin panels and control pages
site:target.com inurl:admin
site:target.com inurl:login
site:target.com inurl:dashboard
site:target.com inurl:wp-admin
site:target.com inurl:phpmyadmin
site:target.com intitle:”admin” inurl:admin
# Specific application login pages
site:target.com intitle:”Webmin”
site:target.com intitle:”cPanel”
site:target.com inurl:”/jenkins”
site:target.com inurl:”/jira”

📁 Finding Open Directories

Open directory listing — browsable folders on web servers
# “Index of” is the title of auto-generated Apache/Nginx directory listings
intitle:”index of” site:target.com
intitle:”index of” “parent directory” site:target.com
# Find open directories with specific file types
intitle:”index of” “.env” site:target.com
intitle:”index of” “backup” site:target.com
intitle:”index of” “/uploads” site:target.com
# What you find in open directories:
# Source code, database dumps, configuration files,
# uploaded user files, application backups
# These findings are high-severity in any security assessment

🖥️ Finding Exposed Technology & Error Pages

Technology fingerprinting through public pages
# Error pages reveal software versions and server paths
site:target.com “PHP Parse error”
site:target.com “Microsoft OLE DB Provider” “error”
site:target.com intext:”mysql_fetch_array()”
site:target.com “Traceback (most recent call last)”
# Version-specific application pages
site:target.com “powered by WordPress” inurl:wp-content
site:target.com intitle:”Apache Tomcat” “HTTP Status”
site:target.com intitle:”phpinfo()”
# phpinfo() pages expose PHP config, server paths, and sensitive settings

🔑 Finding Credentials & API Keys in Public Code

Code repositories and public pastebins — common credential leakage vectors
# GitHub — developers accidentally commit secrets
site:github.com “target.com” “password”
site:github.com “target.com” “api_key”
site:github.com “target.com” “secret_key”
site:github.com “target.com” “.env”
# Pastebin and similar sites
site:pastebin.com “target.com” “password”
site:pastebin.com “@target.com”
# Why this works: developers push code to GitHub including .env files
# containing database passwords, AWS keys, API tokens
# This is one of the most common causes of serious breaches
# It’s a primary target in every bug bounty recon phase


The Google Hacking Database — 10,000+ Ready-Made Dorks

You don’t need to build every dork from scratch. The Google Hacking Database (GHDB) — maintained by Exploit-DB — is an archive of thousands of proven dork queries, each tagged by category. Security researchers contribute their discoveries and the database grows continuously.

Footholds
Entry points and vulnerable login pages

Files w/ Passwords
Credentials in indexed documents

Sensitive Dirs
Open directory listings

Vuln Servers
Pages revealing server version info

Error Messages
Verbose errors revealing internals

Using the GHDB — how to search and apply dorks
# Step 1: Visit the GHDB
https://www.exploit-db.com/google-hacking-database
# Step 2: Filter by category (e.g. “Files containing passwords”)
# Step 3: Pick a dork relevant to your target’s technology
# Step 4: Add “site:target.com” to scope it to your authorised target
# Example GHDB dork → scoped to a specific target
# GHDB entry: intitle:”index of” “wp-config.php”
intitle:”index of” “wp-config.php” site:target.com
# wp-config.php contains WordPress database credentials
# Finding it in an open directory = critical severity


Beyond Google — The OSINT Toolkit

Google is powerful but it’s just the start. Professional OSINT uses a collection of tools — each surfacing a different type of publicly available information. All of these tools work with data that’s already public; none of them require touching the target’s systems.


TOOL
Shodan — The Internet’s Device Search Engine

Shodan is a search engine that continuously scans the internet and indexes what it finds — not web pages, but the services running on open ports. It is the most powerful passive reconnaissance tool for finding internet-connected devices, exposed services, and vulnerable systems — all without touching anything yourself.

Shodan — searches that reveal exposed infrastructure
# Search at shodan.io — free account gives basic access
https://www.shodan.io
# Find all hosts for a specific organisation
org:”Target Company Ltd”
hostname:target.com
# Find specific vulnerable software versions
product:”Apache” version:”2.2.8″
product:”vsftpd” version:”2.3.4″
# These find servers still running the same versions as Metasploitable 2
# Find exposed databases
port:3306 org:”Target Company” # MySQL exposed
port:5432 org:”Target Company” # PostgreSQL exposed
port:27017 org:”Target Company” # MongoDB exposed
# Shodan CLI — use from Kali terminal
pip3 install shodan
shodan init YOUR_API_KEY
shodan host 203.0.113.10 # Full info on a specific IP
shodan search –limit 10 “apache 2.2.8”

🎯 Why Shodan matters for defenders: Shodan sees your internet-exposed infrastructure the same way attackers do. Running a Shodan search for your own organisation’s IP ranges before a security assessment tells you exactly what attackers see — and often reveals forgotten servers, exposed services, and outdated software that nobody on the internal team knew was reachable.

TOOL
theHarvester — Email Addresses & Subdomains

theHarvester automates the collection of email addresses, subdomains, and employee names from search engines, LinkedIn, and DNS. It’s pre-installed on Kali and produces structured output you can use directly in follow-on social engineering assessments or subdomain scanning.

theHarvester — email and subdomain collection
# Basic syntax
theHarvester -d target.com -b google
# -d = target domain, -b = data source
# Use multiple sources simultaneously
theHarvester -d target.com -b google,bing,linkedin,hunter
# Save results to a file
theHarvester -d target.com -b all -f ~/Day9/harvester_results
# Sample output
[*] Emails found:
j.smith@target.com
admin@target.com
hr@target.com
[*] Subdomains found:
mail.target.com
dev.target.com
staging.target.com
vpn.target.com
dev.target.com → often runs older, less-patched software
staging.target.com → may lack production security controls


WHOIS & Certificate Transparency — Domain & Subdomain Intelligence

WHOIS and certificate transparency — from your Kali terminal
# WHOIS — domain registration and ownership info
whois target.com
Registrant Name: Jane Smith
Registrant Email: admin@target.com
Name Server: ns1.target.com
Creation Date: 2018-03-14
Registrar: GoDaddy
# IP range WHOIS — find all IP blocks owned by a company
whois 203.0.113.10
NetRange: 203.0.113.0 – 203.0.113.255
OrgName: Target Company Ltd
# Certificate Transparency — find ALL subdomains from SSL certs
# Every SSL cert is logged publicly. crt.sh searches these logs.
https://crt.sh/?q=%.target.com
# The % wildcard finds ALL subdomains with certificates
mail.target.com
internal.target.com ← interesting — sounds internal
legacy-app.target.com ← “legacy” = potentially old software
test.target.com ← test environments often less secure
# From Kali — certificate transparency via CLI
curl -s “https://crt.sh/?q=%.target.com&output=json” | \
python3 -c “import sys,json; [print(c[‘name_value’]) \
for c in json.load(sys.stdin)]” | sort -u


🗺️ Complete OSINT Workflow — The Professional Approach

Here’s how I structure OSINT for a real security assessment. Every step feeds information into the next — building a complete picture before a single active scan is run. I want you to use this same structure during your Day 9 task.

STEP 1 — DOMAIN & IP INTELLIGENCE
Run WHOIS on the primary domain. Look up IP ranges. Identify all nameservers. What registrar? When was it registered? Are there other domains registered to the same email? Build your IP range list for Nmap later.

STEP 2 — SUBDOMAIN ENUMERATION
crt.sh for certificate transparency. theHarvester for search engine results. DNS brute force later (Day 22). Document every subdomain — note which ones sound interesting (dev, staging, internal, legacy, api).

STEP 3 — PEOPLE & EMAIL INTELLIGENCE
theHarvester for email addresses. LinkedIn for employee names and roles (note technical roles — developers, sysadmins). Understand email format (j.smith@, jsmith@, firstname.lastname@). This feeds social engineering assessments.

STEP 4 — GOOGLE DORKING
Run systematic dorks: files, login pages, open directories, error pages, code repositories. Document every finding with the exact URL, query used, and what was exposed. High-severity findings get flagged immediately.

STEP 5 — SHODAN & INTERNET EXPOSURE
Search Shodan for the target’s IP ranges. What services are internet-facing? What software versions? Any unusual ports open? Compare against what the organisation thinks is exposed versus what Shodan reveals.

STEP 6 — COLLATE & PRIORITISE
Compile everything into a structured document. Priority targets for active scanning: interesting subdomains, exposed services on unusual ports, old software versions. This becomes your Nmap target list for Day 8’s workflow applied to the real engagement.


🎯 Day 9 Practical Task — OSINT on a Permitted Target

For today’s tasks we use your own domain (if you have one), or practice on domains explicitly created for security training practice. Do not run OSINT workflows on organisations without authorisation — even passive techniques can create paper trails.

📋 DAY 9 CHECKLIST
1
Learn Google operators hands-on — practice on SecurityElites.com
site:securityelites.com
site:securityelites.com filetype:pdf
site:securityelites.com inurl:course
How many pages does Google have indexed? What file types appear? Note how site: restricts results to one domain.

2
Browse the GHDB and understand its categories
Visit exploit-db.com/google-hacking-database. Browse 3 categories. For each, read 5 dorks and understand what they’re looking for. This builds pattern recognition for what misconfiguration each dork type exploits.

3
Run theHarvester against a practice domain
mkdir ~/Day9
theHarvester -d securityelites.com -b google -f ~/Day9/harvest
What emails and subdomains does it find? Open the HTML report it generates. This is a sample output you’ll be generating on client domains in real assessments.

4
Find subdomains via certificate transparency
curl -s “https://crt.sh/?q=%.securityelites.com&output=json” | \
python3 -c “import sys,json; [print(c[‘name_value’]) for c in json.load(sys.stdin)]” | sort -u
Or simply visit crt.sh and search %.securityelites.com in the browser.

⭐ BONUS CHALLENGE — OSINT on Yourself

Run a quick OSINT check on yourself. Google your name in quotes. Google your email address. Check Email Breach Checker Tool to see if your email appeared in any known data breaches. Search GitHub for your email. What did you find? Most people are surprised. Share what surprised you most in Telegram with #Day9Done 🔍

🔍
You’ve learned to find what people accidentally leave public.
That’s one of the most valuable skills in this field.

Nine days in. You have the mindset, the lab, the command line, the file system, networking, subnetting, packet analysis, network scanning, and now reconnaissance. Day 10 introduces password attacks — one of the most searched topics in security — where we learn how credentials get cracked and what makes passwords resilient.

Day 10: Password Attacks →

Frequently Asked Questions — Day 9

Can companies stop Google from indexing their sensitive pages?
Yes — through several mechanisms. A robots.txt file can instruct search crawlers not to index specific paths (though it doesn’t restrict access, just crawling). The <meta name="robots" content="noindex"> tag tells Google not to index a page. Moving sensitive resources behind authentication prevents indexing entirely because Google can’t authenticate. The right approach is defence in depth: authentication, access controls, AND telling crawlers to stay away — not relying on any single method.
What is the difference between passive and active reconnaissance?
Passive reconnaissance gathers information without directly interacting with the target’s systems — Google searches, WHOIS lookups, Shodan searches, certificate transparency. The target never knows you looked. Active reconnaissance involves direct interaction — Nmap scans, DNS queries to their servers, web crawling. Active recon creates logs on the target’s systems. In a professional assessment, passive recon always comes first. Never start active scanning without proper scope documentation.
What should I do if I find sensitive information exposed by an organisation?
If you’re on an authorised assessment — document it and include it in your report with impact and remediation guidance. If you stumble upon it by accident on an organisation you have no relationship with — responsible disclosure is the ethical path. Contact the organisation’s security team (look for security@company.com or check their security policy page), describe what you found without going further, and give them time to fix it. Do not access, download, or use the exposed data. Many organisations have bug bounty programmes that reward exactly this kind of responsible reporting.
How do I protect my own organisation from Google Dorking?
Run a systematic dorking exercise against your own domain regularly — use the same techniques attackers would. Configure robots.txt correctly, protect sensitive directories with authentication rather than relying on obscurity, audit what files are publicly accessible on your web servers, review GitHub repositories for accidentally committed credentials, and set up Google Search Console to monitor what Google has indexed about your domain. The best way to find what’s exposed is to look for it yourself before someone else does.

ME
Mr Elite
Founder, SecurityElites.com | Penetration Tester | Educator

The first time I ran a proper OSINT workflow on an authorised target, I found an exposed database backup containing 50,000 customer records — sitting in a public S3 bucket, indexed by Google. Not hacked. Just misconfigured and forgotten. That finding — which I reported responsibly — was more valuable than any exploit. Know how to look before you know how to attack.

LEAVE A REPLY

Please enter your comment!
Please enter your name here