What is Katana
Katana is a fast web crawler made by Project Discovery. The tool is both headless and non-headless with a focus on being used in automation workflows. For example Katana could be used to crawl a target and stored all crawled data, or Katana could be used to crawl a site and store all urls with inputs. The following Katana cheat sheet aims to provide an overview of the tools functionality and provide real world examples from existing workflows.
Install Katana
Basic Usage
Katana Cheat Sheet
Katana Input Commands
COMMAND | DESCRIPTION |
---|---|
|
target url / list to crawl |
|
resume scan using resume.cfg |
|
exclude host matching specified filter ('cdn', 'private-ips', cidr, ip, regex) |
Katana Configuration Options
COMMAND | DESCRIPTION |
---|---|
|
target url / list to crawl |
|
resume scan using resume.cfg |
|
exclude host matching specified filter ('cdn', 'private-ips', cidr, ip, regex) |
|
list of custom resolver (file or comma separated) |
|
maximum depth to crawl (default 3) |
|
enable endpoint parsing / crawling in javascript file |
|
enable jsluice parsing in javascript file (memory intensive) |
|
maximum duration to crawl the target for (s, m, h, d) (default s) |
|
enable crawling of known files (all,robotstxt,sitemapxml), a minimum depth of 3 is required to ensure all known files are properly crawled. |
|
maximum response size to read (default 9223372036854775807) |
|
time to wait for request in seconds (default 10) |
|
enable automatic form filling (experimental) |
|
extract form, input, textarea & select elements in jsonl output |
|
number of times to retry the request (default 1) |
|
http/socks5 proxy to use |
|
custom header/cookie to include in all http request in header:value format (file) |
|
path to the katana configuration file |
|
path to custom form configuration file |
|
path to custom field configuration file |
|
Visit strategy (depth-first, breadth-first) (default "depth-first") |
|
Ignore crawling same path with different query-param values |
|
enable experimental client hello (ja3) tls randomization |
|
disable following redirects (default false) |
Katana Debug Options
COMMAND | DESCRIPTION |
---|---|
|
run diagnostic check up |
|
file to write sent requests error log |
Katana Headless Mode Options
Allow Katana to scan using a real browser, to pretent targets / wafs fingerprint blocking - your traffic will appear to be from a legitimate web browsers fingerprint.
COMMAND | DESCRIPTION |
---|---|
|
enable headless hybrid crawling (experimental) |
|
use local installed chrome browser instead of katana installed |
|
show the browser on the screen with headless mode |
|
start headless chrome with additional options |
|
start headless chrome in --no-sandbox mode |
|
path to store chrome browser data |
|
use specified chrome browser for headless crawling |
|
start headless chrome without incognito mode |
|
use chrome browser instance launched elsewhere with the debugger listening at this URL |
|
extract xhr request url,method in jsonl output |
Katana Passive Crawling
Using third party locations such as the wayback machine, crawl a target passively (without ever touching the target).
COMMAND | DESCRIPTION |
---|---|
|
enable passive sources to discover target endpoints |
|
passive source to use for url discovery (waybackarchive,commoncrawl,alienvault) |
Katana Scope Options
Scope Katana to define what is in scope / out of scope including filters and exlcudes for file types. E.g., don’t store crawled videos or jpg, fonts etc.
COMMAND | DESCRIPTION |
---|---|
|
in scope url regex to be followed by crawler |
|
out of scope url regex to be excluded by crawler |
|
pre-defined scope field (dn,rdn,fqdn) or custom regex (e.g., '(company-staging.io|company.com)') (default "rdn") |
|
disables host based default scope |
|
display external endpoint from scoped crawling |
Katana Filters
Configure Katana to match or filter or exclude results based on the following configuration options.
COMMAND | DESCRIPTION |
---|---|
|
in scope url regex to be followed by crawler |
|
out of scope url regex to be excluded by crawler |
|
pre-defined scope field (dn,rdn,fqdn) or custom regex (e.g., '(company-staging.io|company.com)') (default "rdn") |
|
disables host based default scope |
|
display external endpoint from scoped crawling |
Katana Rate Limiting
Configure the number of threads, or requests per second or per minute for Katana.
COMMAND | DESCRIPTION |
---|---|
|
number of concurrent fetchers to use (default 10) |
|
number of concurrent inputs to process (default 10) |
|
request delay between each request in seconds |
|
maximum requests to send per second (default 150) |
|
maximum number of requests to send per minute |
How To Update Katana
COMMAND | DESCRIPTION |
---|---|
|
update katana to latest version |
|
disable automatic katana update check |
Katana Output File Options
Output Katana crawl data to file types.
COMMAND | DESCRIPTION |
---|---|
|
file to write output to |
|
store http requests/responses |
|
store http requests/responses to custom directory |
|
omit raw requests/responses from jsonl output |
|
omit response body from jsonl output |
|
write output in jsonl format |
|
disable output content coloring (ANSI escape codes) |
|
display output only |
|
display verbose output |
|
display debug output |
|
display project version |
Katana Example Commands
Katana Output Query Paramaters
Build a list of URL input injection fields from a target:
Do the same but from a httpx scan output text file:
Conclusion
We hope you found this Katana cheat sheet useful, and it helps you get started with this powerful web crawler by Project Discovery.