Katana Logo

What is Katana

Katana is a fast web crawler made by Project Discovery. The tool is both headless and non-headless with a focus on being used in automation workflows. For example Katana could be used to crawl a target and stored all crawled data, or Katana could be used to crawl a site and store all urls with inputs. The following Katana cheat sheet aims to provide an overview of the tools functionality and provide real world examples from existing workflows.

Install Katana

go install github.com/projectdiscovery/katana/cmd/katana@latest

Basic Usage

katana -u target-domain.com 

Katana Cheat Sheet

Katana Input Commands

COMMAND DESCRIPTION

-u, -list string[]

target url / list to crawl

-resume string

resume scan using resume.cfg

-e, -exclude string[]

exclude host matching specified filter ('cdn', 'private-ips', cidr, ip, regex)

Katana Configuration Options

COMMAND DESCRIPTION

-u, -list string[]

target url / list to crawl

-resume string

resume scan using resume.cfg

-e, -exclude string[]

exclude host matching specified filter ('cdn', 'private-ips', cidr, ip, regex)

-r, -resolvers string[]

list of custom resolver (file or comma separated)

-d, -depth int

maximum depth to crawl (default 3)

-jc, -js-crawl

enable endpoint parsing / crawling in javascript file

-jsl, -jsluice

enable jsluice parsing in javascript file (memory intensive)

-ct, -crawl-duration value

maximum duration to crawl the target for (s, m, h, d) (default s)

-kf, -known-files string

enable crawling of known files (all,robotstxt,sitemapxml), a minimum depth of 3 is required to ensure all known files are properly crawled.

-mrs, -max-response-size int

maximum response size to read (default 9223372036854775807)

-timeout int

time to wait for request in seconds (default 10)

-aff, -automatic-form-fill

enable automatic form filling (experimental)

-fx, -form-extraction

extract form, input, textarea & select elements in jsonl output

-retry int

number of times to retry the request (default 1)

-proxy string

http/socks5 proxy to use

-H, -headers string[]

custom header/cookie to include in all http request in header:value format (file)

-config string

path to the katana configuration file

-fc, -form-config string

path to custom form configuration file

-flc, -field-config string

path to custom field configuration file

-s, -strategy string

Visit strategy (depth-first, breadth-first) (default "depth-first")

-iqp, -ignore-query-params

Ignore crawling same path with different query-param values

-tlsi, -tls-impersonate

enable experimental client hello (ja3) tls randomization

-dr, -disable-redirects

disable following redirects (default false)

Katana Debug Options

COMMAND DESCRIPTION

-health-check, -hc

run diagnostic check up

-elog, -error-log string

file to write sent requests error log

Katana Headless Mode Options

Allow Katana to scan using a real browser, to pretent targets / wafs fingerprint blocking - your traffic will appear to be from a legitimate web browsers fingerprint.

COMMAND DESCRIPTION

-hl, -headless

enable headless hybrid crawling (experimental)

-sc, -system-chrome

use local installed chrome browser instead of katana installed

-sb, -show-browser

show the browser on the screen with headless mode

-ho, -headless-options string[]

start headless chrome with additional options

-nos, -no-sandbox

start headless chrome in --no-sandbox mode

-cdd, -chrome-data-dir string

path to store chrome browser data

-scp, -system-chrome-path string

use specified chrome browser for headless crawling

-noi, -no-incognito

start headless chrome without incognito mode

-cwu, -chrome-ws-url string

use chrome browser instance launched elsewhere with the debugger listening at this URL

-xhr, -xhr-extraction

extract xhr request url,method in jsonl output

Katana Passive Crawling

Using third party locations such as the wayback machine, crawl a target passively (without ever touching the target).

COMMAND DESCRIPTION

-ps, -passive

enable passive sources to discover target endpoints

-pss, -passive-source string[]

passive source to use for url discovery (waybackarchive,commoncrawl,alienvault)

Katana Scope Options

Scope Katana to define what is in scope / out of scope including filters and exlcudes for file types. E.g., don’t store crawled videos or jpg, fonts etc.

COMMAND DESCRIPTION

-cs, -crawl-scope string[]

in scope url regex to be followed by crawler

-cos, -crawl-out-scope string[]

out of scope url regex to be excluded by crawler

-fs, -field-scope string

pre-defined scope field (dn,rdn,fqdn) or custom regex (e.g., '(company-staging.io|company.com)') (default "rdn")

-ns, -no-scope

disables host based default scope

-do, -display-out-scope

display external endpoint from scoped crawling

Katana Filters

Configure Katana to match or filter or exclude results based on the following configuration options.

COMMAND DESCRIPTION

-cs, -crawl-scope string[]

in scope url regex to be followed by crawler

-cos, -crawl-out-scope string[]

out of scope url regex to be excluded by crawler

-fs, -field-scope string

pre-defined scope field (dn,rdn,fqdn) or custom regex (e.g., '(company-staging.io|company.com)') (default "rdn")

-ns, -no-scope

disables host based default scope

-do, -display-out-scope

display external endpoint from scoped crawling

Katana Rate Limiting

Configure the number of threads, or requests per second or per minute for Katana.

COMMAND DESCRIPTION

-c, -concurrency int

number of concurrent fetchers to use (default 10)

-p, -parallelism int

number of concurrent inputs to process (default 10)

-rd, -delay int

request delay between each request in seconds

-rl, -rate-limit int

maximum requests to send per second (default 150)

-rlm, -rate-limit-minute int

maximum number of requests to send per minute

How To Update Katana

COMMAND DESCRIPTION

-up, -update

update katana to latest version

-duc, -disable-update-check

disable automatic katana update check

Katana Output File Options

Output Katana crawl data to file types.

COMMAND DESCRIPTION

-o, -output string

file to write output to

-sr, -store-response

store http requests/responses

-srd, -store-response-dir string

store http requests/responses to custom directory

-or, -omit-raw

omit raw requests/responses from jsonl output

-ob, -omit-body

omit response body from jsonl output

-j, -jsonl

write output in jsonl format

-nc, -no-color

disable output content coloring (ANSI escape codes)

-silent

display output only

-v, -verbose

display verbose output

-debug

display debug output

-version

display project version

Katana Example Commands

Katana Output Query Paramaters

Build a list of URL input injection fields from a target:

katana -f qurl -o qurl-output.txt

Do the same but from a httpx scan output text file:

cut -d " " -f 1 httpx.txt | katana -f qurl -o qurl-httpx.txt

Conclusion

We hope you found this Katana cheat sheet useful, and it helps you get started with this powerful web crawler by Project Discovery.