MENU
bash python

API Reference

The OnCrawl REST API is used for accessing your crawl data as well as managing your projects and your crawls.

In order to use this API you need to have an OnCrawl account, an active subscription and an access token.

The current version of the web API is known as V2, although we don’t expect it to change too much it is still considered under development.

We try to keep breaking change as little as possible but this is not 100% guaranteed.

Requests

All API requests should be made to the /api/v2 prefix, and will return JSON as the response.

HTTP Verbs

When applicable, API tries to use the appropriate HTTP verb for each action:

Verb Description
GET Used for retrieving resources.
POST Used for creating resources.
PUT Used for updating resources.
DELETE Used for deleting resources.

Parameters and Data

curl "https://app.oncrawl.com/api/v2/projects" \
    -H "Content-Type: application/json" \
    -d @- <<EOF
    {
        "project": {
            "name": "Project name",
            "start_url": "https://www.oncrawl.com"
        }
    }
EOF
import requests

requests.post("https://app.oncrawl.com/api/v2/projects", json={
  "project": {
    "name": "Project name",
    "start_url": "https://www.oncrawl.com"
  }
})

Any parameters not included in the URL should be encoded as JSON with a Content-Type of application/json.

Additional parameters are sometimes specified via the querystring, even for POST, PUT and DELETE requests.

When a complex object is required to be passed via the querystring, the rison encoding format is used.

Errors

Format an en error message

{
  "type": "error_type",
  "code": "error_code",
  "message": "Error message",
  "fields": [{
    "name": "parameter_name",
    "type": "field_error_type",
    "message": "Error message"
  }]
}

When an error occurs, the API returns a JSON object with the following properties:

Property Description
type An attribute that groups errors based on their nature.
code
optional
A more specific attribute to let you handle specific errors.
message
optional
A human readable message describing the error.
fields
optional
List of field’s related errors.

Quota error message

{
 "type": "quota_error",
 "message": "Not enough quota" 
}

Forbidden error message

{
  "type": "forbidden",
  "code": "no_active_subscription"
}

Fields related errors

{
 "type": "invalid_request_parameters",
 "fields": [{
  "name": "start_url",
  "type": "required",
  "message": "The start URL is required."
 }]
}

Permissions errors

The following errors occurs if you are not allowed to perform a request.

Type Description
unauthorized Returned when the request is not authenticated.
HTTP Code: 401
forbidden Returned when the request is authenticated but the action is not allowed.
HTTP Code: 403
quota_error Returned the current quota does not allow the action to be performed.
HTTP Code: 403

The forbidden error is usually accompanied with a code key:

Validations errors

The following errors are caused due to invalid request. In most cases it means the request won’t be able to complete unless the parameters are changed.

Type Description
invalid_request Returned when the request has incompatible values or does not match the API specification.
HTTP Code: 400
invalid_request_parameters Returned when the value does not meet the required specification for the parameter.
HTTP Code: 400
resource_not_found Returned when any of resource(s) referred in the request is not found.
HTTP Code: 404
duplicate_entry Returned when the request provides a duplicate value for an attribute that is specified as unique.
HTTP Code: 400

Operation failure errors

There errors are returned when the request was valid but the requested operation could not be completed.

Type Description
invalid_state_for_request Returned when the requested operation is not allowed for current state of the resource.
HTTP Code: 409
internal_error Returned when the request couldn’t be completed due to a bug in OnCrawl side.
HTTP Code: 500

Authentication

To authorize, use this code:

# With shell, you can just pass the correct header with each request
curl "https://app.oncrawl.com/api/v2/projects" \
  -H "Authorization: Bearer {ACCESS_TOKEN}"
import requests

response = requests.get("https://app.oncrawl.com/api/v2/projects",
    headers={ 'Authorization': 'Bearer {ACCESS_TOKEN}' }
)

Make sure to replace {ACCESS_TOKEN} with your own access token.

OnCrawl uses access tokens to allow access to the API. You can create tokens from your settings panel if your subscription allows it.

OnCrawl expects the access token to be included in all API requests to the server.

An access token may be created with various scopes:

Scope Description
account:read Give read access to all account’s related data.
Examples: Profile, invoices, subscription.
account:write Give write access to all account’s related data.
Examples: Close account, update billing information.
projects:read Give read access to all project’s and crawl’s data.
Examples: View crawl reports, export data.
projects:write Give write access to all project’s and crawl’s data.
Examples: Launch crawl, create project.

OnCrawl Query Language

OnCrawl provides a JSON-style language that you can use to execute queries.

This is referred to as the OQL for OnCrawl Query Language.

An OQL query has a tree-like structure composed of nodes.

A node can be terminal and is referred to as a leaf, or be a compound of other nodes.

An OQL query must start with a single root node.

Leaf nodes

Example of OQL using a field node:

{
  "field": [ "field_name", "filter_type", "filter_value" ]
}
Node Description
field Apply a filter on a field.

The value of a field node is an array with 3 values:

Compound nodes

Example OQL using an and node:

{
  "and": [ {
    "field": [ "field_name", "filter_type", "filter_value" ]
  }, {
    "field": [ "field_name", "filter_type", "filter_value" ]   
  }]
}
Node Description
and Execute a list of nodes using the logical operator AND.
or Execute a list of nodes using the logical operator OR.

Common filters

OQL to retrieve pages found in the structure:

{
  "field": [ "depth", "has_value", "" ]
}
Filter type Description
has_no_value The field must have no value.
has_value The field must have any value.

String filters

OQL to retrieve pages with “cars” in title

{
  "field": [ "title", "contains", "cars" ]
}
Filter type Description
contains The field’s value must contains the filter value.
endswith The field’s value must ends with the filter value.
startswith The field’s value must starts with the filter value.
equals The field’s value must be strictly equals to the filter value.

Numeric filters

OQL to retrieve pages with less than 10 inlinks:

{
  "field": [ "follow_inlinks", "lt", "10" ]
}

OQL to retrieve pages between depth 1 and 4

{
  "field": [ "depth", "between", [ "1", "4" ]]
}
Filter type Description
gt The field’s value must be greater than the filter value.
gte The field’s value must be greater or equal than the filter value.
lt The field’s value must be lesser than the filter value.
lte The field’s value must be lesser or equal than the filter value.
between The field’s value must be between both filter values (lower inclusive, upper exclusive).

Filters options

OQL to retrieve urls within /blog/{year}/:

{
  "field": [ "urlpath", "startswith", "/blog/([0-9]+)/", { "regex": true } ]
}

The filters equals, contains, startswith and endswith can take options as the fourth parameter of the field node as a JSON object.

Property Description
ci
boolean
true is the match should be case insensitive.
regex
boolean
true if the filter value is a regex.

Pagination

The majority of endpoints returning resources such as projects and crawls are paginated.

HTTP request

Example of paginated query

curl "https://app.oncrawl.com/api/v2/projects?offset=50&limit=100&sort=name:desc" \
    -H "Authorization: Bearer {ACCESS_TOKEN}"
import requests

response = requests.get(
    "https://app.oncrawl.com/api/v2/projects?offset={offset}&limit={limit}&sort={sort}"
    .format(
        offset=50,
        limit=100,
        sort='name:desc'
    ),
    headers={ 'Authorization': 'Bearer {ACCESS_TOKEN}' }
).json()

The HTTP query expect the following parameters:

Parameter Description
offset
optional
The offset for matching items.
Defaults to 0.
limit
optional
The maximum number of matching items to return.
Defaults to 10.
sort
optional
How to sort matching items, order can be asc or desc.
Natural ordering is from most recent to least recent.
filters
optional
The OQL filters used for the query.
Defaults to null.

Because filters is a JSON objects that need to be passed in the querystring, the rison encoding format is used.

The sort parameter is expected to be the following format {name}:{order} where:

HTTP response

Example of paginated response

{
  "meta": {
    "offset": 0,
    "limit": 10,
    "total": 100,
    "filters": "<OQL>",
    "sort": [
      [ "name", "desc" ]
    ]
  },
  "projects": [ "..." ]
}

The HTTP response always follow the same pattern:

The meta key returns a JSON object that allows you to easily paginate through the resources:

Property Description
offset The offset used for the query.
Defaults to 0.
limit The limit used for the query.
Defaults to 10.
total The total number of matching items.
sort The sort used for the query.
Defaults to null.
filters The OQL filters used for the query.
Defaults to {}.

Data API

The Data API allows you to explore, aggregate and export your data.

There are 3 main sources:

Each sources can have one or several data types behind them.

Data types

For Crawl Reports:

curl "https://app.oncrawl.com/api/v2/data/crawl/<crawl_id>/<data_type>" \
  -H "Authorization: Bearer {ACCESS_TOKEN}"
import requests

response = requests.get("https://app.oncrawl.com/api/v2/data/crawl/<crawl_id>/<data_type>",
    headers={ 'Authorization': 'Bearer {ACCESS_TOKEN}' }
).json()

For Crawl over Crawl:

curl "https://app.oncrawl.com/api/v2/data/project/<project_id>/log_monitoring/<data_type>" \
  -H "Authorization: Bearer {ACCESS_TOKEN}"
import requests

response = requests.get("https://app.oncrawl.com/api/v2/data/project/<project_id>/log_monitoring/<data_type>",
    headers={ 'Authorization': 'Bearer {ACCESS_TOKEN}' }
).json()

For Log monitoring:

curl "https://app.oncrawl.com/api/v2/data/crawl_over_crawl/<coc_id>/<data_type>" \
  -H "Authorization: Bearer {ACCESS_TOKEN}"
import requests

response = requests.get("https://app.oncrawl.com/api/v2/data/crawl_over_crawl/<coc_id>/<data_type>",
    headers={ 'Authorization': 'Bearer {ACCESS_TOKEN}' }
).json()

A data type is the nature of the objects you are exploring, each data type has its own schema and purpose.

Source Data type Description
Crawl report pages Lists of crawled pages of your website.
Crawl report links Lists of all links of your website.
Crawl report clusters Lists of duplicate clusters of your website.
Crawl report structured_data Lists of structured data of your website.
Crawl over Crawl pages List of compared pages.
Logs monitoring pages Lists of all urls.
Logs monitoring events Lists of all events.
Pages
Represents an HTML page of the website.
Links
Represents a link between two pages.
Example: an ‘href’ link to another page.
Clusters
Represents a cluster of pages that are considered similar.
A cluster has a size and an average similarity ratio.
Structured data
Represents a structured data item found on a page.
Supported format are: JSON-LD, RDFa and microdata.
Events
Represents a single line of a log file.
Available only in logs monitoring.

Data Schema

HTTP Request

Example of field’s request

curl "https://app.oncrawl.com/api/v2/data/crawl/<crawl_id>/<data_type>/fields" \
  -H "Authorization: Bearer {ACCESS_TOKEN}"
import requests

fields = requests.get("https://app.oncrawl.com/api/v2/data/crawl/<crawl_id>/<data_type>/fields",
    headers={ 'Authorization': 'Bearer {ACCESS_TOKEN}' }
).json().get('fields', [])

HTTP Response

Example of HTTP response

{
  "fields": [{
    "name": "canonical_evaluation", 
    "type": "enum", 
    "arity": "one", 
    "values": [
         "matching", 
         "not_matching", 
         "not_set"
    ],
    "actions": [
     "has_no_value", 
     "not_equals", 
     "equals", 
     "has_value"
    ], 
    "agg_dimension": true, 
    "agg_metric_methods": [
     "value_count", 
     "cardinality"
    ], 
    "can_display": true, 
    "can_filter": true, 
    "can_sort": false,
    "user_select": true, 
    "category": "HTML Quality"
  }, "..."]
}
Property Description
name The name of the field
type The field’s type (natural, float, hash, enum, bool, string, percentage, object, date, datetime, ratio)
arity If the field is multivalued, can be one or many.
values List of possible values for enum type.
actions List of possible filters of this field.
agg_dimension true if can be used as a dimension in aggregate queries.
agg_metric_methods List of available aggregations methods for this field.
can_display true if the field can be retrieved in search or export queries.
can_filter true if the field can be used in filters queries.
can_sort true if the field can be sort on in search or export queries.
category
deprecated
Do not use.
user_select
deprecated
Do not use.

Search Queries

The search queries allows you to explore your data by filtering, sorting and paginating.

HTTP Request

Search for crawled with with 301 or 404 HTTP status code.

curl "https://app.oncrawl.com/api/v2/data/crawl/<crawl_id>/pages" \
    -H "Authorization: Bearer {ACCESS_TOKEN}" \
    -H "Content-Type: application/json" \
    -d @- <<EOF
    {
        "offset": 0,
        "limit": 10,
        "fields": [ "url", "status_code" ],
        "sort": [
            { "field": "status_code", "order": "asc" }
        ],
        "oql": {
            "and":[
                {"field":["fetched","equals",true]},
                {"or":[
                    {"field":["status_code","equals",301]},
                    {"field":["status_code","equals",404]}
                ]}
            ]}
        }
    }
EOF
import requests

response = requests.post("https://app.oncrawl.com/api/v2/data/crawl/<crawl_id>/pages",
    headers={ 'Authorization': 'Bearer {ACCESS_TOKEN}' },
    json={
      "offset": 0,
      "limit": 10,
      "fields": [ "url", "status_code" ],
      "sort": [
          { "field": "status_code", "order": "asc" }
      ],
      "oql": {
        "and":[
            {"field":["fetched","equals",true]},
            {"or":[
                {"field":["status_code","equals",301]},
                {"field":["status_code","equals",404]}
            ]}
        ]}
      }
    }
).json()

The HTTP request is expected to be a JSON object as its payload with the following properties:

Property Description
limit
optional
Maximum number of matching result to return.
offset
optional
An offset for the returned matching results.
oql
optional
An OnCrawl Query Language object.
fields
optional
List of fields to retrieve for each matching result.
sort
optional
Ordering of the returned matching results.

The sort parameter is expected to be an array of object with the a field key and an order key where:

HTTP response

{
  "meta": {
    "columns": [
      "url", 
      "inrank", 
      "status_code", 
      "meta_robots", 
      "fetched"
    ], 
    "total_hits": 1, 
    "total_pages": 1
  }, 
  "oql": {
    "and": [
      { "field": [ "fetched",  "equals",  true ] }, 
      {
        "or": [
          { "field": [ "status_code", "equals", 301 ] }, 
          { "field": [ "status_code", "equals", 404 ] }
        ]
      }
    ]
  }, 
  "urls": [
    {
      "fetched": true, 
      "inrank": 8, 
      "meta_robots": null, 
      "status_code": 301, 
      "url": "http://www.website.com/redirect/"
    }
  ]
}

The response will be a JSON object with an urls key, an oql key and a meta key.

The urls key will contains an array of matching results.

The oql key will contains the OnCrawl Query Language object used for filtering.

The meta key will contains as keys:

Property Description
columns List of returned fields. They are the keys used in urls objects.
total_hits Total number of matching results.
total_pages
deprecated
Total number of pages according to limit and total_hits.

Aggregate Queries

Average load time of crawled pages

curl "https://app.oncrawl.com/api/v2/data/crawl/<crawl_id>/pages/aggs" \
    -H "Authorization: Bearer {ACCESS_TOKEN}" \
    -H "Content-Type: application/json" \
    -d @- <<EOF
    {
      "aggs": [{
        "oql": {
          "field": ["fetched", "equals", "true"]
        },
        "value": "load_time:avg"
      }]
    }
EOF
import requests

response = requests.post("https://app.oncrawl.com/api/v2/data/crawl/<crawl_id>/pages/aggs",
    headers={ 'Authorization': 'Bearer {ACCESS_TOKEN}' },
    json={
      "aggs": [{
        "oql": {
          "field": ["fetched", "equals", "true"]
        },
        "value": "load_time:avg"
      }]
    }
).json()

The returned JSON looks like:

{
  "aggs": [
    {
      "cols": [
        "load_time:avg"
      ],
      "rows": [
        [
          183.41091954022988
        ]
      ]
    }
  ]
}

HTTP Request

This HTTP endpoint expect a JSON object as its payload with a single aggs key and an array of aggregate queries as its value.

An aggregate query is an object with the following properties:

Property Description
oql
optional
An OnCrawl Query Language object to match a set of items.
By default it will match all items.
fields
optional
Specify how to create buckets of matching items.
value
optional
Specify how to aggregate matching items.
By default it will return the number of matching items.

How to aggregate items

By default an aggregate request will return the count but you can also perform a different aggregation using the field parameter.

The expected format is <field_name>:<aggregation_type>.

For example:

But not all fields can be aggregated and not all aggregations are available on all fields.

To know which aggregations are available on a field you can check the agg_metric_methods value returned by the Data Schema endpoint.

The available methods are:

min
Returns the minimal value for this field.
max
Returns the maximal value for this field.
avg
Returns the average value for this field.
sum
Returns the sum of all the values for this field.
value_count
Returns how many items have a value for this field.
cardinality
Returns the number of different values for this field.

How to create simple buckets

Average inrank by depth

curl "https://app.oncrawl.com/api/v2/data/crawl/<crawl_id>/pages/aggs" \
    -H "Authorization: Bearer {ACCESS_TOKEN}" \
    -H "Content-Type: application/json" \
    -d @- <<EOF
    {
      "aggs": [{
        "fields": [{
            "name": "depth"
        }],
        "value": "inrank:avg"
      }]
    }
EOF
import requests

response = requests.post("https://app.oncrawl.com/api/v2/data/crawl/<crawl_id>/pages/aggs",
    headers={ 'Authorization': 'Bearer {ACCESS_TOKEN}' },
    json={
      "aggs": [{
        "fields": [{
            "name": "depth"
        }],
        "value": "inrank:avg"
      }]
    }
).json()

Pages count by range of inlinks

curl "https://app.oncrawl.com/api/v2/data/crawl/<crawl_id>/pages/aggs" \
    -H "Authorization: Bearer {ACCESS_TOKEN}" \
    -H "Content-Type: application/json" \
    -d @- <<EOF
    {
      "aggs": [{
        "fields": [{
          "name": "nb_inlinks_range",
          "ranges": [
            {
              "name": "under_10",
              "to": 10
            },
            {
              "name": "10_50",
              "from": 10,
              "to": 51
            },
            {
              "name": "more_50",
              "from": 51
            }
          ]
        }]
      }]
    }
EOF
import requests

response = requests.post("https://app.oncrawl.com/api/v2/data/crawl/<crawl_id>/pages/aggs",
    headers={ 'Authorization': 'Bearer {ACCESS_TOKEN}' },
    json={
      "aggs": [{
        "fields": [{
          "name": "nb_inlinks_range",
          "ranges": [
            {
              "name": "under_10",
              "to": 10
            },
            {
              "name": "10_50",
              "from": 10,
              "to": 51
            },
            {
              "name": "more_50",
              "from": 51
            }
          ]
        }]
      }]
    }
).json()

When performing an aggregation, you can create buckets for your matching items using the fields parameter which takes an array of JSON objects.

The simplest way is to simply use the field’s name like so: {"name": "field_name"}.

It will returns the item’s count for all different values of field_name.

But not all fields can be used to create a bucket.

To know which fields are available as a bucket you can check the agg_dimension value returned by the Data Schema endpoint.

How to create ranges buckets

If the field_name returns too many different values it could be useful to group them as ranges.

To do so you can add a ranges key that takes an array of range. A range is a JSON object with the following expected keys:

Property Description
name
required
The name that will be returned in the JSON response for this range.
from
optional
The lowest or equal value for this range.
to
optional
The highest (not equal) value for this range.

Only numeric fields can be used with ranges buckets.

Export Queries

Export all pages from the structure.

curl "https://app.oncrawl.com/api/v2/data/crawl/<crawl_id>/pages" \
    -H "Authorization: Bearer {ACCESS_TOKEN}" \
    -H "Content-Type: application/json" \
    -d @- <<EOF
    {
         "export": "true",
         "fields": ["url"],
         "oql": {
            "field":["depth","has_value", ""]
        }
    }
EOF > my_export.csv
import requests

response = requests.post("https://app.oncrawl.com/api/v2/data/crawl/<crawl_id>/pages",
    headers={ 'Authorization': 'Bearer {ACCESS_TOKEN}' },
    json={
      "export": True,
      "fields": ["url"],
       "oql": {
          "field":["depth","has_value", ""]
      }
    }
)

An export query allows you to save as a csv file the result of your search query.

It does not suffer from the 10K items limitation and allows you to export all of the matching results.

To export the result of your search query as csv, simply add export: true in the JSON payload.

HTTP response

The response of the query will be a streamed csv file.

Projects API

The Projects API allows you manage all your projects and your crawls.

With this API you can, for example:

Projects

List projects

Get list of projects.

curl "https://app.oncrawl.com/api/v2/projects" \
    -H "Authorization: Bearer {ACCESS_TOKEN}"
import requests

projects = requests.get("https://app.oncrawl.com/api/v2/projects"
    headers={ 'Authorization': 'Bearer {ACCESS_TOKEN}' }
).json()

HTTP Request

The projects can be paginated and filtered using the parameters described in the pagination section.

The fields available for the sort and filters are:

Property Description
id The project ID.
name The project’s name.
start_url The project’s start URL.
features The project’s enabled features.

HTTP Response

{
   "meta":{
      "filters":{},
      "limit":100,
      "offset":0,
      "sort":null,
      "total":1
   },
   "projects": [
      "<Project Object>",
      "<Project Object>"
   ]
}

A JSON object with a meta, described by the pagination section and a projects key with the list of project.

Get a project

Get a project.

curl "https://app.oncrawl.com/api/v2/projects/<project_id>" \
    -H "Authorization: Bearer {ACCESS_TOKEN}"
import requests

project = requests.get("https://app.oncrawl.com/api/v2/projects/<project_id>"
    headers={ 'Authorization': 'Bearer {ACCESS_TOKEN}' }
).json()

HTTP Response

{
  "project": {
    "id": "592c1e1cf2c3a42743d14350",
    "name": "OnCrawl",
    "start_url": "http://www.oncrawl.com/",
    "user_id": "54dce0f264b65e1eef3ef61b",
    "is_verified_by": "google_analytics",
    "domain": "oncrawl.com",
    "features": [
        "at_internet",
        "google_search_console"
    ],
    "last_crawl_created_at": 1522330515000,
    "last_crawl_id": "5abceb9303d27a70f93151cb",
    "limits": {
        "max_custom_dashboard_count": null,
        "max_group_count": null,
        "max_segmentation_count": null,
        "max_speed": 100
    },
    "log_monitoring_data_ready": true,
    "log_monitoring_processing_enabled": true,
    "log_monitoring_ready": true,
    "crawl_config_ids": [
        "5aa80a1303d27a729113bb2d"
    ],
    "crawl_ids": [
        "5abceb9303d27a70f93151cb"
    ],
    "crawl_over_crawl_ids": [
        "5abcf43203d27a1ecf100b2c"
    ]
  },
  "crawl_configs": [
    "<CrawlConfig Object>"
  ],
  "crawls": [
    "<Crawl Object>"
  ]
}

The HTTP response is JSON object with three keys:

The project’s properties are:

Property Description
id The project ID.
name The project’s name.
start_url The project’s start URL.
user_id The ID of the project’s owner.
is_verified_by Holds how the project’s ownership was verified.
Can be google_analytics, google_search_console, admin or null.
domain The start URL’s domain.
features List of project’s enabled features.
last_crawl_id The ID of the latest created crawl.
last_crawl_created_at UTC timestamp of the latest created crawl, in milliseconds.
Defaults to null.
limits An object with customized limits for this project.
log_monitoring_data_ready true if the project’s log monitoring index is ready to be searched.
log_monitoring_processing_enabled true if the project’s files for the log monitoring are automatically processed.
log_monitoring_ready true if the project’s log monitoring configuration was submitted.
crawl_config_ids The list of Crawl over Crawl IDs attached to this project.
crawl_ids The list of Crawl IDs for this project.
crawl_config_ids The list of Crawl configurations IDs for this project.

Create a project

Create a project.

curl -X POST "https://app.oncrawl.com/api/v2/projects" \
    -H "Authorization: Bearer {ACCESS_TOKEN}" \
    -H "Content-Type: application/json" \
    -d @- <<EOF
    {
        "project": {
            "name": "Project name",
            "start_url": "https://www.oncrawl.com"
        }
    }
EOF
import requests

requests.post("https://app.oncrawl.com/api/v2/projects", json={
      "project": {
        "name": "Project name",
        "start_url": "https://www.oncrawl.com"
      }
  },
  headers={ 'Authorization': 'Bearer {ACCESS_TOKEN}' }
)

HTTP request

Property Description
name
required
The project’s name, must be unique.
start_url
required
The project’s start url starting by http:// or https://.

HTTP Response

Examples of HTTP response

{
  "project": "<Project Object>"
}

An HTTP 200 status code is returned with the created project returned directly as the response within a project key.

Delete a project

Delete a project.

curl -X DELETE "https://app.oncrawl.com/api/v2/projects/<project_id>" \
    -H "Authorization: Bearer {ACCESS_TOKEN}"
import requests

requests.delete("https://app.oncrawl.com/api/v2/projects/<project_id>",
    headers={ 'Authorization': 'Bearer {ACCESS_TOKEN}' }
)

HTTP request

No HTTP parameters.

HTTP Response

Returns an HTTP 204 status code if successful.

Scheduling

The scheduling of crawls allows you to start your crawl at a later date, run it periodically automatically or both.

Schedule your crawls to be run every week or every month and never think about it again.

List scheduled crawls

Get list of scheduled crawls.

curl "https://app.oncrawl.com/api/v2/projects/<project_id>/scheduled_crawls" \
    -H "Authorization: Bearer {ACCESS_TOKEN}"
import requests

projects = requests.get("https://app.oncrawl.com/api/v2/projects/<project_id>/scheduled_crawls"
    headers={ 'Authorization': 'Bearer {ACCESS_TOKEN}' }
).json()

HTTP Request

The scheduled crawls can be paginated using the parameters described in the pagination section.

There are not sort or filters available.

HTTP Response

{
   "meta":{
      "filters": {},
      "limit":50,
      "offset":0,
      "sort": null,
      "total":1
   },
   "scheduled_crawls": [
      {
         "config_id":"59f3048cc87b4428618d7c44",
         "id":"5abdeb0f03d27a69ef169c52",
         "project_id":"592c1e1cf2c3a42743d14350",
         "recurrence":"week",
         "start_date":1522482300000
      }
   ]
}

A JSON object with a meta, described by the pagination section and a scheduled_crawls key with the list of scheduled crawls for this project.

Create a scheduled crawl

HTTP request

Create a scheduled crawl.

curl "https://app.oncrawl.com/api/v2/projects/<project_id>/scheduled_crawls" \
    -H "Authorization: Bearer {ACCESS_TOKEN}" \
    -H "Content-Type: application/json" \
    -d @- <<EOF
    {
        "scheduled_crawl": {
            "config_id": "59f3048cc87b4428618d7c49",
            "recurrence": "week",
            "start_date": 1522482300000
        }
    }
EOF
import requests

requests.post("https://app.oncrawl.com/api/v2/projects", json={
    "scheduled_crawl": {
        "config_id": "59f3048cc87b4428618d7c49",
        "recurrence": "week",
        "start_date": 1522482300000
    }
  },
  headers={ 'Authorization': 'Bearer {ACCESS_TOKEN}' }
)

The request is expected to be a JSON object with a scheduled_crawl key and the following properties:

Property Description
config_id
required
The ID of the crawl configuration to schedule.
recurrence
optional
Can be day, week, 2weeks or month.
start_date
required
An UTC timestamp in milliseconds for when to start the first crawl.

HTTP Response

Examples of HTTP response

{
   "scheduled_crawl":{
      "config_id":"59f3048cc87b4428618d7c29",
      "id":"5abdeb0f03d27a69ef169c53",
      "project_id":"592c1e1cf2c3a42743d14350",
      "recurrence":"week",
      "start_date":1522482300000
   }
}

An HTTP 200 status code is returned with the created scheduled crawl returned directly as the response within a scheduled_crawl key.

Delete a scheduled crawl

Delete a scheduled crawl.

curl -X DELETE "https://app.oncrawl.com/api/v2/projects/<project_id>/scheduled_crawls/<scheduled_crawl_id>" \
    -H "Authorization: Bearer {ACCESS_TOKEN}"
import requests

requests.delete("https://app.oncrawl.com/api/v2/projects/<project_id>/scheduled_crawls/<scheduled_crawl_id>",
    headers={ 'Authorization': 'Bearer {ACCESS_TOKEN}' }
)

HTTP request

No HTTP parameters.

HTTP Response

Returns an HTTP 204 status code if successful.

Crawls

Launch a crawl

Launch a crawl.

curl -X POST "https://app.oncrawl.com/api/v2/projects/<project_id>/launch-crawl?configId=<crawl_config_id>" \
    -H "Authorization: Bearer {ACCESS_TOKEN}"
import requests

requests.post("https://app.oncrawl.com/api/v2/projects/<project_id>/launch-crawl?configId=<crawl_config_id>",
    headers={ 'Authorization': 'Bearer {ACCESS_TOKEN}' }
)

HTTP request

You have to pass a configId parameter in the query string with the ID of the crawl configuration you want to launch.

HTTP Response

Example of HTTP response

{
  "crawl": "<Crawl Object>"
}

Returns an HTTP 200 status code if successful with the created crawl returned directly as the response within a crawl key.

List crawls

Get list of crawls.

curl "https://app.oncrawl.com/api/v2/crawls" \
    -H "Authorization: Bearer {ACCESS_TOKEN}"
import requests

crawls = requests.get("https://app.oncrawl.com/api/v2/crawls"
    headers={ 'Authorization': 'Bearer {ACCESS_TOKEN}' }
).json()

HTTP Request

The crawls can be paginated and filtered using the parameters described in the pagination section.

The fields available for the sort and filters are:

Property Description
id The crawl’s ID.
user_id The crawl’s owner ID.
project_id The crawl’s project ID.
status The crawl’s status.
Can be running, done, cancelled, terminating, pausing, paused, archiving unarchiving, archived.
created_at The crawl’s creation date as UTC timestamp in milliseconds.

HTTP Response

{
   "meta":{
      "filters":{},
      "limit":100,
      "offset":0,
      "sort":null,
      "total":1
   },
   "crawls": [
      "<Crawl Object>",
      "<Crawl Object>"
   ]
}

A JSON object with a meta, described by the pagination section and a crawls key with the list of crawl.

Get a crawl

Get a crawl.

curl "https://app.oncrawl.com/api/v2/crawls/<crawl_id>" \
    -H "Authorization: Bearer {ACCESS_TOKEN}"
import requests

project = requests.get("https://app.oncrawl.com/api/v2/crawls/<crawl_id>"
    headers={ 'Authorization': 'Bearer {ACCESS_TOKEN}' }
).json()

HTTP Response

{
   "crawl": {
     "id":"5a57819903d27a7faa253683",
     "project_id":"592c1e1cf2c3a42743d14341",
     "user_id":"54dce0f264b65e1eef3ef61b",
     "link_status":"live",
     "status":"done",
     "created_at":1515684249000,
     "ended_at":1515685455000,
     "fetched_urls":10,  
     "end_reason":"max_url_reached",
     "features":[
        "at_internet"
     ],
      "crawl_config": "<CrawlConfig Object>",
      "cross_analysis_access_logs": null,
      "cross_analysis_at_internet": {
         "dates":{
            "from":"2017-11-26",
            "to":"2018-01-10"
         }
      },
      "cross_analysis_google_analytics": {
        "error": "No quota remaining."
      },
      "cross_analysis_majestic_back_links": {
         "stores":[
            {
               "name":"www.oncrawl.com",
               "success": true,
               "sync_date":"2017-10-27"
            }
         ],
         "tld":{
            "citation_flow":35,
            "name":"oncrawl.com",
            "trust_flow":29
         }
      }
   }
}

The HTTP response is JSON object with a single crawl key containing the crawl’s data:

The crawls’s properties are:

Property Description
id The crawl ID.
project_id The crawl’s project ID.
user_id The crawl’s owner ID.
link_status The links index status.
Can be live or archived.
status The crawl’s status.
Can be running, done, cancelled, terminating, pausing, paused, archiving unarchiving, archived.
created_at Date of the crawl creation using an UTC timestamp in milliseconds.
ended_at Date of the crawl termination using an UTC timestamp in milliseconds.
fetched_urls Number of URLs that were fetched for this crawl.
end_reason A code describing the reason of why the crawl’s stopped.
This value may not be present.
features List of features available by this crawl.
crawl_config The crawl configuration object used for this crawl.
cross_analysis_access_logs Dates used by the Logs monitoring cross analysis.
null if no cross analysis were done.
cross_analysis_at_internet Dates used by the AT Internet cross analysis.
null if no cross analysis were done.
cross_analysis_google_analytics Dates used by the Google Analytics cross analysis.
null if no cross analysis were done.
cross_analysis_majestic_back_links Majestic cross analysis metadata.
null if no cross analysis available.

End reasons

ok
All the URL of the structure have been crawled.
crawl_already_running
A crawl with the same configuration was already running.
quota_reached_before_start
When a scheduled crawl could not run because of missing quota.
quota_reached
When the URL quota was reached during the crawl.
max_url_reached
When the maximum number of URL defined in the crawl configuration was reached.
max_depth_reached
When the maximum depth defined in the crawl configuration was reached.
user_cancelled
When the crawl was manually cancelled.
user_requested
When the crawl was manually terminated and a partial crawl report was produced.
no_active_subscription
When no active subscription were available.
stopped_progressing
Technical end reason: at the end of the crawl, there are still unfetched urls, but for some reason the crawler is unable to fetch them. To prevent the crawler from iterating indefinitely, we abort the fetch phase when, after three attempts, he still has not managed to crawl those pages.
max_iteration_reached
Technical end reason: crawl evolved abnormally very slowly. It can happen, for example, when the website server is very busy with randomly dropped connections. We abort the fetch phase after 500 iterations when we detect this pathological server behavior.

Get a crawl progress

Get a crawl progress

curl "https://app.oncrawl.com/api/v2/crawls/<crawl_id>/progress" \
    -H "Authorization: Bearer {ACCESS_TOKEN}"
import requests

project = requests.get("https://app.oncrawl.com/api/v2/crawls/<crawl_id>/progress"
    headers={ 'Authorization': 'Bearer {ACCESS_TOKEN}' }
).json()

You can call this endpoint for running crawls in order to follow its progression.

It allows you for example to monitoring if the crawler encounters an abnormal number of errors.

HTTP request

This endpoint takes no parameters.

HTTP Response

{
   "progress":{
      "analysis":{
         "progress":0.0
      },
      "fetch":{
         "depth_progress":[
            {
               "cumulative_fetched":1,
               "depth":1,
               "statuses":{
                  "fetched_2xx":1,
                  "fetched_3xx":0,
                  "fetched_4xx":0,
                  "fetched_5xx":0,
                  "unfetched_exception":0,
                  "unfetched_robots_denied":0
               },
               "total_known_urls":1
            },
            {
               "cumulative_fetched":36,
               "depth":2,
               "statuses":{
                  "fetched_2xx":29,
                  "fetched_3xx":6,
                  "fetched_4xx":0,
                  "fetched_5xx":0,
                  "unfetched_exception":0,
                  "unfetched_robots_denied":0
               },
               "total_known_urls":36
            }
         ],
         "max_depth":5,
         "samples":{
            "fetched_2xx":[
               {
                  "fetch_date":1522400745000,
                  "fetch_duration":214,
                  "status_code":200,
                  "url":"https://www.oncrawl.com/"
               }
            ]
         },
         "total_fetched":127,
         "total_known_urls":127
      }
   }
}

The HTTP response is JSON object with a single progress key containing the crawl’s progression.

The properties are:

Property Description
fetch.total_known_urls The total number of discovered URLs during the crawl.
fetch.total_fetched The total number of fetched URLs during the crawl.
fetch.max_depth The current crawler’s depth.
fetch.samples A list of URL’s samples per status.
It varies during the crawl and may not have a sample for a status.
fetch.depth_progress A detailed status per depth.
analysis.progress A decimal between 0.0 and 1.0 that give the progression of the analysis.

Fetch statuses

fetched_2xx
Status code between 200 and 299
fetched_3xx
Status code between 300 and 399
fetched_4xx
Status code between 400 and 499
fetched_5xx
Status code between 500 and 599
unfetched_exception
Unable to fetched an URL (ex: a server timeout.)

Update crawl state

HTTP request

Pause a running crawl

curl "https://app.oncrawl.com/api/v2/crawls/<crawl_id>/pilot" \
    -H "Authorization: Bearer {ACCESS_TOKEN}" \
    -H "Content-Type: application/json" \
    -d @- <<EOF
    {
        "command": "pause"
    }
EOF
import requests

requests.post("https://app.oncrawl.com/api/v2/crawls/<crawl_id>/pilot", json={
      "command": "pause"
  },
  headers={ 'Authorization': 'Bearer {ACCESS_TOKEN}' }
)

You have to pass a JSON object with a command key and the desired state.

The crawl’s commands are:

Command Description
cancel Cancel the crawl. It won’t produce a report.
Crawl must be running or paused.
resume Resume a paused crawl.
Crawl must be paused.
pause Pause a crawl.
Crawl must be running.
terminate Terminate a crawl early. It will produce a report.
Crawl must be running or paused.
unarchive Un-archive all crawl’s data.
Crawl must be archived or links_status must be archived.
unarchive-fast Un-archive crawl’s data except links.
Crawl must be archived.

HTTP Response

Example of HTTP response

{
  "crawl": "<Crawl Object>"
}

Returns an HTTP 200 status code if successful with the updated crawl returned directly as the response within a crawl key.

Delete a crawl

Delete a crawl.

curl -X DELETE "https://app.oncrawl.com/api/v2/crawls/<crawl_id>" \
    -H "Authorization: Bearer {ACCESS_TOKEN}"
import requests

requests.delete("https://app.oncrawl.com/api/v2/crawls/<crawl_id>",
    headers={ 'Authorization': 'Bearer {ACCESS_TOKEN}' }
)

HTTP request

No HTTP parameters.

HTTP Response

Returns an HTTP 204 status code if successful.

Crawls Configurations

List configurations

Get list of crawl configurations.

curl "https://app.oncrawl.com/api/v2/projects/<project_id>/crawl_configs" \
    -H "Authorization: Bearer {ACCESS_TOKEN}"
import requests

projects = requests.get("https://app.oncrawl.com/api/v2/projects/<project_id>/crawl_configs"
    headers={ 'Authorization': 'Bearer {ACCESS_TOKEN}' }
).json()

HTTP Request

The endpoint does not take any parameter.

HTTP Response

{
   "crawl_configs": [
      "<CrawlConfig Object>",
      "<CrawlConfig Object>"
   ]
}

A JSON object with a crawl_configs key with the list of crawl configuration.

Get a configuration

Get a configuration.

curl "https://app.oncrawl.com/api/v2/projects/<project_id>/crawl_configs/<crawl_config_id>" \
    -H "Authorization: Bearer {ACCESS_TOKEN}"
import requests

project = requests.get("https://app.oncrawl.com/api/v2/projects/<project_id>/crawl_configs/<crawl_config_id>"
    headers={ 'Authorization': 'Bearer {ACCESS_TOKEN}' }
).json()

HTTP Response

{
  "crawl_config": {
       "agent_kind":"web",
       "ajax_crawling":false,
       "allow_query_params":true,
       "alternate_start_urls": [],
       "at_internet_params": {},
       "crawl_subdomains":false,
       "custom_fields":[],
       "dns": [],
       "extra_headers": {},
       "filter_query_params":false,
       "google_analytics_params":{},
       "google_search_console_params":{},
       "http_auth":{ },
       "id":"592c1f53973cb53b75287a79",
       "js_rendering":false,
       "majestic_params":{},
       "max_depth":15,
       "max_speed":10,
       "max_url":2000000,
       "name":"default",
       "query_params_list":"",
       "resource_checker":false,
       "reuse_cookies":false,
       "robots_txt":[],
       "scheduling_period":null,
       "scheduling_start_date":null,
       "scheduling_timezone":"Europe/Paris",
       "sitemaps":[],
       "start_url":"http://www.oncrawl.com/",
       "strict_sitemaps":true,
       "trigger_coc":false,
       "use_cookies":true,
       "use_proxy":false,
       "user_agent":"OnCrawl",
       "user_input_files":[],
       "watched_resources":[],
       "webhooks":[],
       "whitelist_params_mode":true
    }
}

The HTTP response is JSON object with the crawl configuration inside a crawl_config key.

The crawl base properties are:

Property Description
agent_kind The type of user agent.
Values are web or mobile.
ajax_crawling true if the website should be crawled as a pre-rendered JavaScript website, false otherwise.
allow_query_params true if the crawler should follow URL with query parameters, false otherwise.
alternate_start_urls List of alternate start URLs. All those URLs will start with a depth of 1.
They must all belongs to the same domain.
at_internet_params Configuration for AT Internet cross analysis.
The AT Internet cross analysis feature is required.
crawl_subdomains true if the crawler should follow links of all the subdomains.
Example: http://blog.domain.com for http://www.domain.com.
custom_fields Configuration for custom fields scraping.
The Data Scraping feature is required.
dns Override the crawler’s default DNS.
extra_headers Defines additional headers for the HTTP requests done by the crawler.
filter_query_params true if the query string of URLs should be stripped.
google_analytics_params Configuration for the Google Analytics cross analysis.
The Google Analytics cross analysis feature is required.
google_search_console_params Configuration for the Google Search Console cross analysis.
The Google Search Console cross analysis feature is required.
http_auth Configuration for the HTTP authentication of the crawler.
id The ID of this crawl configuration.
js_rendering true if the crawler should render the crawled pages using JavaScript.
The Crawl JS feature is required.
majestic_params Configuration for the Majestic Back-links cross analysis.
The Majestic Back-Links feature is required.
max_depth The maximum depth after which the crawler will stop follow links.
max_speed The maximum speed at each the crawler should go in number of URLs per second. Valid values are 0.1, 0.2, 0.5, 1, 2, 5 then every multiple of 5 until your maximum allowed crawl speed.
To crawl above 1 URL/s you need to verify the ownership of the project.
max_url The maximum number of fetched URLs after which the crawler will stop.
name The name of the configuration.
Only used as a label to easily identify it.
query_params_list If filter_query_params is true, this is a list of comma separated name of query parameter to filter. The parameter whitelist_params_mode will define how to filter them.
resource_checker true if the crawler should watch for requested resources during the crawl, false otherwise. This feature requires js_rendering:true.
reuse_cookies
deprecated
Not used anymore.
robots_txt List of configured virtual robots.txt.
The project’s ownership must be verified to use this option.
scheduling_period
deprecated
Not used anymore.
scheduling_start_date
deprecated
Not used anymore.
scheduling_timezone
deprecated
Not used anymore.
sitemaps List of sitemaps URLs.
start_url The start URL of the crawl.
This URL should not be a redirection to another URL.
strict_sitemaps true if the crawler should follow strictly the sitemaps protocol, false otherwise.
trigger_coc true if the crawler should automatically generate a Crawl over Crawl at the end.
The Crawl over Crawl feature is required.
use_cookies true if the crawler should keep the cookies returned by the server between requests, false otherwise.
use_proxy true if the crawler should use the OnCrawl proxy which allows it to keep a static range of IP addresses during its crawl.
user_agent Name of the Crawler, this name will appears in the user agent sent by the crawler.
user_input_files List of ingested data files IDs to use in this crawl.
The Data Ingestion feature is required.
watched_resources List of patterns to watch if resource_checker is set to true.
webhooks List of webhooks to call during the crawl.
whitelist_params_mode true if the query_params_list should be used as a whitelist, false if it should be used as a blacklist.

Create a configuration

Create a crawl configuration.

curl "https://app.oncrawl.com/api/v2/projects/<project_id>/crawl_configs" \
    -H "Authorization: Bearer {ACCESS_TOKEN}" \
    -H "Content-Type: application/json" \
    -d @- <<EOF
    {
        "crawl_config": {
            "name": "New crawl configuration",
            "start_url": "https://www.oncrawl.com",
            "user_agent": "OnCrawl",
            "max_speed": 1
        }
    }
EOF
import requests

requests.post("https://app.oncrawl.com/api/v2/projects/<project_id>/crawl_configs", json={
        "crawl_config": {
            "name": "New crawl configuration",
            "start_url": "https://www.oncrawl.com",
            "user_agent": "OnCrawl",
            "max_speed": 1
        }
  },
  headers={ 'Authorization': 'Bearer {ACCESS_TOKEN}' }
)

HTTP request

The expected HTTP request is exactly the same format as the response when you retrieve a crawl configuration.

The id is automatically generated by the API for any new crawl configuration and must not be part of the payload.

The only required fields are name, start_url, user_agent and max_speed.

HTTP Response

Examples of HTTP response

{
  "crawl_config": "<CrawlConfig Object>"
}

An HTTP 200 status code is returned with the created crawl configuration returned directly as the response within a crawl_config key.

AT Internet

{
  "at_internet_params": {
    "api_key": "YOUR_API_KEY",
    "site_id": "YOUR_SITE_ID"
  }
}

A subscription with the AT Internet feature is required to use this configuration.

To can request an API Key in API Accounts within the settings area of your AT Internet homepage.

This API Key is necessary to allow OnCrawl to access your AT Internet data.

The site_id specify from which site we should collect the data.

The HTTP requests that you need to whitelist are:

Note: You must replace the {site_id} of both URLs with the actual site ID.

Without this, OnCrawl won’t be able to fetch the data.

Google Analytics

{
  "google_analytics_params": {
    "email": "local@domain.com",
    "account_id": "12345678",
    "website_id": "UA-12345678-9",
    "profile_id": "12345678"
  }
}

A subscription with the Google Analytics feature is required to use this configuration.

You have the provides the following properties:

Property Description
email Email of your Google account.
account_id ID of your Google Analytics account.
website_id ID of your website in Google Analytics.
profile_id ID of the website’s profile to use for cross analysis.

To use a Google Account you must first give access to your analytics data to OnCrawl using OAuth2.

For now you must use the onCrawl web client to add your Google account.

Google Search Console

{
  "google_search_console_params": {
    "email": "local@domain.com",
    "websites": [
      "https://www.oncrawl.com"
    ],
    "branded_keywords": [
      "oncrawl",
      "on crawl",
      "oncrowl"
    ]
  }
}

A subscription with the Google Search Console feature is required to use this configuration.

You have the provides the following properties:

Property Description
email Email of your Google account.
websites List of the websites URLs from your Google Search Console to use.
branded_keywords List of keywords that the crawler should consider as part of a brand.

To use a Google Account you must first give access to your analytics data to OnCrawl using OAuth2.

For now you must use the onCrawl web client to add your Google account.

Majestic

{
  "majestic_params": {
    "access_token": "ABCDEF1234"
  }
}

A subscription with the Majestic feature is required to use this configuration.

You have the provides the following properties:

Property Description
access_token An access token that the crawler can use to access your data.

You can create an access token authorizing OnCrawl to access your Majestic data here.

Custom fields

Documentation not available yet.

Webhooks

Documentation not available yet.

DNS

{
  "dns": [{
    "host": "www.oncrawl.com",
    "ips": [ "82.34.10.20", "82.34.10.21" ]
  }, {
    "host": "fr.oncrawl.com",
    "ips": [ "82.34.10.20" ]
  }]
}

The dns configuration allows you resolve one or several domains to another IP address than they normally would.

This can be useful to crawl a website in pre-production as if it was already deployed on the real domain.

Extra HTTP headers

{
  "extra_headers": {
    "Cookie": "lang=fr;",
    "X-My-Token": "1234"
  }
}

The extra_headers configuration allows you to inject custom HTTP headers to each of the crawl’s HTTP requests.

HTTP Authentication

{
  "http_auth": {
    "username": "user",
    "password": "1234",
    "scheme": "Digest",
    "realm": null
  }
}

The http_auth configuration allows you to crawl sites behind an authentication.

It can be useful to crawl a website in pre-production that is password protected before its release.

Property Description
username
required
Username to authenticate with.
password
required
Password to authenticate with.
scheme
required
How to authenticate. Available values are Basic Digest and NTLM.
realm
optional
The authentication realm.
To NTLM this correspond to the domain.

Robots.txt

{
  "robots_txt": [{
    "host": "www.oncrawl.com",
    "content": "CONTENT OF YOUR ROBOTS.TXT"
  }]
}

The robots_txt configuration allows you to override, for a given host, its robots.txt.

It can be use to:

Because you can make the crawler ignore the robots.txt of a website, it is necessary to verify the ownership of this project to use this feature.

For now you can only verify the ownership using the OnCrawl application.

Update a configuration

Update a crawl configuration.

curl "https://app.oncrawl.com/api/v2/projects/<project_id>/crawl_configs" \
    -H "Authorization: Bearer {ACCESS_TOKEN}" \
    -H "Content-Type: application/json" \
    -X PUT \
    -d @- <<EOF
    {
        "crawl_config": {
            "name": "New crawl configuration",
            "start_url": "https://www.oncrawl.com",
            "user_agent": "OnCrawl",
            "max_speed": 1
        }
    }
EOF
import requests

requests.put("https://app.oncrawl.com/api/v2/projects/<project_id>/crawl_configs", json={
        "crawl_config": {
            "name": "New crawl configuration",
            "start_url": "https://www.oncrawl.com",
            "user_agent": "OnCrawl",
            "max_speed": 1
        }
  },
  headers={ 'Authorization': 'Bearer {ACCESS_TOKEN}' }
)

HTTP request

It takes the same parameters as a crawl configuration creation except the name that cannot be modified and must be the same.

HTTP response

It returns the same response as a crawl configuration creation.

Delete a configuration

Delete a configuration.

curl -X DELETE "https://app.oncrawl.com/api/v2/projects/<project_id>/crawl_configs/<crawl_config_id>" \
    -H "Authorization: Bearer {ACCESS_TOKEN}"
import requests

requests.delete("https://app.oncrawl.com/api/v2/projects/<project_id>/crawl_configs/<crawl_config_id>",
    headers={ 'Authorization': 'Bearer {ACCESS_TOKEN}' }
)

HTTP request

No HTTP parameters.

HTTP Response

Returns an HTTP 204 status code if successful.