Vision AI API Documentation

About Vision AI
Computer Vision API Documentation

asticaVision API is a general purpose computer vision model with state-of-the-art capabilities. Available through an API, it enables developers to provide users with a powerful and comprehensive suite of image analysis and understanding capabilities.


Vision AI Demo

Effortlessly process images and extract valuable insights using the asticaVision API to automate image moderation, automatic categorization, face recognition, and object detection.

REST API ‐ Examples
Computer Vision Code Samples
View API Options and API Output
Endpoint
https://vision.astica.ai/describe
Basic CURL Usage
curl --location --request POST 'https://vision.astica.ai/describe' \
--header 'Content-Type: application/json' \
--insecure \
--data '{
  "tkn": "your_api_key",
  "modelVersion": "2.5_full",
  "input": "https://www.astica.org/inputs/analyze_3.jpg",
  "visionParams": "gpt, describe, describe_all, tags, objects",
  "objects_custom_kw": "",
  "gpt_prompt": "",
  "gpt_length": "90"
}'
Vision Inputs
Input List
Vision API Inputs

Each call to the asticaVision API should include the computer vision parameters that you wish to detect. If you do not supply any parameters, then it will default to all and you will be billed for each transaction for every call.

List of Astica Vision Inputs
token
string
your astica API Key
string
modelVersion
string
Model versions can be one of the following:
  • 2.5_full
  • 2.1_full
    older models
  • 1.0_full
  • 2.0_full
string
input
string
Supports valid HTTPS link to image file or base64 encoded string.
  • Max Filesize: 20MB
  • Max Resolution: 16000x16000
  • Compatible File Types
    • .PNG
    • .JPG version 2.5+ or higher:
    • .BMP
    • .JIFF
string
visionParams
string
string
gpt_prompt (optional)
string
Use a custom prompt to control the GPT-S description produced by Vision AI. Use natural language prompts to request a specific writing style, written language and more.
  • Min characters: 8
  • Max characters: 150 version 2.1 or lower
  • Max characters: 325 version 2.5 or higher
  • Required Parameter: gpt or gpt_detailed
about custom gpt prompts
string
prompt_length (optional)
integer
An optional integer to control the ideal number of words to be returned by the GPT-S description. Defaults to 90 for "gpt" and 125 for "gpt_detailed"
  • Min characters: 8
  • Max characters: 125 version 2.1 or lower
  • Max characters: 250 version 2.5 or higher
integer
objects_custom_kw (optional)
string
Optional comma-separated string of keywords to be detected. Used to annotate and detect bounding boxes of domain-specific or highly specific objects.
  • Max characters: 245
  • Required model version: v2.5 or higher
about custom object detection
string
Vision Parameters
Parameter List
Vision API Parameters

Each request to the API can contain one or more vision parameter which determines which capabilities will be engaged. Utilize multiple different Vision AI functions in a single API call.

Utilize multiple different Vision AI functions in a single API call. If left blank then all available parameters will be used. For cost-effective usage it is recommended to use only the vision parameters that are relevant to your usecase.

Specifying Vision Parameters
Use a comma-separated string:
"visionParams": "gpt_detailed, describe, describe_all, faces, moderate",
List of Computer Vision Parameters
describe
Returns a caption which describes the image.
describe_all v2.0+ or higher
Returns multiple auxilliary captions that describe the image.
text_read v2.0+ or higher
Returns the all text found in the image. OCR results include word level bounding boxes.
gpt
Produces a detailed paragraph describing the image roughly 90 words.

gpt_detailed
Produces a high detailed paragraph with increased verbosity, and improved logic and reasoning for custom prompting. Using this parameter will increase the processing time of the request by several seconds.
faces
Returns the age and gender of all faces detected in the image.
objects
Returns the name and bounding box of detected objects.
objects_custom v2.5+ or higher
Returns the bounding box of all custom objects listed in user-provided objects_custom_kw.
objects_color v2.5+ or higher
Returns the color codes of all detected objects (including custom objects).
moderate
Returns a calculated value for different types of sensitive materials found in the image.
tags
Returns a list of descriptive terms which describe the image.
brands
Returns a list of brands that have been identified. For example, a logos on a cup, or a t-shirt.
celebrities
Returns a list of celebrities and other known persons that have been detected in the photo.
landmarks
Returns a list of known locations and landmarks found in the photo. For example, the Eiffel Tower.
Vision Responses
Responses List
Vision API Responses

Astica Vision AI provides responses that are generated as a result of a successful request. These responses appear based on the specific inputs and parameters that were provided during the initial request.

If the Vision response returns an error more information can be found in the Vision API Errors section

List of Vision Responses
caption
dict (object)
Primary image caption or alt-text of the image.
caption_list
list (array)
Additional image captions describing the image
caption_GPTS
string
contains detailed long-form description of the image
tags
list (array)
list of tags that describe the scene of the image
colors
dict (object)
list of primary and accent colors found within the image
moderate
dict (object)
a rating of of any sensitive content that was found
objects
list (array)
a list of objects of all detected objects and their bounding boxes.
objects_custom
list (array)
a list of all detected custom objects and their bounding boxes.
colors_object
dict (object)
list of color codes for all detected objects in the image
faces
list (array)
a list of faces detected in the image containing age, gender and bounding boxes.
categories
list (array)
a list of general categories describing the image
brands
list (array)
a list of known brands and logo detected in the image
landmarks
list (array)
a list of known geographical landmarks found in the image.
celebrities
list (array)
a list of known public figures found in the image.
astica
dict (object)
contains request specification and transaction details
Vision API Response Output

The following output was produced by submitting this image.

{
   "astica": {
    "request": "vision",
    "requestType": "analyze",
    "modelVersion": "2.5",
    "api_qty": 30.778
  },                                                      
  "caption": {
    "text": "a dog standing on its hind legs and a person standing on the sidewalk",
    "confidence": 1
  },
  "caption_GPTS": "In the image, we see a dog standing on its hind legs on a sidewalk, reaching out to a person standing nearby. The person is wearing black sneakers and jeans. The dog appears to be a German Shepherd, and it is interacting playfully with the person. The scene takes place in a park, with green grass and trees in the background. The person is extending their hand towards the dog, possibly offering a treat or engaging in a game of fetch with a frisbee. The interaction between the dog and the person is heartwarming and playful, showcasing a bond between human and animal.",
  "caption_list": [
    {
      "text": "a dog being held by a person in a park",
      "conf": 0.333,
      "bbox": {
        "w": 190,
        "h": 461,
        "x": 166,
        "y": 0
      },
      "guid_object": "0_9aca5"
    },
    {
      "text": "the dog's face is a man's hand and a dog's leg",
      "conf": 0.133,
      "bbox": {
        "w": 190,
        "h": 461,
        "x": 166,
        "y": 0
      },
      "guid_object": "0_9aca5"
    },
    {
      "text": "the image is a german shepherd dog and a man",
      "conf": 0.133,
      "bbox": {
        "w": 256,
        "h": 413,
        "x": 304,
        "y": 45
      },
      "guid_object": "1_fe687"
    },
    {
      "text": "a dog standing on its hind legs to reach a person's hand",
      "conf": 0.133,
      "bbox": {
        "w": 256,
        "h": 413,
        "x": 304,
        "y": 45
      },
      "guid_object": "1_fe687"
    },
    {
      "text": "a dog and a person playing with a frisbee",
      "conf": 0.133,
      "bbox": {
        "w": 256,
        "h": 413,
        "x": 304,
        "y": 45
      },
      "guid_object": "1_fe687"
    },
    {
      "text": "the shoe of a person standing on a skateboard",
      "conf": 0.133,
      "bbox": {
        "w": 124,
        "h": 67,
        "x": 223,
        "y": 394
      },
      "guid_object": "2_dbc99"
    },
    {
      "text": "a person's feet and shoes on a sidewalk",
      "conf": 0.133,
      "bbox": {
        "w": 124,
        "h": 67,
        "x": 223,
        "y": 394
      },
      "guid_object": "2_dbc99"
    },
    {
      "text": "a person wearing black shoes and jeans standing on a sidewalk",
      "conf": 0.133,
      "bbox": {
        "w": 124,
        "h": 67,
        "x": 223,
        "y": 394
      },
      "guid_object": "2_dbc99"
    },
    {
      "text": "the dog's paw is a woman's hand",
      "conf": 0.133,
      "bbox": null,
      "guid_object": ""
    },
    {
      "text": "a dog standing on its hind legs to get a treat from a person",
      "conf": 0.133,
      "bbox": null,
      "guid_object": ""
    },
    {
      "text": "a dog and a person standing on a sidewalk",
      "conf": 0.133,
      "bbox": null,
      "guid_object": ""
    },
    {
      "text": "the dog's face is the same as the human",
      "conf": 0.133,
      "bbox": null,
      "guid_object": ""
    }
  ],
  "objects": [
    {
      "name": "Person",
      "conf": 0.83,
      "guid_object": "0_9aca5",
      "confidence": 0.83,
      "rectangle": {
        "x": 166,
        "y": 0,
        "w": 190,
        "h": 461
      }
    },
    {
      "name": "Cat",
      "conf": 0.63,
      "guid_object": "1_fe687",
      "confidence": 0.63,
      "rectangle": {
        "x": 304,
        "y": 45,
        "w": 256,
        "h": 413
      }
    },
    {
      "name": "Sneakers",
      "conf": 0.54,
      "guid_object": "2_dbc99",
      "confidence": 0.54,
      "rectangle": {
        "x": 223,
        "y": 394,
        "w": 124,
        "h": 67
      }
    }
  ],
  "objects_custom": [],
  "tags": [
    {
      "name": "grass",
      "confidence": 0.87
    },
    {
      "name": "plant",
      "confidence": 0.8
    },
    {
      "name": "reed",
      "confidence": 0.7
    },
    {
      "name": "sky",
      "confidence": 0.71
    },
    {
      "name": "tree",
      "confidence": 0.84
    },
    {
      "name": "weed",
      "confidence": 0.67
    },
    {
      "name": "animal",
      "confidence": 0.72
    },
    {
      "name": "floor",
      "confidence": 0.73
    },
    {
      "name": "ledge",
      "confidence": 0.68
    },
    {
      "name": "log",
      "confidence": 0.65
    },
    {
      "name": "stand",
      "confidence": 0.88
    },
    {
      "name": "blanket",
      "confidence": 0.76
    },
    {
      "name": "fur",
      "confidence": 0.73
    },
    {
      "name": "pillow",
      "confidence": 0.65
    },
    {
      "name": "curtain",
      "confidence": 0.72
    },
    {
      "name": "dress shirt",
      "confidence": 0.7
    },
    {
      "name": "selfie",
      "confidence": 0.6
    },
    {
      "name": "tie",
      "confidence": 0.84
    },
    {
      "name": "woodpecker",
      "confidence": 0.51
    },
    {
      "name": "brown",
      "confidence": 0.84
    },
    {
      "name": "brown bear",
      "confidence": 0.66
    },
    {
      "name": "enclosure",
      "confidence": 0.7
    },
    {
      "name": "stone",
      "confidence": 0.79
    },
    {
      "name": "stare",
      "confidence": 0.66
    },
    {
      "name": "walk",
      "confidence": 0.82
    },
    {
      "name": "german shepherd",
      "confidence": 0.67
    },
    {
      "name": "black",
      "confidence": 0.86
    },
    {
      "name": "catch",
      "confidence": 0.83
    },
    {
      "name": "sheepdog",
      "confidence": 0.72
    },
    {
      "name": "dog",
      "confidence": 1
    },
    {
      "name": "man",
      "confidence": 0.86
    },
    {
      "name": "park",
      "confidence": 0.89
    },
    {
      "name": "paw",
      "confidence": 0.76
    },
    {
      "name": "play",
      "confidence": 0.81
    },
    {
      "name": "shepherd",
      "confidence": 0.79
    },
    {
      "name": "woman",
      "confidence": 0.87
    }
  ],
  "colors": {
    "dominantColors": [
      "#dccbcb",
      "#2e2729",
      "#92766a"
    ],
    "accentColor": "#1c181b",
    "list": [
      "#b89789",
      "#93756a",
      "#1c181b",
      "#ece0e3",
      "#403737",
      "#ccb6b4",
      "#6d584b"
    ]
  },
  "colors_object": [
    {
      "guid_object": "0_9aca5",
      "primary": [
        "#9d8a78",
        "#282529",
        "#ddd6d9"
      ],
      "accent": "#161418",
      "list": [
        "#3a363a",
        "#bea48d",
        "#7c7163",
        "#ddd6d9",
        "#161418"
      ]
    },
    [
      {
        "guid_object": "1_fe687",
        "primary": [
          "#b1957f",
          "#443932",
          "#e9dcd9"
        ],
        "accent": "#282120",
        "list": [
          "#e9dcd9",
          "#282120",
          "#957e6b",
          "#605144",
          "#cdac93"
        ]
      }
    ],
    [
      {
        "guid_object": "2_dbc99",
        "primary": [
          "#433e37",
          "#af987d",
          "#f2dfd7"
        ],
        "accent": "#2a2727",
        "list": [
          "#c5ab92",
          "#5c5547",
          "#9a8569",
          "#f2dfd7",
          "#2a2727"
        ]
      }
    ]
  ],
  "categories": [],
  "faces": [],
  "brands": [],
  "landmarks": [],
  "celebrities": [],
  "readResult": {
    "stringIndexType": "TextElements",
    "content": "",
    "pages": [
      {
        "height": 486,
        "width": 729,
        "angle": 0,
        "pageNumber": 1,
        "words": [],
        "spans": [
          {
            "offset": 0,
            "length": 0
          }
        ],
        "lines": []
      }
    ],
    "styles": []
  },
  "moderate": {
    "isAdultContent": false,
    "isRacyContent": false,
    "isGoryContent": false,
    "adultScore": 0.01,
    "racyScore": 0.01,
    "goreScore": 0
  },  
  "metadata": {
    "width": 729,
    "height": 486,
    "format": "jpg"
  },
  "GPT_level": 0,              
  "status": "success"
}
Vision Errors
Errors Handling
Vision API Errors

The API will provide a handled error if it encounters an issue during a request to the API.

Vision API Error Format
{
    "status": "error",
    "error": "Unable to handle asticaVision v2.1_full request."
}
More About Vision AI
asticaVision Features

1. Face Detection (Age and Gender):
asticaVision API employs advanced facial detection algorithms to accurately identify and analyze faces within images. In addition to detecting the presence of faces, it can also estimate the age and gender of the individuals, providing useful demographic information for targeted marketing or personalized user experiences.

2. Object Detection:
The API is also capable of identifying and detecting a wide range of objects within images, enabling users to ascertain the presence of specific items or elements. This powerful feature can be used for applications such as inventory management, surveillance, and visual search.

3. Image Tagging and Categorization:
asticaVision API uses machine learning techniques to automatically generate descriptive tags and assign appropriate categories to images based on their visual content. This feature simplifies the process of organizing and indexing large image databases, streamlining the search and retrieval of relevant images based on specific keywords or themes.

4. Content Moderation:
The API includes intelligent content moderation capabilities designed to automatically detect and filter out images containing adult content or mature subject matter. This feature significantly reduces the manual effort required to moderate visual content, ensuring that inappropriate material is not displayed on websites, social platforms, or digital applications.

5. Automatic Image Description and Captioning:
Leveraging its advanced computer vision capabilities, asticaVision API can automatically generate descriptive text and captions for images, providing an accurate and concise summary of the visual content. This feature not only helps in enhancing the accessibility of images for visually impaired users but also aids in improving the overall user experience by offering meaningful and contextual information about the images.

astica ai Discover More AI

Experiment with different kinds of artificial intelligence. See, hear, and speak with astica.

Return to Dashboard
You can return to this page at any time.
Success Just Now
Copied to clipboard
Success Just Now
Submission has been received.
Success Just Now
Account preferences have been updated.