Main Content

sqlStatistics

Retrieve statistics about database table or SQL query

Since R2025a

    Description

    databaseStats = sqlStatistics(dataSourceInfo,source) returns a table of statistics about a database table or an SQL query by using information about the data source contained in dataSourceInfo, which can be either a connection object or a data source name. By default, sqlStatistics returns the minimum and maximum values. source contains the name of a database table or an SQL query.

    example

    databaseStats = sqlStatistics(dataSourceInfo,source,Name=Value) specifies additional options using one or more name-value arguments. For example, you can return the unique values in a database column by specifying StatsInfo="uniquevalues".

    example

    Examples

    collapse all

    This example assumes you have already stored your username and password with your data source in the Database Explorer app.

    First, specify the name of the data source and a database query. Then, retrieve the minimum and maximum values for each variable.

    dataSourceInfo = "PostgreSQL Native";
    source = "SELECT * FROM nyctaxi";
    databaseStats = sqlStatistics(dataSourceInfo,source)
    databaseStats =
    
      19×4 table
    
             VariableName          VariableType              Min                         Max           
        _______________________    ____________    ________________________    ________________________
    
        "vendorid"                  "double"       {[                   1]}    {[                   2]}
        "tpep_pickup_datetime"      "datetime"     {[01-Jun-2015 12:20:59]}    {[11-Jun-2015 05:32:45]}
        "tpep_dropoff_datetime"     "datetime"     {[01-Jun-2015 12:21:17]}    {[03-Nov-2016 15:34:38]}
        "passenger_count"           "double"       {[                   0]}    {[                   9]}
        "trip_distance"             "double"       {[                   0]}    {[            10083318]}
        "pickup_longitude"          "double"       {[            -87.8606]}    {[             85.0471]}
        "pickup_latitude"           "double"       {[                   0]}    {[             55.3178]}
        "ratecodeid"                "double"       {[                   1]}    {[                  99]}
        "store_and_fwd_flag"        "string"       {["N"                 ]}    {["Y"                 ]}
        "dropoff_longitude"         "double"       {[            -84.8394]}    {[                   0]}
        "dropoff_latitude"          "double"       {[                   0]}    {[            480.2500]}
        "payment_type"              "double"       {[                   1]}    {[                   5]}
        "fare_amount"               "double"       {[                -300]}    {[                 940]}
        "extra"                     "double"       {[             -9.6100]}    {[             20.2000]}
        "mta_tax"                   "double"       {[             -0.5000]}    {[             20.5000]}
        "tip_amount"                "double"       {[                 -80]}    {[            980.9100]}
        "tolls_amount"              "double"       {[             -5.5400]}    {[            900.9700]}
        "improvement_surcharge"     "double"       {[             -0.3000]}    {[              0.3000]}
        "total_amount"              "double"       {[                -300]}    {[            990.7100]}
    

    Specify the name of the data source, the SQL query, and the name of the database column that you want to return the unique values for. Then, retrieve the values by using the StatsInfo name-value argument.

    dataSourceInfo = "PostgreSQL Native";
    source = "SELECT * FROM nyctaxi";
    columnName = "payment_type";
    databaseStats = sqlStatistics(dataSourceInfo,source,VariableNames=columname,StatsInfo="unique")
    databaseStats =
    
      1×3 table
    
         VariableName     VariableType       Unique    
        ______________    ____________    _____________
    
        "payment_type"      "double"      {[2 1 4 3 5]}
    

    Input Arguments

    collapse all

    Data source information, specified as a connection object, string scalar, or character vector. Specifying data source information depends on the connection method:

    • Specify dataSourceInfo as connection object by using the database function and retrieving your credentials using getSecret. This method assumes you have not stored your user credentials with the data source when using the Database Explorer app. Supported connection objects include:

      • database.odbc.connection

      • database.jdbc.connection

      • database.mysql.connection

      • database.postgre.connection

      • sqlite

    • Specify dataSourceInfo as a character vector or string scalar if you have already created your data source and saved your user credentials with the Database Explorer app. Supported data sources include:

      • ODBC

      • JDBC

      • MySQL

      • PostgreSQL

    Source, specified as a string scalar or character vector, contains the database table name or SQL query.

    Name-Value Arguments

    collapse all

    Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

    Example: StatsInfo="numunique"

    Variable names in the database, specified as a string array, character vector, or cell array. You can return all variables by setting VariableNames="all".

    Statistics information, specified as a string array, character vector, or cell array. You can choose from the following values for StatsInfo:

    • "min" — Minimum value

    • "max" — Maximum value

    • "nummissing" — Number of missing values

    • "numunique" — Number of unique values

    • "unique" — List of unique values

    • "all" — All statistics

    If you do not specify StatsInfo, sqlStatistics returns the minimum and maximum values by default.

    Note

    Setting StatsInfo="all" can be time consuming for large data sets. For improved efficiency, choose the minimal set of values that you need.

    Output Arguments

    collapse all

    Database statistics, returned as a table containing the columns VariableName, VariableType, Min, and Max by default. Instead of returning the default Min and Max columns, you can use the StatsInfo name-value argument to return any subset of the following columns:

    • Min

    • Max

    • NumMissing

    • NumUnique

    • Unique

    Version History

    Introduced in R2025a