Big Data with MATLAB

Big Data with MATLAB

Explore, analyze, and develop predictive models on big data

MATLAB simplifies working with big data by accessing and integrating with your existing big data storage and adapts to your data processing needs based on available resources.

With MATLAB, you can:

  • Access big data from various storages such as traditional file systems, cloud-based storages (AWS® S3, Azure® Blob), SQL and NoSQL databases, and data platforms
  • Clean, analyze, and develop machine learning models on smaller sample data
  • Scale up and apply the same code on big data without rewriting your algorithms
  • Use processing power tailored to your needs, from your local machine, traditional HPC clusters, Spark™ clusters, and cloud data platforms

“High-performance computing with MATLAB enables us to process previously unanalyzed big data. We translate what we learn into an understanding of how human activities affect the health of ecosystems to inform responsible decisions about what humans do in the ocean and on land.”

Access Data

You can use MATLAB to read data from large collections of files, databases, data platforms, and cloud storage systems. Datastores in MATLAB let you access data that do not fit into the memory of a single computer or are distributed across multiple files. These datastores support various file formats (CSV, Parquet, MDF etc.) and storage systems (AWS S3, Azure Blob, HDFS, databases, data platforms). You can also create your own datastores for custom file formats.

Illustration shows that you can access data from CSV, Parquet, spreadsheets, databases, Databricks, Domino, and Cloudera, and cloud storages like AWS, Azure, and more.
List of functions that clean, prepare, explore, analyze, and develop machine learning models on big data.

Explore, Clean, Transform, and Develop Predictive Models

With MATLAB, you can perform data analysis and data engineering on big data efficiently. MATLAB supports predicate pushdown for Parquet files, so you can filter big data at the source. Once read, you can transform and combine data from different datastores for preprocessing and data engineering.

MATLAB tall arrays use a lazy evaluation framework, which lets you run in-memory table and timetable-based code on big data without rewriting. Tall arrays support hundreds of data manipulation, mathematical, statistical, and machine learning functions, which you can use for simple statistical analysis or developing predictive models on big data.

Integrate and Run on Your Big Data IT Infrastructure

MATLAB can help you process big data efficiently by integrating it with your existing infrastructure. You can scale up and run your MATLAB code interactively using parallel processing as well as in deployed production mode. You can deploy analytics in streaming, and batch applications royalty-free. Also, you can run your MATLAB code and models with big data on different cloud data platforms like Databricks, Domino Data Lab, and Google® BigQuery.

Illustration shows a person running MATLAB code on Databricks, Domino, AWS, Azure and other big data systems.