Motivation
One of the most critical pieces of the puzzle that goes into optimizing oil and gas field productivity is the analysis of historical production data. Robust production data analysis techniques, such as decline curve analysis (DCA), assist production engineers with assessing well productivity–related metrics such as decline rates, average oil and gas production rates, and cumulative production. More importantly, these techniques enable forecasting oil and gas production to accurately evaluate the current financial performance and longterm viability of an asset.
Production data analysis techniques rely on mathematical models based on well performance parameters to fit historical oil and gas production data via regression analysis. DCA, for instance, uses mathematical functions that exhibit exponentiallike decline. The Arps equation is arguably the most popular and broadly adopted DCA approach in the oil and gas industry. Like most DCA models, Arps brings several important advantages over more complex techniques like ratetransient analysis. First, DCA is computationally inexpensive and easy to implement. Second, DCA features a small number of parameters, thus regression results are easier to interpret. Third, DCA can handle large data sets, such as dense historical production records from single or multiple wells. However, just like with any other model, the quality of input data is critical to obtaining a physically valid DCA model for data analysis, production forecasting, and petroleum economics.
In this paper, we explain how to use MATLAB^{®} to develop and deploy a DCAbased data analysis workflow for oil and gas production data. We discuss how to format and preprocess historical production data sets, create custom regression models, generate a production forecast, and perform a petroleum economics analysis. Our main objective is to demonstrate how MATLAB can help encapsulate and address complexities involved in the workflow by following simple yet robust software development practices. For demonstration purposes, we will use historical oil and gas production reports publicly available on the Texas Railroad Commission (TRRC) website.
Production Data Analysis Web App
We will describe the main steps in the workflow to perform oil and gas production data analysis in MATLAB, along with some best practices on software development and implementation useful for designing and customizing more reliable and maintainable software applications for oil and gas production.
Even though our app is deployed online, end users can interact with the web app in the same way as any other traditional desktop application. Figure 1 shows the Import tab in the app for dragging and dropping historical production records. Once a data set is added, the app performs various data preprocessing operations and plots the resulting data on the corresponding plots. The processes associated with data import and data preprocessing are explained in detail in the Data Import and Data Preprocessing sections, respectively.
After importing the data, users can perform DCA regression analysis. Figure 2 shows the Regression tab in the app to adjust the regression settings and choose whether to use average production rate or cumulative production data of a particular fluid phase, either oil or gas, to fit the model. Note that the app also allows the user to pick the start and end points for the regression window, making the analysis more flexible to create DCA regressions for specific timeframes of interest to assess production behavior, for instance, before and after a well intervention operation. Once all parameters are defined, users can click on the Fit Data option for the underlying optimization algorithm to estimate the set of DCA parameters that best fit the data set. The app then displays a summary of the parameters used along with the coefficient of determination (R^{2}) for each fluid phase. We provide more details of the DCA process in the Decline Curve Analysis section.
To create an oil and gas output projection based on the DCA model, users only need to define how many years to forecast ahead of time, and the app calculates average production rates and total production for both fluid phases separately. As seen in Figure 3, the app allows the user to specify the predicted time in years using a slider or a numeric text field. As the preferred forecast time option is modified, average output rates and cumulative production graphs are automatically updated.
Performing petroleum economic analysis based on production forecast data is the final stage of this production data analysis workflow. Petroleum economics is used by asset managers to make wellinformed decisions at every stage, from fieldscale analysis to individual well analysis. To create a new forecast in the app, the data is instantly transferred to the Economics tab, where the user may modify parameters such as taxes, capital expenditure (CAPEX), operating expenses (OPEX), and oil and gas prices. Next, the app automatically updates the net present value (NPV) curve on the graph by internally computing the NPV, breakeven point, recession point (if applicable), internal rate of return, and return on investment. Figure 4 shows an example NPV curve that crosses from negative to positive (breakeven point), achieves a maximum value (maximum NPV), and then starts to decay, suggesting that the well will be too expensive to operate from that point forward. The app also shows the highest predicted NPV along with an estimated date of realization. We discuss these implementation details in the Economic Analysis section.
Let’s review some software development capabilities in MATLAB used in the web app we developed to execute the oil and gas production data analysis workflow.
Software Architecture
You probably noticed in the previous section that the app’s components exhibit a linear relationship, where one component’s input depends on another component’s output. Take note of how, for example, the DCA tab depends on the data that the Import tab provides, which, at the same time, depends on the data set that the user provides. The Economics tab is dependent on the production forecast supplied by the Forecast tab, just as the Forecast tab is dependent on the DCA model provided by the DCA tab. This is a classic example of the pipes and filters software architecture pattern. Given this wellknown software architecture, we base the application logic on its design pattern concepts. Table 1 describes the components, or filters, along with their responsibilities, inputs, and outputs.
Component  Responsibility  Inputs  Outputs 
Data Import 



Data Preprocessing 



Decline Curve Analysis 



Production Forecasting 



Economic Analysis 



Apart from these components, we established two overarching modules: Analysis and Results. These assembly elements jointly cover the primary duties of a given part of the process. Figure 5 depicts the software architecture in discussion.
The pipes and filters architecture offers several advantages such as the ability to break complex workflows into independent, cooperative components, making the software easy to debug and maintain. Additionally, you can expand the functionality of your program by adding new components, the only requirement is that new components should have application programming interfaces (API) that are compatible with your pipes or, more formally, software connectors. One way to extend your Results subsystem, for example, would be to include a Report component that uses MATLAB Report Generator™ to generate reports that are specifically designed with production engineers or asset managers in mind.
Sample Production Data
Using the TRRC’s Public GIS Viewer tool, we generated a population of 200 wells that were randomly picked throughout the Eagle Ford Shale’s black oil window in South Texas. Next, we created the bubble plot for cumulative oil production depicted in Figure 6 using Mapping Toolbox™.
The average daily oil output from Karnes County wells from this population, producing oil from the Lower Eagle Ford formation, is displayed in Figure 7. Take note of the daily oil production’s rate decrease, which closely resembles a power law decay—exactly what Arps DCA is intended to describe.
For the production data analysis method covered in this article, we randomly selected a well from the full population of wells previously mentioned. To maintain its anonymity, we will simply refer to this well as the “sample well” throughout this article. Well metadata such as operator, location, etc., will not be disclosed.
Production Data Analysis Workflow
The previous section laid the foundation for the production data analysis workflow adopted in our web app. This section concentrates on the software implementation of the components mentioned in the Software Architecture section, emphasizing how to use MATLAB to construct the functionality required for every component. Assuming users have a fundamental knowledge of computer programming, we have included links in this paper to additional resources with further details on these steps.
The workflow is divided into five different subsections. In Data Import, we describe the process of importing TRRC data sets into MATLAB and some of the challenges associated with this specific type of data structure. In Data Preprocessing, we explore the different options available in MATLAB for data cleaning and outlier detection and discuss how to automate this process. The definition of bespoke regression models in MATLAB, such as Arps DCA, and their application for production forecasting are covered in the subsections on Decline Curve Analysis and Production Forecasting, respectively. Finally, we cover how to utilize userdefined economic indicators to get financial insights from oil and gas production forecasts in Economic Analysis.
Data Import
Working with typical data types such as Excel^{®} spreadsheets, commaseparated value (CSV) files, image files, audio and video files, and many others is made easier by the robust data import capabilities of MATLAB. Although we focus on locally stored data, we want to highlight that MATLAB comes with builtin interfaces to cloud services such as Amazon^{®} Web Services, Microsoft^{®} Azure^{®}, and Google Cloud Platform™ (see Using MATLAB and Simulink in the Cloud), as well as relational and NoSQL databases via Database Toolbox™.
Preparing a Sample Well Data Set
Our sample well’s oil and gas production record is kept in a local CSV file that follows the TRRC template’s formatting. Although tables from CSV files can be read and imported into MATLAB directly, this specific template comes with some difficulties. First, there is a mixture of numerical data (such as net oil and gas output) and metadata (including the operator’s name, location, etc.). Second, certain data points are missing and are labeled as NO RPT
. Finally, the datetime format is a bit unusual since it does not indicate the exact day that a particular report was recorded. A screenshot of the sample well’s data set is displayed in Figure 8.
Despite these difficulties, MATLAB is still able to process this data set through the Import Tool. This is a graphical application that allows you to further customize the import parameters by indicating the data types that correspond to each column, defining the data range of interest, and selecting whether to include rows that have missing data entries.
Here, we began by selecting Number
as the appropriate data type and highlighting the oil and gas production data from the table. Next, we defined a custom datetime type for the values in Column 1 for MATLAB to correctly read and process report dates. After doing this, we renamed Columns 1, 2, and 4 as reportDates
, netOil
, and netGas
, respectively.
Creating a Custom Import Function
After following the necessary steps to prepare and correctly import the sample well’s data set, we used code generation features in MATLAB to create the function wellDataParser
, which automatically fulfills the data preparation steps described earlier. The wellDataParser
function can be used to import TRRC data sets from CSV files directly from the MATLAB command line. This function returns a table containing raw data that can be used in the analysis that follows, just as described in the Software Architecture section.
Since wellDataParser
is created as a whitebox function, you can modify it to support more functionalities. We, for example, calculated the last day of a given report date and refactored all report dates in reportDates
, as shown in Figure 9.
Pro Tip: You can use datastores in conjunction with Parallel Computing Toolbox™ to import hundreds, or even thousands, of TRRC data sets concurrently.
Data Preprocessing
As we saw in the last section, data sets occasionally contain invalid or missing elements, such as Inf
or NaN
values. These can be readily removed by applying the builtin MATLAB function rmmissing
to the data set. But data can also contain anomalies that are hard to identify using visual inspection.
Cleaning a Production Data Set
Fortunately, inconsistent data can be cleaned out using the Data Cleaner app. This app explores and visualizes the data, and it defines rules for outlier detection. Figure 10 displays a snapshot of the Data Cleaner app with a visual analysis of netOil
and netGas
from our sample well. After defining the actions required to purge our data collection, we used the Data Cleaner app to create the custom function wellDataCleaner
, which contains the preprocessing actions we had performed in the app.
Data Augmentation
The data set is clean at this point. The next step is to define the variables to perform DCA regression analysis, namely operative time (i.e., flowing time), average daily oil and gas rates, and cumulative oil and gas production. Since wellDataCleaner
is a whitebox function, we added the necessary code to calculate the missing features flowTime
, oilRate
, gasRate
, cumOil
, and cumGas
, and insert them into the output table to conform with the requirement from the Software Architecture plan. Figure 11 shows the preprocessed table for our sample well.
Progress on Software Architecture
The development of the production data analysis pipeline is illustrated in Figure 12, where the custom MATLAB functions wellDataParser
and wellDataCleaner
denote the software components for data import and data preprocessing. The blue arrow in this illustration represents the raw table, which is the result of the former function and is used as input by the latter. In MATLAB, the relationship between these two parts can be achieved as follows:
>> well = wellDataParser("sampleWellData.csv"); >> well = wellDataCleaner(well);
The code above clearly illustrates the pipe connecting the data import and data preprocessing components. Observe that wellDataCleaner
effectively utilizes memory by modifying the well object in place rather than returning a new one.
Pro Tip: MATLAB provides numerous examples for using the Data Cleaner app. Please refer to the MATLAB Documentation for more information.
Decline Curve Analysis
Many practitioners view Arps equations as the standard model for DCA in oil and gas production data analysis. Its most popular equation is the ratebased hyperbolic decline that describes the decay of average daily production over time as a function of three regression parameters—initial rate \( q_{i} \) (volume/day), decline rate \( D \) (1/day), and \( b \)factor—and with respect to a reference time \( t_{i} \) (days). This equation is defined as follows:
\( q(t; t_{i}) = \frac{q_{i}}{[1 \ + \ bD(t \  \ t_{i})]^{\frac{1}{b}}} \)
where \( q(t; t_{i}) \) represents production rate at \( t > t_{i} \). The second equation related to cumulative production is obtained after integrating Equation 1 over time, from reference time \( t_{i} \) to time \(t\) > \( t_{i} \). As a result, the productionbased Arps equation is defined as follows:
\( Q(t; t_{i}) = Q_{i} + \frac{q_{i}^{b}}{D(b \  \ 1)}(q^{1b}(t; t_{i})  q_{i}^{1b}) \)
where \( Q(t;t_{i}) \) represent cumulative production at \( t>t_{i} \). Observe that Equation 2 incorporates the starting cumulative production \( Q(t_{i}) ≡ Q_{i} \) as a parameter. This parameter can be obtained from the production record of a well or defined as an extra regression parameter. Table 2 summarizes the units we adopted for the DCA analysis.
Fluid  Production Rate  Cumulative Production 
Oil  Barrels per day (bbl/day) 
Barrels (bbl) 
Gas  Million standard cubic feet per day (MMscf/day) 
Billion standard cubic feet (Bcf) 
Creating DCA Regression Models
We created bespoke DCA regression models based on Equations 1 and 2 for each fluid phase independently using Curve Fitter from Curve Fitting Toolbox™. Here, assuming that the well
table is in your workspace, we will walk you through the four stages we used to develop a ratebased DCA regression for average daily oil output:
 Step 1: Import
well
and assign x and y data towell.flowTime
andwell.oilRate
, respectively. Notice that oil rate peaks then starts to decrease at about 120 days, therefore you should not include any data before this point.  Step 2: Use the Custom Equation option to create the ratebased DCA model. MATLAB will attempt to perform the regression the right way, which will likely fail due to convergence issues caused by one or more regression parameters getting too close to zero or becoming negative or extremely large. Resolve this issue by defining physically meaningful boundaries for each parameter. A good initial guess for each parameter will also help the algorithm converge faster.
 Step 3: Check the summary statistics of the operation and make sure that the resulting regression parameters are reasonable and within the specified bounds. MATLAB will show their values and their 95% confidence bounds. The resulting square sum of errors, R^{2}, and root mean square error (RMSE) give you a better understanding of the goodness of the regression. MATLAB refers to this summary statistics as the goodness of fit (
gof
).  Step 4: Use the Export option to generate a function that contains the resulting regression model along with the
gof
statistics.
To create the remaining DCA regressions, we went through the same steps again. Consequently, we were able to create and modify the functions getDCAParamRate
and getDCAParamProd
for running DCA regressions for the oil and gas phases, respectively, based on Equations 1 and 2. A screengrab of the Curve Fitter app indicating where to start a specific step is shown in Figure 13.
The regression parameters and \( R^{2} \) for every DCA regression and fluid phase in our sample well are listed in Table 3. Since cumulative output data is generally smoother than average daily rates, it is not surprising that productionbased DCA produces the highest \( R^{2} \) scores.
RateBased DCA  ProductionBased DCA  
Oil  Gas  Oil  Gas  
\( Q_{i} \)  192095  84475.5  192095  84475.5 
\( q_{i} \)  1530.6  984.06  1098.68  698.46 
\( b \)  1.1398  1.279  0.9388  0.7806 
\( D \)  0.0089  0.004  0.0045  0.0015 
\( R^{2} \) score  0.945  0.784  0.999  0.997 
We designed getDCAParamRate
and getDCAParamProd
to have the sole responsibility of computing the DCA parameters for Equations 1 and 2. They are, however, unable to forecast cumulative productivity or average daily output. Therefore, it is necessary to operationalize DCA models by developing MATLAB functions that can accept the DCA regression parameters as inputs and output the required result, such as flow rate or volume.
To this end, we added two functions to our architecture, flowRate(t,dcaParam)
and cumulProd(t,dcaParam)
, where t
is either a positive scalar or a vector of positive scalars, and dcaParam
is a MATLAB structure that holds \( q_{i} \), \(Q_{i} \), \( b \), \(D \), and \( t_{i} \). These two functions are utility functions—meaning that they are not necessarily dependent on a specific data set but instead depend on DCA regression parameters. Henceforth, both flowRate
and cumulProd
can be packaged together into an independent utility package.
We used these two functions to compare DCA regression for our data set, both productionbased and ratebased. A comparison of productionbased and ratebased DCA methods for examining average daily oil and gas output is presented in Figures 14a and 14b. Both regression models perform well in estimating production rates, producing outcomes that are quite comparable.
Figures 15a and 15b compare predicted cumulative oil and gas production, respectively, against reported historical production. Like with average daily rates, both DCA models do a good job with predicting cumulative production while exhibiting a very similar outcome.
Progress on Software Architecture
Figure 16 illustrates the workflow’s current state. The addition of a utility software component, which the DCA component uses to calculate expected average rate and cumulative production numbers, is a key point to make here.
We had the option to include this utility package right within the component. However, given maintainability and responsibilitydriven design, we chose to keep them apart. Maintainability was significantly increased by this approach because components can be built and maintained independently so long as their function signatures do not change. Likewise, we added flexibility so that future additions of other DCA models can be made without disrupting the workflow.
Additionally, keeping utility functions separate makes the code more readable in general, so if someone else wants to work on it, they will know exactly what each part is meant to perform, cutting down on coding time.
A modular software design, such as the one suggested here, is far easier to maintain and grow than cramming hundreds or even thousands of lines of code into “god” functions that are challenging to test and manage, even though doing this may seem more expensive upfront.
Production Forecasting
For this software component, we leveraged regression parameters derived from the DCA component to estimate future oil and gas production using the utility functions flowRate
and cumulProd
discussed in the previous section. Since all the functions needed to generate a production forecast were now available, we had to write the code for invoking them.
In doing so, we created the function generateForecast
, which takes well
DCA regression parameters for each fluid and requests forecast time as input arguments, generating a full production forecast by performing the following internal tasks:
 Generating a vector of future flow times
 Generating a vector of future dates
 Calculating oil and gas decline rates
 Computing cumulative oil and gas production
As a result, generateForecast
returns a new table with the same format as the original input table, except that production metrics are forwardlooking values generated with the DCA model.
The following subsection describes how generateForecast
works. We assumed that the production forecast begins one calendar month after the last reported date and will be reported monthly up until it reaches the intended outlook time.
Generating a Production Forecast
The resulting oil and gas prediction for our sample well for the next 50 years, beginning on January 31, 2024, is displayed in Figure 17. Compared to its ratebased DCA equivalent, the productionbased DCA model forecasts a slightly steeper decrease in oil output over time, as seen in Figure 17a. However, Figure 17b shows that the difference in projections appears to be more pronounced for average daily gas output.
The difference between forecast models becomes more significant for cumulative oil and gas production, as shown in Figures 18a and 18b. Note that the difference in predicted cumulative production is over 50,000 barrels of oil and 300 million standard cubic feet of gas, which can introduce uncertainties in the economic evaluation of the prospect.
Given that different models may produce different production estimates, it is crucial to experiment with different DCA models and select the one that most closely matches the production behavior of the field being studied. One way to make this decision is to leverage production data from existing wells to facilitate the comparison of models. To this end, you may consider adding more DCA models to the DCA component, but doing so will increase the complexity of your functions. To avoid this, you can instead resort to an objectoriented design where each object corresponds to a DCA model. This way, you increase your system’s maintainability and reliability as these objects can be managed separately, just like the pipes and filters software architecture itself.
Progress on Software Architecture
Figure 19 illustrates our progress with our software architecture up until this point. As we mentioned earlier, the production forecast component leverages the DCA utility package created in the previous section to generate an oil and gas production outlook.
Economic Analysis
The last step in our workflow is to translate a production forecast into financial metrics. Asset managers rely on this kind of analysis, commonly referred to as petroleum economics, to make decisions about keeping wells online or divesting them. Investors use this analysis to evaluate the viability of purchasing existing wells. However, the trustworthiness of any petroleum economics analysis strongly depends on the accuracy of the production forecast and associated financial metrics.
Example Evaluation Case
Suppose that an investor is interested in purchasing our sample well. The asset owner agrees to sell it for $1.8 million (USD). Your goal is to determine whether this investment is financially sound. To help you make a better decision, the owner gives you some additional information:
 Well operation cost (OPEX): $2,000/month
 Oil production tax: 36%
 Gas production tax: 16%
In addition, your finance department recommends considering a discount rate of 10% and average oil and gas prices of $80/barrel and $2.78/thousand cubic feet, respectively, for the analysis. Furthermore, you would like to introduce variability in oil and gas prices to better understand the resilience of your investment when facing tough market fluctuations.
We included the price variability feature in the MATLAB function econAnalysis
, which we created to carry out the entire economic analysis. This function takes a forecast table and a MATLAB structure containing the required financial parameters (e.g., discount rates, CAPEX, OPEX, tax rates, etc.) and calculates net and cumulative cash flow, breakeven and recession points, and internal rate of return (IRR). We relied on xirr
from Financial Toolbox™ to compute IRR.
Petroleum Economics Results
Figure 21 displays the resulting NPV for this project, from which you can draw several conclusions. First, if you decide to move forward, you could break even by or around February 11, 2026. IRR is another great indicator of the health of your project. Another interesting observation is that you will make close to $5 million USD if you hold onto this project, at least, until December 2053—that is, for almost 30 years. Be aware this analysis does not consider any future expenses related to maintenance, increase in operational costs, etc., so the optimistic return may be misleading. However, the NPV curve shows that you could make about $3 million USD by December 31, 2031, that is, eight years into your project, which would represent a 166.7% return on your investment.
Progress on Software Architecture
The completion of the economic analysis component marks the completion of the production data analysis workflow. As shown in Figure 22, the architecture remained unchanged at this stage as the petroleum economics analysis function did not require developing or integrating any external utility components. Nonetheless, the flexibility to add complementary modules and utility packages remains.
Summary
In this paper, we developed a pipes and filters software architecture to automate the analysis of oil and gas production data. We dissected and thoroughly detailed each step of the workflow, showing how MATLAB allowed us to carry out intricate tasks that would have required hours of coding in a separate programming language.
We also demonstrated how to automatically create MATLAB functions that can complete complex tasks with a single function call. We also talked about the benefits of having a modularized software design in terms of maintainability and reliability, while also noting that additional data formats, cleaning techniques, DCA models, and other utility packages for creating plots and figures might be required in the future.
Finally, it is worth noting that we purposefully decided not to discuss advanced software design practices, such as objectoriented design, as this article is intended as introductory material. MATLAB is an objectoriented programming language, nonetheless, so you have absolute freedom to use the learnings from this article to create software architectures tailored for specific use cases like cloud deployment and C/C++ code generation.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
 América Latina (Español)
 Canada (English)
 United States (English)
Europe
 Belgium (English)
 Denmark (English)
 Deutschland (Deutsch)
 España (Español)
 Finland (English)
 France (Français)
 Ireland (English)
 Italia (Italiano)
 Luxembourg (English)
 Netherlands (English)
 Norway (English)
 Österreich (Deutsch)
 Portugal (English)
 Sweden (English)
 Switzerland
 United Kingdom (English)
Asia Pacific
 Australia (English)
 India (English)
 New Zealand (English)
 中国
 日本Japanese (日本語)
 한국Korean (한국어)