We get it.
The molehill of data collected from one simulation is enough to make an engineer look like an ant, but it’s doable.
However, what happens when you add a mountain of big data from the Internet of Things (IoT), the Digital Twin, DoEs, project lifecycle management (PLM) and more?
Well, according to IBM, 90 percent of the information the IoT collects will remain unused. As for the information collected from the alphabet soup of engineering tools? Much of that only results in piece-meal analytics and computation at best. Unfortunately, this inaction could leave a lot of interesting results and product innovations on the table.
“With the Big Data solution in MINESET, we are able to handle millions of records both within visualizations and from the Machine Learning perspective, in an interactive manner,” Suzana Djurcilov, Technical Marketing Manager of MINESET at ESI Group, said.
Data analytics and machine learning are the important next steps in the workflow to extracting patterns from simulation results and IoT data. They help engineers find and visualize patterns and results faster than any human could. Better yet, they give engineers the ability to discover pattern and predict the most probable behavior.
How ESI MINESET Crunches Billions of Rows of Data
Design space exploration of a fictional thermal product. (Image courtesy of ESI Group.)
So, what is MINESET? How can machine learning tools help an everyday engineer turn into a product lifecycle fortune teller?
MINESET is a web-based client server that gives engineers the ability to visualize millions of records interactively. The information parsed by the software can be as diverse as a temperature reading or a GPS location.
“The client front end is a web-based application,” Djurcilov said. “The back end is the server that provides the storage for the data and the calculations.”
MINESET uses the Spark open source library as a general engine to crunch and process large sums of data. It grants access to the numbers and provides the tools to process them. Specifically, Spark’s API is used to distribute information between clusters, while its aggregation tools are used to process data on the server to either plot, chart or aggregate the results. Finally, Spark does the heavy calculations and handles the distribution.
MINESET uses two-thirds of the figures provided to create a predictive model. The last third of the information collected is used to validate this model. This ensures that the engineers know how well their predictive models perform.
“In terms of visualization, this involves the aggregation of the data, so you could plot, for example, a line chart,” Djurcilov noted. “The server calculates what it takes to reduce the visuals and takes care of the machine learning results. There is a limitation in the browser that you can only display up to 10,000 pieces of visuals without the browser slowing down too much. Eventually you end up with a small enough amount of data to visualize it. MINESET is an application that combines tightly visualization of data and machine learning results.”
In other words, the software gives engineers the ability to clean, sample, create predictive models and visualize big data. These models can then be used to predict future outcomes in the real-world system or feed back into design.
“It’s hard to find ‘aha’ moments,” Djurcilov said. “The software can help engineers design a response curve or response surface model (RSM). It can plot out variable inputs or look at statistical covariance. It can then look into what parameters have the most control of the outcomes and which fall to the laws of diminishing returns.”
Additionally, MINESET is designed to democratize data analysis. For instance, these machine learning and analytics tools do not require any programming to execute. The user interface (UI) employs drag-and-drop methodologies to set up each assessment as well as the target variables and parameters to explore.
Once the statistics are collected and the model is completed, engineers can use the Evidence Visualizer to better understand the results in an interactive setting. These tools can show engineers how their design parameters and decisions interact with the final product. “With the Evidence Visualizer an engineer can explore and put into numbers the effects of confining the range of values for variants of a simulation, whether individually or in combination,” Djurcilov noted.
Useful Big Data Charts for Engineers
So, what tools in MINESET are the most useful when assessing the design space of a product? Well, Djurcilov is fond of parallel coordinate, decisions trees, column importance and what-if charts. These tools offer engineers a clearer path towards an optimal outcome throughout a product lifecycle.
Parallel coordinate chart (top left) shows multiple lines crossing multiple axes. Each line is a scenario in your DoE and each axis represents a parameter value or result. This tool can quickly determine where results are concentrated, correlations and which scenarios don’t meet the engineer’s criteria. (Image courtesy of ESI Group.)
Parallel coordinate charts almost look like an unraveling rope. They are very useful when looking for an optimal set of parameters from a DoE.
The chart contains multiple single vertical axes that are evenly spaced apart. Each vertical axis represents a parameter or variable outcome with a range of values.
Lines, representing each run in the DoE, cross an axis once. Each of these crossings tells the user what the parameter setting or resultant variable value is for that run.
By hovering a mouse over a line in the chart, it will become bold, standing apart from other runs in the chart. This makes it easier to see how that particular run performed.
The tool also makes it easy to see how runs are concentrated around particular parameters and outcomes. The denser a section of the axis is, the more runs gravitate to that value. This helps engineers to determine correlations between variables and outcomes.
“Parallel coordinates are also very convenient for robustness analysis,” Djurcilov noted. “When you look at your results you can highlight the failure options and understand the patterns. You can also tell MINESET to set parameters to a specific range. By setting up these filters you can drill into the data to see the runs that fill all the requirements and play around with what works or doesn’t work.”
MINESET can also create decision and regression trees, where each branch of the tree denotes a chosen parameter value experienced by runs in the DoE.
“Decision and Regression Trees are two sides of the same coin, the difference being in the type of target variable they can be applied to,” Djurcilov explained. The difference between the two charts is that decision trees use discrete numbers when regression trees can use continuous numbers. An engineer can also convert a regression tree into a decision tree by using a binning process.
This decision tree shows the user the path of parameter settings for rpm2, rpm1, Grille Loss Coefficient and rpm3 that are most likely to have favorable results with respect to a maximum temperature. The parameter values are binned to better represent and visualize the chart. (Image courtesy of ESI Group.)
The regression tree can show mini pie charts along each branch. These pie charts denote success ratios of all the runs that have the associated parameter decisions that correspond to the branch. This chart can then be used by the engineer as a quick guide to find parameter settings that trend towards more favorable outcomes, in this case a lower maximum temperature.
Based on this Column Importance chart, “param2” is the parameter of greatest importance to the purity of the final product. Param1 is the next most important followed by param6. Any other parameter doesn’t have a meaningful effect on the product purity. These parameters can now safely be ignored in future optimization calculations to save computational time. (Image courtesy of ESI Group.)
Another useful chart created by MINESET is the Column Importance. This tool is designed to help engineers determine which parameters have the strongest influence on the final performance.
“Unlike other machine learning methods that measure covariance and correlation of individual parameters to the output variable, Column Importance measures contribution from the input parameters in a cumulative way,” Djurcilov said.
“It shows which parameter is the most important, and how much it discriminates the target, followed by the second most important and the additional contribution from any other parameter,” Djurcilov added.
This tool helps give engineers focus when optimizing a process or design. It helps to act as a razor that cuts any parameter out of future analysis due to the power of diminishing returns.
“By the time you have five to six parameters, there is generally a cut-off, at which point additional parameters no longer affect the results,” Djurcilov explained. “This is an important factor in multi-domain optimization studies where you might end up with several hundred parameters. There is no way you can look and understand all those parameters and their interactions. We can tell you the handful of parameters that influence the results and the rest you can discard.”
What-if visualizer helps engineers predict which parameter settings will lead to success or failure. The chart compares the success and failure rates of the general data set with a subset based on parameter selections. (Image courtesy of ESI Group.)
Once the engineer has narrowed down their target parameters they can then use a what-if visualizer to help play around with their data and determine the distribution of output values among individual parameters. Similar to the decision and regression trees, the aim is to better determine which parameter settings will optimize the results. The benefit is that the what-if generator digs deeper and allows the user to interactively choose parameter settings.
“In the what-if visualizer the design space exploration really takes off due to the fact that you can manually explore combinations of various parameters in different ranges and see what will happen with the design,” Djurcilov said.
The what-if visualizer consists of a series of bar charts that inform the user how many runs were successful and how many have outputs that are out of target.
The top vertical bar charts are a quick guide to compare the overall success, with a subset of the data based on chosen parameter settings. This same information is represented in a horizontal bar chart. This visualization helps engineers see how the success-to-failures stack up based on the values of the targeted variable which is being used to measure success.
Below these charts are a list of parameters. For each parameter, a series of more charts are used to define how many runs failed to meet compliance based on binned parameter settings. By clicking on a bin, an engineer can select one or more parameter settings for the what-if analysis. These selections will calculate and display the changes in probabilities in the top horizontal and vertical bar chart.
In short, MINESET is a tool that can be used to crunch big data. For instance, this tool can be used to find parameter settings that can either guarantee success and failure. The tool can also be used to create a machine learning model of the system which can be fed back into the design or lifecycle of the product. The result is a tool that can better predict the system allowing for engineers to better improve the overall designs of products.
To learn more, follow this link.
ESI Group has sponsored ENGINEERING.com to write this article. All opinions are mine, except where quoted or stated otherwise. —Shawn Wasserman.