The uptake of machine learning (ML) approaches in the social and health sciences has been rather slow, among other reasons due to the importance of incorporating domain knowledge into the strategy of data analysis in the social and health sciences. This paper provides a meta-mapping of research questions in the social and health sciences to appropriate ML approaches. We map established distinctions in data science for the purposes of description, prediction, and causal inference to common research goals. Example applications to predict prison violence, assess the prevalence of non-communicable diseases, and explain adverse birth outcomes are presented. The meta-mapping should improve the understanding between computational disciplines and the social and health sciences.
A computational data-driven framework for the development of Reduced Order Models (ROMs) with application to Chemical Vapor Deposition (CVD) reactors is presented. Describing and predicting the behavior of CVD reactors requires complex high-fidelity Computational Fluid Dynamics (CFD) models with millions of unknowns, the solution of which has a significantly high computational cost, making it imperative to develop ROMs. The reduction of the dimension of the problem is based on the combination of the Method of Snapshots (MoS) which is a variant of Proper Orthogonal Decomposition (POD) and properly trained Artificial Neural Networks (ANN). Snapshots are the sequential states of the reactor during the transition from one steady-state to another after a certain disturbance on one of the controlled parameters. These are computed by a simplified, low-fidelity, and time-dependent CFD model without chemical reactions. The aim is to develop a ROM with satisfactory accuracy, based on low-fidelity data obtained at low computational cost. The snapshots are used to find the dominant eigenvectors that span the solution space, as well as to train neural networks to quickly predict the time-dependent coefficients of the ROM. The model prediction is certainly not satisfactory due to the low-fidelity data used in the development of the ROM, but it is sufficient that when fed into a complete and large-scale process model, the solution convergence occurs in significantly less time. In conclusion, a remarkable acceleration of calculations is achieved, that reduces the computational cost of parametric analysis and the general study of the prevailing physical and chemical phenomena of the process.
 P. A. Gkinis, E. D. Koronaki, A. Skouteris, I. G. Aviziotis, A. G. Boudouvis. Chemical Engineering Science 199 (2019) 371-380.
 E. D. Koronaki, P. A. Gkinis, L. Beex, S. P. A. Bordas, C. Theodoropoulos, A. G Boudouvis. Comput. & Chem. Eng. 121 (2019) 148-157.
 R. Spencer, P. A. Gkinis, E. D. Koronaki, D. I. Gerogiorgis, S. P. A. Bordas, A. G. Boudouvis, Comput. & Chem. Eng. 149 (2021) 107289.
In this presentation, we will explain very quickly many applications on which we had the opportunity to work. We will talk about steganalysis, watermark removing, myocardium segmentation after a heart attack, 1d physiological signal compression, and reconstruction, airplane protocol attack detection, autofocus with digital holography. Moreover, we will present 4 collaborations with practical applications: Faurecia, Colruyt and BMW, Firemen.
Poroelastic media are composed of deformable porous solid with viscous fluid percolating its pores. When the solid compartment undergoes large deformation, a hyperelastic model is required (Poro-Hyperelasticity) which is usually carried out via phenomenological homogenized models neglecting complex micro-macro scales interdependency. One reason is that the mathematical two-scale analysis is only straightforward assuming infinitesimal strain theory. Exploiting the potential of ANNs for fast and reliable upscaling and localization procedures, we propose an incremental numerical approach for the analysis of poroelastic media under finite deformation considering infinitesimal strain in each time increment. The Darcy’s experiment is reconstructed numerically and the mechanical response of brain tissue under uniaxial cyclic test is simulated and studied.
With the acceleration of population growth and increasing urbanization, the standard archetype of a building being used as a shelter only needs to move forward to an energy self-sufficient building. DATA4WIND project presents a solution in achieving this vision faster by relying on the urban wind power utilization. The goal of the DATA4WIND project is to assure reliable prediction of aerodynamic information on local wind flow patterns essential for urban wind power utilization introducing new approaches. One approach is based on hybrid data assimilated platform and its strategy enabling synergy between computational and experimental wind engineering. An additional track of DATA4WIND project concerns the development of a computationally less demanding approach using reduced modeling techniques that satisfy the main prerequisite – sufficient accuracy of numerical predictions.
Many materials and structures consist of numerous slender struts or fibers. Due to the manufacturing process of man-made struts (and the growth process of biological fibers), their mechanical response fluctuates from strut to strut, as well as locally within each strut. In associated mechanical models, each strut is often represented by a string of beam elements. The parameter input fields of each string are ideally such that the local fluctuations and fluctuations between individual strings are accurately captured. The aim of this contribution is to identify the parameters of the random fields from which each set of input fields is considered to be a realization. As only a few sets of input fields are available (due to time constraints of the supposed experimental techniques), this identification problem is ill-posed. A probabilistic identification approach based on Bayes’ theorem is employed to treat this ill-posedness, as well as the involved uncertainties.
Air pollution is a threat to public health, having negative effects on human health and well-being. This research seminar aims to (1) predict air pollution at a fine spatial resolution using the “land use regression” model, (2) assess population exposure to air population in Luxembourg and its surrounding areas, (3) and discuss ongoing work, opportunities and challenges for future research.
The difference between theory and practice is that in theory the two will mostly agree, in practice they will not.
CERATIZIT is a global player in high-performance material in the midst of its digitalization. ML is nowadays a pervasive technology with plenty of possible applications for an innovative industrial player. But what differentiates a successful project from a failing one? How can a researcher extract value from data instead of using valuable resources to generate data?
This talk aims at giving you an idea of some of the main applications of ML currently active at CERATIZIT and some tips to make your next industrial partnership a success.
The Bayes factor (BF) is used in Bayesian model comparison and selection. Unlike information-theoretic approaches, it implicitly penalizes the number of parameters in a model. BF can be used for both nested and non-nested models and is invariant to data transformation. Nevertheless, it is sensitive to prior parameter specifications. It may favor a different model for weak prior distributions contrary to the frequentist methods of model selection.
This phenomenon is known as Jeffreys-Lindley’s paradox. BF is undetermined when improper priors are used. However, the pseudo-Bayes (PsBF) is not affected by Jeffreys-Lindley’s paradox. Also, partial Bayes factors such as the Intrinsic Bayes factor (IBF) and the fractional Bayes factor (FBF) are determined for improper priors and are not affected by Lindley’s paradox. Thus, model selection should also report at least the PsBF. If the data set is large, the IBF and FBF should be reported. The IBF and the FBF are less sensitive to outliers.
I will introduce the research and show results based on synthetic data. Then, explain how this will be applied to (Eco)hydrological models with real data.
In chemistry and physics, the employment of machine learning (ML) methods has a transformative impact, advancing modeling and improving our understanding of complex molecules and materials. Each ML method comprises a mathematically well-defined procedure, and an increasingly larger number of easy-to-use ML packages for modeling atomistic systems are becoming available. While current approaches mainly focus on developing/improving ML models’ architecture, training sets did not get enough attention. However, training sets are keys to the performance of any ML model, determining its applicability range and predictive power. In this talk, I will address an inherent bias of the reference data caused by their nonuniform nature. On examples of ML force fields trained to reproduce the potential energy surface of molecules, I will demonstrate that the commonly employed measures of the quality of ML models, such as root mean square error, do not provide a full picture. Finally, I will show how combining unsupervised and supervised ML methods can effectively widen the applicability range of ML models to the fullest capabilities of the dataset.
Privacy & Cookies Policy
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.