Estás en: Inicio > Página Raíz Instituto Mixto Universidad Carlos III de Madrid - Banco de Santander en Big Data Financiero (IC3BS) > Research > articles



Title Authors Journal Abstract
Collaborative hierarchy maintains cooperation in asymmetric games Antonioni, A., Pereda, M., Cronin, K.A., Tomassini, M. and Sánchez, A. Scientific Reports Vol 8, Article 5375 (2018) Data from experiments carried out in our group shows that there are no detrimental effects of the hierarchy formed based on group performance, yet when ranking is assigned individually we observe a decrease in cooperation. This arises because subjects interpret rankings as a reputation which carries information about which subjects were cooperators in the previous phase
Unsupervised scalable statistical method for identifying influential users in online social networks Azcorra, A., Chiroque, L.F., Cuevas, R., Fernández Anta, A., Laniado, H., Lillo, R.E., Romo, J. and Sguera, C. Scientific Reports Vol 8, Article 6955 (2018) Billions of users interact intensively every day via Online Social Networks (OSNs) such as Facebook, Twitter, or Google+. This makes OSNs an invaluable source of information, and channel of actuation, for sectors like advertising, marketing, or politics. To get the most of OSNs, analysts need to identify influential users that can be leveraged for promoting products, distributing messages, or improving the image of companies. In this report we propose a new unsupervised method, Massive Unsupervised Outlier Detection (MUOD), based on outliers detection, for providing support in the identification of influential users. MUOD is scalable, and can hence be used in large OSNs. Moreover, it labels the outliers as of shape, magnitude, or amplitude, depending of their features. This allows classifying the outlier users in multiple different classes, which are likely to include different types of influential users. Applying MUOD to a subset of roughly 400 million Google+ users, it has allowed identifying and discriminating automatically sets of outlier users, which present features associated to different definitions of influential users, like capacity to attract engagement, capacity to attract a large number of followers, or high infection capacity.
Adding levels of complexity enhances robustness and evolvability in a multilevel genotype-phenotype map Catalá,P., Wagner, A., Manrubia, S. and Cuesta, J.A. Journal of the Royal Society Interface Vol 15, 20170516 (2018) Using a toy model (toyLIFE) we are able to show that, adding levels of complexity on a nonrobust-to-mutation protein model (the HP model), renders the model highly robust. Since robustness and evolvabilty are two sides of the same coin, our results show that complexity enhances evolutionary adaptation.
Quantitative account of social interactions in a mental health care ecosystem: cooperation, trust and collective action Cigarini, A., Vicens, J., Duch, J., Sánchez, A. and Perelló, J. Scientific Reports Vol 8, Article 3794 (2018) In this paper we use data obtained from game theory experiments designed in collaboration with all the collectives in the mental health ecosystem to shed light on different aspects of mental health care. In particular, we show that people with different diagnoses behave in different manners in a suite of games, and we also identify the main characteristics that have to be fostered in order to resocialize them
Goodness of fit test for the funtional linear model based on randomly projected empirical processed Cuesta-Albertos, J.A., García-Portugués, E., Febrero-Bande, M., and González-Manteiga, W. Annals of Statistics (2018) We consider marked empirical processes indexed by a randomly projected functional covariate to construct goodness-of-fit tests for the functional linear model with scalar response. The test statistics are built from continuous functionals over the projected process, resulting in computationally efficient tests that exhibit root-n convergence rates and circumvent the curse of dimensionality. The weak convergence of the empirical process is obtained conditionally on a random direction, whilst the almost surely equivalence between the testing for significance expressed on the original and on the projected functional covariate is proved. The finite sample properties of the tests are illustrated in a simulation study for a variety of linear models, underlying processes, and alternatives. The software provided implements the tests and allows the replication of simulations and data applications.
Equal status in Ultimatum Games promotes rational sharing Han, X., Cao, S., Bao, J.Z., Wang, W.X., Zang, B., Gao, Z.Y. and Sánchez, A. Scientific Reports Vol 8, Article 1222 (2018) These are the first experiments on Ultimatum Games on networks, designed to check predictions from earlier simulation models. Subjects take part in the game in the two roles, proposer and responder. Our data show that that status equality promotes rational sharing while the influence of structure leads to fairer offers compared to well-mixed populations. This is in stark contrast with the simulation predictions and highlights the relevance of having high quality data available to understand human behavior.
Cooperation on dynamic networks within an uncertain reputation environment Lozano, P., Antonioni, A., Tomassini, M. and Sánchez, A. Scientific Reports Vol 8, Article 9093 (2018) When reputation can be faked, cooperation can still be maintained at the expense of honest subjects who are deceived by the dishonest ones. We present here an agent-based simulation model inspired by, and calibrated against, the data obtained in the experiment. The results show that the collective behavior is qualitatively similar in larger systems and stable over longer times horizons.
Applying automatic text-based detection of deceptive language to police reports: extracting behavioral patterns from a multi-step classification model to understand how we lie to the Police Quijano Sánchez, L., Liberatore, F., Camacho-Collados, J. and Camacho-Collados, M. Knowledge-Based Systems. Vol 149 pp 155-168 (2018) Filing a false police report is a crime that has dire consequences on both the individual and the system. In fact, it may be charged as a misdemeanor or a felony. For the society, a false report results in the loss of police resources and contamination of police databases used to carry out investigations and assessing the risk of crime in a territory. In this research, we present VeriPol, a model for the detection of false robbery reports based solely on their text. This tool, developed in collaboration with the Spanish National Police, combines Natural Language Processing and Machine Learning methods in a decision support system that provides police officers the probability that a given report is false. VeriPol has been tested on more than 1,000 reports from 2015 provided by the Spanish National Police. Empirical results show that it is extremely effective in discriminating between false and true reports with a success rate of more than 91%, improving by more than 15% the accuracy of expert police officers on the same dataset. The underlying classification model can be analysed to extract patterns and insights showing how people lie to the police (as well as how to get away with false reporting). In general, the more details provided in the report, the more likely it is to be honest. Finally, a pilot study carried out in June 2017 has demonstrated the usefulness of VeriPol on the field.
Physics of human cooperation: experimental evidence and theoretical models Sánchez, A. Journal of Statistical Mechanics: Theory and Experiment.Vol 2018, 024001 (2018) In this paper, the available data on experiments about cooperation on networks is reviewed and discussed critically. One of the main points of the paper is that progress can only be made by contrasting theoretical models with the available data and generating specifically designed data from additional experiments


Title Authors Journal Abstract
Copying@ Scale: Using harvesting accounts for collecting correct answers in a MOOC Alexandron, G., Ruipérez-Valiente, J. A., Chen, Z., Muñoz-Merino, P.J., & Pritchard, D. E Computers & Education, Vol 108, pp 96-114 (2017) This paper presents a detailed study of a form of academic dishonesty that involves the use of multiple accounts for harvesting solutions in a Massive Open Online Course (MOOC). It is termed CAMEO – Copying Answers using Multiple Existence Online. The detection of CAMEO is done using educational data mining. The study has three main goals: determining the prevalence of CAMEO, studying its detailed characteristics, and inferring the motivation(s) for using it.
Improving the Graphical Lasso Estimation for the Precision Matrix Through Roots of the Sample Covariance Matrix Avagyan, V., Alonso,A.M., and Nogales, F.J. Journal of Computational and Graphical Statistics Vol 26, Issue 4, pp 865-872 (2017) In this article, we focus on the estimation of a high-dimensional inverse covariance (i.e., precision) matrix. We propose a simple improvement of the graphical Lasso (glasso) framework that is able to attain better statistical performance without increasing significantly the computational cost. The proposed improvement is based on computing a root of the sample covariance matrix to reduce the spread of the associated eigenvalues. Through extensive numerical results, using both simulated and real datasets, we show that the proposed modification improves the glasso procedure. Our results reveal that the square-root improvement can be a reasonable choice in practice.
High-fat diet induces metabolic changes and reduces oxidative stress in female mouse hearts Barba, I., Miró-Casas, E., Torrecilla, J.L., Pladevall, E., Tejedor, S., Sebastián-Pérez , R., Ruiz-Meana, M., Berrendero, J.R. , Cuevas,A. and García-Dorado, D. Journal of Nutritional Biochemistry, Vol. 40, Pages 187-193 (2017) In this work, we study the differences induced by sex and diet in the metabolic phenotype and mitochondrial function of mice and their relation to cardiac events. The methodology includes the use of variable selection techniques with nuclear magnetic resonance spectra in order to detect relevant metabolites and improves the classification performance.
An Efficient Industrial Big-data Engine Basanta-Val, P. IEEE Transactions, on Industrial Informatics Vol. PP Issue: 99, (2017) Current trends in industrial systems opt for the use of different big-data engines as a mean to process huge amounts of data that cannot be processed with an ordinary infrastructure. The number of issues an industrial infrastructure has to face is large and includes challenges such as the definition of different efficient architecture setups for different applications, and the definition of specific models for industrial analytics. In this context, the article explores the development of a medium size big-data engine (i.e. implementation) able to improve performance in map-reduce computing by splitting the analytic into different segments that may be processed by the engine in parallel using a hierarchical model.
Patterns for Distributed Real-Time Stream Processing Basanta-Val, P., Fernández-García, N., Sánchez-Fernández,L. and Arias-Fisteus, J. IEEE Transactions on Parallel and Distributed Systems, Vol. 28, Issue: 11 (2017) In recent years, big data systems have become an active area of research and development. Stream processing is one of the potential application scenarios of big data systems where the goal is to process a continuous, high velocity flow of information items. High frequency trading (HFT) in stock markets or trending topic detection in Twitter are some examples of stream processing applications. In some cases (like, for instance, in HFT), these applications have end-to-end quality-of-service requirements and may benefit from the usage of real-time techniques. Taking this into account, the present article analyzes, from the point of view of real-time systems, a set of patterns that can be used when implementing a stream processing application. For each pattern, we discuss its advantages and disadvantages, as well as its impact in application performance, measured as response time, maximum input frequency and changes in utilization demands due to the pattern.
Predictable remote invocations for distributed stream processing Basanta-Val, P., Fernández-García, N. and Sánchez-Fernández,L. Future Generation , DOI. 10.1016/j.future.2017.08.023, (2017) Typical infrastructure for big-data includes multiple machines with data accessed remotely with request–response patterns from different remote locations. Currently, most of the state-of-the-art remote invocation techniques are focused on models for distributed interactions, which have not explored the advantages given by parallel computing, such as those offered to run on distributed stream processors. In this context, the article is focused on the definition of a predictable remote procedure call (RPC) able to take advantage from the distributed stream processing technology.
Distance-weighted discrimination of face images for gender classification Benito, M., García-Portugués , E., Marron, J. S. and Peña, D. JStat Vol 6, pp 231–240 (2017) We illustrate the advantages of distance-weighted discrimination for classification and feature extraction in a highdimension low sample size (HDLSS) situation. The HDLSS context is a gender classification problem of face images in which the dimension of the data is several orders of magnitude larger than the sample size. We compare distance-weighted discrimination with Fisher’s linear discriminant, support vector machines and principal component analysis by exploring their classification interpretation through insightful visuanimations and by examining the classifiers’ discriminant errors. This analysis enables us to make new contributions to the understanding of the drivers of human discrimination between men and women.
On the use of reproducing kernel Hilbert spaces in functional classification Berrendero, J.R., Cuevas, A. and Torrecilla, J.L. Journal of the American Statistical Association, DOI: 10.1080/01621459.2017.1320287, (2017) This paper provides: (a) Explicit expressions for the optimal (Bayes) rule in several classification problems of equivalent Gaussian processes. (b) An interpretation, in terms of mutual singularity, for the “near perfect classification” phenomenon described by Delaigle and Hall (2012) and an asymptotically optimal rule under singularity. (c) As an application, we propose a natural variable selection method and discuss the conditions for optimality. The approach relies on some classical results in the RKHS theory.
Modelling Electricity Swaps with Stochastic Forward Premium Models Blanco, I., Peña, J.I. and Rodriguez r. Energy Journal Issue, Vol. 39, no 2(2017) We present a new model for pricing electricity swaps. We posit swap electricity prices result from at least three driving forces. First, a stochastic factor acting as an anchor of the level of the forward curve. This is the average “consensus” price for the contracts within a maturity slot (yearly, quarterly, and monthly). Second, an element reflecting deterministic trend-seasonal components, because we assume market expects weather-related variations in demand. Third, a part accounting for (mean-reverting) stochastic deviations from the last two factors. These deviations depend on time to maturity and length of delivery period. By using a Multivariate Normal Inverse Gaussian (MNIG) distribution, our model embodies realistic probabilities of occurrence of extreme prices. Finally, we test the model using EEX data for the German market
Humans expect generosity Brañas-Garza, P., Rodríguez-Lara, I. and Sánchez, A. Scientific Reports, 7, Article number: 42446 (2017) Data analysis of experiments with the Dictator game in different setups and countries shows that the majority of people expects generosity from strangers in situations when sharing is non-enforceable
Combining Multivariate Volatility Forecasts: An Economic-Based Approach Caldeira, J.F., Moura, G.V., Nogales, F.J. and Santos A. A.P. Journal of Financial Econometrics Vol 15, Issue 2, pp 247-285 (2017) We devise a novel approach to combine predictions of high-dimensional conditional covariance matrices using economic criteria based on portfolio selection. The combination scheme takes into account not only the portfolio objective function but also the portfolio characteristics in order to define the mixing weights.Three important advantages are that i) it does not require a proxy for the latent conditional covariance matrix, ii) it does not require optimization of the combination weights, and iii) can be calibrated in order to adjust the influence of the best performing models.
Control charts based on parameter depths Cascos, I. and López Díaz, M. Applied Mathematical Modelling, Vol 53, pp 487--509 (2018) Control charts are designed to monitor on-going production processes by tracking subsequent samples of the production using some statistic of a quality characteristic. We propose to track the parameter depths of estimates of a parameter by means of depth (D)-charts, or the associated depth-based ranks by means of r-charts. More precisely, given a general parameter (e.g. mean, standard deviation or pair given by mean and standard deviation) and some historical data of the production, the parameter depth of an estimate of the parameter on new samples of the production with regard to the historical data is computed. The process is considered to be out-of-control when the depth of the estimate of the parameter falls below some given threshold (control limit). Some control limits of specific D-charts are obtained under the assumption of normality of the quality characteristic.
Adaptive multiscapes: an up-to-date metaphor to visualize molecular adaptation Catalán, P., Arias,C.F. , Cuesta, J. and Manrubia, S. Biology Direct, 12:7 (2017) This paper proposes an update to Wright's fitness landscapes that incorporates the most recent discoveries in molecular evolution
T-Hoarder: A framework to process Twitter data streams Congosto, M., Basanta-Val, P. and Sanchez-Fernandez, L. Journal of Network and Computer Applications, Vol. 83, Pages 28-39 (2017) This paper describes T-Hoarder: a framework that enables tweet crawling, data filtering, and which is also able to display summarized and analytical information about the Twitter activity with respect to a certain topic or event in a web-page. T-Hoarder is capable of managing very large experiments both in duration (more than one year) and size (millions of tweets).
Functional Principal Component Regression and Functional Partial Least-squares Regression: An Overview and a Comparative Study Febrero-Bande, M., Galeano, P. and González-Manteiga, W. International Statistical Review Vol 85, Issue 1, pp 61–83 (2017) :Functional data analysis is a field of growing importance in Statistics. In particular, the functional linear model with scalar response is surely the model that has attracted more attention in both theoretical and applied research. Two of the most important methodologies used to estimate the parameters of the functional linearmodel with scalar response are functional principal component regression and functional partial least-squares regression. We provide an overview of estimation methods based on these methodologies and discuss their advantages and disadvantages. We emphasise that the role played by the functional principal components and by the functional partial least-squares components that are used in estimation appears to be very important to estimate the functional slope of the model. A functional version of the best subset selection strategy usual in multiple linear regression is also analysed. Finally, we present an extensive comparative simulation study to compare the performance of all the considered methodologies
Langevin diffusions on the torus: estimation and applications Garcia Portugués, E., Sørensen M., Mardia, K.V. and Hamelryck, T. Statistics and Computing, pp 1–22, (2017) We introduce stochastic models for continuous-time evolution of angles and develop their estimation. We focus on studying Langevin diffusions with stationary distributions equal to well-known distributions from directional statistics, since such diffusions can be regarded as toroidal analogues of the Ornstein–Uhlenbeck process. We propose three approximate likelihoods that are computationally tractable and investigate the empirical performance of the approximate likelihoods. The software package sdetorus implements the estimation methods and applications presented in the paper
Disentangling the effects of selection and loss bias on gene dynamics Iranzo J., José A. Cuesta, Susanna Manrubia, Mikhail I. Katsnelson, and Koonin, E. V. Proceedings of the National Academy of Sciences (USA), Early Edition, vol. 114 no. 28 (2017) We combine mathematical modeling of genome evolution with comparative analysis of prokaryotic genomes to estimate the relative contributions of selection and intrinsic loss bias to the evolution of different functional classes of genes and mobile genetic elements
A divisive clustering method for functional data with special consideration of outliers Justel, A. and Svarc, M. Advances in Data Analysis and Classification (2017) This paper presents DivClusFD, a new divisive hierarchical method for the non-supervised classification of functional data. Data of this type present the peculiarity that the differences among clusters may be caused by changes as well in level as in shape. Different clusters can be separated in different subregion and there may be no subregion in which all clusters are separated. In each step of division, the DivClusFD method explores the functions and their derivatives at several fixed points, seeking the subregion in which the highest number of clusters can be separated
The BIG CHASE: A decision support system for client acquisition applied to financial networks Liberatore F. and Quijano-Sánchez L. Decision Support Systems, Vol. 98, Pages 49-58 (2017) The paper presents a case study of a client acquisition decision support system for "Banco Santander, S.L.. In it, a reliability graph is built from client and transaction data provided by the bank. This graph models relationships based on a probability of traversal function that includes social measures. Then, an optimization procedure tailored to be efficient on very large sparse graphs with millions of nodes and edges identifies the most reliable sequence of clients that a manager should contact to reach a specific target.
What do we really need to compute the Tie Strength? An empirical study applied to Social Networks Liberatore F. and Quijano-Sánchez L. Computer Communications, Volume 110, Pages 59-74 (2017) The paper empirically presents the relative importance of different social variables for the computation of the tie strength and proposes a computational model independent of the Social Networks' domain. It includes the first dataset publicly available to explicitly include tie strength measures.

Title Authors Journal Abstract
Distribution of genotype networks sizes in sequence-to-structure genotype-phenotype maps Manrubia S. and Cuesta J. A. Journal of the Royal Society Interface, Vol. 14, issue 129 (2017) By using very simple statistical arguments we explain the observed distributions of genotype network sizes (the number of genotypes that yield the same phenotype)
Equilibria, information and frustration, in heterogeneous network games with conflicting preferences Mazzoli,M. and Sánchez, A Journal of Statistical Mechanics: Theory and Experiment, DOI: 10.1088/1742-5468/aa9347 (2017) This paper presents a simulation model to address the problem of people interacting on a network and having to choose between two options, when there is heterogeneity in the population. Thus, preferences are introduced by assigning to every individual a preference for one of the said options. The paper shows that the population then ends up in different situations depending on the type of network and the specific interaction. The model can be used to generate data about specific applications where this generic mechanism of identity is of relevance.
Detecting Steps Walking at very Low Speeds Combining Outlier Detection, Transition Matrices and Autoencoders from Acceleration Patterns Munoz-Organero, M. and Ruiz-Blaquez, R. Sensors, 17(10), 2274 (2017) Este trabajo desarrolla y valida un nuevo algoritmo para detectar pasos mientras caminamos a muy baja velocidad (entre 30 y 40 pasos por minuto) basado ​​en datos de un único acelerómetro triaxial. El algoritmo concatena tres fases consecutivas. En primer lugar, se realiza una detección de valores atípicos en los datos sensados basado ​​en la distancia de Mahalanobis para detectar puntos candidatos en la serie temporal de aceleración que pueden contener un segmento de contacto del pie con el suelo. En segundo lugar, los segmentos de aceleración alrededor de los puntos atípicos pre-detectados se utilizan para calcular matrices de transición con el fin de capturar las dependencias temporales. Finalmente se usan autocodificadores entrenados con segmentos de datos que contienen matrices de transición de pasos etiquetados para decidir si un valor atípico corresponde con un paso a baja velocidad.
Automatic detection of traffic lights, street crossings and urban roundabouts combining outlier detection and deep learning classification techniques based on GPS traces while driving Munoz-Organero, M., Ruiz-Blaquez, R. and Sánchez-Fernández, L. Computers Environment and Urban Systems. DOI: 10.1016/j.compenvurbsys.2017.09.005 (2017) Este artículo presenta un mecanismo novedoso para la detección automática de elementos de infraestructura urbana que influyen en la conducción como semáforos, cruces de calles y rotondas. Con el fin de minimizar los requisitos del sistema y simplificar la recopilación de datos de muchos usuarios con un impacto mínimo para ellos, sólo se utilizan trazas de GPS de un dispositivo móvil durante la conducción. Las series temporales de aceleración y de velocidad se derivan de los datos GPS. Un algoritmo de detección de valores atípicos se utiliza en primer lugar con el fin de detectar ubicaciones de conducción anormal (que pueden ser debidas a elementos de infraestructura o condiciones particulares del tráfico). Utilizando herramientas de aprendizaje profundo, los patrones de velocidad y aceleración se analizan automáticamente con el fin de extraer características relevantes que luego se clasifican en un semáforo, cruce de calles, rotonda urbana u otro elemento.
Improving transportation networks: Effects of population structure and decision making policies Pablo-Martí, F. and Sánchez, A. Scientific Reports, 7, Article number: 4498 (2017) In this paper we introduce a method to analyze data from transportation networks in order to identify the criteria used to decide how they have been built. The method can also be used to optimize an existing network subject to different types of constraints reflecting strategic decisions.
The emergence of altruism as a social norm Pereda, M., Brañas-Garza,P., Rodríguez-Lara,I. and Sánchez, A. Scientific Reports 7, Article number: 9684 (2017) Experimental data shows very clearly that people are generous in so far as they give money to others when they are allowed to keep all of it without any punishment. In this work we introduce a simulation model that allows to understand the experimental data in terms of human behavior arising from reinforcement learning. For the model to reproduce the data properly, we show that mistakes during the process must be taken into account as the deterministic learning process does not fit the data quantitatively.
Fast and robust estimators of variance components in the nested error model Pérez, B., Molina, I., Thieler, A., Fried , R. and Peña, D. Statistics and Computing Vol 27, Issue 6, pp 1655–1675 (2017) Usual fitting methods for the nested error linear regression model are known to be very sensitive to the effect of even a single outlier. Robust approaches for the unbalanced nested error model with proved robustness and efficiency properties, such as M-estimators, are typically obtained through iterative algorithms. These algorithms are often computationally intensive and require robust estimates of the same parameters to start the algorithms, but so far no robust starting values have been proposed for thismodel.This paper proposes computationally fast robust estimators for the variance components under an unbalanced nested error model, based on a simple robustification of the fitting-ofconstantsmethod or Hendersonmethod III. These estimators can be used as starting values for other iterative methods. Our simulations show that they are highly robust to various types of contamination of different magnitude.
Make it personal: A social explanation system applied to group recommendations Quijano-Sanchez, L., Sauer, C., Recio-Garcia, J.A. and Diaz-Agudo, B. Expert Systems with Applications, Vol. 76, Pages 36-48 (2017) The paper proposes a Personalized Social Individual Explanation approach for group recommenders. Its goal is to study how to best explain proposed items to social groups performing joint activities and how to enhance users’ reactions towards a recommender system by recalling the groups’ affective bonds.
Bayesian Analysis of the Stationary MAP Ramírez-Cobo, P, Lillo, R.E. and Wiper, M.P. Bayesian Aalysis Vol 12, Number 4, pp. 1163–1194 (2017) In this article we describe a method for carrying out Bayesian estimation for the two-state stationary Markov arrival process (MAP2), which has been proposed as a versatile model in a number of contexts. The approach is illustrated on both simulated and real data sets, where the performance of the MAP2 is compared against that of the well-known MMPP2. As an extension of the method, we estimate the queue length and virtual waiting time distributions of a stationary MAP2/G/1 queueing system, a matrix generalization of the M/G/1 queue that allows for dependent inter-arrival times. Our procedure is illustrated with applications in Internet traffic analysis.
Directional multivariate extremes in environmental phenomena Torres, R., De Michele, C., Henry Laniado, H. and Lillo, R.E. Environmetrics Vol 28 (2017) Several environmental phenomena can be described by different correlated variables that must be considered jointly in order to be more representative of the nature of these phenomena. For such events, identification of extremes is inappropriate if it is based on marginal analysis. Extremes have usually been linked to the notion of quantile, which is an important tool to analyze risk in the univariate setting. We propose to identify multivariate extremes and analyze environmental phenomena in terms of the directional multivariate quantile, which allows us to analyze the data considering all the variables implied in the phenomena, as well as look at the data in interesting directions that can better describe an environmental catastrophe. Because there are many references in the literature that propose extremes detection based on copula models, we also generalize the copula method by introducing the directional approach. Advantages and disadvantages of the nonparametric proposal that we introduce and the copula methods are provided in the paper. We show with simulated and real data sets how by considering the first principal component direction we can improve the visualization of extremes. Finally, two cases of study are analyzed: a synthetic case of flood risk at a dam (a three-variable case) and a real case study of sea storms (a five-variable case).
Next-Generation Big Data Analytics: State of the Art, Challenges, and Future Research Topics Zhihan Lv, Houbing Song, Basanta-ValP., Steed, A. and Minho Jo IEEE Transactions on Industrial Informatics Vol. 13, Issue: 4, (Aug. 2017 ) The term big data occurs more frequently now than ever before. A large number of fields and subjects, ranging from everyday life to traditional research fields (i.e., geography and transportation, biology and chemistry, medicine and rehabilitation), involve big data problems. The popularizing of various types of network has diversified types, issues, and solutions for big data more than ever before. In this paper, we review recent research in data types, storage models, privacy, data security, analysis methods, and applications related to network big data. Finally, we summarize the challenges and development of big data to predict current and future trends.


Title Authors Journal Abstract
D-trace estimation of a precision matrix using adaptive Lasso penalties Avagyan, V., Alonso,A.M., and Nogales, F.J. Advances in Data Analysis and Classification DOI: (2016) The accurate estimation of a precision matrix plays a crucial role in the current age of high-dimensional data explosion. To deal with this problem, one of the prominent and commonly used techniques is the ℓ1ℓ1 norm (Lasso) penalization for a given loss function. This approach guarantees the sparsity of the precision matrix estimate for properly selected penalty parameters. However, the ℓ1ℓ1 norm penalization often fails to control the bias of obtained estimator because of its overestimation behavior. In this paper, we introduce two adaptive extensions of the recently proposed ℓ1ℓ1 norm penalized D-trace loss minimization method. They aim at reducing the produced bias in the estimator.
Stock Return Serial Dependence and Out- of-Sample Portfolio Performance. DeMiguel, A.V., Nogales,F.J. and Uppal, R. The Review of Financial Studies,Vol. 27, Issue 4, Pages 1031–1073 (2014) . We study whether investors can exploit serial dependence in stock returns to improve out-of-sample portfolio performance. We show that a vector-autoregressive (VAR) model captures stock return serial dependence in a statistically significant manner.
Dating multiple change points in the correlation matrix Galeano, P. and Dominik Wied, D. Sociedad de Estadística e Investigación Operativa (2016) DOI 10.1007/s11749-016-0513-3 A nonparametric procedure for detecting and dating multiple change points in the correlation matrix of sequences of random variables is proposed. The procedure is based on a recently proposed test for changes in correlation matrices at an unknown point in time. Although the procedure requires constant expectations and variances, only mild assumptions on the serial dependence structure are assumed. The convergence rate of the change point estimators is derived and the asymptotic validity of the procedure is proved. Moreover, the performance of the proposed algorithm in finite samples is illustrated by means of a simulation study and the analysis of a real data example with financial returns. These examples show that the algorithm has large power in finite samples.
Multiperiod portfolio optimization with multiple risky assets and general transaction costs Mei, X., De Miguel, V and Nogales, F.J. Journal of Banking & Finance Vol 69, pp 108-120, (2016) We analyze the optimal portfolio policy for a multiperiod mean–variance investor facing multiple risky assets in the presence of general transaction costs. For proportional transaction costs, we give a closed-form expression for a no-trade region, shaped as a multi-dimensional parallelogram, and show how the optimal portfolio policy can be efficiently computed for many risky assets by solving a single quadratic program. For market impact costs, we show that at each period it is optimal to trade to the boundary of a state-dependent rebalancing region. Finally, we show empirically that the losses associated with ignoring transaction costs and behaving myopically may be large.
Common Seasonality in Multivariate Time Series Nieto, F.H., Peña,D. and Saboyá, D. Statistica Sinica, 26, 1389-1410, 2016. Common factors for seasonal multivariate time series are usually obtained by first filtering the series to eliminate the seasonal component and then extracting the nonseasonal common factors.
Monitoring multivariate variance changes Pape, K., Dominik Wied, D. and Galeano, P. Journal of Empirical Finance Vol 39, pp 54-68 (2016) We propose a model-independent multivariate sequential procedure to monitor changes in the vector of component wise unconditional variances in a sequence of p-variate random vectors.The asymptotic behavior of the detector is derived and consistency of the procedure stated. A detailed simulation study illustrates the performance of the procedure confronted with different types of data generating processes. We conclude with an application t to the log returns of a group of DAX listed assets.
Generalized Dynamic Principal Components Peña,D. and Yohai, V.J. The Journal of American Statistical Association, 111,515, 1121-1131, 2016. Brillinger defined dynamic principal components (DPC) for time series based on a reconstruction criterion. He gave a very elegant theoretical solution and proposed an estimator which is consistent under stationarity. Here we propose a new enterally empirical approach to DPC.
Dependence patterns for modeling simultaneous events. Rodríguez, J., Lillo, R.E. and Ramírez Cobo, P. (2016). Reliability Engineering & System Safety, 154, 19-30. In this paper we examine in detail some of the modeling capabilities of the stationary m-state BMAP , with simultaneous events up to size k, noted BMAPm(k) . Specifically, we study the forms of the auto-correlation functions of the inter-event times and event sizes
Analyzing the Impact of Using Optional Activities in Self-Regulated Learning Ruipérez-Valiente, J.A., Muñoz-Merino, P.J.,Delgado Kloos,C.,Niemann,K.,Schefeld,M. and Wolpers, M. IEEE Transactions on Learning Technologies, Volume: 9, Issue: 3, July-Sept. 1 2016 This paper analyzes the use of optional activities in an educational online environment in two case studies with a Self-Regulated Learning approach. We found that the level of use of optional activites was low. Optional activities which are not related to learning are used more. Students finished the goals they set in more than 50 percent of the time and that they voted their peers' comments in a positive way. We also found that gender and the type of course can influence which optional activities are used.
Functional outlier detection by a local depth with application to NO x levels Sguera, C, Galeano, P y Lillo, R.E Stochastic Environmental Research and Risk Assessment, Volume 30, Issue 4, pp 1115–1130 (2016) This paper proposes methods to detect outliers in functional data sets and the task of identifying atypical curves is carried out using the recently proposed kernelized functional spatial depth (KFSD).


Title Authors Journal
Daily rhythms in mobile telephone communication Aledavood, T., López, E., Roberts, S., Reed-Tsochas, F., Moro, E., Dunbar, R. and Saramäki, J. PLoS ONE 10, e0138098 (2015)
Short-Range Mobility and the Evolution of Cooperation: An Experimental Study Antonioni, A., Tomassini, M. and Sánchez, A. Scientific Reports 5, 10282 (2015).
Time series segmentation procedures to detect, locate and estimate change- points Badagian, A.L., Kaiser, R. and Peña, D. In festschrift for Prof. Heiler, Empirical Economic and Financial Research – Theory, Methods and Practice, Beran, J., Feng, Y. and Hebbel, H. (eds.) Springer, Berlin. 2015.
Revealing patterns of local species richness along environmental gradients with a novel network tool. Baudena, M., Sánchez, A.,Georg, C.P., Ruíz-Benito, P., Zavala, M.A., Rodríguez, M.A. and Rietkerk, M.G./td> Scientific Reports 5, 11561 (2015).
Reputation drives cooperative behaviour and network formation in human groups Cuesta, J.A., Gracia-Lázaro, C., Ferrer, A., Moreno, Y. and Sánchez, A. Scientific Reports 5, 7843 (2015).
Performance of Social Network Sensors During Hurricane Sandy. Kryvasheyeu, Y., Chen, H., Moro, E., Van Hentenryck, P. and Cebrian, M. PLoS ONE 10, 0117288 (2015)
Detection and evaluation of emotions in Massive Open Online Courses. Leony, D., Muñoz-Merino, P.J., Ruipérez-Valiente, J.A., Pardo, A., Arellano, D. and Delgado kloos, C. Journal of Universal Computer Science, vol. 21, no. 5, pp. 638-655 (2015)
Social Media Fingerprints of Unemployment. Llorente, A., García-Herranz, M., Cebrián, M. and Moro, E. PLoS ONE 10, 0128692 (2015)
Precise effectiveness strategy for analyzing the effectiveness of students with educational resources and activities in MOOCs Muñoz-Merino, P.J., Ruipérez-Valiente, J.A., Alario-Hoyos, C., Pérez-Sanagustín, M. and Delgado kloos, C. Computers in Human Behavior, vol. 47, pp. 108-118 (2015)
A Software Engineering Model for the Development of Adaptation Rules and its Application in a Hinting Adaptive E-learning System Muñoz-Merino, P.J., Delgado kloos, C., Muñoz-Organero, M. and Pardo, A. Computer Science and Information Systems, vol. 12, no. 1 (2015), pp. 203--231.
Rethinking Statistics with Big Data: learning from George Box Peña, D. Quality Technology &Quantitative Management 12, 1, 2015.
ALAS-KA: A learning analytics extension for better understanding the learning process in the Khan Academy platform Ruipérez-Valiente, J.A., Muñoz-Merino, P.J., Leony, D. and Delgado kloos, C. Computers in Human Behavior, vol. 47, pp. 139-148, (2015).
Theory must be informed by experiments (and back) - Comment on "Universal scaling for the dilemma strength in evolutionary games", by Z. Wang et al. Sánchez, A. Physics of Life Reviews 14, 52-53 (2015).
From seconds to months: an overview of multi-scale dynamics of mobile telephone calls Saramäki, J. and Moro, E. Eur. Phys. J. B 88, 164 (2015).