Antonioni, A., Pereda, M., Cronin, K.A., Tomassini, M. and Sánchez, A.
Scientific Reports Vol 8, Article 5375 (2018)
Data from experiments carried out in our group shows that there are no detrimental effects of the hierarchy formed based on group performance,
yet when ranking is assigned individually we observe a decrease in cooperation. This arises because subjects interpret rankings as a reputation which carries information about which subjects were cooperators in the previous phase
Catalá,P., Wagner, A., Manrubia, S. and Cuesta, J.A.
Journal of the Royal Society Interface Vol 15, 20170516 (2018)
Using a toy model (toyLIFE) we are able to show that, adding levels of complexity on a nonrobust-to-mutation protein model (the HP model),
renders the model highly robust. Since robustness and evolvabilty are two sides of the same coin, our results show that complexity enhances evolutionary adaptation.
Cigarini, A., Vicens, J., Duch, J., Sánchez, A. and Perelló, J.
Scientific Reports Vol 8, Article 3794 (2018)
In this paper we use data obtained from game theory experiments designed in collaboration with all the collectives in the mental health ecosystem to shed light on different aspects of mental health care.
In particular, we show that people with different diagnoses behave in different manners in a suite of games, and we also identify the main characteristics that have to be fostered in order to resocialize them
Cuesta-Albertos, J.A., García-Portugués, E., Febrero-Bande, M., and González-Manteiga, W.
Annals of Statistics (2018)
We consider marked empirical processes indexed by a randomly projected functional covariate to construct goodness-of-fit tests for the functional linear model with scalar response. The test statistics are built from continuous functionals over the projected process, resulting in computationally efficient tests that exhibit root-n convergence rates and circumvent the curse of dimensionality.
The weak convergence of the empirical process is obtained conditionally on a random direction, whilst the almost surely equivalence between the testing for significance expressed on the original and on the projected functional covariate is proved. The finite sample properties of the tests are illustrated in a simulation study for a variety of linear models, underlying processes, and alternatives.
The software provided implements the tests and allows the replication of simulations and data applications.
Han, X., Cao, S., Bao, J.Z., Wang, W.X., Zang, B., Gao, Z.Y. and Sánchez, A.
Scientific Reports Vol 8, Article 1222 (2018)
These are the first experiments on Ultimatum Games on networks, designed to check predictions from earlier simulation models. Subjects take part in the game in the two roles, proposer and responder. Our data show that that status equality promotes rational sharing while the influence of structure leads to fairer offers compared to well-mixed populations. This is in stark contrast with the simulation predictions and highlights the relevance of having high quality data available to understand human behavior.
Filing a false police report is a crime that has dire consequences on both the individual and the system. In fact, it may be charged as a misdemeanor or a felony. For the society, a false report results in the loss of police resources and contamination of police databases used to carry out investigations and assessing the risk of crime in a territory. In this research, we present VeriPol, a model for the detection of false robbery reports based solely on their text. This tool, developed in collaboration with the Spanish National Police, combines Natural Language Processing and Machine Learning methods in a decision support system that provides police officers the probability that a given report is false. VeriPol has been tested on more than 1,000 reports from 2015 provided by the Spanish National Police.
Empirical results show that it is extremely effective in discriminating between false and true reports with a success rate of more than 91%, improving by more than 15% the accuracy of expert police officers on the same dataset. The underlying classification model can be analysed to extract patterns and insights showing how people lie to the police (as well as how to get away with false reporting). In general, the more details provided in the report, the more likely it is to be honest. Finally, a pilot study carried out in June 2017 has demonstrated the usefulness of VeriPol on the field.
Journal of Statistical Mechanics: Theory and Experiment.Vol 2018, 024001 (2018)
In this paper, the available data on experiments about cooperation on networks is reviewed and discussed critically.
One of the main points of the paper is that progress can only be made by contrasting theoretical models with the available data and generating specifically designed data from additional experiments
Alexandron, G., Ruipérez-Valiente, J. A., Chen, Z., Muñoz-Merino, P.J., & Pritchard, D. E
Computers & Education, Vol 108, pp 96-114 (2017)
This paper presents a detailed study of a form of academic dishonesty that involves the use of multiple accounts for harvesting solutions in a Massive Open Online Course (MOOC). It is termed CAMEO – Copying Answers using Multiple Existence Online. The detection of CAMEO is done using educational data mining. The study has three main goals: determining the prevalence of CAMEO, studying its detailed characteristics, and inferring the motivation(s) for using it.
Journal of Computational
and Graphical Statistics Vol 26, Issue 4, pp 865-872 (2017)
In this article, we focus on the estimation of a high-dimensional inverse covariance (i.e., precision) matrix. We propose a simple improvement of the graphical Lasso (glasso) framework that is able to attain better statistical performance without increasing significantly the computational cost. The proposed improvement is based on computing a root of the sample covariance matrix to reduce the spread of the associated eigenvalues. Through extensive numerical results, using both simulated and real datasets, we show that the proposed modification improves the glasso procedure. Our results reveal that the square-root improvement can be a reasonable choice in practice.
Barba, I., Miró-Casas, E., Torrecilla, J.L., Pladevall, E., Tejedor, S., Sebastián-Pérez , R., Ruiz-Meana, M., Berrendero, J.R. , Cuevas,A. and García-Dorado, D.
Journal of Nutritional Biochemistry, Vol. 40, Pages 187-193 (2017)
In this work, we study the differences induced by sex and diet in the metabolic phenotype and mitochondrial function of mice and their relation to cardiac events. The methodology includes the use of variable selection techniques with nuclear magnetic resonance spectra in order to detect relevant metabolites and improves the classification performance.
on Industrial Informatics Vol. PP Issue: 99, (2017)
Current trends in industrial systems opt for the use of different big-data engines as a mean to process huge amounts of data that cannot be processed with an ordinary infrastructure. The number of issues an industrial infrastructure has to face is large and includes challenges such as the definition of different efficient architecture setups for different applications, and the definition of specific models for industrial analytics. In this context, the article explores the development of a medium size big-data engine (i.e. implementation) able to improve performance in map-reduce computing by splitting the analytic into different segments that may be processed by the engine in parallel using a hierarchical model.
Basanta-Val, P., Fernández-García, N., Sánchez-Fernández,L. and Arias-Fisteus, J.
IEEE Transactions on Parallel and Distributed Systems, Vol. 28, Issue: 11 (2017)
In recent years, big data systems have become an active area of research and development. Stream processing is one of the potential application scenarios of big data systems where the goal is to process a continuous, high velocity flow of information items. High frequency trading (HFT) in stock markets or trending topic detection in Twitter are some examples of stream processing applications. In some cases (like, for instance, in HFT), these applications have end-to-end quality-of-service requirements and may benefit from the usage of real-time techniques. Taking this into account, the present article analyzes, from the point of view of real-time systems, a set of patterns that can be used when implementing a stream processing application. For each pattern, we discuss its advantages and disadvantages, as well as its impact in application performance, measured as response time, maximum input frequency and changes in utilization demands due to the pattern.
Typical infrastructure for big-data includes multiple machines with data accessed remotely with request–response patterns from different remote locations. Currently, most of the state-of-the-art remote invocation techniques are focused on models for distributed interactions, which have not explored the advantages given by parallel computing, such as those offered to run on distributed stream processors. In this context, the article is focused on the definition of a predictable remote procedure call (RPC) able to take advantage from the distributed stream processing technology.
Journal of the American Statistical Association, DOI: 10.1080/01621459.2017.1320287, (2017)
This paper provides: (a) Explicit expressions for the optimal (Bayes) rule in several classification problems of equivalent Gaussian processes. (b) An interpretation, in terms of mutual singularity, for the “near perfect classification” phenomenon described by Delaigle and Hall (2012) and an asymptotically optimal rule under singularity. (c) As an application, we propose a natural variable selection method and discuss the conditions for optimality. The approach relies on some classical results in the RKHS theory.
We present a new model for pricing electricity swaps. We posit swap electricity prices result from at least three driving forces. First, a stochastic factor acting as an anchor of the level of the forward curve. This is the average “consensus” price for the contracts within a maturity slot (yearly, quarterly, and monthly). Second, an element reflecting deterministic trend-seasonal components, because we assume market expects weather-related variations in demand. Third, a part accounting for (mean-reverting) stochastic deviations from the last two factors. These deviations depend on time to maturity and length of delivery period. By using a Multivariate Normal Inverse Gaussian (MNIG) distribution, our model embodies realistic probabilities of occurrence of extreme prices. Finally, we test the model using EEX data for the German market
We devise a novel approach to combine predictions of high-dimensional conditional covariance matrices using economic criteria based on portfolio selection. The combination scheme takes into account not only the portfolio objective function but also the portfolio characteristics in order to define the mixing weights.Three important advantages are that i) it does not require a proxy for the latent conditional covariance matrix, ii) it does not require optimization of the combination weights, and iii) can be calibrated in order to adjust the influence of the best performing models.
Control charts are designed to monitor on-going production processes by tracking subsequent samples of the production using some statistic of a quality characteristic. We propose to track the parameter depths of estimates of a parameter by means of depth (D)-charts, or the associated depth-based ranks by means of r-charts. More precisely, given a general parameter (e.g. mean, standard deviation or pair given by mean and standard deviation) and some historical data of the production, the parameter depth of an estimate of the parameter on new samples of the production with regard to the historical data is computed. The process is considered to be out-of-control when the depth of the estimate of the parameter falls below some given threshold (control limit). Some control limits of specific D-charts are obtained under the assumption of normality of the quality characteristic.
Congosto, M., Basanta-Val, P. and Sanchez-Fernandez, L.
Journal of Network and Computer Applications, Vol. 83, Pages 28-39 (2017)
This paper describes T-Hoarder: a framework that enables tweet crawling, data filtering, and which is also able to display summarized and analytical information about the Twitter activity with respect to a certain topic or event in a web-page. T-Hoarder is capable of managing very large experiments both in duration (more than one year) and size (millions of tweets).
Garcia Portugués, E., Sørensen M., Mardia, K.V. and Hamelryck, T.
Statistics and Computing, pp 1–22, (2017)
We introduce stochastic models for continuous-time evolution of angles and develop their estimation. We focus on studying Langevin diffusions with stationary distributions equal to well-known distributions from directional statistics, since such diffusions can be regarded as toroidal analogues of the Ornstein–Uhlenbeck process. We propose three approximate likelihoods that are computationally tractable and investigate the empirical performance of the approximate likelihoods. The software package sdetorus implements the estimation methods and applications presented in the paper
Iranzo J., José A. Cuesta, Susanna Manrubia, Mikhail I. Katsnelson, and Koonin, E. V.
Proceedings of the National Academy of Sciences (USA), Early Edition, vol. 114 no. 28 (2017)
We combine mathematical modeling of genome evolution with comparative analysis of prokaryotic genomes to estimate the relative contributions of selection and intrinsic loss bias to the evolution of different functional classes of genes and mobile genetic elements
Advances in Data Analysis and Classification (2017) doi.org/10.1007/s11634-017-0290-1
This paper presents DivClusFD, a new divisive hierarchical method for the non-supervised classification of functional data. Data of this type present the peculiarity that the differences among clusters may be caused by changes as well in level as in shape. Different clusters can be separated in different subregion and there may be no subregion in which all clusters are separated. In each step of division, the DivClusFD method explores the functions and their derivatives at several fixed points, seeking the subregion in which the highest number of clusters can be separated
Decision Support Systems, Vol. 98, Pages 49-58 (2017)
The paper presents a case study of a client acquisition decision support system for "Banco Santander, S.L.. In it, a reliability graph is built from client and transaction data provided by the bank. This graph models relationships based on a probability of traversal function that includes social measures. Then, an optimization procedure tailored to be efficient on very large sparse graphs with millions of nodes and edges identifies the most reliable sequence of clients that a manager should contact to reach a specific target.
The paper empirically presents the relative importance of different social variables for the computation of the tie strength and proposes a computational model independent of the Social Networks' domain. It includes the first dataset publicly available to explicitly include tie strength measures.
Journal of Statistical Mechanics: Theory and Experiment, DOI: 10.1088/1742-5468/aa9347 (2017)
This paper presents a simulation model to address the problem of people interacting on a network and having to choose between two options, when there is heterogeneity in the population. Thus, preferences are introduced by assigning to every individual a preference for one of the said options. The paper shows that the population then ends up in different situations depending on the type of network and the specific interaction. The model can be used to generate data about specific applications where this generic mechanism of identity is of relevance.
Este trabajo desarrolla y valida un nuevo algoritmo para detectar pasos mientras caminamos a muy baja velocidad (entre 30 y 40 pasos por minuto) basado en datos de un único acelerómetro triaxial. El algoritmo concatena tres fases consecutivas. En primer lugar, se realiza una detección de valores atípicos en los datos sensados basado en la distancia de Mahalanobis para detectar puntos candidatos en la serie temporal de aceleración que pueden contener un segmento de contacto del pie con el suelo. En segundo lugar, los segmentos de aceleración alrededor de los puntos atípicos pre-detectados se utilizan para calcular matrices de transición con el fin de capturar las dependencias temporales. Finalmente se usan autocodificadores entrenados con segmentos de datos que contienen matrices de transición de pasos etiquetados para decidir si un valor atípico corresponde con un paso a baja velocidad.
Munoz-Organero, M., Ruiz-Blaquez, R. and Sánchez-Fernández, L.
Computers Environment and Urban Systems. DOI: 10.1016/j.compenvurbsys.2017.09.005 (2017)
Este artículo presenta un mecanismo novedoso para la detección automática de elementos de infraestructura urbana que influyen en la conducción como semáforos, cruces de calles y rotondas. Con el fin de minimizar los requisitos del sistema y simplificar la recopilación de datos de muchos usuarios con un impacto mínimo para ellos, sólo se utilizan trazas de GPS de un dispositivo móvil durante la conducción. Las series temporales de aceleración y de velocidad se derivan de los datos GPS. Un algoritmo de detección de valores atípicos se utiliza en primer lugar con el fin de detectar ubicaciones de conducción anormal (que pueden ser debidas a elementos de infraestructura o condiciones particulares del tráfico). Utilizando herramientas de aprendizaje profundo, los patrones de velocidad y aceleración se analizan automáticamente con el fin de extraer características relevantes que luego se clasifican en un semáforo, cruce de calles, rotonda urbana u otro elemento.
In this paper we introduce a method to analyze data from transportation networks in order to identify the criteria used to decide how they have been built. The method can also be used to optimize an existing network subject to different types of constraints reflecting strategic decisions.
Pereda, M., Brañas-Garza,P., Rodríguez-Lara,I. and Sánchez, A.
Scientific Reports 7, Article number: 9684 (2017)
Experimental data shows very clearly that people are generous in so far as they give money to others when they are allowed to keep all of it without any punishment. In this work we introduce a simulation model that allows to understand the experimental data in terms of human behavior arising from reinforcement learning. For the model to reproduce the data properly, we show that mistakes during the process must be taken into account as the deterministic learning process does not fit the data quantitatively.
Quijano-Sanchez, L., Sauer, C., Recio-Garcia, J.A. and Diaz-Agudo, B.
Expert Systems with Applications, Vol. 76, Pages 36-48 (2017)
The paper proposes a Personalized Social Individual Explanation approach for group recommenders. Its goal is to study how to best explain proposed items to social groups performing joint activities and how to enhance users’ reactions towards a recommender system by recalling the groups’ affective bonds.
The term big data occurs more frequently now than ever before. A large number of fields and subjects, ranging from everyday life to traditional research fields (i.e., geography and transportation, biology and chemistry, medicine and rehabilitation), involve big data problems. The popularizing of various types of network has diversified types, issues, and solutions for big data more than ever before. In this paper, we review recent research in data types, storage models, privacy, data security, analysis methods, and applications related to network big data. Finally, we summarize the challenges and development of big data to predict current and future trends.
Advances in Data Analysis
and Classification DOI:
The accurate estimation of a precision matrix plays a crucial role in the current age of high-dimensional data explosion. To deal with this problem, one of the prominent and commonly used techniques is the ℓ1ℓ1 norm (Lasso) penalization for a given loss function. This approach guarantees the sparsity of the precision matrix estimate for properly selected penalty parameters. However, the ℓ1ℓ1 norm penalization often fails to control the bias of obtained estimator because of its overestimation behavior. In this paper, we introduce two adaptive extensions of the recently proposed ℓ1ℓ1 norm penalized D-trace loss minimization method. They aim at reducing the produced bias in the estimator.
The Review of Financial Studies,Vol. 27, Issue 4, Pages 1031–1073 (2014) .
We study whether investors can exploit serial dependence in stock returns to improve out-of-sample portfolio performance. We show that a vector-autoregressive (VAR) model captures stock return serial dependence in a statistically significant manner.
Journal of Banking
Vol 69, pp 108-120, (2016)
We analyze the optimal portfolio policy for a multiperiod mean–variance investor facing multiple risky assets in the presence of general transaction costs. For proportional transaction costs, we give a closed-form expression
for a no-trade region, shaped as a multi-dimensional parallelogram, and show how the optimal portfolio policy
can be efficiently computed for many risky assets by solving a single quadratic program. For market impact costs, we show that at each period it is optimal to trade to the boundary of a state-dependent rebalancing region. Finally, we show empirically that the losses associated with ignoring transaction costs and behaving myopically may be large.
The Journal of American Statistical Association, 111,515, 1121-1131, 2016.
Brillinger defined dynamic principal components (DPC) for time series based on a reconstruction criterion. He gave a very elegant theoretical solution and proposed an estimator which is consistent under stationarity. Here we propose a new enterally empirical approach to DPC.
Rodríguez, J., Lillo, R.E. and Ramírez Cobo, P. (2016).
Reliability Engineering & System Safety, 154, 19-30.
In this paper we examine in detail some of the modeling capabilities of the stationary m-state BMAP , with simultaneous events up to size k, noted BMAPm(k) . Specifically, we study the forms of the auto-correlation functions of the inter-event times and event sizes
This paper analyzes the use of optional activities in an educational online environment in two case studies with a Self-Regulated Learning approach. We found that the level of use of optional activites was low. Optional activities which are not related to learning are used more. Students finished the goals they set in more than 50 percent of the time and that they voted their peers' comments in a positive way. We also found that gender and the type of course can influence which optional activities are used.
Stochastic Environmental Research and Risk Assessment, Volume 30, Issue 4, pp 1115–1130 (2016)
This paper proposes methods to detect outliers in functional data sets and the task of identifying atypical curves is carried out using the recently proposed kernelized functional spatial depth (KFSD).