|Adam D. Lelkes|
Ph.D. DissertationAlgorithms and Complexity Results for Learning and Big Data
Investigating Rumor News Using Agreement-Aware Search. Jingbo Shang, Tianhang Sun, Jiaming Shen, Xingbang Liu, Anja Gruenheid, Flip Korn, Ádám D. Lelkes, Cong Yu, and Jiawei Han. To appear in Proceedings of the 2018 ACM on Conference on Information and Knowledge Management (CIKM 2018).
A Confidence-Based Approach for Balancing Fairness and Accuracy. With Benjamin Fish and Jeremy Kun. In Proceedings of the 2016 SIAM International Conference on Data Mining (SDM 2016).
Interactive Clustering of Linear Classes and Cryptographic Lower Bounds. With Lev Reyzin. In Proceedings of the 26th International Conference on Algorithmic Learning Theory (ALT 2015). slides
Network installation under convex costs. With Alexander Gutfraind, Jeremy Kun and Lev Reyzin. In Journal of Complex Networks, Volume 4, Issue 2.
Biclique coverings, rectifier networks and the cost of ε-removal. With Szabolcs Iván, Judit Nagy-György, Balázs Szörényi, and György Turán. In Proceedings of the 16th International Workshop on Descriptional Complexity of Formal Systems (DFCS 2014).
Improved algorithms for splitting full matrix algebras. With Gábor Ivanyos and Lajos Rónyai. JP Journal of Algebra, Number Theory and Applications 28 (2013), 141--156.
Small zero divisors in maximal orders of Mn(Q) (BSc thesis)
Small zero divisors in maximal orders of Mn(Q) (Scientific Student Conference 2011 at the Budapest University of Technology and Economics)
CoauthorsBenjamin Fish, Alexander Gutfraind, Szabolcs Iván, Gábor Ivanyos, Jeremy Kun, Judit Nagy-György, Lev Reyzin, Lajos Rónyai, Balázs Szörényi, György Turán.
Investigating Rumor News Using Agreement-Aware Search
In recent years, rumor news has been generated by humans as well as robots in order to attract readership, influence opinions, and increase internet click revenue. Its detrimental effects have become a worldwide phenomenon, leading to confusion over facts and causing mistrust about media reports. However, evaluating the veracity of news stories can be a complex and cumbersome task, even for experts. One of the challenging problems in this context is to automatically understand different points of view, i.e., whether other news articles reporting on the same problem agree or disagree with the reference story. This can then lead to the identification of news articles that propagate false rumors.
Here, we propose a novel agreement-aware search framework, Maester, for dealing with the problem of rumor detection. Given an investigative question summarizing some news story or topic, it will retrieve related articles to that question, assign and display top articles from agree, disagree, and discuss categories to users, and thus provide a more holistic view. Our work makes two observations. First, relatedness can commonly be determined by keywords and entities occurred in both questions and articles. Second, the level of agreement between the investigative question and the related news article can often be decided by a few key sentences. Accordingly, we design our approach for relatedness detection to focus on keyword/entity matching using gradient boosting trees, while leveraging recurrent neural networks and posing attentions to key sentences to infer the agreement level. Our evaluation is based on the dataset from the Fake News Challenge (FNC). Extensive experiments demonstrate up to an order of magnitude improvement of Maester over all baseline methods for agreement-aware search as well as slightly improved accuracy based on the same metrics used in FNC.
A Confidence-Based Approach for Balancing Fairness and Accuracy
We study three classical machine learning algorithms in the context of algorithmic fairness: adaptive boosting, support vector machines, and logistic regression. Our goal is to maintain the high accuracy of these learning algorithms while reducing the degree to which they discriminate against individuals because of their membership in a protected group. Our first contribution is a method for achieving fairness by shifting the decision boundary for the protected group. The method is based on the theory of margins for boosting. We empirically compare our method with other variants of these learning algorithms as well as results in previous papers in the fairness literature. Our method, in addition to outperforming many of the prior algorithms in terms of accuracy and low discrimination, also allows for a fast and transparent quantification of the trade-off between bias and error. Our second contribution addresses the shortcomings of the bias-error trade-off studied in most of the algorithmic fairness literature. We demonstrate that even hopelessly naive modifications of a biased algorithm, which cannot be reasonably said to be `fair,' can still achieve low bias and high accuracy. To help to distinguish between these naive algorithms and more sensible algorithms we propose a new measure of fairness, called resilience to random bias (RRB). We demonstrate that RRB distinguishes well between our naive and sensible fairness algorithms. RRB together with bias and accuracy provides a more complete picture of the fairness of an algorithm.
On the Computational Complexity of MapReduce
In this paper we study the MRC model, which aims to formally capture distributed MapReduce computations. We show that the class of regular languages, and moreover all of sublogarithmic space, lies in constant-round MRC. In addition, we prove that, conditioned on a weak version of the Exponential Time Hypothesis, there are strict hierarchies within MRC so that increasing the number of rounds or the amount of time per processor increases the power of MRC. Our work lays the foundation for further analysis relating MapReduce to established complexity classes. Our results also hold for Valiant's BSP model of parallel computation and the MPC model of Beame et al.
Interactive Clustering of Linear Classes and Cryptographic Lower Bounds
We study an interactive model of supervised clustering introduced by Balcan and Blum (2008), where the clustering algorithm has query access to a teacher. We give an efficient algorithm clustering linear functionals over finite fields, which implies the learnability of parity functions in this model. We also present an efficient clustering algorithm for hyperplanes which are a natural generalization of the problem of clustering linear functionals over Rd. We also give cryptographic hardness results for interactive clustering. In particular, we show that, under plausible cryptographic assumptions, the interactive clustering problem is intractable for the concept classes of polynomial-size constant-depth threshold circuits, Boolean formulas, and finite automata.
Fair boosting: a case study
We study the classical AdaBoost algorithm in the context of fairness. We use the Census Income Dataset as a case study. We empirically evaluate the bias and error of four variants of AdaBoost relative to an unmodified AdaBoost baseline, and study the trade-offs between reducing bias and maintaining low error. We further define a new notion of fairness and measure it for all of our methods. Our proposed method, modifying the hypothesis output by AdaBoost by shifting the decision boundary for the protected group, outperforms the state of the art for the census dataset.
Network installation under convex costs
We study the Neighbor Aided Network Installation Problem (NANIP) introduced previously which asks for a minimal cost ordering of the vertices of a graph, where the cost of visiting a node is a function of the number of neighbors that have already been visited. This problem has applications in resource management and disaster recovery. In this paper we analyze the computational hardness of NANIP. In particular we show that this problem is NP-hard even when restricted to convex decreasing cost functions, give a linear approximation lower bound for the greedy algorithm, and prove a general sub-constant approximation lower bound. Then we give a new integer programming formulation of NANIP and empirically observe its speedup over the original integer program.
Biclique coverings, rectifier networks and the cost of ε-removal
We relate two complexity notions of bipartite graphs: the minimal weight biclique covering number Cov(G) and the minimal rectifier network size Rect(G) of a bipartite graph G. We show that there exist graphs with Cov(G)≥Rect(G)3/2−ϵ. As a corollary, we establish that there exist nondeterministic finite automata (NFAs) with ε-transitions, having n transitions total such that the smallest equivalent ε-free NFA has Ω(n3/2−ϵ) transitions. We also formulate a version of previous bounds for the weighted set cover problem and discuss its connections to giving upper bounds for the possible blow-up.
Improved algorithms for splitting full matrix algebras
Let K be an algebraic number field of degree d and discriminant Δ over Q. Let A be an associative algebra over K given by structure constants such that A ≅ Mn(K) holds for some positive integer n. Suppose that d, n and |Δ| are bounded. In a previous paper a polynomial time ff-algorithm was given to construct explicitly an isomorphism A → Mn(K). Here we simplify and improve this algorithm in the cases n≤43, K=Q, and n=2, with K=Q(√(-1)) or K=Q(√(-3)). The improvements are based on work by Y. Kitaoka and R. Coulangeon on tensor products of lattices.
Small zero divisors in maximal orders of Mn(Q)
The topic of this paper stems from the representation theory of finite dimensional algebras: let A be an associative algebra given by structure constants over an algebraic number field K, which is isomorphic to the full matrix algebra Mn(K); we would like to construct explicitly an isomorphism A→Mn(K). We consider the case K=Q. In order to construct an isomorphism, we need to find a rank one matrix with a small Frobenius norm in a maximal Z-order Λ in A, where the Frobenius norm is inherited from an embedding of A into Mn(R) (a maximal Z-order in A is essentially Mn(Z) transformed by an invertible rational matrix). Additionally, in order to deduce the time bound of the algorithm, we need to find a tight upper bound for the minimal norm of rank one matrices.
We use lattices in Euclidean spaces to examine this problem, thus establishing a firm link between the theory of lattices and the representation theory of algebras. Furthermore, besides being interesting on their own, this topic has a number of connections with important problems in algebraic geometry and number theory, including the very famous Birch and Swinnerton-Dyer conjecture.
In a recent paper G. Ivanyos, L. Rónyai and J. Schicho proved that there exists a rank one matrix in Λ whose Frobenius norm is less than n. By slightly modifying the proof, we can infer that this theorem holds with the Hermite constant γn as upper bound instead of n. Thus a naturally arising question is whether this upper bound is optimal, i.e. if there is an invertible real matrix Q such as the minimal Frobenius norm of QAQ-1 for rank one matrices A in Λ is γn.
In this paper, after summarizing the background of the problem and introducing the necessary concepts, we first prove some general theorems about the upper bound for the minimal norm of rank 1 matrices. Specifically, we show that the optimal upper bound is the so-called Bergé-Martinet constant. In the second part of the paper we examine the structure of the lattice QMn(Z)Q-1, prove a stronger version of a theorem by Kitaoka about tensor products of lattices, and present a substantially improved version of the Ivanyos-Rónyai-Schicho algorithm in small dimensions.