We explore the feasibility of a database storage engine housing up to 307 billion genetic Single Nucleotide Polymorphisms (SNP) for online access. We evaluate database storage engines and implement a solution utilizing factors such as dataset size, information gain, cost and hardware constraints. Our solution provides a full feature functional model for scalable storage and query-ability for researchers exploring the SNP's in the human genome. We address the scalability problem by building physical infrastructure and comparing final costs to a major cloud provider.
Keywords: Big Data; Billion Records; Cassandra; Data Reduction; Distributed Computing; Economical Computing; Edge Computing; Elasticsearch; Genomics; MySQL; NoSQL; PWM; SNP.