BitQuery

BitQuery is a GitHub1 API driven and D32 based search engine for open source repositories (OSR).

BitQuery pursues two main objectives:

  • (I) Provide an automatic OSR categorization system for data science teams and software developers promoting discoverability, technology transfer and coexistence
  • (II) Establish visual data exploration and topic driven navigation of GitHub users and organizations for collaborative reproducible research (CRR) and web deployment

The BitQuery architecture consists of three abstraction layers, following the visual analytics approach3:

  • GitHub API based parser layer (Data Management)
  • Smart Data layer (Analysis)
  • D3-3D Visu layer (Visualization)

1. GitHub is the world's largest code hosting platform for version control and collaboration.

2. D3.js is a JavaScript library for producing dynamic, interactive data visualizations in web browsers.

3. Visual analytics: Definition, process, and challenges. Lecture notes in computer science, 4950:154–176 (D. Keim et al., 2008)


Special Edition

The application spectrum of BitQuery Special Edition is illustrated by exploring and querying the following data types:

DCF

  • R packages on GitHub
    A massive collection of R packages on GitHub including both the official CRAN packages and the developer versions.

JSON

  • npm packages on GitHub
    npm is the package manager for JavaScript and the world’s largest software registry.
  • PHP packages on GitHub
    Composer is the package manager for the PHP programming language.

YAML

  • Ebooks from GITenberg
    The GITenberg project is a Free and Open, Collaborative, Trackable and Scriptable digital library. It is curating and publishing highly usable and attractive ebooks in the public domain. Currently there are over 43,000 books in GITenberg.
  • DSSV 2017 members (YAML encoded Metadata)
    Data Science, Statistics & Visualisation (DSSV 2017) is a Satellite Conference of the 61st World Statistics Congress, promoted by IASC. 12-14 July 2017, Instituto Superior Técnico, Lisbon, Portugal. For more information, see the official DSSV website.

Markdown

  • GitHub READMEs
    Markdown is a lightweight and easy-to-use syntax for styling all forms of writing on the GitHub platform.

BitTrinity

BitTrinity is the driving technology of BitQuery that allows to retrieve the GitHub data, postprocess and export them to the appropriate visualization schemes. It comprises the following main components:

  • GitHub API based parser layer: extracts data from GitHub
  • Smart Data layer: transforms Big Data into value, processing the data semantics and metadata via dynamic calibration of metadata configurations, text mining (TM) models and clustering methods
  • D3-3D Visu layer, or BitQuery Visual Analytics application (VA-App), which is powered by two JavaScript libraries for producing dynamic data visualizations in web browsers: D3.js and Three.js.

The API Parser Layer and Smart Data Layer have been programmed in R using various CRAN packages, see also the References. The design and implementation of the D3-3D Visu layer is described in detail in the VA-App section.

BitQuery VA-App

BitQuery VA-App creates an interactive network visualization that allows to overview, sort, zoom, filter and query the data. Additional components such as Legends, Tooltip and Search field provide detailed information on chosen subsets or single data nodes.

BitQuery VA-App was designed in full compliance with the visual analytics mantra:

"Analyze first - show the important - zoom, filter and analyze further - details on demand."

The VA-App has been programmed via JavaScript and CoffeeScript by means of D3.js, Three.js and some npm packages. The (simplified) component diagram of the software infrastructure is given below. For more information, see   d3VA - D3 for Visual Analytics: source code, libraries, components


BitQuery VA-App infrastructure, implemented via CoffeeScript classes



Main VA-App components


Tooltip

Details on demand
Shows detailed information on the chosen data node, e.g. package title, version etc.

TooltipSearch

Zoom and filter
Various Search parameters, e.g. package or author name.

Legend

Zoom and filter
Interactive Legends which enable to filter and project data subsets according to various dimensions and parameters.

Orbits

Overview
flexible layout settings for the radial graph scheme (by orbits).

GraphLayout

Overview
highly customizable graph layout which visualizes selected data.

NodesFilter

Zoom and filter
Creates and organizes the legends and performs interactions between them and the graph layout.

VisTransform

Overview
Prepares and filters data for the visualization and manages transformation settings (the animation will start when you move the mouse over the image).

BitQueryForced

Overview
Creates an interactive network visualization that allows to overview, sort, zoom, filter and query the data (the animation will start when you move the mouse over the image).

References


Publications and working papers


R Packages

  • github: Provides access to the Github v3 API. R package version 0.9.8. C. Scheidegger (2016)
  • rgithubS: Provides access to the GitHub v3 API. Special edition: search, statistics, parsers. R package version 0.9.9. C. Scheidegger and L. Borke (2017)
  • tm: A framework for text mining applications within R. R package version 0.7-5. I. Feinerer and K. Hornik (2018)
  • TManalyzer: Provides IR tools in 3 text mining models: BVSM, GVSM(TT) and LSA. It is complemented by metadata analytics and document clustering functionality. R package version 0.6.0. L. Borke (2017)


Our GitHub organizations


bemined

CRAN & GitHub Mining infrastructure


b2net

Collaboration Network


d3VA

D3 for Visual Analytics

Our Team



lborke

Lukas Borke

Software engineer and data scientist

polarstern

Svetlana Bykovskaya

Data visualization specialist and data scientist