Topic- "Graph-based analysis of big data"
Big data is a term that describes the large volume of data both, structured and unstruc-
tured that inundates a business on a day to day basis. But it’s not the amount of data
that’s important. It’s what organizations do with the data that matters. Big data can
be analyzed for insights that lead to better decisions and strategic business moves with
the help of graph. Graphs are ubiquitous and the volume and diversity of graph data
are strongly growing.
Continuous growth of real time new applications results a huge amount of data that has been modeled into [login to view URL] graph database has been represented as an interconnections between objects and the study of these interconnections left with recent research gaps that are given as follows:
-Overhead reduction of cypher queries.
-Manual attribute based graph summarization.
-Complex link analysis to discover fraud patterns in a big data analysis.
-Detect and prevent fraud as it happens in real time.
Interactive graph analytics supported by suitable visualizations is highly desirable to put
the human in the loop for exploring and analyzing graph data. The currently existing
separation between interactive query processing with graph databases and batch-oriented
graph analytics should thus be overcome by providing all kinds of analysis in a unified,
distributed platform with support for interactive and visual analysis. Some of the graph
e.g., Blazegraph, System G and Titan, try to go into this direction, but there are still many
open issues in finding suitable visualizations and interaction forms for the difierent kinds of
analysis and at the same time it poses a number of challenges for suitable implementations
which are observed as follows:
Problem 1: Cypher queries mostly request complete nodes and relationships, which
causes a considerable overhead due to inefficiencies in the data structure used by Neo4j.
Problem 2: Graph visualization and summarization: Graph visualization methods are
primarily designed to better layout a graph in a big data world, so that it is easier for
users to understand the graph by visual. However, as graphs become large, displaying an
entire graph on the limited computer screen is challenging both, from the usability and
the visual performance perspectives. To overcome the problems raised by the large graph
sizes, navigation, interaction and summarization techniques are often incorporated into
graph visualization tools.
Graph summarization techniques are crucial in such domains as they can assist in
uncovering useful insights about the patterns hidden in the underlying data. However,
earlier graph summarization is to produce small and informative summaries based on user-
selected node attributes and relationships which allow users to interactively drill-down or
roll-up to navigate through summaries with different resolutions. Earlier we have used K-
Snap method which only deals with categorical node attributes(A categorical or discrete
variable is one that has two or more categories) but in the real world, many node attributes
are numerical, such as the age of a social network user or the number of publications of
an author in a coauthorship network. Simply running the graph summarization method
on the numerical attributes will result in summaries with large sizes (at least as large as
the number of distinct numerical values).
Problem 3: Complex link analysis to discover fraud patterns in a big data analysis:
Uncovering fraud rings requires you to traverse data relationships with high computational
complexity. This problem exacerbated as a fraud ring grows with the size of incremental
Problem 4: Detect and prevent fraud as it happens in execution time: To prevent a
fraud ring, you need real-time link analysis on an interconnected dataset, from the time a
false account is created to when a fraudulent transaction occurs.