Zhao - Title

Ying Zhao, PhD

Research Professor
Department of Information Sciences
More info on Dr. Ying Zhao

Dr. Ying Zhao



This section talks about several ideas about Big Data, including the Navy's goal for Big Data, the use of LSA (Latent Semantic Analysis) and the application use of Lexical Link Analysis (LLA).
Navy's Goals Accordion

One of Navy's big goals is looking for automated methods to detect patterns in unstructured Big Data, yet few tools, especially in the areas of Acquisition and Intelligence, have been applied. Lexical Link Analysis (LLA) is a text analysis method for Big Data that has a proven research track record.

Lexical links represented in LLA are derived from word pairs discovered using bi-gram analysis. Words are linked as word pairs that appear next to each other in the original documents (bi-gram). The word pairs then form a linked word pair network. We couple social network methodology with data mining of text to analyze the word meanings as if they are in a social community. Lexical links help reveal overlaps or gaps between two or more big data sets.

By leveraging the text analysis, LLA can consider how information is categorized, ranked, and sorted in globally interplayed social and semantic networks. Current internet search ranking methods requires established hyperlinks, like the ones in the internet such as citation networks or other forms of crowd-sourced collective intelligence.  Yet, these methods are not applicable to internal government data or public social media data because there are no hyperlinks available in these data.  Furthermore, current methods mainly score popular information and do not rank emerging and anomalous information where they may become more critical in contextually varied applications. For example, intelligence analysis may need to identify anomalous information to take proactive action for national security. Resource management may need to identify emerging information to best allocate a limited resource, anticipate and forecast upcoming events, and maximize return of investment.

By representing the text as a network, where each node represents a word or word pair representing a concept or an idea, LLA can measure authority and expertise centralities. We further develop a theory of system self-awareness (SSA) that balances the authority and expertise in a network. With this theory, LLA can not only discover the current authoritative ideas, but also predict the ideas that might emerge or grow to be authoritative in the future.  This effort then presents the decision makers with previously unavailable and emerging patterns and themes, as well as the levels of analysis, thus reduces the workload and overcomes the blind spots of human analysts and avoids knowledge ignorance.

LLA is related to PageRank, Latent semantic analysis (LSA), Probabilistic Latent Semantic Analysis (PLSA), Latent Dirichlet Allocation (LDA), Automap, Organizational Risk Analyzer for compute social network metrics, InXight, AlchemyAPI, Semantica for entity extraction, text analysis and sentiment analysis, WordNet, and Apache lucene, opennlp, and mahout and has taken the best of each to develop  LLA.

By applying learning agents to process big data sets in parallel, we usually can analyze a data set in about a week, maybe two รข?? much faster than humans. Results from these efforts arose from leveraging intelligent agent technology via an educational license with Quantum Intelligence, Inc. CLA is a computer-based learning agent, or agent collaboration tool, capable of ingesting and processing big data sources in parallel. We are also able to scale-up to big data.  This implementation is parallel and distributed and can be adapted further to the Hadoop, map/reduce framework.

We have been using LLA for improving the use and understanding of data within DoD. For example, for acquisition research, since the US DoD acquisition process is extremely complex, there has been a critical need for automation, validation, and discovery to help acquisition professionals, decision makers and researchers to reveal the interrelationships among the data elements and business processes.  We applied LLA to extract the links, compare the trends and discover previously unknown patterns (Zhao, MacKinnon & Gallup, 2010, 2011, 2012, 2013a & 2013b) from the data of the three services (Army, Navy and Air Force) more than ten years. In the Naval recruiting area, Facebook, Twitter, and many other social networking sites offer virtual environments for meeting possible candidates that could fit service entry profiles. Sponsored by the Navy Recruiting Command, we collected and analyzed the public "footprints" of Facebook users using LLA, which resulted in a list of selected individuals who could become strong officer candidates for the U.S. Navy.?

In the aftermath of the Haiti earthquake, U.S. military and civil organizations provided rapid and extensive relief operations. The challenge is to sift through the data that are collected in real-life events to create an overall picture of how various organizations (military and civil) actually collaborated. SOUTHCOM and USAID used Twitter and the HAITI HA/DR Community of Interest (COI) on the All Partners Access Network (APAN) to handle real-time information gathering and dissemination during the crisis. We analyzed about ~10000 documents collected from these social media platforms such as Twitter, Facebook, news-feed Web sites, official PDF briefing documents, situation reports, forums and blogs. The sensemaking goal was to use LLA to develop utilities and measures for analyzing trends and develop interagency synergies (Zhao, MacKinnon & Gallup, 2011 & 2012).