Book Demo
DataBlaze

Make Your Data Useful

Built on the principles of standard Domain Vocabulary, Datablaze helps organizations to speed up the data acquisition and data curation resulting in intelligent implementation of domain-centric data lake.

Let us help your organization in facilitating intelligent implementation of domain-centric data management infrastructure.

Learn More

Data Acquisition

DataBlaze ingestion process ingests the data into its original form in data lake allowing the schema on read approach while exploring the data from data lake.

DataBlaze data ingestion is powered by open standards provided by Apache NiFi, DataBlaze supports number of other tools to ingest data from various source types, some common tools are:
  • For High speed streaming Apache Flume.
  • Messaging based (publish/subscribe) Apache Kafka.
  • Data Services to acquire data from ERP systems.

Maintenance of Domain Vocabulary

Rawcubes Domain Vocabulary tool integrated with DataBlaze allows organizations to maintain their business vocabulary in the form of taxonomy or OWL. Standard domain vocabulary embedded as part of product offerings can be customized to suit organizations hierarchy, depending on the size of the organization vocabulary can be very large and complex to maintain. With Rawcubes Domain Vocabulary Editor (DVE) organizations can handle business vocabulary at department level where users of a group or department can create their own custom view of the business vocabulary.

Domain and Metadata Mapper

As DataBlaze supports data from various sources in diverse data types to be ingested into data lake it becomes equally important to maintain efficiency of the data capture. The domain is represented as business vocabulary (DSL), mapper allows the mapping of source systems metadata to target system (Raw Data Zone) once the data being ingested from various sources mapped to target system normally to landing zone then the data available in data lake mapped to business vocabulary. The mapping of business vocabulary with ingested data helps to maintain the efficiency of data capture. DataBlaze uses business vocabulary for any searching or discovery of data.

Maintenance of Domain Vocabulary

Rawcubes Domain Vocabulary tool integrated with DataBlaze allows organizations to maintain their business vocabulary in the form of taxonomy or OWL. Standard domain vocabulary embedded as part of product offerings can be customized to suit organizations hierarchy, depending on the size of the organization vocabulary can be very large and complex to maintain. With Rawcubes Domain Vocabulary Editor (DVE) organizations can handle business vocabulary at department level where users of a group or department can create their own custom view of the business vocabulary.

Domain and Metadata Mapper

As DataBlaze supports data from various sources in diverse data types to be ingested into data lake it becomes equally important to maintain efficiency of the data capture. The domain is represented as business vocabulary (DSL), mapper allows the mapping of source systems metadata to target system (Raw Data Zone) once the data being ingested from various sources mapped to target system normally to landing zone then the data available in data lake mapped to business vocabulary. The mapping of business vocabulary with ingested data helps to maintain the efficiency of data capture. DataBlaze uses business vocabulary for any searching or discovery of data.

Augmented Data Discovery

DataBlaze's Augmented data discovery feature enable business users to search and discovery of data using Natural language processing and data patterns based on machine learning. Using Augmented data discovery business users can explore the data without writing a single line of query, the machine learning based automated pattern detection service detects the data patterns and provide these data patterns to search and discover the data.  Business users can also explore the data from data zones by simply writing their queries in natural language based on business vocabulary. 

Data Advisor a Self-learning service help users to explore the data based on data patterns, the available patterns give the glimpse of what's available in data lake and how it can be discovered.Query Advisor educate users by showing what all data exploration patterns are available.  Instead of users querying the data, our data discovery tells the users what all query patterns can be applied on the data. Based on user’s satisfaction with result the Data Advisor continue to train itself and improves in building the discovery patterns.

Data Flow Pipeline

DataFlow builder allows to perform operations on the ingested data, any supported operation can be performed on data from any zone except landing zone, as-a-rule the data in landing zone is always preserved in its original form. Once the data is landed in landing zone after that operations can be applied on the data.

Any operation dropped on data pipeline can be persisted or processed in memory and passed onto next operation on pipeline.The processing of operations on data pipeline is performed by SparkSQL andSparkML. Each operation is processed asynchronously allowing high performance.

HDFS Layout / Storage Provisioning

DataBlaze uses HDFS as storage infrastructure, by design HDFS is designed to look like a typical OS file structure, consisting of a directory/folder hierarchy that contains the data being stored in files. The storage provisioning allows the organizations to design of HDFS directory structure and organizing storage and determining how and where data is stored. Based on industry standards the Storage provisioning provides standard layout structure, the standards layout is built on the type of data to be stored, how it will be accessed. The access to storage file structure is controlled by polices by Apache Ranger.

The standard storage structure is defined in the form of Data Zones which helps in access performance, change management and data archival. The data security policies, auditing and archival is defined for each Data Zone. Layout provisioning provides intuitive user interfaces to manage, map and control the data zones.

HDFS Layout / Storage Provisioning

DataBlaze uses HDFS as storage infrastructure, by design HDFS is designed to look like a typical OS file structure, consisting of a directory/folder hierarchy that contains the data being stored in files. The storage provisioning allows the organizations to design of HDFS directory structure and organizing storage and determining how and where data is stored. Based on industry standards the Storage provisioning provides standard layout structure, the standards layout is built on the type of data to be stored, how it will be accessed. The access to storage file structure is controlled by polices by Apache Ranger.

The standard storage structure is defined in the form of Data Zones which helps in access performance, change management and data archival. The data security policies, auditing and archival is defined for each Data Zone. Layout provisioning provides intuitive user interfaces to manage, map and control the data zones.

Integrated governance model

Rawcubes Data Governance model uses number of open source frameworks like Apache Atlas for data cataloging and business metadata maintenance, monitoring and auditing is based on Elastic Search,Kibana and Logstash.

The Data Security and user security is controlled by Apache Ranger.