Welcome to Innovative Technologies

Solutions > Big Data > Reference Solutions

On-Demand Risk Management

On Demand Risk Management uses Value at Risk(VaR) as risk assessment measure that is popularly used in a risk management model. VaR is based on current market conditions, time and day of the investment among other factors which are utilized to generate a confidence value to evaluate the risk involved for a given investment. The VaR can be calculated across days to generate comparison models for investment portfolios. The visualization of the results can also be integrated to enhance understandability and provide deeper analysis. The reference application can be enhanced and customized as per each individual Customer requirement.

Clickstream reference application

The application processes user interactions at a particular website in real time and immediately outputs predictions, recommendations or alerts to be displayed back to the user or sent to third parties. This is a general-purpose application that can be adapted to various situations, such as:

- collecting user clicks on advertisements and predict which advertisements will be clicked on, with the purpose of choosing personalized advertisements which have a higher chance of being clicked on, and therefore improved ad monetization
- collecting user interaction with an e-commerce website, such as products viewed, searches executed and product categories visited, for the purpose of recommending similar products to those currently displayed
- collecting user interaction with a subscription-based website for the purpose of predicting whether the user is likely to abandon the service (so-called “churn”). An alert could be sent to website administrator and users could be offered special offers to reduce churn.

In each case, a machine learning model is used from historical user interactions, object properties and current context of user on the web page. This model is used in batch mode and deployed in real time.

Estimation of Fire-Risk in insurance using Big Data (Hadoop) and Machine Learning

The insurance sector is one of the most promising fields for the application of machine learning in industry. Estimation of risk for a particular claim, involving the estimation of the possible claim amount relative to the insured value of the insured entity is an extremely important goal for any insurance company. Our particular solution aims to estimate the severity of possible fire based losses (which is one of the largest sources of high value claims) for the properties insured by the agency. Fires are inherently difficult to predict due to their low probability of occurrence and their dependence on a wide variety of factors, a lot of which are beyond human control such as weather and geographical factors. However, fire-based claims are also characterized by very high claim amounts due to the severity of loss suffered by the property. These two factors combine to make fires the source of very large losses in the insurance sector. Therefore, the ability to be able to quantifiably and reliably measure the fire risk associated with a particular property would be highly beneficial to any insurance agency as it would lead to much more accurate pricing models.

Our project aims to harness the power of big data as well as highly advanced machine learning techniques to effectively estimate the fire-risk associated with the particular property. We take into account a wide variety of features to calculate the ratio between the claim amount in the event of a fire and the insured value of the property. We consider demographic, location-based, weather-based, crime-based and other descriptive variables to capture representative features which would serve as effective identifiers of fire risk as well as the claim amount in the event of a fire. This information is then piped into advanced machine learning algorithms which allow us to model the risk associated with the property to very high accuracy and eliminates guesswork from the process. The model also allows us to identify which subset of factors are most important in determining fire-risk, while at the same time allowing us to take into account a far larger number of variables than would be possible for any human being to consider. Our use of advanced machine learning techniques, coupled with the use of highly scalable parallelized algorithms enable us to quickly and reliable provide results for the firm at the time of claims pricing. In addition, this capability also allows the company to estimate its exposure for its current policies and take appropriate risk to minimize the possibility of losses in the future.

This project solves an important business problem in the insurance space by making use of the large amount of data already collected by the firm during the insurance process. Our efforts in solving this problem has led to us being in the advanced stages of having a working prototype and we aim to have a finished application ready for use in the near future.

Data Engineering Best Practices

Technologies in BigData space have been evolving rapidly and the goal of this project is to understand these various technologies and tools and how to use them effectively. This project would lay the foundation for all other projects we do in the BigData space.

The primary goal of this project is to arrive at best practices when dealing with huge amounts of data.

To develop best data engineering practices for consuming large amounts of data into Hadoop cluster in various input formats (Parquet, Avro, SequenceFiles etc):

- arrive at optimal cluster setup and resource utilization when processing data using Spark
- arrive at best practices for efficiently indexing and searching using ElasticSearch.
- Evaluate new tools and popular frameworks as an alternative to current technologies.
- Study the feasibility of using Google DataFlow/Beam as an alternative to existing Hadoop tools
- For streaming application study using Apache Flink and Apache Samza as alternatives to Apache Spark for Streaming
- Continue to explore the data visualization tools for data presentation and data analytics

One of the datasets we have chosen for this project is the 'StackOverflow Data Dump' available at https://archive.org/details/stackexchange. We plan on adding more and more publicly available datasets in the future.

Another goal of this project is to increase the visibility of the company in open source community by participating in Data Science challenges posted on sites like Kaggle (https://www.kaggle.com/) and explore the possibility of contributing to various open source project in BigData space.