Theme images by Storman. Powered by Blogger.


Sunday, September 4, 2016

Latest Data Scientist Interview Questions With Answers.

Latest Data Scientist Interview Questions With Answers. Most Frequently Asked Data Scientist Interview Questions. Top 20 Data Scientist Interview Questions For Freshers And Experienced.



1) Mention what are the key skills required for data scientist?

A data scientist must have the following skills

Database knowledge
Database management
Data blending
Querying
Data manipulation
Predictive Analytics
Basic descriptive statistics
Predictive modeling
Advanced analytics
Big Data Knowledge
Big data analytics
Unstructured data analysis
Machine learning
Presentation skill
Data visualization
Insight presentation
Report design

2) Explain what is collaborative filtering?

Collaborative filtering is a simple algorithm to create a recommendation system based on user behavioral data. The most important components of collaborative filtering are users- items- interest.

A good example of collaborative filtering is when you see a statement like “recommended for you” on online shopping sites that’s pops out based on your browsing history.

3) Explain what are the tools used in Big Data?

Tools used in Big Data includes

Hadoop
Hive
Pig
Flume
Mahout
Sqoop
4) Explain what is KPI, design of experiments and 80/20 rule?

KPI: It stands for Key Performance Indicator, it is a metric that consists of any combination of spreadsheets, reports or charts about business process

Design of experiments: It is the initial process used to split your data, sample and set up of a data for statistical analysis

80/20 rules: It means that 80 percent of your income comes from 20 percent of your clients

5) Explain what is Map Reduce?

Map-reduce is a framework to process large data sets, splitting them into subsets, processing each subset on a different server and then blending results obtained on each.

6964774217_53d449288a_b

6) Explain what is Clustering? What are the properties for clustering algorithms?

Clustering is a classification method that is applied to data. Clustering algorithm divides a data set into natural groups or clusters.

Properties for clustering algorithm are

Hierarchical or flat
Iterative
Hard and soft
Disjunctive
7) What are some of the statistical methods that are useful for data-scientist?

Statistical methods that are useful for data scientist are

Bayesian method
Markov process
Spatial and cluster processes
Rank statistics, percentile, outliers detection
Imputation techniques, etc.
Simplex algorithm
Mathematical optimization
8) What is time series analysis?



Time series analysis can be done in two domains, frequency domain and the time domain.  In

Time series analysis the output of a particular process can be forecast by analyzing the previous data by the help of various methods like exponential smoothening, log-linear regression method, etc.

9) Explain what is correlogram analysis?

A correlogram analysis is the common form of spatial analysis in geography. It consists of a series of estimated autocorrelation coefficients calculated for a different spatial relationship.  It can be used to construct a correlogram for distance-based data, when the raw data is expressed as distance rather than values at individual points.

10) What is a hash table?

In computing, a hash table is a map of keys to values. It is a data structure used to implement an associative array. It uses a hash function to compute an index into an array of slots, from which desired value can be fetched.

11) What are hash table collisions? How is it avoided?

A hash table collision happens when two different keys hash to the same value.  Two data cannot be stored in the same slot in array.

To avoid hash table collision there are many techniques, here we list out two

Separate Chaining:
It uses the data structure to store multiple items that hash to the same slot.

Open addressing:
It searches for other slots using a second function and store item in first empty slot that is found

12) Explain what is imputation? List out different types of imputation techniques?

During imputation we replace missing data with substituted values.  The types of imputation techniques involve are

Single Imputation
Hot-deck imputation: A missing value is imputed from a randomly selected similar record by the help of punch card
Cold deck imputation: It works same as hot deck imputation, but it is more advanced and selects donors from another datasets
Mean imputation: It involves replacing missing value with the mean of that variable for all other cases
Regression imputation: It involves replacing missing value with the predicted values of a variable based on other variables
Stochastic regression: It is same as regression imputation, but it adds the average regression variance to regression imputation
Multiple Imputation
Unlike single imputation, multiple imputation estimates the values multiple times
13) Which imputation method is more favorable?

Although single imputation is widely used, it does not reflect the uncertainty created by missing data at random.  So, multiple imputation is more favorable then single imputation in case of data missing at random.

14) Explain what is n-gram?

N-gram:

An n-gram is a contiguous sequence of n items from a given sequence of text or speech. It is a type of probabilistic language model for predicting the next item in such a sequence in the form of a (n-1).

15) Explain what is the criteria for a good data model?

Criteria for a good data model includes

It can be easily consumed
Large data changes in a good model should be scalable
It should provide predictable performance
A good model can adapt to changes in requirements

0 on: "Latest Data Scientist Interview Questions With Answers."