Trending February 2024 # 40 Questions On Probability For Data Science # Suggested March 2024 # Top 11 Popular

You are reading the article 40 Questions On Probability For Data Science updated in February 2024 on the website Tai-facebook.edu.vn. We hope that the information we have shared is helpful to you. If you find the content interesting and meaningful, please share it with your friends and continue to follow and support us for the latest updates. Suggested March 2024 40 Questions On Probability For Data Science

Introduction

Probability forms the backbone of many important data science concepts from inferential statistics to Bayesian networks. It would not be wrong to say that the journey of mastering statistics begins with probability. This skilltest was conducted to help you identify your skill level in probability.

A total of 1249 people registered for this skill test. The test was designed to test the conceptual knowledge of probability. If you are one of those who missed out on this skill test, here are the questions and solutions. You missed on the real time test, but can read this article to find out how you could have answered correctly.

Here are the leaderboard ranking for all the participants.

Are you preparing for your next data science interview? Then look no further! Check out the comprehensive ‘Ace Data Science Interviews‘ course which encompasses hundreds of questions like these along with plenty of videos, support and resources. And if you’re looking to brush up your probability sills even more, we have covered it comprehensively in the ‘Introduction to Data Science‘ course!

Overall Scores

Below are the distribution scores, they will help you evaluate your performance.

You can access the final scores here. More than 300 people participated in the skill test and the highest score obtained was 38. Here are a few statistics about the distribution.

Mean Score: 19.56

Median Score: 20

Mode Score: 15

This was also the first test where some one scored as high as 38! The community is getting serious about DataFest

Useful Resources

Basics of Probability for Data Science explained with examples

Introduction to Conditional Probability and Bayes theorem for data science professionals

1) Let A and B be events on the same sample space, with P (A) = 0.6 and P (B) = 0.7. Can these two events be disjoint?

A) Yes

B) No

Solution: (B)

P(AꓴB) = P(A)+P(B)-P(AꓵB).

An event is disjoint if P(AꓵB) = 0. If A and B are disjoint P(AꓴB) = 0.6+0.7 = 1.3

And Since probability cannot be greater than 1, these two mentioned events cannot be disjoint.

2) Alice has 2 kids and one of them is a girl. What is the probability that the other child is also a girl? 

You can assume that there are an equal number of males and females in the world.

A) 0.5

B) 0.25

C) 0.333

D) 0.75

Solution: (C)

The outcomes for two kids can be {BB, BG, GB, GG}

Since it is mentioned that one of them is a girl, we can remove the BB option from the sample space. Therefore the sample space has 3 options while only one fits the second condition. Therefore the probability the second child will be a girl too is 1/3.

3) A fair six-sided die is rolled twice. What is the probability of getting 2 on the first roll and not getting 4 on the second roll?

A) 1/36

B) 1/18

C) 5/36

D) 1/6

E) 1/3

Solution: (C)

The two events mentioned are independent. The first roll of the die is independent of the second roll. Therefore the probabilities can be directly multiplied.

P(getting first 2) = 1/6

P(no second 4) = 5/6

Therefore P(getting first 2 and no second 4) = 1/6* 5/6 = 5/36

4) 

A) TrueB) False

Solution: (A)

P(AꓵCc) will be only P(A). P(only A)+P(C) will make it P(AꓴC). P(BꓵAcꓵCc) is P(only B) Therefore P(AꓴC) and P(only B) will make P(AꓴBꓴC)

5) Consider a tetrahedral die and roll it twice. What is the probability that the number on the first roll is strictly higher than the number on the second roll?

Note: A tetrahedral die has only four sides (1, 2, 3 and 4). 

A) 1/2

B) 3/8

C) 7/16

D) 9/16

Solution: (B)

(1,1)

(2,1)

(3,1)

(4,1)

(1,2)

(2,2)

(3,2)

(4,2)

(1,3)

(2,3)

(3,3)

(4,3)

(1,4)

(2,4)

(3,4)

(4,4)

There are 6 out of 16 possibilities where the first roll is strictly higher than the second roll.

6) Which of the following options cannot be the probability of any event? 

C) 1.001

A) Only A

B) Only B

C) Only C

D) A and B

E) B and C

F) A and C

A) -0.00001B) 0.5C) 1.001

Solution: (F)

Probability always lie within 0 to 1. 

7) Anita randomly picks 4 cards from a deck of 52-cards and places them back into the deck ( Any set of 4 cards is equally likely ). Then, Babita randomly chooses 8 cards out of the same deck ( Any set of 8 cards is equally likely). Assume that the choice of 4 cards by Anita and the choice of 8 cards by Babita are independent. What is the probability that all 4 cards chosen by Anita are in the set of 8 cards chosen by Babita?

A)48C4 x 52C4

B)48C4 x 52C8

C)48C8 x 52C8

D) None of the above

Solution: (A)

The total number of possible combination would be 52C4 (For selecting 4 cards by Anita) * 52C8 (For selecting 8 cards by Babita).

Since, the 4 cards that Anita chooses is among the 8 cards which Babita has chosen, thus the number of combinations possible is 52C4 (For selecting the 4 cards selected by Anita) * 48C4 (For selecting any other 4 cards by Babita, since the 4 cards selected by Anita are common)

Question Context 8:

A player is randomly dealt a sequence of 13 cards from a deck of 52-cards. All sequences of 13 cards are equally likely. In an equivalent model, the cards are chosen and dealt one at a time. When choosing a card, the dealer is equally likely to pick any of the cards that remain in the deck.

8) If you dealt 13 cards, what is the probability that the 13th card is a King?

A) 1/52

B) 1/13

C) 1/26

D) 1/12

Solution: (B)

Since we are not told anything about the first 12 cards that are dealt, the probability that the 13th card dealt is a King, is the same as the probability that the first card dealt, or in fact any particular card dealt is a King, and this equals: 4/52

9) A fair six-sided die is rolled 6 times. What is the probability of getting all outcomes as unique?

A) 0.01543

B) 0.01993

C) 0.23148

D) 0.03333

Solution: (A)

For all the outcomes to be unique, we have 6 choices for the first turn, 5 for the second turn, 4 for the third turn and so on

Therefore the probability if getting all unique outcomes will be equal to 0.01543

10) A group of 60 students is randomly split into 3 classes of equal size. All partitions are equally likely. Jack and Jill are two students belonging to that group. What is the probability that Jack and Jill will end up in the same class? 

A) 1/3

B) 19/59

C) 18/58

D) 1/2

Solution: (B)

Assign a different number to each student from 1 to 60. Numbers 1 to 20 go in group 1, 21 to 40 go to group 2, 41 to 60 go to group 3.

All possible partitions are obtained with equal probability by a random assignment if these numbers, it doesn’t matter with which students we start, so we are free to start by assigning a random number to Jack and then we assign a random number to Jill. After Jack has been assigned a random number there are 59 random numbers available for Jill and 19 of these will put her in the same group as Jack. Therefore the probability is 19/59

A) 2.75

B) 3.35

C) 4.13

D) 5.33

Solution: (A)

Tosses = 2 * (1/4)[probability of selecting coin A] + 3*(3/4)[probability of selecting coin B]

             = 2.75

12) Suppose a life insurance company sells a $240,000 one year term life insurance policy to a 25-year old female for $210. The probability that the female survives the year is .999592. Find the expected value of this policy for the insurance company.

A) $131

B) $140

C) $112

D) $125

Solution: (C)

P(company loses the money ) = 0.99592

P(company does not lose the money ) = 0.000408

The amount of money company loses if it loses = 240,000 – 210 = 239790

While the money it gains is $210

Expected money the company will have to give = 239790*0.000408 = 97.8

Expect money company gets = 210.

Therefore the value = 210 – 98 = $112

13) 

A) TrueB) False

Solution: (A)

The above statement is true. You would need to know that

P(A/B) = P(AꓵB)/P(B)

Multiplying the three we would get – P(AꓵBꓵCc), hence the equations holds true

14) When an event A independent of itself?

A) Always

B) If and only if P(A)=0

C) If and only if P(A)=1

D) If and only if P(A)=0 or 1

Solution: (D)

The event can only be independent of itself when either there is no chance of it happening or when it is certain to happen. Event A and B is independent when P(AꓵB) = P(A)*P(B). Now if B=A, P(AꓵA) = P(A) when P(A) = 0 or 1.

15) Suppose you’re in the final round of “Let’s make a deal” game show and you are supposed to choose from three doors – 1, 2 & 3. One of the three doors has a car behind it and other two doors have goats. Let’s say you choose Door 1 and the host opens Door 3 which has a goat behind it. To assure the probability of your win, which of the following options would you choose.

A) Switch your choice

B) Retain your choice

C) It doesn’t matter probability of winning or losing is the same with or without revealing one door

Solution: (A)

I would recommend reading this article for a detailed discussion of the Monty Hall’s Problem. 

16) Cross-fertilizing a red and a white flower produces red flowers 25% of the time. Now we cross-fertilize five pairs of red and white flowers and produce five offspring. What is the probability that there are no red flower plants in the five offspring? 

A) 23.7%

B) 37.2%

C) 22.5%

D) 27.3%

Solution: (A)

The probability of offspring being Red is 0.25, thus the probability of the offspring not being red is 0.75. Since all the pairs are independent of each other, the probability that all the offsprings are not red would be (0.75)5 = 0.237. You can think of this as a binomial with all failures.

17) A roulette wheel has 38 slots – 18 red, 18 black, and 2 green. You play five games and always bet on red slots. How many games can you expect to win?

A) 1.1165

B) 2.3684C) 2.6316

C) 2.6316

D) 4.7368

Solution: (B)

The probability that it would be Red in any spin is 18/38. Now, you are playing the game 5 times and all the games are independent of each other. Thus, the number of games that you can win would be 5*(18/38) = 2.3684

18) A roulette wheel has 38 slots, 18 are red, 18 are black, and 2 are green. You play five games and always bet on red. What is the probability that you win all the 5 games?

A) 0.0368

B) 0.0238

C) 0.0526

D) 0.0473

Solution: (B)

The probability that it would be Red in any spin is 18/38. Now, you are playing for game 5 times and all the games are independent of each other. Thus, the probability that you win all the games is (18/38)5 = 0.0238

19) Some test scores follow a normal distribution with a mean of 18 and a standard deviation of 6. What proportion of test takers have scored between 18 and 24?

A) 20%

B) 22%

C) 34%

D) None of the above

Solution: (C)

So here we would need to calculate the Z scores for value being 18 and 24. We can easily doing that by putting sample mean as 18 and population mean as 18 with σ = 6 and calculating Z. Similarly we can calculate Z for sample mean as 24.

Z= (X-μ)/σ

Therefore for 26 as X,

Z = (18-18)/6  = 0 , looking at the Z table we find 50% people have scores below 18.

For 24 as X

Z = (24-18)/6  = 1, looking at the Z table we find 84% people have scores below 24.

Therefore around 34% people have scores between 18 and 24.

20) A jar contains 4 marbles. 3 Red & 1 white. Two marbles are drawn with replacement after each draw. What is the probability that the same color marble is drawn twice?

A) 1/2

B) 1/3

C) 5/8

D) 1/8

Solution: (C)

If the marbles are of the same color then it will be 3/4 * 3/4 + 1/4 * 1/4 = 5/8.

21) Which of the following events is most likely? 

C) At least 3 sixes when 18 dice are rolled

D) All the above have same probability

A) At least one 6, when 6 dice are rolledB) At least 2 sixes when 12 dice are rolled

Solution: (A)

Probability of ‘6’ turning up in a roll of dice is P(6) = (1/6) & P(6’) = (5/6). Thus, probability of

∞ Case 1: (1/6) * (5/6)5 = 0.06698

∞ Case 2: (1/6)2 * (5/6)10 = 0.00448

∞ Case 3: (1/6)3 * (5/6)15 = 0.0003

Thus, the highest probability is Case 1

22) Suppose you were interviewed for a technical role. 50% of the people who sat for the first interview received the call for second interview. 95% of the people who got a call for second interview felt good about their first interview. 75% of people who did not receive a second call, also felt good about their first interview. If you felt good after your first interview, what is the probability that you will receive a second interview call?

A) 66%

B) 56%

C) 75%

D) 85%

Solution: (B)

Let’s assume there are 100 people that gave the first round of interview. The 50 people got the interview call for the second round. Out of this 95 % felt good about their interview, which is 47.5. 50 people did not get a call for the interview; out of which 75% felt good about, which is 37.5. Thus, the total number of people that felt good after giving their interview is (37.5 + 47.5) 85. Thus, out of 85 people who felt good, only 47.5 got the call for next round. Hence, the probability of success is (47.5/85) = 0.558.

Another more accepted way to solve this problem is the Baye’s theorem. I leave it to you to check for yourself. 

23) A coin of diameter 1-inches is thrown on a table covered with a grid of lines each two inches apart. What is the probability that the coin lands inside a square without touching any of the lines of the grid? You can assume that the person throwing has no skill in throwing the coin and is throwing it randomly. 

You can assume that the person throwing has no skill in throwing the coin and is throwing it randomly.

A) 1/2

B) 1/4

C) Π/3

D) 1/3

Solution: (B)

Think about where all the center of the coin can be when it lands on 2 inches grid and it not touching the lines of the grid.

If the yellow region is a 1 inch square and the outside square is of 2 inches. If the center falls in the yellow region, the coin will not touch the grid line. Since the total area is 4 and the area of the yellow region is 1, the probability is ¼ .

24) There are a total of 8 bows of 2 each of green, yellow, orange & red. In how many ways can you select 1 bow? 

A) 1

B) 2

C) 4

D) 8

Solution: (C)

You can select one bow out of four different bows, so you can select one bow in four different ways. 

25) Consider the following probability density function: What is the probability for X≤6 i.e. P(x≤6)

What is the probability for X≤6 i.e. P(x≤6)

A) 0.3935

B) 0.5276

C) 0.1341

D) 0.4724

Solution: (B)

To calculate the area of a particular region of a probability density function, we need to integrate the function under the bounds of the values for which we need to calculate the probability.

Therefore on integrating the given function from 0 to 6, we get 0.5276

26) In a class of 30 students, approximately what is the probability that two of the students have their birthday on the same day (defined by same day and month) (assuming it’s not a leap year)?

For example – Students with birthday 3rd Jan 1993 and 3rd Jan 1994 would be a favorable event.

A) 49%

B) 52%

C) 70%

D) 35%

Solution: (C)

The total number of combinations possible for no two persons to have the same birthday in a class of 30 is 30 * (30-1)/2 = 435.

Now, there are 365 days in a year (assuming it’s not a leap year). Thus, the probability of people having a different birthday would be 364/365. Now there are 870 combinations possible. Thus, the probability that no two people have the same birthday is (364/365)^435 = 0.303.

Thus, the probability that two people would have their birthdays on the same date would be 1 – 0.303 = 0.696

27) Ahmed is playing a lottery game where he must pick 2 numbers from 0 to 9 followed by an English alphabet (from 26-letters). He may choose the same number both times.

If his ticket matches the 2 numbers and 1 letter drawn in order, he wins the grand prize and receives $10405. If just his letter matches but one or both of the numbers do not match, he wins $100. Under any other circumstance, he wins nothing. The game costs him $5 to play. Suppose he has chosen 04R to play.

What is the expected net profit from playing this ticket?

A) $-2.81

B) $2.81C) $-1.82

C) $-1.82

D) $1.82

Solution: (B)

Expected value in this case

E(X) = P(grand prize)*(10405-5)+P(small)(100-5)+P(losing)*(-5)

P(grand prize)=  (1/10)*(1/10)*(1/26)

P(small) = 1/26-1/2600, the reason we need to do this is we need to exclude the case where he gets the letter right and also the numbers rights. Hence, we need to remove the scenario of getting the letter right.

P(losing ) = 1-1/26-1/2600

Therefore we can fit in the values to get the expected value as $2.81

28) Assume you sell sandwiches. 70% people choose egg, and the rest choose chicken. What is the probability of selling 2 egg sandwiches to the next 3 customers?

A) 0.343

B) 0.063

C) 0.44

D) 0.027

Solution: (C)

Question context: 29 – 30

HIV is still a very scary disease to even get tested for. The US military tests its recruits for HIV when they are recruited. They are tested on three rounds of Elisa( an HIV test) before they are termed to be positive.

The prior probability of anyone having HIV is 0.00148. The true positive rate for Elisa is 93% and the true negative rate is 99%.

29) What is the probability that a recruit has HIV, given he tested positive on first Elisa test? The prior probability of anyone having HIV is 0.00148. The true positive rate for Elisa is 93% and the true negative rate is 99%.

A) 12%

B) 80%

C) 42%

D) 14%

Solution: (A)

I recommend going through the Bayes updating section of this article for the understanding of the above question.

30) What is the probability of having HIV, given he tested positive on Elisa the second time as well.

The prior probability of anyone having HIV is 0.00148. The true positive rate for Elisa is 93% and the true negative rate is 99%.

A) 20%

B) 42%

C) 93%

D) 88%

Solution: (C)

I recommend going through the Bayes updating section of this article for the understanding of the above question.

C) You have the same probability of winning in guessing either, hence whatever you guess there is just a 50-50 chance of winning or losing

D) None of these

Solution: (C)

32) The inference using the frequentist approach will always yield the same result as the Bayesian approach.

A) TRUE

B) FALSE

Solution: (B)

The frequentist Approach is highly dependent on how we define the hypothesis while Bayesian approach helps us update our prior beliefs. Therefore the frequentist approach might result in an opposite inference if we declare the hypothesis differently. Hence the two approaches might not yield the same results.

33) Hospital records show that 75% of patients suffering from a disease die due to that disease. What is the probability that 4 out of the 6 randomly selected patients recover?

A) 0.17798

B) 0.13184

C) 0.03295

D) 0.35596

Solution: (C)

Think of this as a binomial since there are only 2 outcomes, either the patient dies or he survives.

Here n =6, and x=4.  p=0.25(probability if living(success)) q = 0.75(probability of dying(failure))

P(X) = nCx pxqn-x = 6C4 (0.25)4(0.75)2 = 0.03295

34) The students of a particular class were given two tests for evaluation. Twenty-five percent of the class cleared both the tests and forty-five percent of the students were able to clear the first test.

Calculate the percentage of students who passed the second test given that they were also able to pass the first test.

A) 25%

B) 42%

C) 55%

D) 45%

Solution: (C)

This is a simple problem of conditional probability. Let A be the event of passing in first test.

B is the event of passing in the second test.

P(AꓵB) is passing in both the events

P(passing in second given he passed in the first one) = P(AꓵB)/P(A)

= 0.25/0.45 which is around 55%

35) While it is said that the probabilities of having a boy or a girl are the same, let’s assume that the actual probability of having a boy is slightly higher at 0.51. Suppose a couple plans to have 3 children. What is the probability that exactly 2 of them will be boys?

A) 0.38

B) 0.48

C) 0.58

D) 0.68

E) 0.78

Solution: (A)

Think of this as a binomial distribution where getting a success is a boy and failure is a girl. Therefore we need to calculate the probability of getting 2 out of three successes.

P(X) = nCx pxqn-x = 3C2 (0.51)2(0.49)1 = 0.382

36) Heights of 10 year-olds, regardless of gender, closely follow a normal distribution with mean 55 inches and standard deviation 6 inches. Which of the following is true?

A) We would expect more number of 10 year-olds to be shorter than 55 inches than the number of them who are taller than 55 inches

B) Roughly 95% of 10 year-olds are between 37 and 73 inches tall

C) A 10-year-old who is 65 inches tall would be considered more unusual than a 10-year-old who is 45 inches tall

D) None of these

Solution: (D)

None of the above statements are true.

37) About 30% of human twins are identical, and the rest are fraternal. Identical twins are necessarily the same sex, half are males and the other half are females. One-quarter of fraternal twins are both males, one-quarter both female, and one-half are mixed: one male, one female. You have just become a parent of twins and are told they are both girls. Given this information, what is the probability that they are identical?

A) 50%

B) 72%

C) 46%

D) 33%

Solution: (C)

This is a classic problem of Bayes theorem.  

P(I) denoted Probability of being identical and P(~I) denotes Probability of not being identical

P(Identical) = 0.3

P(not Identical)= 0.7

38) Rob has fever and the doctor suspects it to be typhoid. To be sure, the doctor wants to conduct the test. The test results positive when the patient actually has typhoid 80% of the time. The test gives positive when the patient does not have typhoid 10% of the time. If 1% of the population has typhoid, what is the probability that Rob has typhoid provided he tested positive?

A) 12%

B) 7%

C) 25%

D) 31.5%

Solution: (B)

We need to find the probability of having typhoid given he tested positive.

=P(testing +ve and having typhoid) / P(testing positive)

39) Jack is having two coins in his hand. Out of the two coins, one is a real coin and the second one is a faulty one with Tails on both sides. He blindfolds himself to choose a random coin and tosses it in the air. The coin falls down with Tails facing upwards. What is the probability that this tail is shown by the faulty coin?

A) 1/3

B) 2/3

C) 1/2

D) 1/4

Solution: (B)

We need to find the probability of the coin being faulty given that it showed tails.

P(Faulty) = 0.5

P(getting tails) = 3/4

P(faulty and tails) =0.5*1 = 0.5

Therefore the probability of coin being faulty given that it showed tails would be 2/3

40) A fly has a life between 4-6 days. What is the probability that the fly will die at exactly 5 days?

A) 1/2

B) 1/4

C) 1/3

D) 0

Solution: (D)

Here since the probabilities are continuous, the probabilities form a mass function. The probability of a certain event is calculated by finding the area under the curve for the given conditions. Here since we’re trying to calculate the probability of the fly dying at exactly 5 days – the area under the curve would be 0. Also to come to think of it, the probability if dying at exactly 5 days is impossible for us to even figure out since we cannot measure with infinite precision if it was exactly 5 days.

End Notes

If you missed out on this competition, make sure you complete in the ones coming up shortly. We are giving cash prizes worth $10,000+ during the month of April 2023.

If you have any questions or doubts feel free to post them below.

Check out all the upcoming skilltests here.

You're reading 40 Questions On Probability For Data Science

40 Questions To Test A Data Scientist On Time Series

Introduction

Time Series forecasting & modeling plays an important role in data analysis. Time series analysis is a specialized branch of statistics used extensively in fields such as Econometrics & Operation Research. This skilltest was conducted to test your knowledge of time series concepts.

Here are the leaderboard ranking for all the participants.

Table of Contents Overall Scores

Below are the distribution scores, they will help you evaluate your performance.

You can access the scores here. More than 300 people participated in the skill test and the highest score obtained was 38. Here are a few statistics about the distribution.

Mean Score: 17.13

Median Score: 19

Mode Score: 19

Useful Resources

A Complete Tutorial on Time Series Modeling in R

A comprehensive beginner’s guide to create a Time Series Forecast (with Codes in Python)

Q1. Which of the following is an example of time series problem?

Solution: (E)

All the above options have a time component associated.

2) Which of the following is not an example of a time series model?

Solution: (D)

Naïve approach: Estimating technique in which the last period’s actuals are used as this period’s forecast, without adjusting them or attempting to establish causal factors. It is used only for comparison with the forecasts generated by the better (sophisticated) techniques.

In exponential smoothing, older data is given progressively-less relative importance whereas newer data is given progressively-greater importance.

In time series analysis, the moving-average (MA) model is a common approach for modeling univariate time series. The moving-average model specifies that the output variable depends linearly on the current and various past values of a stochastic (imperfectly predictable) term.

3) Which of the following can’t be a component for a time series plot?

Solution: (E)

A seasonal pattern exists when a series is influenced byseasonal factors (e.g., the quarter of the year, the month, or day of the week). Seasonality is always of a fixed and known period. Hence, seasonal time series are sometimes called periodic time series

Seasonality is always of a fixed and known period. A cyclic pattern exists when data exhibit rises and falls that are not of fixed period.

Trend is defined as the ‘long term’ movement in a time series without calendar related and irregular effects, and is a reflection of the underlying level. It is the result of influences such as population growth, price inflation and general economic changes. The following graph depicts a series in which there is an obvious upward trend over time.

Quarterly Gross Domestic Product

Noise: In discrete time, white noise is a discrete signal whose samples are regarded as a sequence of serially uncorrelated random variables with zero mean and finite variance.

Thus all of the above mentioned are components of a time series.

4) Which of the following is relatively easier to estimate in time series modeling?

5) The below time series plot contains both Cyclical and Seasonality component.

A) TRUE

B) FALSE

Solution: (B)

Clusters of observations are frequently correlated with increasing strength as the time intervals between them become shorter. This needs to be true because in time series forecasting is done based on previous observations and not the currently observed data unlike classification or regression.

7) Smoothing parameter close to one gives more weight or influence to recent observations over the forecast. 

Solution: (A)

It may be sensible to attach larger weights to more recent observations than to observations from the distant past. This is exactly the concept behind simple exponential smoothing. Forecasts are calculated using weighted averages where the weights decrease exponentially as observations come from further in the past — the smallest weights are associated with the oldest observations:

where 0≤α≤10≤α≤1 is the smoothing parameter. The one-step-ahead forecast for time T+1T+1 is a weighted average of all the observations in the series y1,…,yT. The rate at which the weights decrease is controlled by the parameter αα.

Solution: (B)

Table 7.1 shows the weights attached to observations for four different values of αα when forecasting using simple exponential smoothing. Note that the sum of the weights even for a small αα will be approximately one for any reasonable sample size.

Observationα=0.2α=0.4α=0.6α=0.8yT0.20.40.60.8yT−10.160.240.240.16yT−20.1280.1440.0960.032yT−30.1020.08640.03840.0064yT−4(0.2)(0.8)(0.4)(0.6)(0.6)(0.4)(0.8)(0.2)yT−5(0.2)(0.8)(0.4)(0.6)(0.6)(0.4)(0.8)(0.2)

9) The last period’s forecast was 70 and demand was 60. What is the simple exponential smoothing forecast with alpha of 0.4 for the next period.

Solution: (D)

Yt-1= 70

St-1= 60

Alpha = 0.4

Substituting the values we get

0.4*60 + 0.6*70= 24 + 42= 66

10) What does autocovariance measure?

Solution: (D)

Option D is the definition of autocovariance.

11) Which of the following is not a necessary condition for weakly stationary time series?

Solution: (D)

A Gaussian time series implies stationarity is strict stationarity.

12) Which of the following is not a technique used in smoothing time series? 

Solution: (C)

Time series smoothing and filtering can be expressed in terms of local regression models. Polynomials and regression splines also provide important techniques for smoothing. CART based models do not provide an equation to superimpose on time series and thus cannot be used for smoothing. All the other techniques are well documented smoothing techniques.

13) If the demand is 100 during October 2024, 200 in November 2024, 300 in December 2024, 400 in January 2023. What is the 3-month simple moving average for February 2023?

Solution: (A)

X`= (xt-3 + xt-2 + xt-1 ) /3

(200+300+400)/ 3 = 900/3 =300

14) Looking at the below ACF plot, would you suggest to apply AR or MA in ARIMA modeling technique?

15) Suppose, you are a data scientist at Analytics Vidhya. And you observed the views on the articles increases during the month of Jan-Mar. Whereas the views during Nov-Dec decreases.

16) Which of the following graph can be used to detect seasonality in time series data?

17) Stationarity is a desirable property for a time series process.

Mean is constant and does not depend on time

The time series under considerations is a finite variance process

These conditions are essential prerequisites for mathematically representing a time series to be used for analysis and forecasting. Thus stationarity is a desirable property.

18) Suppose you are given a time series dataset which has only 4 columns (id, Time, X, Target).

What would be the rolling mean of feature X if you are given the window size 2?Note: X column represents rolling mean.A) 

B) 

C) 

D) None of the above

Solution: (B)

X`= xt-2 + xt-1 /2

Based on the above formula: (100 +200) /2 =150; (200+300)/2 = 250 and so on.

19) Imagine, you are working on a time series dataset. Your manager has asked you to build a highly accurate model. You started to build two types of models which are given below.

Model 1: Decision Tree modelModel 2: Time series regression model

At the end of evaluation of these two models, you found that model 2 is better than model 1. What could be the possible reason for your inference?

Solution: (A)

A time series model is similar to a regression model. So it is good at finding simple linear relationships. While a tree based model though efficient will not be as good at finding and exploiting linear relationships.

20) What type of analysis could be most effective for predicting temperature on the following type of data.

Solution: (A)

The data is obtained on consecutive days and thus the most effective type of analysis will be time series analysis.

21) What is the first difference of temperature / precipitation variable?

Solution: (B)

73.17-35 = 38.17

27.05-73.17 = – 46.11 and so on..

13.75 – 36.36 = -22.61

22) Consider the following set of data: 

{23.32 32.33 32.88 28.98 33.16 26.33 29.88 32.69 18.98 21.23 26.66 29.89}

What is the lag-one sample autocorrelation of the time series?

Solution: (C)

ρˆ1 = PT t=2(xt−1−x¯)(xt−x¯) PT t=1(xt−x¯) 2

= (23.32−x¯)(32.33−x¯)+(32.33−x¯)(32.88−x¯)+··· PT t=1(xt−x¯) 2

= 0.130394786

Where x¯ is the mean of the series which is 28.0275

23) Any stationary time series can be approximately the random superposition of sines and cosines oscillating at various frequencies.

random superposition of sines and cosines oscillating at various frequencies is white noise. white noise is weakly stationary or stationary. If the white noise variates are also normally distributed or Gaussian, the series is also strictly stationary.

Solution: (D)

Joint stationarity is defined based on the above two mentioned conditions.

27) For MA (Moving Average) models the pair σ = 1 and θ = 5 yields the same autocovariance function as the pair σ = 25 and θ = 1/5.

28) How many AR and MA terms should be included for the time series by looking at the above ACF and PACF plots?

Solution: (B)

Strong negative correlation at lag 1 suggest MA and there is only 1 significant lag. Read this article for a better understanding.

29) Which of the following is true for white noise? 

30) For the following MA (3) process yt = μ + Εt + θ1Εt-1 + θ2Εt-2 + θ3Εt-3 , where σt is a zero mean white noise process with variance σ2

Solution: (B)

Variance of the disturbances divided by (1 minus the square of the autoregressive coefficient

Which in this case is : 1/(1-(0.2^2))= 1/0.96= 1.041

Solution: (B)

33) Second differencing in time series can help to eliminate which trend?

34) Which of the following cross validation techniques is better suited for time series data?

Solution: (D)

Time series is ordered data. So the validation data must be ordered to. Forward chaining ensures this. It works as follows:

fold 1 : training [1], test [2]

fold 2 : training [1 2], test [3]

fold 3 : training [1 2 3], test [4]

fold 4 : training [1 2 3 4], test [5]

fold 5 : training [1 2 3 4 5], test [6]

35) BIC penalizes complex models more strongly than the AIC. 

N = number of observations

At relatively low N (7 and less) BIC is more tolerant of free parameters than AIC, but less tolerant at higher N (as the natural log of N overcomes 2).

36) The figure below shows the estimated autocorrelation and partial autocorrelations of a time series of n = 60 observations. Based on these plots, we should. 

Solution: (B)

The autocorr shows a definite trend and partial autocorrelation shows a choppy trend, in such a scenario taking a log would be of no use. Differencing the series to obtain a stationary series is the only option.

Question Context (37-38)

These results summarize the fit of a simple exponential smooth to the time series.

Solution: (B)

The predicted value from the exponential smooth is the same for all 3 years, so all we need is the value for next year. The expression for the smooth is

smootht = α yt + (1 – α) smooth t-1

Hence, for the next point, the next value of the smooth (the prediction for the next observation) is

smoothn = α yn + (1 – α) smooth n-1

= 0.3968*0.43 + (1 – 0.3968)* 0.3968

= 0.3297

38) Find 95% prediction intervals for the predictions of temperature in 1999. 

39) Which of the following statement is correct?

Solution: (C)

Autoregressive component: AR stands for autoregressive.  Autoregressive parameter is denoted by p.  When p =0, it means that there is no auto-correlation in the series.  When p=1, it means that the series auto-correlation is till one lag.

Integrated: In ARIMA time series analysis, integrated is denoted by d.  Integration is the inverse of differencing.  When d=0, it means the series is stationary and we do not need to take the difference of it.  When d=1, it means that the series is not stationary and to make it stationary, we need to take the first difference.  When d=2, it means that the series has been differenced twice.  Usually, more than two time difference is not reliable.

Moving average component: MA stands for moving the average, which is denoted by q.  In ARIMA, moving average q=1 means that it is an error term and there is auto-correlation with one lag.

40) In a time-series forecasting problem, if the seasonal indices for quarters 1, 2, and 3 are 0.80, 0.90, and 0.95 respectively. What can you say about the seasonal index of quarter 4?

Solution: (B)

The seasonal indices must sum to 4, since there are 4 quarters. .80 + .90 + .95 = 2.65, so the seasonal index for the 4th quarter must be 1.35 so B is the correct answer.

End Notes

If you missed out on this competition, make sure you complete in the ones coming up shortly. We are giving cash prizes worth $10,000+ during the month of April 2023.

If you have any questions or doubts feel free to post them below.

Check out all the upcoming skilltests here.

Related

Introduction To Git For Data Science

The data science and engineering fields are interacting more and more because data scientists are working on production systems and joining R&D teams. We want to make it simpler for data scientists without prior engineering experience to understand the core engineering best practices.

We are building a manual on engineering subjects like Git, Docker, cloud infrastructure, and model serving that we hear data science practitioners think about.

Introduction to Git

A version control system called Git is made to keep track of changes made to a source code over time.

Typically, each user will clone a single central repository to their local system (referred to as “origin” or “remote”) which the individual users will clone to their local machine (called “local” or “clone”). Users “push” and “merge” their completed work back into the central repository once they have stored relevant work (referred to as “commits”) on their computers.

Difference between Git and GitHub

Git serves as both the foundational technology, for tracking and merging changes in a source code, and its command-line client (CLI).

An online platform called GitHub was created on top of git technology to make it simpler. Additionally, it provides capabilities like automation, pulls requests, and user management. GitLab and Sourcetree are two additional options.

Git for Data Science

In data science we are going to analyze the data using some models and algorithms, a model might be created by more than one person which makes it hard to handle and makes updates at the same time, but Git makes this all easy by storing the previous versions and allowing many peoples to work on the same project at a single time.

Let’s look into some terms of Git which are very common among developers

Terms

Repository − “Database” containing all of a project’s branches and commits

Branch − A repository’s alternative state or route of development.

Merge − Merging two (or more) branches into one branch, one truth is the definition of the merge.

Clone − The process of locally copying a remote repository.

Origin − The local clone was made from a remote repository, which is referred to as the origin.

Main/Master − Common names for the root branch, which is the main repository of truth, include “main” and “master.”

Stage − Choosing which files to include in the new commit at this stage

Commit − A stored snapshot of the staged modifications made to the file(s) in the repository is known as a “commit.”

HEAD − Abbreviation for the current commit in your local repository.

Push − Sending changes to a remote repository for public viewing is known as pushing.

Pull − Pulling is the process of adding other people’s updates to your personal repository.

Pull Request − Before merging your modifications to main/master, use the pull request mechanism to examine and approve them.

As we have discussed above do for that we need some commands that are generally used, lets discussed them below −

git init − Create a new repository on your local computer.

git clone − begin editing an already-existing remote repository.

git add − Select the file or files to save (staging).

Show the files you have modified with git status.

git commit − Store a copy of the selected file(s) as a snapshot (commit).

Send your saved snapshots (commits) into the distant repository using the git push command.

Pull current commits made by others into your own computer using the git pull command.

Create or remove branches with the git branch.

git checkout − Change branches or reverse local file(s) modifications.

git merge − merges branches with git to create a single branch or a single truth.

Rules for Handling Git Process Smooth

There are some rules for handling the smooth process of uploading a project over GitHub

Don’t push datasets

Git is used to tracking, manage, and store the codes but it is not a good practice to put the datasets over it. Keep track of the data there are many good data trackers available.

Don’t push secrets Don’t use the –force

−force method is used in various situations but it is not recommended to use it mostly because while pushing the code to git if there is an error, it will be displayed by the compiler or the CLI to use the force method to put the data on the server but it is not a good approach.

Do small commits with clear descriptions

Beginners developers may not be as good with the small commits but it is recommended to do the small commits as they make the view of the development process much clear and helps out in future updates. Also writing a good and clear description makes the same process much easier.

Conclusion

A version control system called Git is made to keep track of changes made to a source code over time. Without a version control system, a collaboration between multiple people working on the same project is complete confusion. Git serves as both the foundational technology, for tracking and merging changes in a source code, and its command-line client (CLI). An online platform called GitHub was created on top of git technology to make it simpler. Additionally, it provides capabilities like automation, pulls requests, and user management.

25 Essential Computer Science Interview Questions {Updated For 2023}

Introduction to Computer Science Interview Questions and Answers

So you have finally found your dream job in Computer Science but are wondering how to crack the 2023 Computer Science interview and what could be the probable Computer Science Interview Questions. Every Computer Science interview is different, and the job scope is different too. Keeping this in mind, we have designed the most common  Computer Science interview Questions and answers to help you get success in your interview.

Start Your Free Software Development Course

Web development, programming languages, Software testing & others

1. What is a file?

A file is a named location that stores data or information permanently. A file is always stored inside a storage device using a file name (e.g., STUDENT.MARKS). A file name typically has a primary and secondary name separated by a “.” (DOT).

2. What is a class?

A class is a blueprint from which objects are created. A class contains methods and variables associated with an instance of a class.

3. What is an object?

}

4. What is a constructor?

A constructor is a method used to create an Object of a class. There are two types of constructor Default & Parameterized constructor.

5. What is the different OOPS principle?

The basic OOPS principle are as follows,

Encapsulation

Abstraction

Inheritance

Polymorphism

6. What is inheritance?

}

7. What is polymorphism?

Polymorphism is the ability of an object to take on multiple forms. Polymorphism is commonly used in OOP when a parent class reference refers to a child class object.

8. What are the instance and class variables?

}

9. Compare the method and constructor?

Method: Used to perform some function or operation.

Method: Has a return type.

10. What is a singleton class? 11. What are the steps for creating the object?

Abc a= new Abc();

12. What is the different type of access modifiers?

• Protected – Visible to package and subclass.

13. Which is the highest operator precedence in Java

The operator with the highest preference is the Postfix operator, i.e. () [].

14. What is an array?

The array is a container with a fixed number of similar data types.

15. What is the difference between equals() and method and == operator?

The equals() is a method that matches the content of the strings, whereas == is an operator and matches the object or reference of the strings.

16. Is string class final?

Yes

17. What is a wrapper class?

To access the primitive data type as an object, we use the wrapper class. They are the following:-

Primitive Type Wrapper class

boolean Boolean

char Character

byte Byte

short Short

int Integer

long Long

float Float

double Double

18. What is the difference between overloading and overriding? 19. What are multiple inheritances in Java?

Java supports multiple inheritances, i.e., the ability of a class to implement more than one Interface. A class can implement multiple Interfaces but cannot extend multiple classes.

20. What is a stream?

Output Stream: Used to write data into a destination.

21. What is a Character stream?

Java Character stream is used to perform input and output for 16-bit Unicode. The main classes users are FileReader and FileWriter, which internally uses FileInputStream and FileOutputStream, so the basic difference is that FileReader and FileWriter read and write two bites at a time, respectively.

22. What is a Byte stream?

The main classes related to byte streams are FileInputStream and FileOutputStream.

23. What is an Interface?

The Interface is a reference type in Java, similar to the class, but it’s a collection of abstract methods. A class can implement multiple interfaces.

24. What is the difference between class and Interface?

Below are the difference between Interface and class:-

The Interface cannot be instantiated.

An interface doesn’t have any constructors.

The Interface only has abstract methods.

A class implements an interface and extends a class.

An interface can extend multiple interfaces.

25. What is an abstract class?

A class that contains the abstract keyword in a declaration is called an abstract class. The properties of the abstract class are as follows:-

Abstract classes may or may not contain abstract methods, but if a class has at least one abstract method, it must be declared abstract.

The abstract class cannot be instantiated.

To use an abstract class, we have to inherit it from another class.

If we inherit an abstract class, we must provide implementations for all its abstract methods.

Recommended Article

This has been a comprehensive guide to the Computer Science Interview Questions and answers so that the candidate can crack down on these Computer Science Interview Questions easily. This article consists of all the top Computer Science Interview Questions and Answers. You may also look at the following articles to learn more –

Python Treatment For Outliers In Data Science

What is Feature Engineering?

When we have a LOT OF FEATURES in the given dataset, feature engineering can become quite a challenging and interesting module.

The number of features could significantly impact the model considerably, So that feature engineering is an important task in the Data Science life cycle.

Feature Improvements

In the Feature Engineering family, we are having many key factors are there, let’s discuss Outlier here. This is one of the interesting topics and easy to understand in Layman’s terms.

Outlier

An outlier is an observation of a data point that lies an abnormal distance from other values in a given population. (odd man out)

Like in the following data point (Age)

18,22,45,67,89,125,30

An outlier is an object(s) that deviates significantly from the rest of the object collection.

List of Cities

New York, Las Angles, London, France, Delhi, Chennai

It is an abnormal observation during the Data Analysis stage, that data point lies far away from other values.

List of Animals

cat, fox, rabbit, fish

An outlier is an observation that diverges from well-structured data.

The root cause for the Outlier can be an error in measurement or data collection error.

Quick ways to handling Outliers.

Outliers can either be a mistake or just variance. (As mentioned, examples)

If we found this is due to a mistake, then we can ignore them.

If we found this is due to variance, in the data, we can work on this.

In the picture of the Apples, we can find the out man out?? Is it? Hope can Yes!

But the huge list of a given feature/column from the .csv file could be a really challenging one for naked eyes.

First and foremost, the best way to find the Outliers are in the feature is the visualization method.

What are the Possibilities for an Outlier? 

Of course! It would be below quick reasons.

Missing values in a dataset.

Data did not come from the intended sample.

Errors occur during experiments.

Not an errored, it would be unusual from the original.

Extreme distribution than normal.

That’s fine, but you might have questions about Outlier if you’re a real lover of Data Analytics, Data mining, and Data Science point of view.

Let’s have a quick discussion on those.

Understand more about Outlier

Outliers tell us that the observations of the given data set, how the 

data point(s) differ significantly from the overall perspective. Simply saying 

odd one/many. this would be an 

error during 

data collection. 

Generally, 

Outliers

affect

 statistical results while doing the EDA process, we could say a quick example is the MEAN and MODE of a given set of data set, which will be misleading that the 

data

values would be higher than they really are.

Positive Relationship 

When the correlation coefficient is closer to value 1

 Negative Relationship

When the correlation coefficient is closer to value -1

Independent

When X and Y are independent

, then the

correlation coefficient

is close to

 zero (0)

We could understand the data collection process from the Outliers and its observations. An analysis of how it occurs and how to minimize and set the process in future data collection guidelines.

Even though the Outliers increase the inconsistent results in your dataset during analysis and the power of statistical impacts significant, there would challenge and roadblocks to remove them in few situations.

DO or DO NOT (Drop Outlier)

Before dropping the Outliers, we must analyze the dataset with and without outliers and understand better the impact of the results.

If you observed that it is obvious due to incorrectly entered or measured, certainly you can drop the outlier. No issues on that case.

If you find that your assumptions are getting affected, you may drop the outlier straight away, provided that no changes in the results.

If the outlier affects your assumptions and results. No questions simply drop the outlier and proceed with your further steps.

Finding Outliers

So far we have discussed what is Outliers, how it affects the given dataset, and Either can we drop them or NOT. Let see now how to find from the given dataset. Are you ready!

We will look at simple methods first, Univariate and Multivariate analysis.

Univariate method: I believe you’re familiar with Univariate analysis, playing around one variable/feature from the given data set. Here to look at the Outlier we’re going to apply the BOX plot to understand the nature of the Outlier and where it is exactly.

Let see some sample code. Just I am taking chúng tôi as a sample for my analysis, here I am considering age for my analysis.

plt.figure(figsize=(5,5)) sns.boxplot(y='age',data=df_titanic)



You can see the outliers on the top portion of the box plot visually in the form of dots.

Multivariate method: Just I am taking titanic.csv as a sample for my analysis, here I am considering age and passenger class for my analysis.

plt.figure(figsize=(8,5)) sns.boxplot(x='pclass',y='age',data=df_titanic)

We can very well use Histogram and Scatter Plot visualization technique to identify the outliers.

mathematically to find the Outliers as follows Z-Score and Inter Quartile Range (IQR) Score methods

Z-Score method: In which the distribution of data in the form mean is 0 and the standard deviation (SD) is 1 as Normal Distribution format.

Let’s consider below the age group of kids, which was collected during data science life cycle stage one, and proceed for analysis, before going into further analysis, Data scientist wants to remove outliers. Look at code and output, we could understand the essence of finding outliers using the Z-score method.

import numpy as np kids_age = [1, 2, 4, 8, 3, 8, 11, 15, 12, 6, 6, 3, 6, 7, 12,9,5,5,7,10,10,11,13,14,14] mean = np.mean(voting_age) std = np.std(voting_age) print('Mean of the kid''s age in the given series :', mean) print('STD Deviation of kid''s age in the given series :', std) threshold = 3 outlier = [] for i in voting_age: z = (i-mean)/std outlier.append(i) print('Outlier in the dataset is (Teen agers):', outlier) Output

The outlier in the dataset is (Teenagers): [15]

(IQR) Score method: In which data has been divided into quartiles (Q1, Q2, and Q3). Please refer to the picture Outliers Scaling above.  Ranges as below.

25th percentile of the data – Q1

50th percentile of the data – Q2

75th percentile of the data – Q3

Let’s have the junior boxing weight category series from the given data set and will figure out the outliers.

import numpy as np import seaborn as sns # jr_boxing_weight_categories jr_boxing_weight_categories = [25,30,35,40,45,50,45,35,50,60,120,150]  Q1 = np.percentile(jr_boxing_weight_categories, 25, interpolation = 'midpoint') Q2 = np.percentile(jr_boxing_weight_categories, 50, interpolation = 'midpoint') Q3 = np.percentile(jr_boxing_weight_categories, 75, interpolation = 'midpoint') IQR = Q3 - Q1 print('Interquartile range is', IQR) low_lim = Q1 - 1.5 * IQR up_lim = Q3 + 1.5 * IQR print('low_limit is', low_lim) print('up_limit is', up_lim) outlier =[] for x in jr_boxing_weight_categories: outlier.append(x) print(' outlier in the dataset is', outlier) Output

the outlier in the dataset is [120, 150]

sns.boxplot(jr_boxing_weight_categories)

Loot at the boxplot we could understand where the outliers are sitting in the plot.

So far, we have discussed what is Outliers, how it looks like, Outliers are good or bad for data set, how to visualize using matplotlib /seaborn and stats methods.

Now, will conclude correcting or removing the outliers and taking appropriate decision. we can use the same Z- score and (IQR) Score with the condition we can correct or remove the outliers on-demand basis. because as mentioned earlier Outliers are not errors, it would be unusual from the original.

Hope this article helps you to understand the Outliers in the zoomed view in all aspects. let’s come up with another topic shortly. until then bye for now! Thanks for reading! Cheers!!

The media shown in this article are not owned by Analytics Vidhya and is used at the Author’s discretion.

Related

Top Data Science Jobs In Gurgaon Available For Data Scientists In 2023

Analytics Insight has churned out the top Data Science jobs in Gurgaon available in 2023 Data Scientist at Airtel

Airtel is known as one of the largest telecom service providers for customers and businesses in India. It also operates in 18 countries with products such as 2G, 3G and 4G wireless services, high-speed home broadband as well as DTH. The company consists of more than 403 million customers across the world. Responsibilities: The data scientist needs to research, design, implement as well as evaluate novel Computer Vision algorithms, work on large-scale datasets and create scalable systems in versatile application fields. The candidate is required to work closely with the customer expertise team, research scientist teams as well as product engineering teams to drive model implementations along with new algorithms. The candidate also needs to interact with the customer to gain a better understanding of the business problems and help them by implementing machine learning solutions. Qualifications: A candidate is required to have practical experience in Computer Vision and more than three years in building production-scale systems in either Computer Vision, deep learning or machine learning. There should be coding skills in one programming language and a clear understanding of deep learning CV evaluation metrics such as mAP, F_beta and PR curves as well as face detection, facial recognition and OCR. The candidate also needs to have 2-3 years of modelling experience working with Pytorch, MxNet and Tensorflow along with object detection approaches such as Faster RCNN, YOLO and CenterNet.

Data Scientist at BluSmart

BluSmart is known as the first and leading all-electric ride-hailing mobility service in India. It has a mission to steer urban India towards a sustainable means of transportation by building a comprehensive electric on-demand mobility platform with smart charging and smart parking. The company will provide efficient, affordable, intelligent as well as reliable mobility. Responsibilities: The candidate is required to do a geospatial and time-based analysis of business vectors like time-travelled, fare, trip start and many more to optimise fleet utilisation and deployment as well as develop strategies to deploy electric vehicles and chargers in Delhi-NCR along with Mumbai by using data from thousands of trip from BluSmart cabs. The data scientist will create a new experimental framework to collect data and build tools to automate data collection by using open-source data analysis and visualisation tools. Qualifications: The candidate is required to have sufficient knowledge of data analytics, machine learning, and programming languages such as R, SQL and Python. The candidate needs to have practical experience with data analytics, machine learning and business intelligence tools such as Tableau with smart mathematical skills.

Associate Data Scientist at Pee Safe

Responsibilities: The data scientist should receive actionable insights from data to be used in real-time in all decision-making processes for the company and implement multiple processes across different departments to enhance business metrics. The candidate needs to create new models or improve existing models to be used for the supply chain, demand predictions, logistics and many more. Qualifications: The candidate should have a Bachelor’s degree in Statistics, Mathematics, Computer Science, Engineering or any other relevant field. The candidate is required to have at least two to three years of practical experience in quantitative analytics or data modelling. It is essential to have a clear understanding of predictive modelling, machine learning, clustering, classification techniques, algorithms, programming language as well as Big Data frameworks and visualisation tools such as Cassandra, Hadoop, Spark and Tableau. The candidate must have strong problem-solving skills with sufficient knowledge of Excel.

Data Scientist at Siemens Limited

Siemens is popularly known as a technology company focused on industry, infrastructure, mobility as well as healthcare. It aims in creating technologies for more resource-efficient factories along with resilient supply chains to transform industries. Responsibilities:  The candidate is required to design software solutions supplemented with Artificial Intelligence and machine learning based on the customer requirements within architectural or design guidelines. The candidate also needs to be involved in the coding of features, bug fixing as well as delivering solutions to scripting and quality guidelines. The person is responsible for ensuring integration and submission of solutions into software configuration management system, performing regular technical coordination and timely reporting. Qualifications: The candidate must have a strong knowledge of Data Science, Artificial Intelligence, machine learning, deep learning, exploratory analysis, predictive modelling, prescriptive modelling and Cloud systems with a B.E/B. Tech/ CA/ M. Tech in science background. The candidate should have practical experience in data visualisation tools, statistical computer languages, data architecture and machine learning techniques. It is essential to have a good knowledge of querying SQL, no SQL databases, data mining techniques, AWS services, computing tools as well as end-to-end Data Science pipelines into production.

Data Scientist at Mastercard

Mastercard is known as the global technology company in the financial industry, especially payments. It has a mission to connect an inclusive digital economy to benefit everyone by making transactions safe and accessible. It works in more than 210 countries with secure data and networks, innovations and solutions. Qualifications: The candidate should have practical experience in data management, support decks, SQL Server, Microsoft BI Stack, Python, campaign analytics, SSIS, SSAS, SSRS and data visualisation tools. It is essential to have a Bachelor’s or Master’s degree in Computer Science, IT, Engineering, Mathematics, Statistics or any relevant field.

Update the detailed information about 40 Questions On Probability For Data Science on the Tai-facebook.edu.vn website. We hope the article's content will meet your needs, and we will regularly update the information to provide you with the fastest and most accurate information. Have a great day!