Trending December 2023 # Analytical Scientist– Avery Dennison, Pune # Suggested January 2024 # Top 13 Popular

You are reading the article Analytical Scientist– Avery Dennison, Pune updated in December 2023 on the website Tai-facebook.edu.vn. We hope that the information we have shared is helpful to you. If you find the content interesting and meaningful, please share it with your friends and continue to follow and support us for the latest updates. Suggested January 2024 Analytical Scientist– Avery Dennison, Pune

Here is an exotic opening – Avery Dennison is looking for a scientist, who is a subject matter expert in one or more areas of chromatography, spectroscopy, thermal analysis, and imaging & surface science. The Analytical Scientist should have in-depth knowledge and hands-on experience with various analytical instruments used in the coatings, adhesives, films and papers industries.

Designation – Analytical Scientist

Location – Pune

About employer – Avery Dennison

About Avery Dennison

Avery Dennison is a global leader in labeling and packaging materials and solutions. The company’s applications and technologies are an integral part of products used in every major market and industry. With operations in more than 50 countries and more than 26,000 employees worldwide, Avery Dennison serves customers with insights and innovations that help make brands more inspiring and the world more intelligent. Headquartered in Pasadena, California, the company reported sales from continuing operations of $6 billion in 2012.

Job Description:

The Analytical Scientist will implement analytical and materials science capabilities within Avery’s Global Innovation Center in Pune, India. The scientist will be an instrumental and subject matter expert in one or more areas of chromatography, spectroscopy, thermal analysis, and imaging & surface science.

The Analytical Scientist is dedicated to the further growth of Avery Dennison in India and globally through the deployment of new and improved materials science and analytical capabilities. This individual will be instrumental in building analytical technical depth in the Global Innovation Center in India and will lead analytical/materials science discussion and evaluation of new techniques with internal and external partners.

This person will strive to position the Global Innovation Center in India as a leading analytical & materials science resource for the whole of Avery Dennison.

The Analytical Scientist will have in-depth knowledge and hands-on experience with various analytical instruments used in the coatings, adhesives, films and papers industries. This individual will also possess strong skills in problem solving, data analysis & interpretation, and strong communication skills, both verbal and written. This role will be based in Pune, India but will require regular and extensive interactions with Avery Dennison analytical centers in US, Europe and China.

Responsibilities:

Run and maintain analytical instruments, collect data, interpret data, develop test methods

Interact and interface with internal and external customers to provide problem solving, failure analysis, materials science data and understanding

Standardize and implement test and analysis methods within R&D

Interface with the equipment supplier base to select and specify the instruments for purchase and keep abreast of new developments in the analytical field

Perform benchmarking of existing Avery products and competition products

Develop database of material ageing performance and develop predictive correlations of ageing behaviour

Issue test and analysis reports

Interface with global analytical community in Avery to share best practices and learning and leverage capacity

Develop customized test methods for specific product applications

Act in a manner consistent with Avery Dennison leadership principles

Qualification and Skills Required:

Minimum Bachelor’s degree in engineering or science

5-10 years industrial experience with expertise in operating an analytical instrumentation.

Experience with thermal analysis (DSC, TGA, Rheometer), chromatographic techniques (GPC, GCMS, HPLC), spectroscopy (FTIR), surface science (contact angle, zeta potential), and imaging techniques (SEM, optical microscopy)

Materials and product knowledge. Preferable to have some knowledge of Labeling, Identification, Decoration and Graphics materials and components (adhesives, films, paper, coating, printing, converting, packaging materials)

Highly developed collaboration capability – working through matrices while driving project execution

Strong communication skills are a must; should have good verbal, written and presentation skills to technical and non-technical personnel

Willingness to learn and be customer facing

Creative problem solving

High degree of ingenuity and creativity

Reliable discussion partner with internal and external customers

Effective project mgmt (timelines etc.)

Sense of urgency

Domestic and international travel (10-20%) will be a requirement

Interested candidates can apply for this job at this PAGE.

If you want to stay updated on latest analytics jobs, follow our job postings on twitter or like our Careers in Analytics page on Facebook

Related

You're reading Analytical Scientist– Avery Dennison, Pune

Validation & Verification Of Analytical Methods

This training course is the easiest and most completed material for every analyst who needs to start in laboratory fields and for experience analysts too. not only theoretical but also practical training using excel sheets for calculation where it includes: Full validation example (excel sheet contains validation up to measurement uncertainty calculation for analysis and for sampling in a very easy way), control charts ready to be used for any method (accuracy and precision charts), ANOVA excel sheet ready to be used and also guidelines followed in this course attached with videos. After you will finalize this training and apply every step you will be able o validate or verify any method and that’s very important to be accredited according to ISO 17025/2023 edition.

Note: All guidelines used are attached.

Learn with me how to develop methods for the analysis of the analyte of interest. What are the studies required to develop any method?

Matrix effect study, the chemistry of target analytes, and following steps to get the required method for analytes of interest.

How to select the instrument which is fit for the method.

Learn also how to evaluate most of the performance parameters according to international guidelines to meet the requirements of ISO 17025/2023 edition.

Learn the differences between verification & validation of methods and when you need to verify or validate the method.

Learn also how to ensure the validity of results and to get accurate and reliable results.

Learn how to calculate the limit of detection and the Limit of quantitation in 3 different ways.

Selectivity and specificity for analytes of interest.

Linearity in easy steps (Method linearity).

Learn the differences between Repeatability & reproducibility and how to ensure the repeatability of a specific method to analyze the required analytes.

Accuracy, precision & trueness.

Inter-laboratory comparison.

Intra-laboratory comparison.

Proficiency test (PT) and Certified reference materials (CRM).

Measurement uncertainty calculation for analytical measurement and for sampling, excel sheet attached.

Finally, create your own Excel sheet with a complete example in a very easy procedure.

Goals

Learn with me how to develop methods for the analysis of the analyte of interest. What are the studies required to develop any method?

Matrix effect study, the chemistry of target analytes, and following steps to get the required method for analytes of interest.

How to select the instrument which is fit for the method.

Learn also how to evaluate most of the performance parameters according to international guidelines to meet the requirements of ISO 17025/2023 edition.

Learn the differences between verification & validation of methods and when you need to verify or validate the method.

Learn also how to ensure the validity of results and to get accurate and reliable results.

Learn how to calculate the limit of detection and the Limit of quantitation in 3 different ways.

Selectivity and specificity for analytes of interest.

Linearity in easy steps (Method linearity).

Learn the differences between Repeatability & reproducibility and how to ensure the repeatability of a specific method to analyze the required analytes.

Accuracy, precision & trueness.

Inter-laboratory comparison.

Intra-laboratory comparison.

Proficiency test (PT) and Certified reference materials (CRM).

Measurement uncertainty calculation for analytical measurement and for sampling, excel sheet attached.

Finally, create your own Excel sheet with a complete example in a very easy procedure.

Prerequisites

A computer installed with EXCEL

Internet connection

Computer with any specification

Listening to every lecture many times 

Basic computer fundamentals

paper and pen

It Doesn’t Take A Rocket Scientist

Good news/bad news: The Columbia disaster has brought renewed attention to spaceflight, but so far, much of that attention lacks any real clarity of understanding. Rather than train the spotlight on our space program’s fairly desperate need for both funding and vision, Columbia seems to have ushered in open season on NASA. Congressional hearings rehash hoary old debates about the value of our space program, chastizing the agency and calling for hastily conceived reforms. Many people with whom I’ve been privileged to work closely inside and around NASA share my concern that we may be on the verge of making irreversible decisions that future generations will regret. The Bush administration’s announcement of a redirection of the space program, which was pending at press time, may address some issues raised by the Columbia investigation, but it’s sure to miss some more fundamental problems, problems that are deep, structural and, if you believe in the value of space exploration, critical to our place in the 21st century.

In a decade of professional practice in large-scale urban, medical and institutional architecture, I have always started any new project with an investigation into institutional memory. I need to know how previous programs arrived at their final designs before I feel qualified to propose next-generation solutions. But almost immediately after I arrived at NASA in 1997, I learned that trying to gather such information in the 18,000-employee, 16-facility agency was tough going. The standard response when I requested data on old projects was a quizzical stare. As I began working on the design of the TransHab, an inflatable habitat for long crew expeditions like a Mars mission, I realized I needed solid dimensions for Skylab interiors and furnishings. Those drawings always seemed archived somewhere beyond reach. Eventually I just went over to the Skylab 1G Trainer at Space Center Houston’s visitor center with a tape measure and some gum-soled shoes. I’m sure it gave a few tourists a real thrill to come into the Trainer exhibit and find me dangling from the ceiling.

heart of the matter: As has been pointed out with regard to the Columbia disaster, there is within NASA a creeping lack of interest in real expertise. When any bureaucracy supports its mandarin culture over real intellectual capital–precisely what the board that investigated the Columbia disaster accused NASA of doing–it becomes stagnant rather than productive.

product are readily available.

But even these measures won’t fully address the squandering of hard-won expertise, because the problem isn’t confined to a failure of archiving. Any team that takes on a project is going to amass some truly valuable information. What happens then? At NASA, more often than not, project teams get disbanded and people with unique knowledge get poached away. Whereas other industries actively encourage the capture of knowledge in team environments–where the sum of knowledge is measurably greater than any individual effort–NASA seems unaware of the value of a stable, successful team and its ability to store, transmit, and use accumulated knowledge.

the feats that made NASA great. Finally earning their approval after three days of vigorous work felt like the greatest achievement of my life.

40 years of spaceflight, the results of thousands of failures large and small. As Charlie Feltz told us, “engineers learn by failures. We’ve had a lot of failures.”

Here’s an idea: Why don’t we borrow a pattern from design disciplines like architecture and industrial design, and develop “studios” populated by specialists from different fields–and when one project is done, try keeping the team together.

It’s not that NASA hasn’t taken first steps toward developing a meaningful shuttle replacement–it’s just that those steps invariably ended in a stumble. In the past three years, we’ve seen three separate programs proposed: the Second Generation Reusable Launch Vehicle (2GRLV), the Space Transportation Architecture Study (STAS) and the Space Launch Initiative (SLI). Each set forth overarching new strategies and architectures for human spaceflight that differed only slightly in scope. And each took a few toddling steps before the rug was pulled out. (Just three months before the Columbia accident, NASA diverted the SLI’s $4.5 billion budget to help cover the needs of the shuttle and International Space Station programs.)

Now NASA is pushing a new program dubbed the Orbital Space Plane, which is widely touted as the plan to replace the space shuttle. There is some confusion in this, since not one of the specifications of the Orbital Space Plane as currently envisioned could match the shuttle’s capacity for crew support, nor its sheer power as a high-tonnage launch system.

The inherited crew-transfer component had originally been conceived as one element of a broad, upgradable, long-term system, capable of carrying up to 10 passengers for full-up missions, and of active docking and orbital operations. As the OSP emerged last spring, it had fewer and fewer of those characteristics, until it became the pint-size version of a passive-crew-rescue vehicle envisioned today. A competition that was already under way–and from which several potential bidders had been eliminated–had been radically rescoped to meet immediate political goals within soaring budgetary shortfalls.

Why? Probably because there wasn’t enough vision or commitment behind the shuttle-replacement plans to begin with.

system and requested that the crew-transfer part of the program be fast-tracked; or, if that approach didn’t seem responsive enough to the needs of the day, the table should have been swept clean and the process started afresh with a new set of problems on the boards.

ideas by inviting competition and reopening the field of design solutions? Most likely cost savings and superior design.

Here is the recent history of shuttle-replacement systems in a nutshell: Propose and study a succession of systems, then fight to keep a single subcomponent going when the budget is slashed–without considering its long-term compatibility with the rest of the human spaceflight program. All these separate pieces somehow need to be made to fit the next wave of big-picture plans. And the Big Plan bogey keeps shifting–it’s anyone’s guess how the OSP will fit into the Bush Administration’s new space initiative. When the components’ utility in a new scenario is hard to prove, they get shot down–no matter how much effort has already gone into their development. They become ideas that go back on the shelf, only to get reinvented by future generations.

As many wise space pundits have said in recent times, NASA needs a challenge. Without a broad, external challenge backed up by consistent support and political will, it seems unlikely that the kind of heroic effort and vision that characterized the first decade and a half of NASA’s existence will re-emerge.

What these pundits are really bemoaning is the lack of consistent vision, which ultimately stems from an issue that is much larger and older than NASA, and whose nature is of profound interest to architects and master planners, because it has a powerful effect on the kind and scale of projects we may build. Simply put, undertaking what we call Great Projects–projects of a large, public scope whose completion will require 10 years or more–is very difficult in a democracy.

Under our democratic system, it is inherently impossible to ensure that any long-term program will receive funding, or remain consistently funded, from year to year. From this perspective, the four terms of FDR’s nearly unchallenged administration may well have been critical not only to the establishment of the Works Progress Administration but, more important, to the completion of many individual WPA projects.

Certainly in today’s politically polarized environment, a shift from a Democratic to a Republican administration (or vice versa) often portends the cancellation of many unfinished public projects–for example, the several major human spaceflight programs axed before the end of February 2001, less than a month after George W. Bush’s inauguration.

When budgets are cut, the public needs to be aware that this will result in the loss of valuable programs and personnel. But for those losses to matter to the American people, a truly inspiring vision for NASA must be articulated. And when politicians announce new NASA initiatives, whether to the Moon or Mars or beyond, the public must listen hard within the announcement for a coherent plan and a powerful commitment–including, of course, the funding–to deliver the mission itself and not just the idea of mission.

On this point, the Columbia Accident Investigation Board’s report is very clear: “It is the view of the Board that the previous attempts to develop a replacement vehicle for the aging Shuttle represent a failure of national leadership.”

NASA proved a long time ago that it can answer a profound and improbable challenge, as it did with the great Moon mission announced by President Kennedy in 1961. But it is not up to NASA to supply the vision itself. That falls to our leaders. If they do supply the vision, it’s a safe bet that a truly renewed NASA will do an extraordinary job of bringing it to fruition.

by NASA

Adams’s Critique

Adams first worked to improve the original proposals, rearranging certain elements to suit the crew. Here, she shows two potential seating arrangements meant to enhance socialization. Even these layouts, however, would be too cramped and awkward.

Pre-existing Design

A group of structural engineers had drawn up the initial designs before the architects got involved. There was no up or down in these plans–astronauts in different areas would be inverted relative to one another. Besides being disorienting for the crew, this design also constituted an inefficient use of the total volume.

Outsider In

Constance Adams stands before the Lunar Landscape section of Johnson Space Center’s Starship Gallery. “I am one of the people who live in the boundary world between the space ‘insiders’ and the general educated public,” she says.

by Brent Humphreys

11 New Analytical Functions In Google Sheets For 2023

Coming hot on the heels of last year’s batch of new lambda functions, Google recently announced another group of new analytical functions for Sheets.

Included in this new batch are the long-awaited LET function, 8 new array manipulation functions, a new statistical function, and a new datetime function.

Let’s begin with a look at the new array functions. The LET function is at the end of the post.

New Array Manipulation Functions

Although there are eight new array manipulation functions, you can think of them as four pairs of horizontal/vertical array functions.

Although most of this behavior was already possible with existing functions, these new functions simplify the syntax and are therefore a welcome upgrade. They also maintain function parity with Excel, which will help folks coming from that world.

TOROW Function

TOROW transforms a range into a single row.

This formula turns the input array in A1:C2 into a single row:

=TOROW(

A1:C2

)

It has optional arguments that determine how to handle blank cells or error values, and also whether to scan down columns or across rows when scanning the input range.

More information in the Google Documentation.

TOCOL Function

The TOCOL function transforms a range into a single column. It behaves in the same way as the FLATTEN function.

This formula in cell A5 transposes each row into a column format and puts them one atop another:

=TOCOL(

A1:C2

)

Like the TOROW function, it has optional arguments that determine how to handle blank cells or error values, and also whether to scan down columns or across rows when scanning the input range.

More information in the Google Documentation.

CHOOSEROWS Function

Given a range of data, the CHOOSEROWS function lets you select rows by row number.

For example, this formula selects the first, second, and fourth rows:

=CHOOSEROWS(

A1:C5

,

1

,

2

,

4

)

This can also be achieved, of course, by the FILTER function or the QUERY function. However, this new function is a nice, lightweight alternative for when you know the row numbers and don’t need to perform a conditional test on each row.

More information in the Google Documentation.

CHOOSECOLS Function

CHOOSECOLS is a welcome addition to the function family.

It lets you select specific columns from a range, which previously required a QUERY function and was more awkward because you use letter references for the columns.

This example selects the first and third columns from the input range:

=CHOOSECOLS(

A1:C5

,

1

,

3

)

More information in the Google Documentation.

WRAPROWS Function

WRAPROWS takes a 1-dimensional range (a row or a column) and turns it into a 2-dimensional range by wrapping the rows.

It takes three arguments: 1) an input range, 2) a wrap count, which is the maximum number of elements in the new rows, and 3) a pad value, to fill in any extra cells.

In this example, the formula wraps the first row into multiple rows with a max of two elements on each row. Notice the second comma in the formula. This sets the pad value to blank, which means cell B8 is blank.

=WRAPROWS(

A1:I1

,

2

,)

More information in the Google Documentation.

WRAPCOLS Function

WRAPCOLS takes a 1-dimensional range (a row or a column) and turns it into a 2-dimensional range by wrapping the columns.

It takes three arguments: 1) an input range, 2) a wrap count, which is the maximum number of elements in the new columns, and 3) a pad value, to fill in any extra cells.

In this example, the formula wraps the first row into multiple columns with a max of two elements down each column. Notice the second comma in the formula. This sets the pad value to blank, which means cell E4 is blank.

=WRAPCOLS(

A1:I1

,

2

,)

More information in the Google Documentation.

VSTACK Function

The VSTACK function stacks ranges of data vertically.

For example, you could VSTACK to easily combine two (or more) datasets:

=VSTACK(

A1:B5

,

D2:E5

)

It takes the data in the range A1:B5 (which includes a header row) and appends the data from D2:E5 underneath, so you have it all in a single table:

More information in the Google Documentation.

HSTACK Function

HSTACK combines data ranges horizontally.

For example, this formula in A14 combines three data ranges horizontally:

=HSTACK(

A1:A5

,

C3:C7

,

A7:A11

)

More information in the Google Documentation.

Margin Of Error Function

The MARGINOFERROR function calculates the margin of error for a range of values for a given confidence level.

It takes two arguments: 1) the range of values, and 2) the confidence level.

Let’s use this dataset as an example:

The MARGINOFERROR function is:

=MARGINOFERROR(

C2:C11

,

0.99

)

which calculates the margin of error at a confidence level of 99%:

This is perhaps easier to understand on a chart, showing the margin of error bars (1 standard deviation) and the mean value (red line):

More information in the Google Documentation.

Epoch To Date Function

EPOCHTODATE converts a Unix epoch timestamp to a regular datetime (in Univerasal Coordinated Timezone, UTC).

It takes two arguments: 1) the Unix epoch timestamp, and 2) an optional unit argument.

A Unix timestamp looks like this:

1676300687

The EPOCHTODATE Function takes this as an input (e.g. the Unix timestamp is in cell A1 in this example):

=EPOCHTODATE(A1)

The output of the function is:

2/13/2023 15:04:47

More information in the Google Documentation.

LET Function

The LET function lets you use defined named variables in your formulas. It’s a powerful technique that reduces duplicate expressions in your formulas.

LET Example 1

Consider this LET example, which categorizes the total sales using an IFS function.

The LET function allows us to define a “sales” variable that references the SUM of sales in column C. We can reuse the variable “sales” anywhere else in this formula:

Without LET, the formula repeats the SUM expression repeatedly, which makes changing this formula more difficult:

LET Example 2

This formula — using LET, SEQUENCE, and FILTER — will get all the weekdays (Monday – Friday) for the year ahead, starting from today:

=LET( dates , SEQUENCE(

365

,

1

,TODAY(),

1

) , FILTER( dates , WEEKDAY(dates,

2

)6 ))

By using LET, you can avoid repeating the SEQUENCE expression.

More information in the Google Documentation.

Why Software Engineer And Data Scientist Collaboration Is Important?

It’s necessary to collaborate between software engineers and data scientists for the betterment of the organization.

Data scientists are excellent mathematicians with a wide range of interdisciplinary knowledge and exceptional analytical skills. This expert’s job is to determine the best training method for machine intelligence. They should go through all of the available algorithms to find the one that is most suited for resolving the issues with the project and figure out precisely what is wrong. Data scientists must work with software developers, such as committed Laravel engineers, to boost the company’s competitive edge. Comparatively to software development, such as Laravel application development, working with data is more research-focused. The technical aspect of the problem may be handled by a Laravel developer. Both data scientists and engineers must feel accountable for the issue and be able to contribute to the project at any level. Continuous communication allows for the early detection of any possible discrepancies. In this post, we’ll look more closely at the difficulties that software developers and data scientists encounter along the process and discuss ways to enhance their interaction.

Issues that Software Engineers and Data Scientists Might Face and Their Solutions

Working directly with data, scientists assist engineers in gaining the research and analytical abilities necessary to produce better code. Users of data warehouses and data lakes are exchanging information more effectively, which improves project flexibility and yields longer-lasting, more enduring solutions. The developer and data scientist are working together to improve the business’s choices as well as the goods it offers to customers. However, issues might come up during work, and specialists will need to work together to find solutions:

Gaining knowledge from the data

The developer tends to focus more on issues that are based on particular needs, whereas the data scientist might locate the issue by identifying new data sources that can be included in predictive models.

Solution: The data scientist should concentrate on the more theoretical aspects of research and discovery, while the developer should concentrate on the execution of the solution, the needs for which are progressively identified.

Data of poor quality

Poor quality is attributed to mistakes made during the data collecting and sampling processes. Issues with data quality also make it challenging for data scientists to feel convinced that they are acting ethically. This presents challenges for developers because the data scientist initially delivered an incomplete product. It’s important to note that initiatives in both software engineering and data science fail frequently, with up to 75% of software projects failing and 87% of data science projects never reaching production. Even though they are the main consumers of data, the data scientist’s goal is to address problems with data quality. The developer receives the assignment soon after, and he then begins his portion.

Combining data from many sources

Data must frequently be merged from many areas where it is located for analysis. Lack of documentation, inconsistent schemas, and several potential meanings for data labels are all aspects that make the data challenging to comprehend.

The developer’s and data scientist’s task is to locate and construct keys that integrate many sources into templates to learn from and enhance the customer experience. The only problem is that data is kept in silos.

Describing job requirements to developers

The issue of misunderstanding might occur when data scientists and developers communicate. Given their numerous duties, developers frequently have little interest in the data scientist’s tools.

Solution: The data scientist should thoroughly describe the issue and solicit the engineering team’s assistance to get high-calibre data.

How Data Scientists and Software Engineers Can Work Together:

The following scenario may occur when transmitting production data to data scientists — they might have either very little access or a lot of access to the database. In the first instance, they repeatedly ask for access to the data export, but in the second, they repeatedly run queries that have an impact on the live database. To address this issue, a method of transferring all raw data to data scientists in a setting distinct from production must be established. The fundamental concept is that we store everything flat in a location that is simple for data scientists to access since we never know what data may be required in the future. It makes perfect sense for a software developer to generate storage space.

More Trending Stories 

Interview With Data Scientist And Top Kaggler, Mr. Steve Donoho

It’s our pleasure to introduce top data Scientist (as per Kaggle), Mr. Steve Donoho, who has generously agreed to do an exclusive interview for Analytics Vidhya. Steve is living a dream which most of us think of! He is founder and Chief data Scientist at Donoho Analytics Inc., tops Kaggle ranking for data scientists and chooses his areas of interest.

Prior to this, he worked as Head of Research for Mantas and as Principal for SRA International Inc. On the education front, Steve completed his graduation from Purdue University followed by a M.S. And Ph.D. From Illinois University. His interest and work include an interesting mix of problems in areas of Insider trading, Money Laundering, Excessive mark up and customer attrition.

On a personal front, Steve likes trekking and playing card and board games with his family (Rumikub, Euchre, Dutch Blitz, Settlers of Catan, etc.).

Kunal: Welcome Steve! Thanks for accepting the offer to share your knowledge with our audience of Analytics Vidhya. Kindly tell us briefly about yourself and your career in Analytics and how you chose this career.

Steve: When I was in grad school, I was good at math and science so everyone told me, “You should be an engineer!”  So I got a degree in computer engineering, but I found that designing computers was not so interesting to me. I found what I really loved to do was to analyze things and to use computers as a tool to analyze things. So for any young person out there who is good at math and science, I recommend you ask yourself, “Do I love to analyze things?”  If so, a career as a data scientist may be the thing for you. In my career, I have mainly worked in financial services because data abounds in the financial services world, and it is a very data-driven industry.  I enjoy looking for fraud because it gives me an opportunity to think like a crook without actually being one.

Kunal: So, how and when did you start participating in Kaggle competitions?

Steve: I found out about Kaggle a couple years ago from an article in the Wall Street Journal.  The article was about the Heritage Health Prize, and I worked on that contest. But I was quickly drawn into other contests because they all looked so interesting.

Kunal: How frequently do you participate in these competitions and How do you choose which ones to participate?

Kunal: Team vs. Self?

Steve: I usually enter contests by myself.  This is mainly because it can be difficult to coordinate with teammates while juggling a job, contest, etc.

Kunal: Which was the most interesting / difficult competition you have participated till date?

Steve: The GE Flight Quest was very interesting. The challenge was to predict when a flight was going to land given all the information about its current position, weather, wind, airport delays, etc.  After being in that contest, when I looked up and saw an airplane in the sky, I found myself thinking, “I wonder what that airplane’s Estimated Arrival Time is, and will it be ahead of schedule or behind?” I have also liked the hack-a-thons which are contest that last only 24 hours – it totally changes the way you approach problem because you don’t have as much time to mull over the problem.

Kunal: What are the common tools you use for these competitions and your work outside of Kaggle?

Steve: I mostly use the R programming language, but I also use Python scikit-learn especially if it is a text-mining problem.  For work outside Kaggle, data is often in a relational database so a good working knowledge of SQL is a must.

Kunal: Any special pre-processing / data cleansing exercise which you found immensely helpful? How much time do you spend on data-cleansing vs. choosing the right technique / algorithm?

Steve: Well, I start by simply familiarizing myself with the data.  I plot histograms and scatter plots of the various variables and see how they are correlated with the dependent variable.  I sometimes run an algorithm like GBM or Random Forest on all the variables simply to get a ranking of variable importance.  I usually start very simple and work my way toward more complex if necessary.  My first few submissions are usually just “baseline” submissions of extremely simple models – like “guess the average” or “guess the average segmented by variable X.”  These are simply to establish what is possible with very simple models.  You’d be surprised that you can sometimes come very close to the score of someone doing something very complex by just using a simple model.

A next step is to ask, “What should I actually be predicting?”  This is an important step that is often missed by many – they just throw the raw dependent variable into their favorite algorithm and hope for the best.  But sometimes you want to create a derived dependent variable.  I’ll use the GE Flight Quest as an example – you don’t want to predict the actual time the airplane will land; you want to predict the length of the flight; and maybe the best way to do that is to use that ratio of how long the flight actually was to how long it was originally estimated to be and then multiply that times the original estimate.

I probably spend 50% of my time on data exploration and cleansing depending on the problem.

Kunal: Which algorithms have you used most commonly in your final submissions?

Steve: It really depends on the problem.  I like to think of myself as a carpenter with a tool chest full of tools.  An experienced carpenter looks at his project and picks out the right tools. Having said that, the algorithms that I get the most use out of are the old favourites: R’s GBM package (Generalized Boosted Regression Models), Random Forests, and Support Vector Machines.

Kunal: What are your views on traditional predictive modeling techniques like Regression, Decision tree?

Steve: I view them as tools in my tool chest. Sometimes simple regression is just the right tool for a problem, or regression used in an ensemble with a more complex algorithm.

Kunal: Which tools and techniques would you recommend an Analytics newbie to learn? Any specific recommendation for learning tools with big data capabilities?

Steve:  I don’t know if I have a good answer for this question.

Kunal: I have been working in Analytics Industry for some time now, but am new to Kaggle. What would be your tips for someone like me to excel on this platform?

Steve: Here are some thoughts based on my experience:

Knowledge of statistics & machine learning is a necessary foundation.  Without that foundation, a participant will not do very well.  BUT what differentiates the top 10 in a contest from the rest of the pack is their creativity and intuition.

The more tools you have in your toolbox, the better prepared you are to solve a problem.  If I only have a hammer in my toolbox, and you have a toolbox full of tools, you are probably going to build a better house than I am.  Having said that, some people have a lot of tools in their toolbox, but they don’t know *when* to use *which* tool.  I think knowing when to use which tool is very important.  Some people get a bunch of tools in their toolbox, but then they just start randomly throwing a bunch of tools at their problem without asking, “Which tool is best suited for this problem?”  The best way to learn this is

by

experience, and Kaggle provides a great platform for this.

Thanks Steve for sharing these nuggets of Gold. Really appreciated!

If you like what you just read & want to continue your analytics learning, subscribe to our emails or like our 

facebook

page.

Image (background) source: theninjamarketingblog

Related

Update the detailed information about Analytical Scientist– Avery Dennison, Pune on the Tai-facebook.edu.vn website. We hope the article's content will meet your needs, and we will regularly update the information to provide you with the fastest and most accurate information. Have a great day!