By Padma Priya Chitturi
- Use Apache Spark for facts processing with those hands-on recipes
- Implement end-to-end, large-scale facts research higher than ever before
- Work with robust libraries reminiscent of MLLib, SciPy, NumPy, and Pandas to realize insights out of your data
Spark has emerged because the such a lot promising tremendous info analytics engine for info technology execs. the genuine strength and cost of Apache Spark lies in its skill to execute information technology projects with velocity and accuracy. Spark's promoting element is that it combines ETL, batch analytics, real-time circulation research, computing device studying, graph processing, and visualizations. It enables you to take on the complexities that include uncooked unstructured facts units with ease.
This consultant gets you cozy and assured appearing facts technology projects with Spark. you'll find out about implementations together with disbursed deep studying, numerical computing, and scalable computing device studying. you'll be proven potent ideas to difficult strategies in information technological know-how utilizing Spark's facts technological know-how libraries akin to MLLib, Pandas, NumPy, SciPy, and extra. those easy and effective recipes will help you enforce algorithms and optimize your work.
What you'll learn
- Explore the subjects of knowledge mining, textual content mining, ordinary Language Processing, details retrieval, and computing device learning.
- Solve real-world analytical issues of huge facts sets.
- Address facts technological know-how demanding situations with analytical instruments on a dispensed process like Spark (apt for iterative algorithms), which deals in-memory processing and extra flexibility for information research at scale.
- Get hands-on event with algorithms like type, regression, and advice on genuine datasets utilizing Spark MLLib package.
- Learn approximately numerical and clinical computing utilizing NumPy and SciPy on Spark.
- Use Predictive version Markup Language (PMML) in Spark for statistical information mining models.
About the Author
Padma Priya Chitturi is Analytics Lead at Fractal Analytics Pvt Ltd and has over 5 years of expertise in tremendous info processing. presently, she is a part of strength improvement at Fractal and liable for answer improvement for analytical difficulties throughout a number of enterprise domain names at huge scale. sooner than this, she labored for an airways product on a real-time processing platform serving a million person requests/sec at Amadeus software program Labs. She has labored on understanding large-scale deep networks (Jeffrey dean's paintings in Google mind) for photograph class at the tremendous info platform Spark. She works heavily with titanic facts applied sciences equivalent to Spark, typhoon, Cassandra and Hadoop. She was once an open resource contributor to Apache Storm.
Table of Contents
- Big info Analytics with Spark
- Tricky records with Spark
- Data research with Spark
- Clustering, type, and Regression
- Working with Spark MLlib
- NLP with Spark
- Working with gleaming Water - H2O
- Data Visualization with Spark
- Deep studying on Spark
- Working with SparkR
Read or Download Apache Spark for Data Science Cookbook PDF
Best data modeling & design books
The writer introduces the reader to the construction and implementation of space-related types by way of utilising a learning-by-doing and problem-oriented process. the necessary procedural abilities are hardly taught at universities and lots of scientists and engineers fight to move a version right into a desktop application.
Part Database structures is a set of invited chapters by way of the researchers making the main influential contributions within the database industry's development towards componentizationThis booklet represents the sometimes-divergent, sometimes-convergent techniques taken by way of major database owners as they search to set up commercially attainable componentization options.
This entire paintings exhibits the way to layout and improve leading edge, optimum and sustainable chemical procedures by means of employing the foundations of procedure platforms engineering, resulting in built-in sustainable tactics with 'green' attributes. well-known systematic equipment are hired, supported by means of in depth use of machine simulation as a strong instrument for gaining knowledge of the complexity of actual types.
Key FeaturesAnalyse your facts utilizing the preferred R programs with ready-to-use and customizable recipesFind significant insights out of your information and generate dynamic reportsA functional consultant that can assist you placed your information research talents in R to useful useBook DescriptionThis publication will express you ways you could placed your facts research talents in R to sensible use, with recipes catering to the elemental in addition to complicated information research initiatives.
- Excel Programming with VBA Starter
- Fault Detection and Diagnosis in Industrial Systems (Advanced Textbooks in Control and Signal Processing)
- Parallel Processing for Artificial Intelligence 1 (Machine Intelligence and Pattern Recognition)
- Analytics for the Internet of Things (IoT)
Additional resources for Apache Spark for Data Science Cookbook
Apache Spark for Data Science Cookbook by Padma Priya Chitturi