Trending November 2023 # Tomtom Spark Cardio + Music Review # Suggested December 2023 # Top 11 Popular

You are reading the article Tomtom Spark Cardio + Music Review updated in November 2023 on the website We hope that the information we have shared is helpful to you. If you find the content interesting and meaningful, please share it with your friends and continue to follow and support us for the latest updates. Suggested December 2023 Tomtom Spark Cardio + Music Review

The TomTom Spark has multiple variants with indifferent features to serve different types of demands. I will be reviewing the Spark Cardio + Music, which is the high-end variant among 4 different models. The other three models are the Spark (with GPS tracking), Spark Cardio (with heart-rate monitor) and Spark Music (with inbuilt music player, supports up to 500 songs). Note that all the models have GPS trackers.

TomTom Spark Pros

Nice and comfortable fit

Accurate GPS tracking

Reliable heart rate data

Good battery life

Play music without phone



TomTom Spark Cons

Poor software app

Does not sync your data automatically

Build quality is average

TomTom Spark Cardio + Music Specifications

[table id=717 /]

TomTom Spark Display and Design

It comes with a 2 inch monochrome LCD, which is fairly visible in almost every condition except night. You can always cover the screen with your palm to use the backlight, but it isn’t convenient. With 168×144 p resolution, it looks sharp and easily readable.

The strap is made out of rubbery silicone that sometimes gets sweaty while working out. The band has a watch-style band with 3 different locks, which is great for a safe and comfortable fit. The tracker looks quite robust and heavy but it merely weighs just 47 grams. The face measures 2.7×1.5×0.5 inches.

TomTom Spark Photo Gallery TomTom Spark Features and App

The TomTom Spark is offers a bunch of sports dedicated features along with music playback, GPS and heart rate sensor. The only complain that I have is with the Smartphone application, as it has very limited option and customization. You have to pair your tracker with your Mac or PC using TomTom MySports Connect to sync your activities and customize them.

It supports wide range of activities like Cycle, Freestyle, Gym, Indoor, Run, Swim and Treadmill. Each of these activities allows you to customize your goals and settings. It allows you to adjust the length of the pool while swimming, set distance for running, or even set the wheel size of your cycle.

TomTom Spark Performance

While running, it records heart rate, GPS tracked distance, Calories burned and time. It offers a lot of training options which can be accessed from the app itself. It alerts you with a vibration feedback when you hit your goal or finish a lap. You can set a pace and even the heart rate to a limit and the Spark will notify you if you cross the preset zones.

It also allows you to race against your previous run, which means if you run on a particular route every day, you can set it to race against the same time and pace again. This obviously helps you know your progress without even looking at the stats.

In terms of accuracy and stats, TomTom Spark is very impressive if you actually use all its functions.

TomTom Spark Battery

The battery life totally depends on how you use the watch. If you are an avid user and use all the features like music GPS and heart-rate sensor one hour each day, then it won’t last more than 5-6 days. And if you use it only for keeping a track on your activities and don’t use it quite often, then it may last up to 2 weeks as well.


TomTom Spark Cardio + Music is a complete sports watch with some exciting features to keep you entertained while running or working out and have a better track of your activities. But is it good enough to beat the other competitors existing in the same price segment? I think it totally depends on what you want from your sports watch, if you are a hard core fitness freak then this is the thing. If you care more about looks and look for an simplistic app experience, then you might find it slightly below your expectations.

You're reading Tomtom Spark Cardio + Music Review

Dji Spark Drone Review: A Powerful Little Flying Machine For The Average Person

We may earn revenue from the products available on this page and participate in affiliate programs. Learn more ›

The Spark is about 5.6-inches on each side if you don’t count the reach of the folding propellers. It retails for $499. Stan Horaczek

Unless you’re a creative professional or need to inspect remote power lines without getting electrocuted, you probably don’t need a drone. That’s what makes buying one so tricky. The best consumer drone on the market right now—at least in my opinion—is the DJI Mavic Pro, but the $1,000+ price tag (which only goes up when you consider the mostly-necessary add-ons) reserves it for serious enthusiasts or the top-hat-and-monocle set for which $1,000 wouldn’t even cover their weekly caviar budget.

Enter the DJI Spark. Not only does it try to bring the price down closer to impulse purchase territory, but it also tries to reduce the learning curve required to send a craft up in the air.

At a $500 base price, this 10.5-ounce drone ports over some of the popular features from its higher-end sibling and adds some consumer-friendly tweaks to make it more accessible for the average consumer pilot who is trying to chase their kids around the park rather than make a major motion picture. Plus, it has gesture control that allows you to command this little aircraft simply by waving your hand, which has done wonders for our Darth Vader impression.


Setting up the Spark drone is a relatively simple process that takes about 10 minutes once you have the battery charged and the DJI app downloaded. Sync up a smartphone with the Spark’s ad hoc wifi signal and you’ll seen see a remote feed from the drone’s built-in camera.

Setup will, however, require that you register your new flying machine with DJI. Recently, the FAA stopped requiring users to register small craft like this, and DJI responded by implementing restrictions of its own. If you don’t agree to the terms of a DJI account, functionality like range and top speed is restricted.

You can use the smartphone app to fly the Spark or spring for the optional dedicated controller, which will set you back an extra $150. You’ll also want to make sure the Spark has the latest firmware installed.

This carrying case is part of the $699 Fly More Combo, which comes with extra propellers (shown), an extra battery, propeller guards, a dedicated battery charger, and the $150 remote. Stan Horaczek


Having flown just about every consumer-oriented model on the market and even a few high-end models, I contend that drones are still clunky to fly. Practice helps, of course, and there are some very talented pilots out there, but the process of learning to effectively operate a drone is often a tedious endeavor, experienced in 12 minute bursts between bouts of battery charging. That’s what makes the idea of a simple drone like the Spark or its main competitor, the Yuneec Breeze, so appealing.

But happily, getting the Spark off the ground is simple. The relatively low maximum speed of about 13 miles per hour (it can climb to 31 MPH if you add the optional remote) keeps things moving at a manageable pace, even when you try to push it. Using the sticks of the dedicated wireless controller feels a lot more precise than the app, but it also conflicts with the overall simplicity of the product, so I found myself preferring the app, even though it’s certainly less precise and responsive.

To further lower the learning curve, DJI has brought its Quick Shot modes, which automatically perform complex maneuvers without any input from the pilot. For instance, the Dronie (the name is a mashup of “drone” and “selfie) mode locks onto a subject, then flies away and up to create a shot like what you might expect to see at the end of a movie. Rocket sends the drone straight up over a subject to its maximum height, and Circle does loops around whatever it’s tracking. Helix mode might be the most interesting because it sends the drone in an expanding spiral while also climbing in elevation. It’s impressive to look at and would require serious piloting skills to pull off in manual mode.

These effects really do look impressive once you review the footage, but most of them require tons of open space. Even in a park full of sprawling soccer fields, the fly away and Helix shots were a little nerve-racking because the drone goes so far. You can stop the flight path early, obviously, but there’s a learning curve for gauging the proper venue for each shot mode.

Object tracking works well, too, even if it does sometimes go through a false start. I’d like to be able to track a little closer than the minimum range, but then I also don’t want to crash a drone into a running subject, so I understand the motivation.

The total range on the Spark lets it go 262 feet away and 164 in the air when controlling it with the app. You can greatly extend the range to 1.2 miles by using the optional remote, but the 12 minutes of effective battery life make that seem a little too far for comfort. For most purposes, the range with the smartphone app felt more than sufficient.

The Spark’s camera has a 12-megapixel camera with several shooting options, including three-shot bursts and interval capture for creating basic time lapse videos. Stan Horaczek

The Camera

The efficacy of the gimbal (the rotating mount that gyroscopically stabilizes the camera) would almost be surprising if DJI didn’t already have a very solid reputation in that arena. It needs some adjustment to get things perfectly level before take off, and you may notice a little tilt in the horizon on some of my sample videos, but it does a solid job of keeping things steady once you’re up and running. Changing the angle of the camera, however, is a little jerky, especially if you’re using the app, so it’s best to set that before you start filming rather than adjusting on the fly, so to speak.

There’s a camera on the bottom of the drone to help it hover in place, a feature with which I was very impressed. I expected stability from a DJI drone, and I got it.

The 3D object avoidance system is on the front of the drone, so if you fly directly toward an object like a wall or a tree, it will avoid them with aplomb. But if you start flying it around backward or even sideways, you’re free to smash it into whatever object you want. This isn’t so much an issue if you’re keeping control over the craft, but it makes me a little wary of trying things like having it trail a subject running through a trail in the woods.

Gesture control

Controlling the Spark with the wave of a hand was a big selling point when it was first announced and it really does look cool the first couple times you do it. Landing it in your hand is also a pretty nifty trick. After regular use, though, it’s clear that gesture control isn’t quite ready for many practical uses. You have to move slowly and in a pretty exaggerated fashion for it to really work, and even then, it has to be in very close range. It’s cool, but it’s not quite ready to redefine the drone experience just yet. However, DJI has a history of improving features with firmware updates, so I wouldn’t be surprised to see the gesture control improve considerably, even before the next generation comes along.

Everything else

As a package, the Spark has a lot going for it. Each battery claims about 16 minutes of juice, and that actually seems pretty accurate in terms of total runtime. When you get to around four minutes left, the app will start squawking at you to bring your craft home, so it’s better to plan for around 10-12 minutes per charge if you don’t want to be biting your nails as it tries to land before the last seconds tick off the battery clock. Other things like extensive maneuvering and strong wind can also have an effect on battery drain.

Twelve minutes of effective flight time might seem short, but that’s the nature of these little flying machines. The Mavic Pro promises 27 minutes on a single battery, but that’s also a bigger, more expensive craft. That said, if you’re going to drop $500 on the Spark, I would automatically factor in the cost of at least one extra $50 battery, if not more.

The GPS and location settings worked well and I didn’t have any runaway drone scares, which I’ve had in the past. The app is diligent about alerting you about flight restrictions in your area. I found that most places have some kind of alert, though I was still able to fly without a problem. One thing that takes some getting used to is what happens when you push the “return home” button. Rather than flying straight to its home base, it first climbs to a higher elevation, which can be a problem if you’re in an area with lots of obstacles like trees or utility wires.

While the Spark’s small size makes it easy to carry around, it doesn’t do much to quiet the horrible buzzing sound produced by the propellers. So, while it’s nimble enough to fly indoors, it certainly isn’t inconspicuous. Even when it’s high above your head, you can still hear a pronounced whir from the propellers.

The form factor can make it look like a toy, but it feels sturdy and substantial in your hand. You can have the drone take off and land from your hand as well, but I found the process a little finicky and had better luck just using the ground or some other flat, stable surface. Stan Horaczek


I genuinely enjoyed my time with the DJI Spark. The app feels a little rough around the edges at times (the adjustment slider for the camera angle drove me nuts), but the overall package is much closer to the balance of quality and price that makes sense for the average consumer. If you’re enchanted by the gesture control, I’d recommend you temper your expectations a little, at least for now, since future firmware updates could improve it considerably.

If you’re going to pull the trigger, I’d also recommend thinking long and hard about upgrading to the Fly More Combo, which comes with a battery charger, an extra battery, propeller guards, a carrying case, and the dedicated controller, which is $150 all on its own.


App compatibility: iOS and Android Battery life: 16 minutes (max) Price: $499 (drone only)

Rating: 4/5

Play Your Music Wirelessly With The Blackberry Music Gateway

What is Blackberry Music Gateway?

Previously, we had to connect our devices to a speaker through a cord to hear the music. This, many times, prevented us from being able to charge said device, go in the other room to answer an email with it, or do other things with the device being used. Cords were a way that tied down that device, forcing you to interrupt the music session when needed. While this may be just fine for someone blasting music to their own delight at home, for individuals at parties, this just isn’t acceptable. Blackberry Music Gateway solves this problem by allowing individuals to have a device connected to their speakers, sort of as a connection portal (or Gateway) that connects the device holding the music, with the gadget playing the music.

The good thing about this device is that you don’t need to own a Blackberry to connect to it. As long as your mobile devices support bluetooth or NFC, you can easily pair it with the Gateway and blast your music wirelessly. And yes, when I say “mobile devices”, it includes your laptop as well.

How Do I Set it Up?

First off, you need to ensure that you have your device connected before connecting to the speakers. To do this, ensure that your Gateway is in some sort of power source. This can be a laptop if you are on the go (through the USB port), your car headphone jack, or even an outlet if you’re stationary. From there, you should see a light on.

You should expect to see either a blue, red, or green at any time through the life of your Blackberry Music Gateway.

Green always means that it’s powered on.

A red light always means you don’t have a connection. If it’s flashing quickly, this should be an alert to you that you lost your connection. If blinking slowly, this simply means you haven’t attempted a connection just yet.

The next color is blue, if this is blinking fast, then good news, you’re connected. If not, then it’s still good news, this means your Gateway is playing a song/sound. If mixed with red, this means it’s currently attempting to connect.

After the Gateway is plugged to a power source, press the top of the Gateway device, you’ll then have to go on to your Blackberry (or any other mobile device) and activate Bluetooth. Once Bluetooth is activated, you should be able to see the name “BlackBerry Music Gateway” in the list. At this point, once it is selected and paired, the light should be Blue.

To pair with NFC, Near-Field Communication, power the Gateway on, press the top of the Gateway, and activate NFC on your BlackBerry. From there, tap the Blackberry on top of the Music Gateway to activate.

My Experience with Music Gateway

During my week with the Blackberry Music Gateway, I found it to be a device fitting for a get-together or even while on the road. When testing it out on multiple platforms (in the car, through a television, and traditional speakers), I found it to be quite useful. iPhone and Mac games were able to have a more amplified sound when being hooked up to the speakers, increasing the gaming experience. When testing it on the Blackberry, that’s where I truly was able to appreciate the freedom of not being tied down by cords when listening to music and doing other tasks.


Do you need a Blackberry to get the best experience? No, not at all. RIM found a great way to make this platform-blind. Consumers will still not have the device on mind though due to the Blackberry name. Many will say, “I don’t have a Blackberry, so why would I need the Blackberry Music Gateway”. This wasn’t necessarily done on purpose or by accident by RIM, the device is made for Blackberry devices. The ability to work on other devices is just coincidence. That being said, paying $50 to not be tied down to cords while using your Bluetooth device is still something I see as not a bad purchase. Let’s just say, once Apple introduces a NFC iPhone, you’ll probably appreciate the purchase even more!

The Blackberry Music Gateway is sold on the RIM website for $49.95.

Ari Simon

Ari Simon has been a writer with Make Tech Easier since August 2011. Ari loves anything related to technology and social media. When Ari isn’t working, he enjoys traveling and trying out the latest tech gadget.

Subscribe to our newsletter!

Our latest tutorials delivered straight to your inbox

Sign up for all newsletters.

By signing up, you agree to our Privacy Policy and European users agree to the data transfer policy. We will not share your data and you can unsubscribe at any time.

A Beginners Guide To Spark Dataframe Schema

This article was published as a part of the Data Science Blogathon.


Datatypes in Spark

Spark supports all the traditional data types like String, Long, Int, Double, etc. A lot of documentation is available on them online. I will speak of 2 particular datatypes in Spark which are Struct and Array, which are very useful while working with semi-structured data like a JSON or an XML. In this guide, I will take a sample JSON and show you how a schema can be manipulated in Scala to handle data better when it is too complicated to work with.

So, let’s start with creating a JSON record and create a DataFrame from it.

import org.apache.spark.sql.types._ val sample_json_1="""{"a":{"b":"1","c":"2","d":[1,2,3,4,5],"e":{"key1":"value1","key2":"value2","key3":"value3"}},"d":1,"g":2}""" val val sch_a=df_a.schema

Now that we have the data frame and its schema, let’s see what operations are allowed on the schema.

Struct Data Types of Spark

A schema is a Struct of a list or array of StructFields. Struct is a data type that is defined as StructType in org.apache.spark.sql.types package. StructField is also defined in the same package as StructType. As per Spark’s official documentation, a StructField contains a lot of attributes but we will focus on just 3 of the attributes, which are field name, field data type, and field nullability. The field name and field datatype attributes are mandatory. To know all the attributes that it supports, please refer to the official documentation . Now to start with operations that can be done on a schema.

Listing fields, Their Names, and Indexes

You can get the list of all field names from the schema using either the “names” or “fieldNames” method on the schema as shown below

sch_a.names sch_a.fieldNames

Sometimes, it is not enough just to have the field names as the schema contains all the information of those fields. So to get the fields with all their attributes using the “fields” method.


To get the index of the field in the schema, “fieldIndex” can be used.


DataTypes in StructFields

As mentioned earlier, StructField contains a datatype. This data type can contain a lot of fields and their data type in it, we will see it later in the guide. To get the data type of a field in the schema.


Datatypes as simple strings that can be used with Hive. These strings can be used as it is while writing the create table statement to define the datatype of the columns in that Hive table.


You can also go look for the first element in the schema using head operation.

sch_a.head Adding New Fields to Schema

To add a new field to the schema it can either use the method “add” or the short hand “:+” as shown below

val add_field_to_schema=StructType(sch_a.add(StructField("newfield",StringType))) val add_field_to_schema=StructType(sch_a:+StructField("newfield",StringType)) Deleting a Field From Schema

To remove a field from the schema “diff” method can be used as shown below. The field “g” is being removed from the schema.

val del_field_from_schema = StructType(sch_a.diff(Seq(StructField("g",LongType)))) Concatenating 2 Schemas

Let say you have 2 schema and you need to merge them in 1 single schema. To do the same, follow below code block, where I am creating an additional DataFrame, so that we can have 2 schema to merge them.

val jsonb="""{"h":"newfield","i":"123","j":45.78}""" val val sch_b=df_b.schema

Fig 7 Creating a new data frame so that both schemas can be merged

Now, to merge the schema of “df_a” and “df_b” data frames, the “++” operator can be used. It can also be used to merge 2 lists or arrays.

val merge_two_schema=StructType(sch_a++sch_b)

Working with Subfields

Now with the basic operations taken care of, let’s take an example where we want to read the JSON “Jason” in a data frame but we want to keep the field “a.e” as a string, you can see that initially when we read the JSON “Jason” into a data frame, the field “a.e” has been read as a Struct.

This can be done simply by changing the data type of the said field in the schema. To start with, if you look carefully at the output of the “sch_a.fields” output, you can see, there are only 3 elements namely “a”, “d” and “g”. Each element is a StructField and so consists of a field name and field datatype. The datatype of field “a” is StructType, which contains its children. So to make any changes to the data type of field “a.e”, we have to first get datatype of field “a”

val subfield_a=sch_a.fields(sch_a.fieldIndex("a")).dataType.asInstanceOf[StructType]

Now that we have the subfields of field “a” separated in a variable, we can update the data type of field “e”. unfortunately, there is no direct way of doing that, so we will remove the element first and then add it with the new data type.

val subfield_a_datatype_e=subfield_a.fields(subfield_a.fieldIndex("e")).dataType val subfield_a_updated=StructType(subfield_a.diff(Seq(StructField("e",subfield_a_datatype_e))):+StructField("e",StringType))

Let’s understand that this newly updated subfield should be updated in the parent schema, as the datatype of field “a” before using it to read the “Jason” JSON, for the changes to reflect

val sch_a_update=StructType(sch_a.diff(Seq(StructField("a",subfield_a))):+StructField("a",subfield_a_updated))

In the below screenshots, you can see the difference between the outputs, when using the read with the modified schema, the json values in the “a.e” field remains unparsed and are read as string instead of Struct


JSON of Schema

There are 2 more methods that I would like to specify, these are “json” and “prettyJson”, both of these are used to convert the Struct value into a json, I found them helpful with different use cases. You can explore them as well, below is an example

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.


Apple Music First Look

Apple Music first look – Spotify threat but questions linger

Start Apple Music for the first time, and you’re faced with a ball pond of musical genres from which you’re to choose your favorites. They help shape the first tab in the UI, “For You”, after which point the music you play, playlists you create (complete with custom artwork, if you want), and which tracks you save as favorites.

Apple is leaning hard on the idea of human curation rather than an AI being left to juggle tracks, however. In fact, there are various different types of human interaction: in the “For You” tab, for instance, playlists follow the lines of “Inspired by…” or “Bring the Big Rock” and are curated by in-house experts. “Bring the Big Rock,” for instance, is the handiwork of “Apple Rock” who, despite the generic name, is an actual person who works at the company.

Those Apple Music Editors will be responsible for hundreds of playlists in each category of music, organized by activities like BBQing and Commuting. The latest playlist will appear at the top of the list, and if you follow a genre you’ll also get a notification added to your Connect timeline about any new content.

Then there are “Curators” who are experts but outside of Apple. Exactly who that list will include is still unconfirmed, but expect names in the manner of music site Pitchfork.

The Radio tab is where Apple’s flagship station, Beats 1, lives. Live, 24/7, and across more than 100 countries around the world, it’ll be run by DJs including Zane Lowe in Los Angeles, Ebro Darden in New York, and Julie Adenuga in London. Contrary to the new breed of on-demand stations, Beats 1 is a return to the traditional days of radio: everybody hearing the same thing, at the same time.

Currently, Beats 1 is playing a prerecorded clip of Zane Lowe, and eventual programming is yet to be confirmed. Also unclear at this point is how Apple will signpost upcoming content on Beats 1: right now, there’s no schedule of shows, so catching your favorite genre, track, or an interview with your artist of choice is a matter of luck.

I’d be surprised if Apple didn’t add some sort of alerts system eventually, perhaps tied into your choice of artists included in your Connect timeline.

There are other Featured Stations within the Radio tab, basically iTunes Radio stations. Quite a lot of Apple Music is recognizably pulled from earlier audio offerings from the company, in fact; from an artist or track, for instance, you can start a radio station or create a Genius Playlist.

Connect, the next tab along, is one of those things the success of which hinges on how eagerly artists embrace it. In effect it’s a mini mailing list, through which they can share anything from a thought-of-the-day status update, through photos, audio clips, and videos.

Artists themselves will need to be verified by Apple before they can claim a Connect page. Once that’s done, they’ll get an extra “Post Status” button in the Apple Play app, along with the option to also share across other social networks like Facebook and Instagram. What it doesn’t do is the reverse: Apple is clearly positioning Connect as the home for artist updates, rather than just another spoke in their outreach strategy.

As for sharing content, what the recipient will be able to do with that depends not only on whether they have Apple Music themselves, but also if it was a status produced by Apple or the artist.

Say, for instance, you want to share an album with a friend via email. They’ll get a link which, if they have Apple Music installed, will automatically open that album in the app. If they don’t have it, they’ll get a preview webpage – though not a playable version of the audio, not even the 30-second snippet iTunes offers before you buy – and a link to download Apple Music.

If, though, you share a Connect update shared by a musician themselves – a behind-the-scenes photo of a setlist, perhaps – then that will apparently be visible, since it’s the artist’s content not Apple’s. While I can understand the distinction from a licensing point of view, I can also see it being a little confusing to users at first.

The final tab is My Music, and of all Apple’s varied attempts at music services and apps before, it’s probably the best I’ve seen in terms of straightforward usability. Here, purchased music local to the phone, saved tracks streaming from Apple Music, and offline tracks from Apple Music all co-mingle in a single list, with an overarching search to help navigate through them.

Offline playback is one of the most useful features of Spotify and other streaming music services, particularly if your commute takes you out of cellular coverage. Right now, you can flag a track for offline use by hitting the “Make available offline” button; tracks are thus available in perpetuity, as long as you have an active subscription.

Unfortunately, you can only have offline playback on a single device, though currently you can have five devices connected to a single-user Apple Music account.

There are still more than a few questions lingering overall. Apple isn’t saying exactly how many songs will be available – only that it’s “millions” – and nor will it say just what sort of overlap with the tracks in the iTunes catalog available to download there is. Supposedly most of the catalog is in there, but it’s an evolving thing and – just as Netflix sometimes gains content but loses other movies – will change over time.

Apple isn’t talking about what partnerships, if any, there might be in the works. One of the reasons I like Spotify, for instance, is that it integrates so tightly with Sonos, but it’s unclear at this point whether that will be the case for Apple Music. You’ll get Bluetooth streaming, and versions for Windows and – eventually – Android, but they’re the only certainties right now.

Update: Apple Music will be at 256 kbps. In comparison, Beats Music uses a 320 kbps bitrate, as does Spotify, while Tidal offers a high-bitrate option.

Apple Music feels familiar, which is probably a good thing. Much of it feels educated and shaped by previous Apple experiments: iTunes Radio, Genius, iTunes Match, even the little-loved Ping. While there’s something to be said about the joy of venturing out and discovering new music, there’s also a lot to be said for simply knowing where to go to find a reliable and entertaining soundtrack to your day.

If its three month trial is sufficient to convince the ears that having people shaping playlists is delivering better mixes than an algorithm alone might, then rivals will have to raise their own game to compete. Otherwise, Apple is counting on users wanting to plunder the majority of its music catalog for a set monthly fee, and not being bothered to go comparison shopping across alternatives such as Spotify.

Catch up on the rest of WWDC 2023 in our Apple hub

A Comprehensive Guide To Apache Spark Rdd And Pyspark

This article was published as a part of the Data Science Blogathon


Hadoop is widely used in the industry to examine large data volumes. The reason for this is that the Hadoop framework is based on a basic programming model (MapReduce), which allows for a scalable, flexible, fault-tolerant, and cost-effective computing solution.


Apache Spark is an innovative cluster computing platform that is optimized for speed. It is based on Hadoop MapReduce and extends the MapReduce architecture to be used efficiently for a wider range of calculations, such as interactive queries and stream processing. Spark’s key feature is in-memory cluster computing, which boosts an application’s processing speed.

Components of Apache Spark Apache Spark Core by Apache Apache Spark SQL

Spark SQL is a component built on top of Spark Core that introduces SchemaRDD, a new data abstraction that supports structured and semi-structured data.

Listed below are the four libraries of Spark SQL.

DataFrame API

Interpreter & Optimizer

SQL Service

Data Source API

Streaming Spark

To execute streaming analytics, Spark Streaming makes use of Spark Core’s quick scheduling functionality. It ingests data in mini-batches and transforms it using RDD (Resilient Distributed Datasets) transformations. DStream is the most basic stream unit, which comprises a sequence of RDDs (Resilient Distributed Datasets) that process real-time data.

MLlib (Machine Learning Library):

MLlib is a collection of machine learning libraries. Because of the distributed memory-based Spark architecture, Spark MLlib is called for distributed machine learning framework. It is done by the MLlib developers against the Alternating Least Squares (ALS) implementations.


GraphX is a Spark-based distributed graph processing framework. It provides an API for defining graph computing that uses the Pregel abstraction API to model user-defined graphs. For this abstraction, it also provides an efficient runtime.

Installation of Apache spark:

We’ll need to go through a few steps to get started with Apache Spark and the PySpark library. If you’ve done nothing like this before, it can be a little perplexing, but don’t fear. We’ll make it happen.

Installation Prerequisites:

One of the prerequisites for installing Spark is the installation of Java. The initial steps in getting Apache Spark and PySpark fully operational are to make sure we have everything we need. Java 8, Python 3, and the ability to chúng tôi files are all required.

Let’s look at what Java version you have installed on your desktop computer. If you’re using Windows, open the Command Prompt by going to Start, typing cmd, then pressing Enter. Type the following command there:

$java -version

Followed by the command;

$javac -version

If you don’t already have Java and python installed on your computer, install them from the below link before moving on to the next step.

Download and set up path

1) Verifying Scala and spark installation:

For Linus-Based System:

If we need to install Spark into a Linux-based system. The following steps show how to install Apache Spark.

We need to install a tar file from the Download Scala. Follow the command for extracting the Scala tar file.

$ tar xvf scala-2.11.6.tgz

Scala software files:

To move the Scala software files to the directory (/usr/local/scala), use the commands below.

$ su – Password: # mv scala-2.11.6 /usr/local/scala # exit

Set PATH for Scala;

The command to set PATH for Scala:

$ export PATH = $PATH:/usr/local/scala/bin

Scala Installation Verification:

It’s a good idea to double-check everything after installation. To check if Scala is installed, run the following command.

$scala -version Scala installation in windows:

Open a command prompt and type cd to go to the bin directory of the installed Scala, as seen below.

This is the scala shell, where we may type programs and view the results directly in the shell. The command below can check the Scala version.

Downloading Apache Spark

Visit the following link to get the most recent version of Spark( Download Spark). We’ll be using the spark-1.3.1-bin-hadoop2.6 version for this guide. We can find the Spark tar file in the download folder after you’ve downloaded it.

Extract the downloaded file into that folder. The chúng tôi file for the underlying Hadoop version that Spark will use is the next thing you need to add.

the command is for extracting the spark tar file is:

$ tar xvf spark-1.3.1-bin-hadoop2.6.tgz

Moving files from the Spark:

The instructions below will move the Spark software files to the directory (/usr/local/spark).

$ su – Password: # mv spark-1.3.1-bin-hadoop2.6 /usr/local/spark # exit

Setting up the environment for Spark:

In the /.bashrc file, add the following line. It entails setting the PATH variable to the location of the spark program files.

$export PATH=$PATH:/usr/local/spark/bin

Command for sourcing the ~/.bashrc file :

$ source ~/.bashrc

Spark Installation verification:

Write the following command for opening the Spark shell.

$spark-shell Apache Spark launch

Let us now launch our Spark to view it in all of its magnificence. To run Spark, open a new command prompt and type spark-shell. Spark will be up and running in a new window.

What exactly is Apache spark?

Apache Spark is a data processing framework that can handle enormous data sets quickly and distribute processing duties across many computers, either on its own or with other distributed computing tools.


PySpark is a combination of Apache Spark and Python. It is an excellent language for performing large-scale exploratory data analysis, machine learning pipelines, and data platform ETLs. PySpark is an excellent language to learn if you’re already familiar with Python and libraries like Pandas. It’ll help you construct more scalable analytics and pipelines. This post shows how to get started with PySpark and execute typical tasks.

Pyspark Environment:

There are a few different ways to get started with Spark:


You can create your cluster using bare metal or virtual computers. For this option, Apache Ambari is a valuable project, but it’s not my preferred method for getting up and running quickly.

Most cloud providers have Spark clusters:

AWS offers EMR and Google Cloud Platform has DataProc. DataProc is a faster way to an interactive environment than self-hosting.

Spark solutions are available from companies such as Databricks and Cloudera, making it simple to get started with Spark.

It’s simple to get started with a Spark cluster and notebook environment in this Data Bricks Community Edition environment. With the Spark 2.4 runtime and Python 3, I built a cluster. For the Pandas UDFs feature, you’ll need at least Spark version 2.3 to run the code.

How to import apache spark in the notebook?

To use PySpark in your Jupyter notebook, simply run the following command to install the PySpark pip package:

pip install pyspark

The above command can also use Kaggle as we will, you can just type “pip install pyspark” and Apache Spark will be installed and ready to use.

Python will work with Apache Spark because it is on your system’s PATH. If you wish to use something like Google Colab, run the following block of code, which will automatically set up Apache Spark:

!tar xf spark-3.0.3-bin-hadoop2.7.tgz !pip install -q findspark import os os.environ[“JAVA_HOME”] = “/usr/lib/jvm/java-8-openjdk-amd64” os.environ[“SPARK_HOME”] = “/content/spark-3.0.3-bin-hadoop2.7” import findspark findspark.init()

Apache Spark Dataframes

The Spark data frame is the most important data type in PySpark. This object functions similarly to data frames in R and Pandas and can be thought of as a table dispersed throughout a cluster. If you wish to use PySpark for distributed computation, you’ll need to work with Spark data frames rather than conventional Python data types.

Operations in PySpark are postponed until they require a result in the pipeline. You can define actions for importing a data set from S3 and performing a variety of transformations to the data frame, for example, but we will not do it right away from these operations. Instead, a graph of transformations is maintained, and when the data is needed, we do the transformations as a single pipeline operation when writing the results back to S3. This method avoids storing the entire data frame in memory and allows for more efficient processing across a cluster of devices. They fetched everything into memory with Pandas data frames, and we apply every operation to pandas.

Apache Spark Web UI–Spark Execution

To monitor the progress of your Spark/PySpark application, resource consumption of Spark cluster, and Spark configurations, Apache Spark provides a set of Web UI/User Interfaces (Jobs, Stages, Tasks, Storage, Environment, Executors, and SQL).

These user interfaces are useful for better understanding how Spark runs the Spark/PySpark Jobs. Your application code is a set of instructions that tells the driver to perform a Spark Job and then lets the driver decide how to do so using executors.

Transformations are the instructions given to the driver, and action if causes the transformation to take place. Here, we’re reading chúng tôi file and checking the DataFrame’s count. Let’s have a look at how Spark UI renders an application.

By default, Spark includes an API for reading delimiter files, such as comma, pipe, and tab-separated files, as well as many options for handling with and without headers, double quotes, data types, and so on.

We separated spark UI into the below tabs.

Spark Jobs







RDD Programming with Apache spark

Consider the example of a word count, which counts each word in a document. Consider the following text as input, which is saved in a home directory as an chúng tôi file.

chúng tôi − input file.

“Watch your thoughts; they become words. Watch your words; they become actions. Watch your actions; they become habits. Watch your habits; they become character. Watch your character; it becomes your destiny.”

Create RDD in Apache spark:

Let us create a simple RDD from the text file. Use the following command to create a simple RDD.

Word count Transformation:

The goal is to count the number of words in a file. Create a flat map (flatMap(line ⇒ line.split(“ ”)). to separate each line into words.

We execute the word count logic using the following command. Because this is not an action, but a transformation (pointing to a new RDD or telling Spark what to do with the data), there will be no output once you run it.

Current RDD:

If you want to know what the current RDD is while working with the RDD, use the following command. For debugging, it will display a description of the current RDD and its dependencies.

Persistence of Transformations:

You can use the persist() or cache() methods on an RDD to mark it as persistent. It will be stored in memory on the nodes the first time it is computed in an action. To save the intermediate transformations in memory, run the command below.

Applying the Action:

Performing an action, such as storing all transformations, produces a text file. The absolute path of the output folder is passed as a string argument to the saveAsTextFile(” “) method. To save the output to a text file, use the command below. The ‘output’ folder is in the current location in the following example.

Examining the Results:

To get to your home directory, open another terminal (where a spark is executed in the other terminal). To check the output directory, use the instructions below.

The following command is used to see output from Part-00000 files.


(watch,3) (are,2) (habits,1) (as,8) (beautiful,2) (they, 7) (look,1)

The following command is used to see output from Part-00001 files.

Output 1:

(walk, 1) (or, 1) (talk, 1) (only, 1) (love, 1) (care, 1) (share, 1)

(1) Create Data Frame:

(‘Michael’,’Rose’,”,’2000-05-19′,’M’,4000) ) val columns = Seq(“firstname”,”middlename”,”lastname”,”dob”,”gender”,”salary”) df = spark.createDataFrame(data), schema = columns).toDF(columns:_*) f.split(“,”) })

Apache Spark RDD Operations

Transformations based on RDDs–Transformations are lazy operations that yield another RDD instead of updating an RDD.

RDD actions are operations that cause RDD values to be computed and returned.

Spark transformations RDD yields another RDD, and transformations are lazy, which means they don’t run until action on RDD is called FlatMap, map, reduceByKey, filter, sortByKey, and return new RDD instead of updating the current RDD are some RDD transformations.

How to load data in Apache Spark?

Map() —

The map() transformation is used to do complex operations, such as adding a column, changing a column, and so on. The output of map transformations always has the same amount of records as the input.

In our word count example, we add a new column with the value 1 for each word; the RDD returns PairRDDFunctions, which contain key-value pairs, with a word of type String as the key and 1 of type Int as the value. I’ve defined the rdd3 variable with type.

flatMap() —

After applying the function, the flatMap() transformation flattens the RDD and returns a new RDD. In the example below, it splits each record in an RDD by space first, then flattens it. Each record in the resulting RDD has a single word.


Filtering records in an RDD are done with the filter() transformation. We are filtering all words that begin with the letter “a”.


sortByKey() is a function that allows you to sort your data by key.

It sorts RDD elements by key using the sortByKey() transformation. we use the map transformation to change RDD[(String,Int)] to RDD[(Int, String]) and then use sortByKey to sort on an integer value. Finally, foreach with println statements returns every words in RDD as a key-value pair, as well as their count.

//Print rdd6 result to console rdd6.foreach(println)

reduceByKey() :

reduceByKey() combines the values of each key with the function supplied. It decreases the word string in our case by using the sum function on value. Our RDD yielded a list of unique terms and their counts.

Apache Spark RDD Actions

We’ll stick with our word count example for now; foreach() action is used to manage accumulators, write to a database table, or access external data sources, but foreachPartiton() is more efficient since it allows you to conduct heavy initializations per partition. On our word count example, let’s look at some more action procedures.

max–This function is used to return the max record.

println(“Max Record : “+datMax._1 + “,”+ datMax._2)

fold–This function aggregates the elements of each partition, and then the results for all of the partitions.

val sum = acc+v sum

$ Output: fold: 20

reduce–This function is used to decrease the records to single, we can use this to count or sum.

println(“dataReduce Record : “+totalWordCount._1)

Collect–Returns an array of all data from RDD. When working with large RDDs with millions or billions of records, be cautious about using this method because the driver may run out of memory.

println(“Key:”+ f._1 +”, Value:”+f._2) })

saveAsTextFile–we can use saveAsTestFile action to write the RDD to a text file.

What is Pyspark RDD?

The PySpark RDD (Resilient Distributed Dataset) is a core data structure in PySpark that is a fault-tolerant, immutable distributed collection of items, which means you can’t change it after you’ve created it. RDD divides each dataset into logical partitions that can be computed on separate cluster nodes.

PySpark is the Python API for Apache Spark, an open-source distributed computing framework used for big data processing and analytics. It allows developers to write Spark applications using Python, leveraging the power and scalability of Spark’s distributed computing capabilities. PySpark provides a high-level interface for working with distributed datasets, enabling tasks like data manipulation, querying, and machine learning. It seamlessly integrates with other Python libraries and offers a familiar programming experience for Python developers. PySpark supports parallel processing, fault tolerance, and in-memory caching, making it well-suited for handling large-scale data processing tasks in a distributed computing environment.

How to read CSV or JSON files into DataFrame

Using csv(“path”) or format(“csv”).load(“path”) we can read a CSV file into a PySpark DataFrame of DataFrameReader. These methods take a file path to read from as an input. You can specify data sources by their fully qualified names when using the format(“CSV”) method. However, for built-in sources, you can simply use their short names (CSV, JSON, parquet, JDBC, text e.t.c).

df ="org.apache.spark.sql.csv") .load("/tmp/resources/zipcodes.csv") df.printSchema()

Loading a CSV file in PySpark is a little more difficult. Because there is no local storage in a distributed environment, a distributed file system such as HDFS, Databricks file store (DBFS), or S3 must give the file’s path.

When I use PySpark, I usually work with data stored in S3. Many databases provide an unload to S3 feature, and you can also move files from your local workstation to S3 via the AWS dashboard. I’ll be using the Databricks file system (DBFS) for this article, which gives paths in the manner of /FileStore. The first step is to upload the CSV file that you want to work with.

file_location = "/FileStore/tables/game_skater_stats.csv"df ="csv").option("inferSchema", True).option("header", True).load(file_location)display(df)

The next snippet shows how to save the data frame from a previous snippet as a parquet file on DBFS, then reload the data frame from the parquet file.'/FileStore/parquet/game_skater_stats', format='parquet')df ="/FileStore/parquet/game_skater_stats") display(df) How to 

Write PySpark DataFrame to CSV file?

df.write.option("header",True) .csv("/tmp/spark_output/zipcodes")

Writing Data:

It’s not a good idea to write data to local storage while using PySpark, just like it’s not a good idea to read data with Spark. You should instead use a distributed file system like S3 or HDFS. If you’re going to use Spark to process the findings, parquet is a decent format to save data frames in.'/FileStore/parquet/game_stats',format='parquet')

Create a data frame:

To generate a DataFrame from a list, we’ll need the data, so let’s get started by creating the data and columns we’ll need.

columns = ["language","count"] data = [("Java", "20000"), ("Python", "100000"), ("c#", "3000")]

The toDF() method of PySpark RDD is used to construct a DataFrame from an existing RDD. Because RDD lacks columns, the DataFrame is generated with the default column names “_1” and “_2” to represent the two columns we have.

columns = ["language","users_count"] dfFromRDD1 = rdd.toDF(columns) dfFromRDD1.printSchema() Convert PySpark RDD to DataFrame

The RDD’s toDF() function is used in PySpark to convert RDD to DataFrame. We’d have to change RDD to DataFrame because DataFrame has more benefits than RDD. For example, DataFrame is a distributed collection of data arranged into named columns that give optimization and efficiency gains, comparable to database tables.

from chúng tôi import SparkSession spark = SparkSession.builder.appName('PySpark create using parallelize()').getOrCreate() dept = [("Finance",10),("Marketing",20),("Sales",30),("IT",40)] rdd = spark.sparkContext.parallelize(dept)

To begin, pass a Python list object to the sparkContext.parallelize() function to generate an RDD.

When you construct an RDD in PySpark, this collection will be parallelized if you have data in a list, which means you have a collection of data in the PySpark driver’s memory.

deptColumns = ["dept_name","dept_id"] df2 = rdd.toDF(deptColumns) df2.printSchema() Convert PySpark DataFrame to Pandas

A function toPandas() can convert a PySpark DataFrame to a Python Pandas DataFrame (). A function toPandas() can convert a PySpark DataFrame to a Python Pandas DataFrame (). PySpark works on several machines, whereas pandas run on a single node. If you’re working on a Machine Learning application with massive datasets, PySpark is much faster than pandas at processing operations.

First, we have to create data frames in PySpark.

import pyspark from chúng tôi import SparkSession spark = SparkSession.builder.appName('Pyspark data frames to pandas').getOrCreate() data = [("James","","Smith","36636","M",60000), ("Michael","Rose","","40288","M",70000), columns = ["first_name","middle_name","last_name","dob","gender","salary"] pysparkDF = spark.createDataFrame(data = data, schema = columns) pysparkDF.printSchema()

toPandas() collects all records in the PySpark DataFrame and sends them to the driver software; it should only be used on a tiny fraction of the data. When using a larger dataset, the application crashes because of a memory problem.

pandasDF = pysparkDF.toPandas() print(pandasDF) Most commonly used PySpark functions

PySpark show() :

PySpark DataFrame show() displays the contents of a DataFrame in a Table Row and Column Format. The column values are truncated at 20 characters by default, and only 20 rows are displayed.

from chúng tôi import SparkSession spark = SparkSession.builder.appName('pyspark show()').getOrCreate() columns = ["Seqno","Quote"] data = [("1", "Be the change that you wish to see in the world"), ("2", "Everyone thinks of changing the world, but no one thinks of changing himself."), df = spark.createDataFrame(data,columns)

Let’s look at how to display the complete contents of the Quote column, which are truncated at 20 characters.

Pyspark Filter():

If you’re coming from a SQL background, you can use the where() clause instead of the filter() method to filter the rows from an RDD/DataFrame depending on the specified condition or SQL expression.

from pyspark.sql.types import StructType,StructField from pyspark.sql.types import StringType, IntegerType, ArrayType data = [ (("James","","Smith"),["Java","Scala","C++"],"OH","M"), (("Anna","Rose",""),["Spark","Java","C++"],"NY","F"), (("Julia","","Williams"),["CSharp","VB"],"OH","F"), ] schema = StructType([ StructField('name', StructType([ StructField('firstname', StringType(), True), StructField('middlename', StringType(), True), StructField('lastname', StringType(), True) ])), StructField('languages', ArrayType(StringType()), True), StructField('state', StringType(), True), StructField('gender', StringType(), True) ]) df = spark.createDataFrame(data = data, schema = schema) df.printSchema()

To filter the rows from a DataFrame, use Column with the condition. You can express complex conditions by referring to column names with dfObject.colname.

df.filter(df.state == "OH").show(truncate=False)

PySpark map():

PySpark map (map()) is an RDD transformation that applies the transformation function (lambda) to each RDD/DataFrame element and returns a new RDD.

from chúng tôi import SparkSession spark = SparkSession.builder.master("local[1]") .appName("pyspark map()").getOrCreate() data = ["Project","Gutenberg’s","Alice’s","Adventures", "in","Wonderland","Project","Gutenberg’s","Adventures", "in","Wonderland","Project","Gutenberg’s"] rdd=spark.sparkContext.parallelize(data)

RDD map() transformations are used to do sophisticated operations, such as adding a column, changing a column, converting data, and so on. The output of map transformations always has the same amount of records as the input. x: (x,1)) for element in rdd2.collect(): print(element)

PySpark Select():

PySpark select() is a transformation function that returns a new DataFrame with the selected columns. It may pick single, multiple, column by index, all columns from a list, and nested columns from a DataFrame.

import pyspark from chúng tôi import SparkSession spark = SparkSession.builder.appName('Pyspark Select()').getOrCreate() data = [("James","Smith","USA","CA"), ("Michael","Rose","USA","NY") ] columns = ["firstname","lastname","country","state"] df = spark.createDataFrame(data = data, schema = columns)

By giving the column names to the select() function, you can choose a single or several columns from the DataFrame. This produces a new DataFrame with the selected columns because DataFrame is immutable. The Dataframe contents are displayed using the show() function."name").show(truncate=False)

PySpark Join():

PySpark Join is used to join two DataFrames together, and by chaining them together, you can join several DataFrames. It supports all fundamental SQL join types, including INNER, LEFT OUTER, RIGHT OUTER, LEFT ANTI, LEFT SEMI, CROSS, and SELF JOIN.

emp = [(1,"Smith",-1,"2023","10","M",3000), (2,"Rose",1,"2010","20","M",4000), (3,"Williams",1,"2010","10","M",1000), (4,"Jones",2,"2005","10","F",2000), ] empColumns = ["emp_id","name","superior_emp_id","year_joined", "emp_dept_id","gender","salary"] empDF1 = spark.createDataFrame(data=emp, schema = empColumns) empDF1.printSchema() dept = [("Finance",10), ("Marketing",20), ("Sales",30), ("IT",40) ] deptColumns = ["dept_name","dept_id"] deptDF1 = spark.createDataFrame(data=dept, schema = deptColumns) deptDF1.printSchema()

Inner join is PySpark’s default and most commonly used join. This connects two datasets based on key columns, with rows from both datasets being deleted if the keys don’t match (emp & dept).

empDF.join(deptDF,empDF.emp_dept_id == deptDF.dept_id,"inner") .show(truncate=False) Frequently Asked Questions

Big Data engineers are to identify patterns in large data sets and design algorithms to make raw data more relevant to businesses. This IT position causes a diverse range of technical abilities, including a thorough understanding of SQL database design and several programming languages.

Skillsets and responsibilities for big data engineers:

Analytical abilities

 Data visualization abilities

Knowledge of business domains and big data tools.

Programming abilities

Problem-solving abilities.

Data mining Techniques

About Myself:

This is Lavanya from Chennai. I am a passionate writer and enthusiastic content maker. The most intractable problems always thrill me. I am currently pursuing my B.E., in Computer Engineering and have a strong interest in the fields of data engineering, machine learning, data science, and artificial intelligence, and I am constantly looking for ways to integrate these fields with other disciplines such as science and chemistry to further my analysis goals.


I hope you found this blog post interesting! You should now be familiar with the Apache spark and Pyspark RDD operations and functions, as well as scopes of big data. In this article, we glanced at how to install and use the spark framework using python and how it may help you know about some of the RDD functions in the spark environment.


If you have questions about Spark RDD Operations, please contact us. I will gladly assist you in resolving them.

The media shown in this article is not owned by Analytics Vidhya and are used at the Author’s discretion.


Update the detailed information about Tomtom Spark Cardio + Music Review on the website. We hope the article's content will meet your needs, and we will regularly update the information to provide you with the fastest and most accurate information. Have a great day!