1.    Data type

In this part, we will introduce a few data type that will be used in our tutorial. Data type is a particular kind of data which will define the types of values that can be set into data. Most programming languages support common data types of real, integer and boolean. Python also supports these data types and some other advanced data types such as tuples, vectors … etc.

Integer:

          The first basic data type we’re introducing is integer. Integer is the type of numbers that don’t contain decimal point inside the number. In Python 3, int data type doesn’t have limitation which means it can contain and handle big number (over 1010000). If you use other programming language such as C++ or C#, there’re limitations for integer. However, Python has built-in function to deal with big numbers. In Python 3, integer is called int.

Float

          Secondly, we’ll discuss about float. Float is the type of numbers that contain decimal point in the number – real numbers. In Python 3, float is also called float.

String:

Strings are sequences of character data. The string type in Python is called str. String literals may be delimited using either single or double quotes. All the characters between the opening delimiter and matching closing delimiter are part of the string. A string in Python can contain as many characters as you wish. The only limit is your machine’s memory resources.

          Lists:

          Lists are used to store multiple values in one single variable. An List is a special variable, which can hold more than one value at a time. Imagine you have couples of cars, let’s say 5 cars, and in order to store 5 cars’ names, you will need 5 variables. What if it was 5000 cars? It will be hard to iterate through 5000 variables. Therefore, List is such a solution.

By using a List, you can access elements by index of that List. In Python 3, List is indexed from 0 to n-1 (n is the length of the List).

          There are some functions that are applied for List such as add new element to Lists, delete elements from Lists, change the values of elements inside Lists. These built-in functions will be very useful for data scientists to work with data.

          We just went through some basic data types and structures of Python which will be very useful for solving Data science problems. Hope you got a brief of these data types and become familiar with them so that you can feel more convenient when you code with Python.

2.    If, While, For

In this part we will discuss some controlling codes of Python 3. Every programming languages have these controlling codes such as If else, While, For. These codes will control how the codes run and how they behave in different situations.

If else statements:

In order to write useful programs, we almost always need the ability to check conditions and change the behavior of the program accordingly. Conditional statements give us this ability. The simplest form is the if statement, which has the general form:

if <expr>:
    STATEMENTS_1        # executed if condition evaluates to True
else:
    STATEMENTS_2        # executed if condition evaluates to False

Each statement inside the if block of an if else statement is executed in order if the boolean expression evaluates to True. The entire block of statements is skipped if the boolean expression evaluates to False, and instead all the statements under the else clause are executed.

There is no limit on the number of statements that can appear under the two clauses of an if else statement, but there has to be at least one statement in each block.

Sometimes there are more than two possibilities and we need more than two branches. One way to express a computation like that is a chained conditional:

if <expr1>:
    STATEMENTS_A
elif <expr2>:
    STATEMENTS_B
else:
STATEMENTS_C


While:

          One strength of the computers is that they can run and do one thing over and over again without tiredness – they can iterate to run a script of codes over and over again. Iteration means executing the same block of code over and over, potentially many times. A programming structure that implements iteration is called a loop.

          The while statement is used for repeated execution as long as an expression is true. As long as the <expr> is still being true, the while loop will run the <statement(s)> again and again.

while <expr>:
    <statement(s)>

For:

          A for loop is used for iterating over a sequence (list, sequence, data frame …). For loop in Python is not like for loop in C++ or other languages, it works more like an iterator method as found in other object-orientated programming languages.

for <var> in <sequence>:
    <statement(s)>

1.    Functions

In Python, function is a group of related statements that perform a specific task. Functions help break our program into smaller and modular chunks. As our program grows larger and larger, functions make it more organized and manageable. Furthermore, it avoids repetition and makes code reusable. In Python 3, functions have this form:

def <function name>(<parameters>):
        statement(s)

For example, if you want to calculate the area of triangles, instead of repeat some codes several times, you just have to write a function to do the work and call the function wherever you need:

2.    Numpy, Pandas

In this part we will introduce two Python libraries that are very powerful and useful to organize data, preprocess data, perform many complex calculations of Linear Algebra.

a.     Numpy:

NumPy is the fundamental package for scientific computing with Python. It contains among other things: a powerful N-dimensional array object, sophisticated (broadcasting) functions, tools for integrating C/C++ and Fortran code, useful linear algebra, Fourier transform, and random number capabilities.

Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional container of generic data. Arbitrary data-types can be defined. This allows NumPy to seamlessly and speedily integrate with a wide variety of databases.

To install Numpy to your Raspberry Pi just open a terminal and run this command:

conda install numpy

To test whether numpy is installed to your computer, just import it into your python file like this.

If no error is reported that means you succeeded. There’re many interesting things inside numpy library. However, in this tutorial, we won’t list everything of numpy out but we will explain all the codes that are related to numpy library which are used in our tutorial when we meet them.

b.     Pandas

          Pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis / manipulation tool available in any language. It is already well on its way toward this goal.

pandas is well suited for many different kinds of data:

  • Tabular data with heterogeneously-typed columns, as in an SQL table or Excel spreadsheet
  • Ordered and unordered (not necessarily fixed-frequency) time series data.
  • Arbitrary matrix data (homogeneously typed or heterogeneous) with row and column labels
  • Any other form of observational / statistical data sets. The data actually need not be labeled at all to be placed into a pandas data structure

To install pandas to your Raspberry Pi just open a terminal and run this command:

conda install pandas

To test whether pandas is installed to your computer, just import it into your python file like this.

3.    Scikit-learn

Scikit-learn (sklearn) is the library of Python’s most popular machine learning. It has built in complex algorithms, you just need to insert data, wait for it to calculate and get results, as easy as eating candy.

The functionality that scikit-learn provides include:

  • Regression, including Linear and Logistic Regression
  • Classification, including K-Nearest Neighbors
  • Clustering, including K-Means and K-Means++
  • Model selection
  • Preprocessing, including Min-Max Normalization

The power of scikit-learn will greatly aid your creation of robust Machine Learning programs. This library is necessary for running Machine Learning algorithms which are a very powerful tool for solving DS problems.

To install Sklearn to your Raspberry Pi just open a terminal and run this command:

conda install sklearn

To test whether Sklearn is installed to your computer, just import it into your python file like this.

If it doesn’t report any error just like the figure above, then we are success. At this point we have all most everything that is needed to start solving data science problems. In the next chapter, we will discuss about descriptive statistics which is a very effective way to understand the data. Moreover, we will look over some basic probability distribution functions also.