Home

Pandas - DataFrame

DataFrames description:

  • two dimensional labeld data
  • supports diffrent data types
  • easy to manipulate: eg. reshaping, slicing, grouping

used for:

  • data wrangling
  • visualization
  • data processing
  • exploratory data analysis
  • creating models

Create DataFrame by dict

In [2]:
import pandas as pd
data = {'age':  [22,55,43],
        'names': ['A','B','C'],
        'country': ['uk', 'us','de'],
        }

df = pd.DataFrame (data)
print(df)
   age names country
0   22     A      uk
1   55     B      us
2   43     C      de

Stats about DataFrame

In [3]:
print(df.describe())
print(df.info())
print(df.shape)
             age
count   3.000000
mean   40.000000
std    16.703293
min    22.000000
25%    32.500000
50%    43.000000
75%    49.000000
max    55.000000
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
age        3 non-null int64
names      3 non-null object
country    3 non-null object
dtypes: int64(1), object(2)
memory usage: 152.0+ bytes
None
(3, 3)

Use specific columns

In [4]:
df = pd.DataFrame (data, columns = ['age','country'])
print(df)
   age country
0   22      uk
1   55      us
2   43      de

Filter Rows

In [5]:
dfFiltered = df[df['age']<50]
print(dfFiltered)
   age country
0   22      uk
2   43      de

numpy array <> pandas DataFrame

You can use:

  • np.array(yourDataFrameVariable)
  • yourDataFrameVariable.values

to transform yout pandas DataFrame to a numpy array

In [6]:
import numpy as np
print(type(dfFiltered))
print(dfFiltered)

npArray = np.array(dfFiltered)
print(type(npArray))
print(npArray)

dfAgain = pd.DataFrame(npArray)
print(type(dfAgain))
print(dfAgain)
<class 'pandas.core.frame.DataFrame'>
   age country
0   22      uk
2   43      de
<class 'numpy.ndarray'>
[[22 'uk']
 [43 'de']]
<class 'pandas.core.frame.DataFrame'>
    0   1
0  22  uk
1  43  de
In [7]:
npArray = dfFiltered.values
print(type(npArray))
print(npArray)
<class 'numpy.ndarray'>
[[22 'uk']
 [43 'de']]
Impressum