PYTHON Tutorial

Data Handling

Introduction

Python is a versatile programming language widely used for data handling tasks. It offers a rich ecosystem of libraries, including pandas, numpy, csv, and json, that streamline data manipulation and analysis.

Key Concepts

  • pandas: A dataframe-based library for tabular data manipulation and analysis.
  • numpy: A library for numerical operations and array-based data structures.
  • csv: A module for reading and writing data in Comma-Separated Values (CSV) format.
  • json: A module for reading and writing data in JavaScript Object Notation (JSON) format.

Techniques and Tools

Data Import and Export:
  • Importing data from CSV: pd.read_csv('data.csv')
  • Exporting data to CSV: df.to_csv('output.csv')
  • Importing data from JSON: pd.read_json('data.json')
  • Exporting data to JSON: df.to_json('output.json')
Data Manipulation:
  • Selecting columns: df[['column1', 'column2']]
  • Filtering rows: df[df['column'] > 10]
  • Grouping data: df.groupby('column').agg({'value': 'mean'})
  • Calculating statistics: df['column'].mean()
Data Analysis:
  • Plotting data: plt.plot(df['x'], df['y'])
  • Regression analysis: stats.linregress(df['x'], df['y'])
  • Time series analysis: pd.TimeSeries()
  • Machine learning: Integration with libraries like scikit-learn for data modeling and prediction

Python Example

import pandas as pd

# Import data from CSV
df = pd.read_csv('data.csv')

# Filter rows where 'age' is greater than 30
filtered_df = df[df['age'] > 30]

# Calculate the mean of 'salary'
mean_salary = df['salary'].mean()

# Create a scatter plot of 'age' vs 'salary'
plt.scatter(df['age'], df['salary'])
plt.xlabel('Age')
plt.ylabel('Salary')
plt.show()

Conclusion

Python's data handling capabilities are powerful and user-friendly. Understanding the key concepts and utilizing the available libraries empowers you to effectively manipulate, analyze, and visualize data for various applications.