Data Collection
Using pandas
to read data from a CSV file.
pythonCopy codeimport pandas as pd
# Load data from a CSV file
data = pd.read_csv('data.csv')
print(data.head())
2. Data Cleaning
Handling missing values and duplicates.
# Check for missing values
print(data.isnull().sum())
# Fill missing values
data.fillna(method='ffill', inplace=True)
# Remove duplicates
data.drop_duplicates(inplace=True)
3. Data Exploration
Basic statistics and visualizations.
# Summary statistics
print(data.describe())
# Visualize data distribution
import matplotlib.pyplot as plt
import seaborn as sns
sns.histplot(data['column_name'], bins=30)
plt.show()
4. Data Transformation
Creating new features and encoding categorical variables.
# Creating a new column
data['new_column'] = data['existing_column'] * 2
# One-hot encoding for categorical variables
data = pd.get_dummies(data, columns=['categorical_column'])
5. Data Analysis
Performing group operations and aggregations.
# Group by and aggregate
grouped_data = data.groupby('category_column').agg({'value_column': 'mean'})
print(grouped_data)
6. Data Visualization
Creating plots to visualize relationships.
# Scatter plot
plt.figure(figsize=(10, 6))
sns.scatterplot(data=data, x='feature1', y='feature2', hue='category_column')
plt.title('Feature1 vs Feature2')
plt.show()
7. Machine Learning
Simple model training using scikit-learn
.
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
# Splitting the dataset
X = data[['feature1', 'feature2']]
y = data['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Training a linear regression model
model = LinearRegression()
model.fit(X_train, y_train)
# Making predictions
predictions = model.predict(X_test)
8. Model Evaluation
Assessing model performance.
from sklearn.metrics import mean_squared_error, r2_score
mse = mean_squared_error(y_test, predictions)
r2 = r2_score(y_test, predictions)
print(f'Mean Squared Error: {mse}')
print(f'R^2 Score: {r2}')
Leave a Reply