DataDive

Data Collection

Using pandas to read data from a CSV file.

pythonCopy codeimport pandas as pd

# Load data from a CSV file
data = pd.read_csv('data.csv')
print(data.head())

2. Data Cleaning

Handling missing values and duplicates.

# Check for missing values
print(data.isnull().sum())

# Fill missing values
data.fillna(method='ffill', inplace=True)

# Remove duplicates
data.drop_duplicates(inplace=True)

3. Data Exploration

Basic statistics and visualizations.

# Summary statistics
print(data.describe())

# Visualize data distribution
import matplotlib.pyplot as plt
import seaborn as sns

sns.histplot(data['column_name'], bins=30)
plt.show()

4. Data Transformation

Creating new features and encoding categorical variables.

# Creating a new column
data['new_column'] = data['existing_column'] * 2

# One-hot encoding for categorical variables
data = pd.get_dummies(data, columns=['categorical_column'])

5. Data Analysis

Performing group operations and aggregations.

# Group by and aggregate
grouped_data = data.groupby('category_column').agg({'value_column': 'mean'})
print(grouped_data)

6. Data Visualization

Creating plots to visualize relationships.

# Scatter plot
plt.figure(figsize=(10, 6))
sns.scatterplot(data=data, x='feature1', y='feature2', hue='category_column')
plt.title('Feature1 vs Feature2')
plt.show()

7. Machine Learning

Simple model training using scikit-learn.

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

# Splitting the dataset
X = data[['feature1', 'feature2']]
y = data['target']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Training a linear regression model
model = LinearRegression()
model.fit(X_train, y_train)

# Making predictions
predictions = model.predict(X_test)

8. Model Evaluation

Assessing model performance.

from sklearn.metrics import mean_squared_error, r2_score

mse = mean_squared_error(y_test, predictions)
r2 = r2_score(y_test, predictions)

print(f'Mean Squared Error: {mse}')
print(f'R^2 Score: {r2}')

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *