Data Science and Big Data Analytics Lab

This repo contains all my assignments during the DSBDA Lab in Sem 6.

Sr.	Name	Description
1	Data Wrangling I	Perform the following operations using Python on any open-source dataset (e.g., data.csv): 1. Import all required Python Libraries. 2. Locate an open-source dataset (e.g., https://www.kaggle.com), provide a clear description and source URL. 3. Load the dataset into a pandas DataFrame. 4. Data Preprocessing: Check for missing values using `isnull()`, get initial statistics using `describe()`, provide variable descriptions and types. Check dimensions of the DataFrame. 5. Data Formatting and Normalization: Check and convert data types (character, numeric, integer, factor, logical). 6. Turn categorical variables into quantitative variables. In addition to the codes and outputs, explain every operation clearly.
2	Data Wrangling II	Create an "Academic performance" dataset of students and: 1. Scan all variables for missing values and inconsistencies, handle appropriately. 2. Scan numeric variables for outliers and handle appropriately. 3. Apply transformations to variables (for better scaling, linearity, or normality). Document your approach properly.
3	Descriptive Statistics: Measures of Central Tendency and Variability	Perform the following: 1. On `[nba.csv]`: Provide summary statistics (mean, median, min, max, std deviation) grouped by a categorical variable. 2. On `[iris.csv]`: Display basic statistics (percentile, mean, std deviation) for each Iris species (`setosa`, `versicolor`, `virginica`). Provide codes, outputs, and explanations.
4	Data Visualization I	Using the inbuilt `titanic` dataset (891 rows): 1. Use Seaborn to find patterns. 2. Plot a histogram for 'fare' to see price distribution.
5	Data Visualization II	Using `titanic` dataset: 1. Plot a box plot for 'age' distribution across 'sex' and survival status. 2. Write observations based on the plots.
6	Data Visualization III	Using the `iris.csv` dataset: 1. List all features and their types. 2. Create histograms for each feature. 3. Create box plots for each feature. 4. Compare distributions and identify outliers.
7	Data Analytics I	Create a Linear Regression Model in Python/R to predict home prices using the Boston Housing Dataset (https://www.kaggle.com/c/boston-housing). Objective: Predict house prices using the features.
8	Data Analytics II	Problem Statement: 1. Implement Logistic Regression on `Social_Network_Ads.csv`. 2. Compute confusion matrix and derive TP, FP, TN, FN, Accuracy, Error rate, Precision, Recall.
9	Data Analytics III	Implement a Simple Naïve Bayes classifier using Python/R on `iris.csv`. Compute Confusion matrix and derive TP, FP, TN, FN, Accuracy, Error rate, Precision, Recall.
10	Text Analytics	1. Extract a sample document and apply: Tokenization, POS Tagging, Stop words removal, Stemming, Lemmatization. 2. Calculate Term Frequency and Inverse Document Frequency representations.
11	Hadoop Word Count	Write a Java program for a Word Count application using Hadoop Map-Reduce framework in a local-standalone setup.
13	Apache Spark Word Count	Write a simple program in Scala using the Apache Spark Framework for Word Count.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
Assignment 01		Assignment 01
Assignment 02		Assignment 02
Assignment 03		Assignment 03
Assignment 04		Assignment 04
Assignment 05		Assignment 05
Assignment 06		Assignment 06
Assignment 07		Assignment 07
Assignment 08		Assignment 08
Assignment 09		Assignment 09
Assignment 10		Assignment 10
Assignment 11		Assignment 11
Assignment 13		Assignment 13
Lab Manuals		Lab Manuals
Notes		Notes
Remember		Remember
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Data Science and Big Data Analytics Lab

About

Uh oh!

Releases

Packages

Uh oh!

Languages

tirthraj07/DSBDA-LAB

Folders and files

Latest commit

History

Repository files navigation

Data Science and Big Data Analytics Lab

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages