Data Transformations in Databricks
A collection of Databricks notebooks that I wrote for my big data management class assignments and projects, as well as notebooks of my personal projects. Class assignments involved working with HiveQl and Databricks SQL to load data in from Databricks "DBFS" (Databricks Distributed File System); a distributed file system that is very similar to Hadoop's "HDFS" (Hadoop Distributed File Systems). I also got to use SparkR frequently in this class to transform and wrangle data using Spark DataFrames before loading it into new tables with Spark SQL. My personal projects were focused towards transferring my newfound knowledge of Spark's DataFrame's to Python by applying the data transformation skills learned in class using PySpark.
Below, you'll find links to the notebooks in a HTML format for better readability and visuals. I titled the notebooks by the skill learned with a brief description explaining the objectives achieved.
Type | Description |
---|---|
Creating Tables | Loading data into a newly created table using Databricks SQL |
Partitioning Tables | Made a table using dynamic partitioning and loaded data using HiveQL |
Data Analysis with SparkR | Loadd data into a Spark DataFrame using Spark SQL and analyzed the data using SparkR functions |
-- | -- |