{ "cells": [ { "cell_type": "markdown", "id": "e0a8ceab", "metadata": {}, "source": [ "# Lab 5: Git/GitHub Tutorial\n", "\n", "GitHub is a code hosting platform for version control and collaboration. It lets you and others work together on projects from anywhere.\n", "\n", "## GitHub essentials\n", "Before we head to the basics, create/log in your GitHub account.\n", "### Repositories\n", "* A repository is usually used to organize a single project.\n", "* Repositories can contain folders and files, images, videos, spreadsheets, and data sets -- anything your project needs. \n", "* Often, repositories include a README file, a file with information about your project written in the plain text Markdown language. A Markdown language cheat sheet: https://www.markdownguide.org/cheat-sheet/\n", "\n", "Now let's practice!\n", "\n", "### Branches\n", "* Branching lets you have different versions of a repository at one time.\n", "* Your repository has only one branch named `main` by default, but you can create additional branches off of `main` in your repository. You can use branches to have different versions of a project at one time. This is helpful when you want to add new features to a project without changing the main source of code. \n", "* The work done on different branches will not show up on the `main` branch until you merge it. You can use branches to experiment and make edits before committing them to `main`.\n", "![alt text](diagram.png)\n", "\n", "Practice: Create `readme-edits` branch.\n", "\n", "\n", "### Commits\n", "* When you created a new branch in the previous step, GitHub brought you to the code page for your new `readme-edits` branch, which is a copy of `main`.\n", "\n", "* You can make and save changes to the files in your repository. On GitHub, saved changes are called commits. \n", "\n", "* Each commit has an associated commit message, which is a description tha captures the history of your changes so that other contributors can understand what you’ve done and why.\n", "\n", "Practice: Edit the README.md file and commit changes. Now this branch should be containining content that's different from `main`.\n", "\n", "\n", "### Pull Requests\n", "* Now that you have changes in a branch off of `main`, you can open a pull request.\n", "* Pull requests are the heart of collaboration on GitHub. When you open a pull request, you're proposing your changes and requesting that someone review and pull in your contribution and merge them into their branch. Pull requests show diffs, or differences, of the content from both branches. The changes, additions, and subtractions are shown in different colors.\n", "* As soon as you make a commit, you can open a pull request and start a discussion, even before the code is finished. By using GitHub's `@mention` feature in your pull request message, you can ask for feedback from specific people or teams.\n", "\n", "Practice: Create a pull request. Now your collaborators can review your edits and make suggestions.\n", "\n", "### Merging your pull request\n", "\n", "* In this final step, you will merge your `readme-edits` branch into the `main` branch. After you merge your pull request, the changes on your `readme-edits` branch will be incorporated into `main`." ] }, { "cell_type": "markdown", "id": "6a8cd96c", "metadata": {}, "source": [ "## Use Git/GitHub with Jupyter Notebook\n", "\n", "\n", "### Add a notebook to the GitHub repository\n", "* Copy the highlighted HTTPS repository URL, and clone the GitHub repository on our machine by running following on the terminal: `git clone https://github.com/usrname/repositoryname.git`. It will create lab05 directory on our machine which is linked to `lab05` repository on Github. \n", "\n", "* Let’s push some notebooks to the repository. We copy two notebooks to the directory where we cloned projectA repository, `cp /some/path/analysis1.ipynb /path/of/lab05/`. \n", "\n", "* To push `analysis1.ipynb` to GitHub, we first need to tell local git client to start tracking the file using `git add analysis1.ipynb` (If you want to upload more than one notebooks, instead you can update the repository by enter the folder of the github repositary that you want to update, then type `git add .`). You can check which files are being tracked with `git status`. \n", "\n", "* Now let’s commit the changes, `git commit -m \"Adds customer data analysis notebook\"`\n", "\n", "* Let’s push this commit to GitHub:`git push`\n", "\n", "### Integrate the remote changes to your local directory\n", "* `git pull`\n", "\n", "* If you have made local changes to the directory, you need to commit your changes or stash them before you merge. You can't merge with local modifications.\n", "\n", "### Three options\n", "* Commit the change using: `git commit -m \"My message\"`\n", "* Stash the change using `git stash`, you can do the merge, and then pull the stash: `git stash pop`\n", "* Discard the local changes using `git reset --hard` for all changes or `git checkout filename` for a specific file.\n", "\n" ] }, { "cell_type": "markdown", "id": "aed385f2", "metadata": {}, "source": [ "## Motivation for git\n", "\n", "Good practice in programming project management requires a version control\n", "system.\n", "\n", "Old school techniques are usually bad.\n", "\n", "- Version filenames is a disaster.\n", "\n", " - `mythesis_v1.tex`, `mythesis_v2.tex`, `mythesis_last_v3.tex`\n", " - creates clutter\n", " - Filenames rarely contain information other than chronology\n", " - Parallel independent changes super hard to keep track of\n", " - Did you finally notice a problem in v119 that has been around for a while,\n", " but you have no idea where the error was introduced?\n", "\n", "- Sharing files with others is a disaster.\n", "\n", " - Emailing files sucks --- only magnifies the problems above\n", " - Track changes feature Google Docs or Word --- not so useful for anything\n", " complex\n", "\n", "- Disaster recover is a disaster.\n", " - Oh F#@K! Did I just overwrite all my work from last night??!!!?\n", "\n", "![title](phd101212s.png)\n", "\n", "Modern version control techniques are usually great.\n", "\n", "Modern tools to promote collective intelligence.\n", "\n", "- Automated history of everything\n", "\n", " - not just files, but whole projects with folders and subfolders\n", " - who, what, when, and (most important) why\n", "\n", "- Automated sharing of everyone's latest edits\n", "\n", " - no more emailing files around\n", "\n", "- Easier disaster recovery with distributed VCSes like Git or Mercurial (see\n", " later)\n", "\n", "- Support for automated testing (we'll cover this in future lectures)\n", "\n", "- Infinite sandboxes for clutter-free, fear-free experimentation\n", "\n", " - this is where Git especially shines -- main topic today\n", "\n", "- CAVEAT 1: All of this works best with plain text files\n", "\n", "- CAVEAT 2: All of this works best with a highly modular file structure\n", "\n", "- The git feedback effect:\n", " - Using git encourages positive changes to your workflow.\n", " - And making your workflow more git-friendly will make your work better\n", " overall.\n", "\n", "## Brief history of version control\n", "\n", "![title](CVCS-vs-DVCS.png)\n", "\n", "- Local Version Control\n", "\n", " - Mainly just reduced clutter and automated tracking of chronology...\n", "\n", "- Centralized Version Control\n", "\n", " - Allows group work on the same files...\n", " - Single point of failure --- there is only a single \"real\" repository\n", " - Backing up is a separate process\n", " - File locks --- create \"race conditions\" for committing changes to that\n", " \"real\" repository\n", " - What if you lose internet?\n", " - Branching is cumbersome, so people don't do it (and have trouble reconciling\n", " disparate histories when they do)\n", "\n", "- Distributed Version Control\n", " - Resolve most of the above issues...\n", " - Many separate and independent repos; all are \"first-class\" citizens\n", " - Can make commits locally even without internet...\n", " - ...but can transfer history and information between repositories\n", " - Branching is lightweight and easy (mainly in Git)\n", "\n", "## Core concepts in git\n", "\n", "Branches and DAGs\n", "\n", "A DAG is a directed acyclic graph linking together a sequence of tasks.\n", "\n", "Let's calibrate people's intuitions about git terminology: how many branches are\n", "in this DAG?\n", "\n", "![title](DAG_example.png)\n", "\n", "Best way to get comfortable conceptually with git branching is to see it in\n", "action.\n", "\n", "So let's do an exercise...\n", "\n", "Things to keep in mind during our exercise\n", "\n", "- Just a quick tour\n", "\n", "- Some git operations are of a \"send it out\" variety, while others are of a\n", " \"bring it in\" variety\n", "\n", " - important to keep straight which are of which flavor\n", "\n", "- Some git operations are repo-wise, while others are branch-wise\n", "\n", "Your git branching sandbox\n", "\n", "Open a browser to this URL: https://learngitbranching.js.org/?NODEMO\n", "\n", "Other resources for git:\n", "\n", "- https://gitimmersion.com/\n", "- http://think-like-a-git.net/\n", "- http://ndpsoftware.com/git-cheatsheet.html\n", "- https://ohshitgit.com/\n", "- http://gitready.com/\n", "- https://explainshell.com/" ] }, { "cell_type": "markdown", "id": "d36dc3c5", "metadata": {}, "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.5" } }, "nbformat": 4, "nbformat_minor": 5 }