This post is aimed at people working with code, scripts and/or any other means of textual files. I try to give my point of view on revision control and git as revision control system in particular.
The first thing you should ask yourself is: why using revision control in the first place? I think that’s a good and fair question. A lot of people I talk to see revision control as something that’s for developers in projects with multiple people working on the same code to build history and provide a single point of truth. I think revision control in that situation indeed is needed (in fact, I think it is a necessity).
Let’s now look at the situation of a DBA. Most people I work with built up their own bundle of scripts to circumvent constantly redoing stuff they’ve worked out in the past, and/or scripts they gotten or borrowed from other people (in fact, if you do not keep a copy of the excellent script bundles of Tanel Poder and Kerry Osborne you’re either brilliant enough to do this all from the top of your head, or stupid enough not to use them or know them, with a fair chance of falling in the second category).
Probably most (if not: all) people change scripts, create entirely new ones, experiment, etc. How many times where you searching for this specific version/modification you made *somewhere*, but forgot where it was? In my own situation, when doing research, experiments and investigations, I use a lot of virtual machines (with/without ASM, different operating system versions, different Oracle versions; there are huge differences between 184.108.40.206/2/3/4, and Oracle version 12), and need my scripts and what to conventiently move changes and use newly created scripts among these machines. I don’t think I even have to go in the area of seeing the history of changes to a certain script or the repository as a whole: this is something you will use when using revision control, or miss sorely if your scripts are not in revision control.
Once you or your team is convinced you need revision control the immediate second thing which always pops up (in the situations I worked in, I am aware there are other revision control systems) is subversion or git? If you look at the title of this blog you know where I end up, but it’s good to give this some thought. Subversion is a decent revision control system, with which I’ve worked with great pleasure in the past. The history I’ve read on subversion is that it was made to be an open source version of the CVS revision control system and overcome some of the problems/limitations of it. Subversion is a revision control system that works in a client/server way: there is the central repository, and clients check out the source from that. This is no problem when client and server are on the same machine or in the local network, and even less a problem if there’s only one user.
But what if there are multiple persons working with it? And these people are located at vast distances from each other? And some of the people do not have internet access all the time? The problems that arise from that are a lot of potential problems with concurrent versions, performance can be very bad, because you need to connect to the central repository, and you need a connection to the repository in the first place to commit your change. From what I’ve read from the git revision control system, these were some of the problems Linus Torvalds wanted to overcome with git.
A git repository always works local on the machine you are actually working. You can link remote git repositories and push your changes to one or more repositories, or the other way around pull changes from a remote repository to your repository. I guess most people immediately understand why this easily overcomes a lot of problems that where the result of having a single point as the repository. You always work in your own repository.
I’ve found this (git versus subversion) yet another topic people easily get into a ‘religious discussion’, which means the discussion isn’t about actual properties and pros and con’s of both revision control systems, but on personal preference and sometimes ego. Another thing which is closely related to this is there is a learning curve if you need to start with using git.
I work on Mac OSX. In order to get git, just install the Xcode package from the App Store (free). On Oracle Linux, yum install the git package. I guess on RedHat this works the same. On Windows, download the installer from http://msysgit.github.com/ and run it. (please mind the examples in this blog are on OSX, and applicable to linux)
Once you’ve installed git, the next thing to do is set your credentials:
$ git config --global user.name "John Doe" $ git config --global user.email firstname.lastname@example.org
I’ve set some aliases (which I’ve found in articles on git), to shorten typing a bit:
$ git config --global alias.co=checkout $ git config --global alias.ci=commit $ git config --global alias.st=status $ git config --global alias.hist=log --pretty=format:"%h %ad | %s%d [%an]" --graph --date=short
Okay, at this point you are set, but no repository has been created yet.
Create a repository
The next obvious step is to create a repository:
$ git init
This creates a .git directory in the current working directory, which means you are now in the root directory of the newly created repository. Please mind you can initialise a repository in a root directory of a current project. It will still be an empty repository.
One of the other ways to start using a git repository which might have caught your eyes a few times, is clone ($ git clone git://github.com/username/reponame.git); which creates a local revision controlled copy of a remote git repository.
Let’s say I got a directory which contains a file ‘a’, and a subdirectory ‘tt’ which contains ‘b’ and ‘c':
$ find . . ./a ./tt ./tt/b ./tt/c
Now start a git repository in the root:
$ git init Initialized empty Git repository in /Users/fritshoogland/t/.git/
Now let’s look at how our repository looks like:
$ git status On branch master Initial commit Untracked files: (use "git add ..." to include in what will be committed) a tt/ nothing added to commit but untracked files present (use "git add" to track)
Here we see git showing us that there are no files in it, but that it sees our test files, which are currently untracked. Let’s add all the files:
$ git add *
If we look at the status of the repository, we see the files are added, but not yet committed:
$ git status On branch master Initial commit Changes to be committed: (use "git rm --cached <file>..." to unstage) new file: a new file: tt/b new file: tt/c
And commit the additions:
$ git commit -m 'initial commit.' [master (root-commit) 735ed81] initial commit. 3 files changed, 3 insertions(+) create mode 100644 a create mode 100644 tt/b create mode 100644 tt/c
If you commit something, git wants you to add a remark with this commit. This can be done with the git command by adding ‘-m’ and a comment within quotation marks, or by omitting this, which makes git fire up an editor, in which you can type a comment.
At this point we have our files committed to the repository, and the files and repository are completely in sync:
$ git status On branch master nothing to commit, working directory clean
Now let’s change something. My file ‘a’ has got one line in it, with a single ‘a’. I add a second line to the file ‘a’ reading ‘second line':
$ cat a a second line
Now let’s ask git for the status:
$ git status On branch master Changes not staged for commit: (use "git add <file>..." to update what will be committed) (use "git checkout -- <file>..." to discard changes in working directory) modified: a no changes added to commit (use "git add" and/or "git commit -a")
Here we see that git knows about the modification in file a. The next thing we see is the line ‘Changes not staged for commit’. This is something which is different from subversion. Instead of changes which are committed, git follows a two stage approach: a changed file needs to be staged for commit first, after which it can be committed. This allows you to group changed files for a commit, instead of committing all.
Now let’s stage the change as indicated by the help provided with ‘git status’, using ‘git add’. Please mind the ‘add’ command is used both for making files version controlled and adding them to the stage list. I issued a subsequent git status to show the new status:
$ git add a $ git status On branch master Changes to be committed: (use "git reset HEAD <file>..." to unstage) modified: a
We can look at the difference between the version in git and changes made by using diff –cached:
$ git diff --cached diff --git a/a b/a index 7898192..c4fbe32 100644 --- a/a +++ b/a @@ -1 +1,2 @@ a +second line
Now let’s commit the change, and look at the status:
$ git commit -m 'added line to a.' [master 17ec0e8] added line to a. 1 file changed, 1 insertion(+) $ git status On branch master nothing to commit, working directory clean
Now we build a tiny bit of history, we can look at it using git log. This command can be used to look at the changes of the entire repository, or at the history of a specific file. I use the alias ‘hist’ created above:
$ git hist a * c8ec93c 2014-05-25 | added line to a. (HEAD, master) [FritsHoogland] * 739f11a 2014-05-25 | initial commit. [FritsHoogland]
One common thing you want to do is to see what changes are made to a file. To see what the changes are between the current version in the repository (which is called ‘HEAD’, which means ‘last commit in current branch’), use git diff, and the change hash (look at the above example of git hist):
$ git diff 739f11a a diff --git a/a b/a index 7898192..c4fbe32 100644 --- a/a +++ b/a @@ -1 +1,2 @@ a +second line
Of course you can also checkout that specific version (that’s the reason for having version control!):
$ git checkout 739f11a a
Now the file a is reverted to the version without the line ‘second line’ in it. Because we changed the file a now again, git considers it changed:
$ git status On branch master Changes to be committed: (use "git reset HEAD <file>..." to unstage) modified: a
So after checking out this version you can choose to commit this as new version, or revert it back to it’s original version using get reset HEAD (as described with the status command).
However, everything described here until now are changes and version control done locally. How about collaboration with a team, like subversion has with the subversion server? Well, this is where git is inherently different. Git works in a peer-to-peer fashion, instead of a client-server way like subversion.
You can link your repository to a remote git repository, and push changes made in your local repository to a remote repository, or pull changes made in a remote repository to your own local repository. Especially if you work with a team, this is how you can centralise the source code in a very organised way (you can add a web interface for example).
These is how I create a remote repository on my synology NAS:
$ ssh Username@nas.local # log on to remote server $ mkdir burn.git # create directory for repo $ cd $_ $ git init --bare # create empty git repository $ git update-server-info # not sure if this is needed $ git remote add origin Username@nas.local:burn.git # add remote $ git push -u origin master # push local master to origin
One of the things I use to keep track of current development version and the public client version of scripts is tags. This command tags the last commit with “prod 1.0″:
$ git tag "prod 1.0" HEAD
In order to use tags, you need to see which tags exist. This is very simple with the ‘git tag’ command. To look at the tags of a remote git repository, use:
$ git ls-remote --tags Username@nas.local:gdb_macros.git
If you want to get the whole repository for production usage, without the versioning, use the ‘git archive’ command (remove –remote and argument for archiving from a local repository). This version creates a gzipped tarball:
$ git archive master --remote Username@nas.local:gdb_marcos.git --format=tar | gzip > gdb_macros.tgz
You can also archive a tag, instead of the latest version. This version creates the files in the current directory:
$ git archive "prod 1.0" --remote Username@nas.local:gdb_macros.git --format=tar | tar xf -
I hope this blog encouraged you to put your files in a (git) repository.