Items tagged with: GIT
♲ Eriks Links ():
It’s Magit! And you’re the magician! · Endless Parentheses[l]
There’s nothing I can praise about Magit that hasn’t been written in a dozen blogs already, but since Jonas started a kickstarter campaign for it I knew I had to say something. If you use Magit, you already know the greatness of it. And if you don’t, hopefully I can convince you to try it in time to back the campaign.
— Permalink- - - - - -
If you start out using one DVCS and later decide you like the other better, you can easily move your content¹. Fossil and Git are very similar in many respects, but they also have important…
Article word count: 2190
HN Discussion: https://news.ycombinator.com/item?id=19006036
Posted by afiori (karma: 44)
Post stats: Points: 94 - Comments: 58 - 2019-01-26T12:37:12Z
\#HackerNews #fossil #git
If you start out using one DVCS and later decide you like the other better, you can easily move your content¹.
Fossil and Git are very similar in many respects, but they also have important differences. See the table below for a high-level summary and the text that follows for more details.
Keep in mind that you are reading this on a Fossil website, and though we try to be fair, the information here might be biased in favor of Fossil. Ask around for second opinions from people who have used both Fossil and Git.
¹Git does not support wiki, tickets, or tech-notes, so those elements will not transfer when exporting from Fossil to Git.
2.0 Differences Between Fossil And Git
Differences between Fossil and Git are summarized by the following table, with further description in the text that follows.
GIT FOSSIL File versioning only Versioning, Tickets, Wiki, and Technotes Ad-hoc, pile-of-files key/value database Relational SQL database Bazaar-style development Cathedral-style development Designed for Linux development Designed for SQLite development Lots of little tools Stand-alone executable One check-out per repository Many check-outs per repository Remembers what you should have done Remembers what you actually did GPL BSD
2.1 Feature Set
Git provides file versioning services only, whereas Fossil adds integrated wiki, ticketing & bug tracking, embedded documentation, and Technical notes. These additional capabilities are available for Git as 3rd-party and/or user-installed add-ons, but with Fossil they are integrated into the design. One way to describe Fossil is that it is "github-in-a-box".
If you clone Gitʼs self-hosting repository you get just Gitʼs source code. If you clone Fossilʼs self-hosting repository, you get the entire Fossil website - source code, documentation, ticket history, and so forth.
For developers who choose to self-host projects (rather than using a 3rd-party service such as GitHub) Fossil is much easier to set up, since the stand-alone Fossil executable together with a 2-line CGI script suffice to instantiate a full-featured developer website. To accomplish the same using Git requires locating, installing, configuring, integrating, and managing a wide assortment of separate tools. Standing up a developer website using Fossil can be done in minutes, whereas doing the same using Git requires hours or days.
The baseline data structures for Fossil and Git are the same (modulo formatting details). Both systems store check-ins as immutable objects referencing their immediate ancestors and named by a cryptographic hash of the check-in content.
The difference is that Git stores its objects as individual files in the ".git" folder or compressed into bespoke "pack-files", whereas Fossil stores its objects in a relational (SQLite) database file. To put it another way, Git uses an ad-hoc pile-of-files key/value database whereas Fossil uses a proven, general-purpose SQL database. This difference is more than an implementation detail. It has important consequences.
With Git, one can easily locate the ancestors of a particular check-in by following the pointers embedded in the check-in object, but it is difficult to go the other direction and locate the descendants of a check-in. It is so difficult, in fact, that neither native Git nor GitHub provide this capability. With Git, if you are looking at some historical check-in then you cannot ask "what came next" or "what are the children of this check-in".
Fossil, on the other hand, parses essential information about check-ins (parents, children, committers, comments, files changed, etc.) into a relational database that can be easily queried using concise SQL statements to find both ancestors and descendents of a check-in.
Leaf check-ins in Git that lack a "ref" become "detached", making them difficult to locate and subject to garbage collection. This "detached head" problem has caused untold grief for countless Git users. With Fossil, all check-ins are easily located using a variety of attributes (parents, children, committer, date, full-text search of the check-in comment) and so detached heads are simply not possible.
The ease with which check-ins can be located and queried in Fossil has resulted in a huge variety of reports and status screens (examples) that show project state in ways that help developers maintain enhanced awareness and comprehension and avoid errors.
2.3 Cathedral vs. Bazaar
Fossil and Git promote different development styles. Git promotes a "bazaar" development style in which numerous anonymous developers make small and sometimes haphazard contributions. Fossil promotes a "cathedral" development model in which the project is closely supervised by an highly engaged architect and implemented by a clique of developers.
Nota Bene: This is not to say that Git cannot be used for cathedral-style development or that Fossil cannot be used for bazaar-style development. They can be. But those modes are not their design intent nor their low-friction path.
Git encourages a style in which individual developers work in relative isolation, maintaining their own branches and occasionally rebasing and pushing selected changes up to the main repository. Developers using Git often have their own private branches that nobody else ever sees. Work becomes siloed. This is exactly what one wants when doing bazaar-style development.
Fossil, in contrast, strives to keep all changes from all contributors mirrored in the main repository (in separate branches) at all times. Work in progress from one developer is readily visible to all other developers and to the project leader, well before the code is ready to integrate. Fossil places a lot of emphasis on reporting the state of the project, and the changes underway by all developers, so that all developers and especially the project leader can maintain a better mental picture of what is happening, and better situational awareness.
2.4 Linux vs. SQLite
Git was specifically designed to support the development of Linux. Fossil was specifically designed to support the development of SQLite.
Both SQLite and Linux are important pieces of software. SQLite is found on far more systems than Linux. (Almost every Linux system uses SQLite, but there are many non-Linux systems such as iPhones, PlayStations, and Windows PCs that use SQLite.) On the other hand, for those systems that do use Linux, Linux is a far more important component.
Linux uses a bazaar-style development model. There are thousands and thousands of contributors, most of whom do not know each others names. Git is designed for this scenario.
SQLite uses cathedral-style development. 95% of the code in SQLite comes from just three programmers, 64% from just the lead developer. And all SQLite developers know each other well and interact daily. Fossil is designed for this development model.
2.5 Lots of little tools vs. Self-contained system
Git consists of many small tools, each doing one small part of the job, which can be recombined (by experts) to perform powerful operations. Git has a lot of complexity and many dependencies and requires an "installer" script or program to get it running.
Fossil is a single self-contained stand-alone executable with hardly any dependencies. Fossil can be (and often is) run inside a minimally configured chroot jail. To install Fossil, one merely puts the executable on $PATH.
The designer of Git says that the unix philosophy is to have lots of small tools that collaborate to get the job done. The designer of Fossil says that the unix philosophy is "it just works". Both individuals have written their DVCSes to reflect their own view of the "unix philosophy".
2.6 One vs. Many Check-outs per Repository
A "repository" in Git is a pile-of-files in the ".git" subdirectory of a single check-out. The check-out and the repository are inseperable.
With Fossil, a "repository" is a single SQLite database file that can be stored anywhere. There can be multiple active check-outs from the same repository, perhaps open on different branches or on different snapshots of the same branch. Long-running tests or builds can be running in one check-out while changes are being committed in another.
2.7 What you should have done vs. What you actually did
Git puts a lot of emphasis on maintaining a "clean" check-in history. Extraneous and experimental branches by individual developers often never make it into the main repository. And branches are often rebased before being pushed, to make it appear as if development had been linear. Git strives to record what the development of a project should have looked like had there been no mistakes.
Fossil, in contrast, puts more emphasis on recording exactly what happened, including all of the messy errors, dead-ends, experimental branches, and so forth. One might argue that this makes the history of a Fossil project "messy". But another point of view is that this makes the history "accurate". In actual practice, the superior reporting tools available in Fossil mean that the added "mess" is not a factor.
One commentator has mused that Git records history according to the victors, whereas Fossil records history as it actually happened.
2.8 GPL vs. BSD
Git is covered by the GPL license whereas Fossil is covered by a two-clause BSD license.
Consider the difference between GPL and BSD licenses: GPL is designed to make writing easier at the expense of making reading harder. BSD is designed to make reading easier at the expense of making writing harder.
To a first approximation, the GPL license grants the right to read source code to anyone who promises to give back enhancements. In other words, the act of reading GPL source code (a prerequiste for making changes) implies acceptance of the license which requires updates to be contributed back under the same license. (The details are more complex, but the foregoing captures the essence of the idea.) A big advantage of the GPL is that anybody can contribute to the code without having to sign additional legal documentation because they have implied their acceptance of the GPL license by the very act of reading the source code. This means that a GPL project can legally accept anonymous and drive-by patches.
The BSD licenses, on the other hand, make reading much easier than the GPL, because the reader need not surrender proprietary interest in their own enhancements. On the flip side, BSD and similarly licensed projects must obtain legal affidavits from authors before new content can be added into the project. Anonymous and drive-by patches cannot be accepted. This makes signing up new contributors for BSD licensed projects harder.
The licenses on the implementations of Git and Fossil only apply to the implementations themselves, not to the projects which the systems store. Nevertheless, one can see a more GPL-oriented world-view in Git and a more BSD-oriented world-view in Fossil. Git encourages anonymous contributions and siloed development, which are hallmarks of the GPL/bazaar approach to software, whereas Fossil encourages a more tightly collaborative, cliquish, cathedral-style approach more typical of BSD-licensed projects.
3.0 Missing Features
Most of the capabilities found in Git are also available in Fossil and the other way around. For example, both systems have local check-outs, remote repositories, push/pull/sync, bisect capabilities, and a "stash". Both systems store project history as a directed acyclic graph (DAG) of immutable check-in objects.
But there are a few capabilities in one system that are missing from the other.
3.1 Features found in Fossil but missing from Git
\* The ability to show descendents of a check-in. Both Git and Fossil can easily find the ancestors of a check-in. But only Fossil shows the descendents. (It is possible to find the descendents of a check-in in Git using the log, but that is sufficiently difficult that nobody ever actually does it.) \* Wiki, Embedded documentation, Trouble-tickets, and Tech-Notes Git only provides versioning of source code. Fossil strives to provide other related configuration management services as well. Branches in Fossil have persistent names that are propagated to collaborators via push and pull. All developers see the same name on the same branch. Git, in contrast, uses only local branch names, so developers working on the same project can (and frequently do) use a different name for the same branch. Fossil keeps track of all repositories and check-outs and allows operations over all of them with a single command. For example, in Fossil is possible to request a pull of all repositories on a laptop from their respective servers, prior to taking the laptop off network. Or it is possible to do "fossil all status" to see if there are any uncommitted changes that were overlooked prior to the end of the workday. Fossil supports an integrated web interface. Some of the same features are available using third-party add-ons for Git, but they do not provide nearly as many features and they are not nearly as convenient to use.
3.2 Features found in Git but missing from Fossil
Because of its emphasis on recording history exactly as it happened, rather than as we would have liked it to happen, Fossil deliberately does not provide a "rebase" command. One can rebase manually in Fossil, with sufficient perserverence, but it is not something that can be done with a single command. \* Push or pull a single branch The fossil push, fossil pull, and fossil sync commands do not provide the capability to push or pull individual branches. Pushing and pulling in Fossil is all or nothing. This is in keeping with Fossilʼs emphasis on maintaining a complete record and on sharing everything between all developers.
HackerNewsBot debug: Calculated post rank: 82 - Loop: 83 - Rank min: 80 - Author rank: 36
~~What the Fuck~~ :) WTF terminal dasboard
WTF is a personal information dashboard for your terminal, developed for those who spend most of their day in the command line.It allows you to monitor systems, services, and important information that you otherwise might keep browser tabs open for, the kinds of things you don’t always need visible, but do check in on every now and then.
Keep an eye on your OpsGenie schedules, Google Calendar, Git and GitHub repositories, and New Relic deployments.
See who’s away in BambooHR, which Jira tickets are assigned to you, and what time it is in Barcelona.
It even has weather. And clocks. And emoji.
- Download the stand-alone, compiled binary.
- Unzip the downloaded file.
- From the command line, cd into the newly-created /wtf directory.
- From the command line, run the app: ./wtf
... or install from source:
go get -u github.com/wtfutil/wtf cd $GOPATH/src/github.com/wtfutil/wtf make install make run
By default WTF looks in a ~/.config/wtf/ directory for a YAML file called config.yml. If the ~/.config/wtf/ directory doesn’t exist, WTF will create that directory on start-up, and then display instructions for creating a new configuration file.
In other words, WTF expects to have a YAML config file at: ~/.config/wtf/config.yml.
Example Configuration Files
A couple of example config files are provided in the _sample_configs/ directory of the Git repository.
To try out WTF quickly, copy simple_config.yml into ~/.config/wtf/ as config.yml and relaunch WTF. You should see the app launch and display the Security, Clocks and Status widgets onscreen.
Custom Configuration Files
To try out different configurations (or run multiple instances of WTF), you can pass the path to a config file via command line arguments on start-up.
To load a custom configuration file (ie: one that’s not ~/.config/wtf/config.yml), pass in the path to configuration file as a parameter on launch:
$> wtf --config=path/to/custom/config.yml
A number of top-level attributes can be set to customize your WTF install. See Attributes for details.
WTF uses the Grid layout system from tview to position widgets onscreen. It’s not immediately obvious how this works, so here’s an explanation:
Think of your terminal screen as a matrix of letter positions, say 100 chrs wide and 58 chrs tall.
Columns breaks up the width of the screen into chunks, each chunk a specified number of characters wide. use
[10, 10, 10, 10, 10, 10, 10, 10, 10, 10]
Ten columns that are ten characters wide
Rows break up the height of the screen into chunks, each chunk a specified number of characters tall. If we wanted to have five rows:
[10, 10, 10, 10, 18]
The co-ordinate system starts at top-left and defines how wide and tall a widget is. If we wanted to put a 2-col, 2-row widget in the bottom of the screen, we’d position it at:
top: 4 // top starts in the 4th row left: 9 // left starts in the 9th column height: 2 // span down rows 4 & 5 (18 characters in size, total) width: 2 // span across cols 9 & 10 (20 characters in size, total)
The heart of WTF is the modules. A module is a discreet unit of functionality that extracts data from some source and packages that data for display.
For example, the New Relic module uses New Relic’s API to retrieve a list of the latest deploys and packages that information as a list for display in the “New Relic” widget.
The Clocks module takes a list of timezones and packages that information as a list of city/time pairs for display in the “Clocks” widget.
The following top-level attributes are configurable in config.yml. See this example config file for more details.
wtf: colors: background: "red" border: focusable: "darkslateblue" focused: "orange" normal: "gray" grid: # How _wide_ the columns are, in terminal characters. In this case we have # six columns, each of which are 35 characters wide columns: [35, 35, 35, 35, 35, 35] # How _high_ the rows are, in terminal lines. In this case we have five rows # that support ten line of text, one of three lines, and one of four rows: [10, 10, 10, 10, 10, 3, 4] openFileUtil: open # the name of the utility to call to open files refreshInterval: 1 # the app refreshes once per second term: "xterm-256color"
#wtf #gnu #linux #console #terminal #util #tool #dashboard #soft #programm #go #git #github #command
Jeff Trull (@JaafarTrull) hat getwittert:
Fantastic presentation on Magit (#git mode in #Emacs) by John Wiegley at the MV meetup last month: https://t.co/aShBmLKF0S https://twitter.com/JaafarTrull/status/1080319045943971840?s=17
Fantastic presentation on Magit (#git mode in #Emacs) by John Wiegley at the MV meetup last month: https://t.co/aShBmLKF0S https://twitter.com/JaafarTrull/status/1080319045943971840?s=17
♲ Montag (email@example.com):
Hello @Friendica Support
when i try to update my friendica node with
git pullan error message appears:
$ git pull *** Please tell me who you are. Run git config --global user.email "firstname.lastname@example.org" git config --global user.name "Your Name" to set your account's default identity. Omit --global to set the identity only in this repository. fatal: empty ident name (for &_lt_; ... &_gt_;) not allowed
Did anybody else see this error message? This message only appears on the friendica core repository, update the addons works fine.
#friendica #git #github
Git enables the maintenance of a digital body of work (often, but not limited to, code) by many collaborators using a peer-to-peer network of repositories. It supports distributed workflows, allowing…
HN Discussion: https://news.ycombinator.com/item?id=18309596
Posted by wheresvic1 (karma: 2030)
Post stats: Points: 145 - Comments: 18 - 2018-10-26T14:39:34Z
\#HackerNews #architecture #git #the
Git enables the maintenance of a digital body of work (often, but not limited to, code) by many collaborators using a peer-to-peer network of repositories. It supports distributed workflows, allowing a body of work to either eventually converge or temporarily diverge.
This chapter will show how various aspects of Git work under the covers to enable this, and how it differs from other version control systems (VCSs).
To understand Gitʼs design philosophy better it is helpful to understand the circumstances in which the Git project was started in the Linux Kernel Community.
The Linux kernel was unusual, compared to most commercial software projects at that time, because of the large number of committers and the high variance of contributor involvement and knowledge of the existing codebase. The kernel had been maintained via tarballs and patches for years, and the core development community struggled to find a VCS that satisfied most of their needs.
Git is an open source project that was born out of those needs and frustrations in 2005. At that time the Linux kernel codebase was managed across two VCSs, BitKeeper and CVS, by different core developers. BitKeeper offered a different view of VCS history lineage than that offered by the popular open source VCSs at this time.
Days after BitMover, the maker of BitKeeper, announced it would revoke the licenses of some core Linux kernel developers, Linus Torvalds began development, in haste, of what was to become Git. He began by writing a collection of scripts to help him manage email patches to apply one after the other. The aim of this initial collection of scripts was to be able to abort merges quickly so the maintainer could modify the codebase mid-patch-stream to manually merge, then continue merging subsequent patches.
From the outset, Torvalds had one philosophical goal for Git—to be the anti-CVS—plus three usability design goals:
\* Support distributed workflows similar to those enabled by BitKeeper \* Offer safeguards against content corruption \* Offer high performance
These design goals have been accomplished and maintained, to a degree, as I will attempt to show by dissecting Gitʼs use of directed acyclic graphs (DAGs) for content storage, reference pointers for heads, object model representation, and remote protocol; and finally how Git tracks the merging of trees.
Despite BitKeeper influencing the original design of Git, it is implemented in fundamentally different ways and allows even more distributed plus local-only workflows, which were not possible with BitKeeper. Monotone, an open source distributed VCS started in 2003, was likely another inspiration during Gitʼs early development.
Distributed version control systems offer great workflow flexibility, often at the expense of simplicity. Specific benefits of a distributed model include:
\* Providing the ability for collaborators to work offline and commit incrementally. \* Allowing a collaborator to determine when his/her work is ready to share. \* Offering the collaborator access to the repository history when offline. \* Allowing the managed work to be published to multiple repositories, potentially with different branches or granularity of changes visible.
Around the time the Git project started, three other open source distributed VCS projects were initiated. (One of them, Mercurial, is discussed in Volume 1 of The Architecture of Open Source Applications.) All of these dVCS tools offer slightly different ways to enable highly flexible workflows, which centralized VCSs before them were not capable of handling directly. Note: Subversion has an extension named SVK maintained by different developers to support server-to-server synchronization.
Today popular and actively maintained open source dVCS projects include Bazaar, Darcs, Fossil, Git, Mercurial, and Veracity.
Now is a good time to take a step back and look at the alternative VCS solutions to Git. Understanding their differences will allow us to explore the architectural choices faced while developing Git.
A version control system usually has three core functional requirements, namely:
\* Storing content \* Tracking changes to the content (history including merge metadata) \* Distributing the content and history with collaborators
Note: The third requirement above is not a functional requirement for all VCSs.
The most common design choices for storing content in the VCS world are with a delta-based changeset, or with directed acyclic graph (DAG) content representation.
Delta-based changesets encapsulate the differences between two versions of the flattened content, plus some metadata. Representing content as a directed acyclic graph involves objects forming a hierarchy which mirrors the contentʼs filesystem tree as a snapshot of the commit (reusing the unchanged objects inside the tree where possible). Git stores content as a directed acyclic graph using different types of objects. The "Object Database" section later in this chapter describes the different types of objects that can form DAGs inside the Git repository.
On the history and change-tracking front most VCS software uses one of the following approaches:
\* Linear history \* Directed acyclic graph for history
Again Git uses a DAG, this time to store its history. Each commit contains metadata about its ancestors; a commit in Git can have zero or many (theoretically unlimited) parent commits. For example, the first commit in a Git repository would have zero parents, while the result of a three-way merge would have three parents.
Another primary difference between Git and Subversion and its linear history ancestors is its ability to directly support branching that will record most merge history cases.
Figure 6.1: Example of a DAG representation in Git
Git enables full branching capability using directed acyclic graphs to store content. The history of a file is linked all the way up its directory structure (via nodes representing directories) to the root directory, which is then linked to a commit node. This commit node, in turn, can have one or more parents. This affords Git two properties that allow us to reason about history and content in more definite ways than the family of VCSs derived from RCS do, namely:
\* When a content (i.e., file or directory) node in the graph has the same reference identity (the SHA in Git) as that in a different commit, the two nodes are guaranteed to contain the same content, allowing Git to short-circuit content diffing efficiently. \* When merging two branches we are merging the content of two nodes in a DAG. The DAG allows Git to "efficiently" (as compared to the RCS family of VCS) determine common ancestors.
VCS solutions have handled content distribution of a working copy to collaborators on a project in one of three ways:
\* Local-only: for VCS solutions that do not have the third functional requirement above. \* Central server: where all changes to the repository must transact via one specific repository for it to be recorded in history at all. \* Distributed model: where there will often be publicly accessible repositories for collaborators to "push" to, but commits can be made locally and pushed to these public nodes later, allowing offline work.
To demonstrate the benefits and limitations of each major design choice, we will consider a Subversion repository and a Git repository (on a server), with equivalent content (i.e., the HEAD of the default branch in the Git repository has the same content as the Subversion repositoryʼs latest revision on trunk). A developer, named Alex, has a local checkout of the Subversion repository and a local clone of the Git repository.
Let us say Alex makes a change to a 1 MB file in the local Subversion checkout, then commits the change. Locally, the checkout of the file mimics the latest change and local metadata is updated. During Alexʼs commit in the centralized Subversion repository, a diff is generated between the previous snapshot of the files and the new changes, and this diff is stored in the repository.
Contrast this with the way Git works. When Alex makes the same modification to the equivalent file in the local Git clone, the change will be recorded locally first, then Alex can "push" the local pending commits to a public repository so the work can be shared with other collaborators on the project. The content changes are stored identically for each Git repository that the commit exists in. Upon the local commit (the simplest case), the local Git repository will create a new object representing a file for the changed file (with all its content inside). For each directory above the changed file (plus the repository root directory), a new tree object is created with a new identifier. A DAG is created starting from the newly created root tree object pointing to blobs (reusing existing blob references where the files content has not changed in this commit) and referencing the newly created blob in place of that fileʼs previous blob object in the previous tree hierarchy. (A blob represents a file stored in the repository.)
At this point the commit is still local to the current Git clone on Alexʼs local device. When Alex "pushes" the commit to a publicly accessible Git repository this commit gets sent to that repository. After the public repository verifies that the commit can apply to the branch, the same objects are stored in the public repository as were originally created in the local Git repository.
There are a lot more moving parts in the Git scenario, both under the covers and for the user, requiring them to explicitly express intent to share changes with the remote repository separately from tracking the change as a commit locally. However, both levels of added complexity offer the team greater flexibility in terms of their workflow and publishing capabilities, as described in the "Gitʼs Origin" section above.
In the Subversion scenario, the collaborator did not have to remember to push to the public remote repository when ready for others to view the changes made. When a small modification to a larger file is sent to the central Subversion repository the delta stored is much more efficient than storing the complete file contents for each version. However, as we will see later, there is a workaround for this that Git takes advantage of in certain scenarios.
Today the Git ecosystem includes many command-line and UI tools on a number of operating systems (including Windows, which was originally barely supported). Most of these tools are mostly built on top of the Git core toolkit.
Due to the way Git was originally written by Linus, and its inception within the Linux community, it was written with a toolkit design philosophy very much in the Unix tradition of command line tools.
The Git toolkit is divided into two parts: the plumbing and the porcelain. The plumbing consists of low-level commands that enable basic content tracking and the manipulation of directed acyclic graphs (DAG). The porcelain is the smaller subset of git commands that most Git end users are likely to need to use for maintaining repositories and communicating between repositories for collaboration.
While the toolkit design has provided enough commands to offer fine-grained access to functionality for many scripters, application developers complained about the lack of a linkable library for Git. Since the Git binary calls die(), it is not reentrant and GUIs, web interfaces or longer running services would have to fork/exec a call to the Git binary, which can be slow.
Work is being done to improve the situation for application developers; see the "Current And Future Work" section for more information.
Letʼs get our hands dirty and dive into using Git locally, if only to understand a few fundamental concepts.
First to create a new initialized Git repository on our local filesystem (using a Unix inspired operating system) we can do:
$ mkdir testgit $ cd testgit $ git init
Now we have an empty, but initialized, Git repository sitting in our testgit directory. We can branch, commit, tag and even communicate with other local and remote Git repositories. Even communication with other types of VCS repositories is possible with just a handful of git commands.
The git init command creates a .git subdirectory inside of testgit. Letʼs have a peek inside it:
| |-- applypatch-msg.sample
| |-- commit-msg.sample
| |-- post-commit.sample
| |-- post-receive.sample
| |-- post-update.sample
| |-- pre-applypatch.sample
| |-- pre-commit.sample
| |-- pre-rebase.sample
| |-- prepare-commit-msg.sample
| |-- update.sample
| |-- exclude
| |-- info
| |-- pack
|-- refs |-- heads |-- tags
The .git directory above is, by default, a subdirectory of the root working directory, testgit. It contains a few different types of files and directories:
\* Configuration: the .git/config, .git/description and .git/info/exclude files essentially help configure the local repository. \* Hooks: the .git/hooks directory contains scripts that can be run on certain lifecycle events of the repository. \* Staging Area: the .git/index file (which is not yet present in our tree listing above) will provide a staging area for our working directory. \* Object Database: the .git/objects directory is the default Git object database, which contains all content or pointers to local content. All objects are immutable once created. \* References: the .git/refs directory is the default location for storing reference pointers for both local and remote branches, tags and heads. A reference is a pointer to an object, usually of type tag or commit. References are managed outside of the Object Database to allow the references to change where they point to as the repository evolves. Special cases of references may point to other references, e.g. HEAD.
The .git directory is the actual repository. The directory that contains the working set of files is the working directory, which is typically the parent of the .git directory (or repository). If you were creating a Git remote repository that would not have a working directory, you could initialize it using the git init --bare command. This would create just the pared-down repository files at the root, instead of creating the repository as a subdirectory under the working tree.
Another file of great importance is the Git index: .git/index. It provides the staging area between the local working directory and the local repository. The index is used to stage specific changes within one file (or more), to be committed all together. Even if you make changes related to various types of features, the commits can be made with like changes together, to more logically describe them in the commit message. To selectively stage specific changes in a file or set of files you can using git add -p.
The Git index, by default, is stored as a single file inside the repository directory. The paths to these three areas can be customized using environment variables.
It is helpful to understand the interactions that take place between these three areas (the repository, index and working areas) during the execution of a few core Git commands:
\* git checkout [branch] This will move the HEAD reference of the local repository to branch reference path (e.g. refs/heads/master), populate the index with this head data and refresh the working directory to represent the tree at that head. \* git add [files] This will cross reference the checksums of the files specified with the corresponding entries in the Git index to see if the index for staged files needs updating with the working directoryʼs version. Nothing changes in the Git directory (or repository).
Let us explore what this means more concretely by inspecting the contents of files under the .git directory (or repository).
$ GIT_DIR=$PWD/.git $ cat $GIT_DIR/HEAD ref: refs/heads/master $ MY_CURRENT_BRANCH=$(cat .git/HEAD | sed ʼs/ref: //gʼ) $ cat $GIT_DIR/$MY_CURRENT_BRANCH cat: .git/refs/heads/master: No such file or directory
We get an error because, before making any commits to a Git repository at all, no branches exist except the default branch in Git which is master, whether it exists yet or not.
Now if we make a new commit, the master branch is created by default for this commit. Let us do this (continuing in the same shell, retaining history and context):
$ git commit -m "Initial empty commit" --allow-empty $ git branch * master $ cat $GIT_DIR/$MY_CURRENT_BRANCH 3bce5b130b17b7ce2f98d17b2998e32b1bc29d68 $ git cat-file -p $(cat $GIT_DIR/$MY_CURRENT_BRANCH)
What we are starting to see here is the content representation inside Gitʼs object database.
Figure 6.2: Git objects
Git has four basic primitive objects that every type of content in the local repository is built around. Each object type has the following attributes: type, size and content. The primitive object types are:
\* Tree: an element in a tree can be another tree or a blob, when representing a content directory. \* Blob: a blob represents a file stored in the repository. \* Commit: a commit points to a tree representing the top-level directory for that commit as well as parent commits and standard attributes. \* Tag: a tag has a name and points to a commit at the point in the repository history that the tag represents.
All object primitives are referenced by a SHA, a 40-digit object identity, which has the following properties:
\* If two objects are identical they will have the same SHA. \* if two objects are different they will have different SHAs. \* If an object was only copied partially or another form of data corruption occurred, recalculating the SHA of the current object will identify such corruption.
The first two properties of the SHA, relating to identity of the objects, is most useful in enabling Gitʼs distributed model (the second goal of Git). The latter property enables some safeguards against corruption (the third goal of Git).
Despite the desirable results of using DAG-based storage for content storage and merge histories, for many repositories delta storage will be more space-efficient than using loose DAG objects.
Git tackles the storage space problem by packing objects in a compressed format, using an index file which points to offsets to locate specific objects in the corresponding packed file.
Figure 6.3: Diagram of a pack file with corresponding index file
We can count the number of loose (or unpacked) objects in the local Git repository using git count-objects. Now we can have Git pack loose objects in the object database, remove loose objects already packed, and find redundant pack files with Git plumbing commands if desired.
The pack file format in Git has evolved, with the initial format storing CRC checksums for the pack file and index file in the index file itself. However, this meant there was the possibility of undetectable corruption in the compressed data since the repacking phase did not involve any further checks. Version 2 of the pack file format overcomes this problem by including the CRC checksums of each compressed object in the pack index file. Version 2 also allows packfiles larger than 4 GB, which the initial format did not support. As a way to quickly detect pack file corruption the end of the pack file contains a 20-byte SHA1 sum of the ordered list of all the SHAs in that file. The emphasis of the newer pack file format is on helping fulfill Gitʼs second usability design goal of safeguarding against data corruption.
For remote communication Git calculates the commits and content that need to be sent over the wire to synchronize repositories (or just a branch), and generates the pack file format on the fly to send back using the desired protocol of the client.
As mentioned previously, Git differs fundamentally in merge history approach than the RCS family of VCSs. Subversion, for example, represents file or tree history in a linear progression; whatever has a higher revision number will supercede anything before it. Branching is not supported directly, only through an unenforced directory structure within the repository.
Figure 6.4: Diagram showing merge history lineage
Let us first use an example to show how this can be problematic when maintaining multiple branches of a work. Then we will look at a scenario to show its limitations.
When working on a "branch" in Subversion at the typical root branches/branch-name, we are working on directory subtree adjacent to the trunk (typically where the live or master equivalent code resides within). Let us say this branch is to represent parallel development of the trunk tree.
For example, we might be rewriting a codebase to use a different database. Part of the way through our rewrite we wish to merge in upstream changes from another branch subtree (not trunk). We merge in these changes, manually if necessary, and proceed with our rewrite. Later that day we finish our database vendor migration code changes on our branches/branch-name branch and merge our changes into trunk. The problem with the way linear-history VCSs like Subversion handle this is that there is no way to know that the changesets from the other branch are now contained within the trunk.
DAG-based merge history VCSs, like Git, handle this case reasonably well. Assuming the other branch does not contain commits that have not been merged into our database vendor migration branch (say, db-migration in our Git repository), we can determine—from the commit object parent relationships—that a commit on the db-migration branch contained the tip (or HEAD) of the other upstream branch. Note that a commit object can have zero or more (bounded by only the abilities of the merger) parents. Therefore the merge commit on the db-migration branch knows it merged in the current HEAD of the current branch and the HEAD of the other upstream branch through the SHA hashes of the parents. The same is true of the merge commit in the master (the trunk equivalent in Git).
A question that is hard to answer definitively using DAG-based (and linear-based) merge histories is which commits are contained within each branch. For example, in the above scenario we assumed we merged into each branch all the changes from both branches. This may not be the case.
For simpler cases Git has the ability to cherry pick commits from other branches in to the current branch, assuming the commit can cleanly be applied to the branch.
As mentioned previously, Git core as we know it today is based on a toolkit design philosophy from the Unix world, which is very handy for scripting but less useful for embedding inside or linking with longer running applications or services. While there is Git support in many popular Integrated Development Environments today, adding this support and maintaining it has been more challenging than integrating support for VCSs that provide an easy-to-link-and-share library for multiple platforms.
To combat this, Shawn Pearce (of Googleʼs Open Source Programs Office) spearheaded an effort to create a linkable Git library with more permissive licensing that did not inhibit use of the library. This was called libgit2. It did not find much traction until a student named Vincent Marti chose it for his Google Summer of Code project last year. Since then Vincent and Github engineers have continued contributing to the libgit2 project, and created bindings for numerous other popular languages such as Ruby, Python, PHP, .NET languages, Lua, and Objective-C.
Shawn Pearce also started a BSD-licensed pure Java library called JGit that supports many common operations on Git repositories. It is now maintained by the Eclipse Foundation for use in the Eclipse IDE Git integration.
Other interesting and experimental open source endeavours outside of the Git core project are a number of implementations using alternative datastores as backends for the Git object database such as:
\* jgit_cassandra, which offers Git object persistence using Apache Cassandra, a hybrid datastore using Dynamo-style distribution with BigTable column family data model semantics. \* jgit_hbase, which enables read and write operations to Git objects stored in HBase, a distributed key-value datastore. \* libgit2-backends, which emerged from the libgit2 effort to create Git object database backends for multiple popular datastores such as Memcached, Redis, SQLite, and MySQL.
All of these open source projects are maintained independently of the Git core project.
As you can see, today there are a large number of ways to use the Git format. The face of Git is no longer just the toolkit command line interface of the Git Core project; rather it is the repository format and protocol to share between repositories.
As of this writing, most of these projects, according to their developers, have not reached a stable release, so work in the area still needs to be done but the future of Git appears bright.
In software, every design decision is ultimately a trade-off. As a power user of Git for version control and as someone who has developed software around the Git object database model, I have a deep fondness for Git in its present form. Therefore, these lessons learned are more of a reflection of common recurring complaints about Git that are due to design decisions and focus of the Git core developers.
One of the most common complaints by developers and managers who evaluate Git has been the lack of IDE integration on par with other VCS tools. The toolkit design of Git has made this more challenging than integrating other modern VCS tools into IDEs and related tools.
Earlier in Gitʼs history some of the commands were implemented as shell scripts. These shell script command implementations made Git less portable, especially to Windows. I am sure the Git core developers did not lose sleep over this fact, but it has negatively impacted adoption of Git in larger organizations due to portability issues that were prevalent in the early days of Gitʼs development. Today a project named Git for Windows has been started by volunteers to ensure new versions of Git are ported to Windows in a timely manner.
An indirect consequence of designing Git around a toolkit design with a lot of plumbing commands is that new users get lost quickly; from confusion about all the available subcommands to not being able to understand error messages because a low level plumbing task failed, there are many places for new users to go astray. This has made adopting Git harder for some developer teams.
Even with these complaints about Git, I am excited about the possibilities of future development on the Git Core project, plus all the related open source projects that have been launched from it.
HackerNewsBot debug: Calculated post rank: 102 - Loop: 143 - Rank min: 100 - Author rank: 31