Downloading an Application's Entire Source Code Through an Exposed GIT Directory
It is very common during a penetration test on a web application to use automated tools such as dirbuster to find sensitive files and directories. Dirbuster uses a dictionary based approach to look for files and directories which are "hidden", or not linked anywhere on the site.
Dirbuster is included in Kali, but can also be downloaded from http://dirb.sourceforge.net/.
To run the tool, just provide the URL to the website you would like to test:
# dirb https://www.example.com
Running this tool on our target website managed to find the folder ".git", which is used by the source code management system that goes by the same name "GIT". The folder has been exposed to the Internet, possibly because it was forgotten about or the web admin never expected anyone to find it.
For those who are unfamiliar with Git, essentially it is a popular source code management system which allows developers to keep track of any changes done to the files in the repository. Because Git is used to manage source code, the ".git" folder contain a copy of the application's source-code.
In our case, our target server was also misconfigured to allow for directory listings, making our job much easier.
With the directory listings flaw and the tool "wget", it is pretty straightforward to recursively download every file from the repository.
# wget –r https://www.example.com/.git
While wget is running, something interesting can be observed. The repository contains ".pack" files of several megabytes in size. Pack files, in simple terms, are data structures which contain hashed versions of the source-code with references, indexes and other meta data. More information about pack files can be found at https://schacon.github.io/gitbook/7_the_packfile.html.
Once wget has finished downloading the folder, we are left will the following:
If we try to run a Git command under this folder, for example "git status", it will return an error indicating that certain files contain an incorrect filename. This is because wget also downloaded all the HTML index files (e.g. index.html?C=D, index.html?=C=M) for each folder and their sub-folders.
To use Git normally, it is necessary to eliminate these extra HTML files that don't belong. We can do so recursively with the "find" command and then try the git command again.
# find .git -type f -name 'index.htm*' -delete
# git status
The Git repository is functional again, but because our "project" is empty, git will list all the files as if they had been deleted. However, we can recover all the "deleted" files using the following command:
# git checkout -- .
And automatically all the missing files for the application are added back again.
With the full copy of the application's source-code, the possibilities are endless. We can look for the database credentials, do static-analysis on the code, find hidden debugging parameters or vulnerabilities, and much more. We have basically converted a blackbox penetration test into a whitebox one.
• Proprietary and sensitive information such as the source-code can be obtained.
• The attack surface increases drastically since we know the entire file and folder structure of the web application.
• By analyzing the source-code, it is easier to find vulnerabilities (SQLi, file uploaders, RCE, etc...), even ones that may have not been possible to find otherwise.
The most direct solution is simply not to have the ".git" folder in a folder that is exposed to the Internet. However, if it is necessary to have it there, then there are other solutions available:
• Disable directory listing on the server.
• Configure mod_rewrite to disallow access to the ".git" directory (RewriteRule "^(.*/)?\.git/" - [F,L]).
• Assign a different user and strict privileges to the ".git" directory which can't be accessed by the user which the webserver is running under.
This post was originally written by Lenin Alevsk and translated by Roberto Salgado. The originally post in Spanish can be found here.