Recently I completed a project in which I had to build an add-on for a marketing database application.  The add-on needed to upload a spreadsheet (I decided to use CSV as it is the easiest to work with), find all duplicates (in both the database and the spreadsheet itself) and then insert all unique entries into the database.  So, I stripped out all sensitive information and created these generic scripts to show you how to do it too.

There are two version of the script.   The first version is simple, it uploads the CSV file and then imports it to the database.  The other version displays analyzes the data from the file first and shows the user which entries are unique and which are duplicates, the user has to confirm this in order to import the data.

Uploading the file

First we need to get the file from the user.  In this case we will request the file through a simple HTML form, such as the one below.

<form enctype="multipart/form-data" action="importCSV.php" method="POST">
   <p>Choose a file to upload : <input name="file" type="file" /></p>
   <p><input type="submit" value="Upload File" /></p>
</form>

The file will be uploaded via the form and past to the importCSV.php script.  Now, we want to verify that the file being passed to the server is a CSV file.  If we don’t verify the file type, then the server might accept a file from a user with malicious intentions.


if($_FILES['file']['type'] != "application/vnd.ms-excel"){
   die("This is not a CSV file.");
}
elseif(is_uploaded_file($_FILES['file']['tmp_name'])){
   //Process 'file'
}
else{
   die("You shouldn't be here");
}

The first if statement checks to see that the file is a CSV file.  If it is not a CSV file, the process dies and reports an error.  The elseif statement then checks that the file was uploaded via HTTP POST, if so the file will be processed.  If the file was not uploaded via HTTP POST, then you will want to kill the process as you don’t know where the file came from.

NOTE: The name of the file input from the form is the same name as the $_FILES[''] variable.

Import the data

Now that the file has been uploaded and verified we want to import it into the database.  First, we need to connect to the database, then we need to get the information from the file and finally create an INSERT statement for each line of the CSV file.

//Connect to the database
$dbhost = 'localhost';
$dbuser = 'root';
$dbpass = 'password';
$dbname = 'petstore';
$link = mysql_connect($dbhost, $dbuser, $dbpass) or die('Error connecting to mysql server');
mysql_select_db($dbname);

//Process CSV file
$handle = fopen($_FILES['file']['tmp_name'], "r");
$data = fgetcsv($handle, 1000, ";"); //Remove if CSV file does not have column headings
while (($data = fgetcsv($handle, 1000, ";")) !== FALSE) {
   $att0 = mysql_real_escape_string($data[0]);
   $att1 = mysql_real_escape_string($data[1]);
   $att2 = mysql_real_escape_string($data[2]);
   $att3 = mysql_real_escape_string($data[3]);

   $sql = "INSERT INTO `tbl_name` (
               `attribute0` ,
               `attribute1` ,
               `attribute2` ,
               `attribute3`
               )
               VALUES ('" . $att0 . "', '" . $att1 . "', '" . $att2 . "', '" . $att3 . "')";

   mysql_query($sql);
}

NOTE: You may need to change fgetcsv() parameters if you have a CSV file with lines longer than 1000 characters or if the value delimiter is not a semicolon.

NOTE: the first time fgetcsv() is called, the data is not used. This is because the first line of the file is the column headings. If your CSV file does not have column headings you should remove this line.

NOTE: The CSV file must be formatted identically every time in order for the script to be reusable.

Now to put the whole script together.


if($_FILES["file"]["type"] != "application/vnd.ms-excel"){
   die("This is not a CSV file.");
}
elseif(is_uploaded_file($_FILES['file']['tmp_name'])){
   //Connect to the database
   $dbhost = 'localhost';
   $dbuser = 'root';
   $dbpass = 'password';
   $dbname = 'petstore';
   $link = mysql_connect($dbhost, $dbuser, $dbpass) or die('Error connecting to mysql server');
   mysql_select_db($dbname);

   //Process the CSV file
   $handle = fopen($_FILES['file']['tmp_name'], "r");
   $data = fgetcsv($handle, 1000, ";"); //Remove if CSV file does not have column headings
   while (($data = fgetcsv($handle, 1000, ";")) !== FALSE) {
      $att0 = mysql_real_escape_string($data[0]);
      $att1 = mysql_real_escape_string($data[1]);
      $att2 = mysql_real_escape_string($data[2]);
      $att3 = mysql_real_escape_string($data[3]);

      $sql = "INSERT INTO `tbl_name` (
                 `attribute0` ,
                 `attribute1` ,
                 `attribute2` ,
                 `attribute3`
                 )
                 VALUES ('" . $att0 . "', '" . $att1 . "', '" . $att2 . "', '" . $att3 . "')";

      mysql_query($sql);
   }
   mysql_close($link);
   echo "CSV file successfully imported.";
}
else{
   die("You shouldn't be here");
}

This simple upload script is also available for download here.

Analyze the data

Most of the time we will need to first analyze the imported data in order to find duplicates or errors in the data.  In order to do this we will have to first check each row, then report back to the user on all rows (indicating which rows have errors and which are good), then after user confirmation we will have to import the data.

First, I renamed the importCSV.php script to analyzeCSV.php.  Then, I changed the file upload form’s action to analyzeCSV.php.  Next, I removed the mysql_query() call and included a check to verify that the information is not already in the database.  If the entry is unique, the insert statement is added to a session variable which holds all unique inserts.  The script also adds each row of the CSV file to a table which indicates if the row is unique or not.

if($_FILES["file"]["type"] != "application/vnd.ms-excel"){
   die("This is not a CSV file.");
}
elseif(is_uploaded_file($_FILES['file']['tmp_name'])){
   session_start()
   //Connect to the database
   $dbhost = 'localhost';
   $dbuser = 'root';
   $dbpass = 'password';
   $dbname = 'petstore';
   $link = mysql_connect($dbhost, $dbuser, $dbpass) or die('Error connecting to mysql server');
   mysql_select_db($dbname);

   //Process the CSV file
   $findings = "
<form method=\"post\" action=\"importCSV.php\">
<table width=\"100%\" border=\"0\" cellspacing=\"0\" cellpadding=\"0\">
   <tr>
      <td>Result</td>
      <td>Attribute 0</td>
      <td>Attribute 1</td>
      <td>Attribute 2</td>
      <td>Attribute 3</td>
   </tr>";

   $handle = fopen($_FILES['file']['tmp_name'], "r");
   $data = fgetcsv($handle, 1000, ";"); //Remove if CSV file does not have column headings
   while (($data = fgetcsv($handle, 1000, ";")) !== FALSE) {
      $att0 = mysql_real_escape_string($data[0]);
      $att1 = mysql_real_escape_string($data[1]);
      $att2 = mysql_real_escape_string($data[2]);
      $att3 = mysql_real_escape_string($data[3]);

      //Check if row is in database already
      $sql = "SELECT *
              FROM `tbl_name`
              WHERE `attribute0` = '" . $att0 . "'";    //In this example attribute0 is the primary key
      $result = mysql_query($sql);
      $count = mysql_num_rows($result);
      if($count > 0){
         $findings = $findings . "
   <tr>
      <td bgcolor=\"#FF0000\">DB Duplicate</td>
      <td>" . $att0 . "</td>
      <td>" . $att1 . "</td>
      <td>" . $att2 . "</td>
      <td>" . $att3 . "</td>
   </tr>";
      }

    //Row is unique
    else{
       //Add INSERT statement to INSERT queue
       $_SESSION['insert'] = $_SESSION['insert'] . "INSERT INTO `tbl_name` (
                                                       `attribute0` ,
                                                       `attribute1` ,
                                                       `attribute2` ,
                                                       `attribute3`
                                                       )
                                                       VALUES ('" . $att0 . "', '" . $att1 . "', '" . $att2 . "', '" . $att3 . "');";

      //Add row for row to findings table and mark unique
      $findings = $findings . "
   <tr>
      <td bgcolor=\"#00FF00\">&nbsp;</td>
      <td>" . $att0 . "</td>
      <td>" . $att1 . "</td>
      <td>" . $att2 . "</td>
      <td>" . $att3 . "</td>
   </tr>";
      }
   }
   mysql_close($link);

   $findings = $findings . "
   <tr>
      <td colspan=\"5\"><div align=\"center\"><input type=\"submit\" value=\"Confirm\" /></div></td>
   </tr>
</table>
</form>";
   echo $findings;
}
else{
   die("You shouldn't be here");
}

However, when I built my add-on I found that the CSV files being uploaded contained duplicates inside of the file, so we need to check that each row that isn’t already in the database isn’t in the queue to be added to the database.

//Check if row is already in INSERT queue
 elseif(strpos($_SESSION['insert'], "'" . $att0 . "'") != false){
   $findings = $findings . "
 <tr>
    <td bgcolor=\"#FF0000\">File Duplicate</td>
    <td>" . $att0 . "</td>
    <td>" . $att1 . "</td>
    <td>" . $att2 . "</td>
    <td>" . $att3 . "</td>
    </tr>";
 }

Now that all entries have been checked, and the findings have been reported back to the user.  After user confirmation, the system must execute all the insert statements.  This is done in a new script called importCSV.php which looks like this:


session_start();

//Make sure the insert query queue has been initialized
if(!isset($_SESSION['insert'])){
   die("You shouldn't be here.");
}

//Connect to the database
$dbhost = 'localhost';
$dbuser = 'root';
$dbpass = 'password';
$dbname = 'petstore';
$link = mysql_connect($dbhost, $dbuser, $dbpass) or die('Error connecting to mysql server');
mysql_select_db($dbname);

$queries = explode(';', $_SESSION['insert']);

foreach($queries as $query){
   if($query != ""){
      mysql_query($query);
   }
}

mysql_close($link);

echo "Done";

The session variable ‘insert’ is the insert statement queue.  It is broken apart using explode() and the delimiter is a semicolon.  Each query is then executed and the system reports when done.

NOTE: The importCSV.php (in both the simple case and the analyze case) does not do any sort of error handling if a query fails to execute.

You can download upload, analyze, import script files here.

Conclusion

That’s it, you are free to download and use these scripts however you like.  Just remember to give credit where credit is due.

Once again here are the download links for the scripts:

Simple Upload and Import CSV

Upload, Analyze, and Import CSV

If you have any questions, please submit them as comments on this page.  This way other users can see your question and my answer.