Simple csv file parsing in Symfony Command class

Sometimes, in a Symfony command, you could have to parse a .csv file for example to put some datas in database. This is a small snippet (a function) that reads a comma separated values file and returns a php array, usable in a command (or other places).

Command class

The mechanism in divided into two parts: an array of options and a parseCSV() function. Here is the code:

class ImportationCommand extends ContainerAwareCommand
{
    // change these options about the file to read
    private $csvParsingOptions = array(
        'finder_in' => 'app/Resources/',
        'finder_name' => 'fixtures.csv',
        'ignoreFirstLine' => true
    );

    protected function execute(InputInterface $input, OutputInterface $output)
    {
        [...]

        // use the parseCSV() function
        $csv = $this->parseCSV();
    }

    /**
     * Parse a csv file
     * 
     * @return array
     */
    private function parseCSV()
    {
        $ignoreFirstLine = $this->csvParsingOptions['ignoreFirstLine'];

        $finder = new Finder();
        $finder->files()
            ->in($this->csvParsingOptions['finder_in'])
            ->name($this->csvParsingOptions['finder_name'])
        ;
        foreach ($finder as $file) { $csv = $file; }

        $rows = array();
        if (($handle = fopen($csv->getRealPath(), "r")) !== FALSE) {
            $i = 0;
            while (($data = fgetcsv($handle, null, ";")) !== FALSE) {
                $i++;
                if ($ignoreFirstLine && $i == 1) { continue; }
                $rows[] = $data;
            }
            fclose($handle);
        }

        return $rows;
    }
}

  • 2013-04-18 Matt Robinson

    That's what I figured; it's why I didn't submit a pull request, because it wouldn't have been simple any more :)

    If you're doing a quick & dirty one-off command, then I wouldn't have bothered with the options array or even the parseCsv method, just cram everything into the execute() method, run it and be done with it. I'm not proud ;)

  • 2013-04-18 Emanuele Gaspari Castelletti

    I think you didn't read well the title: "Simple" because I don't want in this case such flexibility; I'm going to use this command only once in my project and I don't want to set the filename as parameter from console, I don't want to remember it. Then "in Symfony command class" because I don't need services or other external tool in this case. It will depend on you then create the complexity you desire.
    Thanks for explaining how cai I configure a service, if you look well on my github you will notice that this platform is built mainly by me, even if it's not the best code in the world. But your observations will surely come useful for novices.

    After this: every point you spotted out is right and I agree with you, when you talk about the memory, Finder, etc.. Above all, when you talk about SplFileObject; I have to document about it because I don't use it frequently.
    Your code is very well made, it would be interesting to have a "brick" with it embedded.

    cheers

  • 2013-04-17 Matt Robinson

    Well I started to do just that, but really there's more than one thing wrong and the better solution would be not to parse the CSV in the command at all. So instead I'll explain how I'd go about doing something like this.

    First, you've set your options in a private array. Since this is a Console command, you should make them optional arguments with defaults instead, then they can be overridden from the command line. For example https://gist.github.com/ina... -- you call the command with "app/console demo:load-fixtures [filename]" and if you don't give the filename it'll use the default one.

    Alternatively (since your command class is container-aware) you could read the filename from a container parameter set in app/config/config.yml or app/config/parameters.yml by using something like $this->container->getParameter('fixtures_file'). Depends how configurable you want it to be.

    Secondly, there's no need to use Finder at all in your example. The only thing you're using it for is to get a fileinfo object which you call realpath() on, but that's already a function anyway. You could instead replace lines 27-35 with $csv = new \SplFileObject(realpath('app/Resources/fixtures.csv'));

    Third, you're loading the entire CSV file into memory. That might be fine for small files, but with a big file it won't be. You can instead just act on each row as it's loaded and then not have to worry about memory.

    Finally, since the intention of your command seems to be to load fixtures, you should make a fixture loader as a separate class and define it as a service. It's really easy in Symfony2 with just a few lines to app/config/config.yml. Then (because your command is a ContainerAwareCommand) you can get that service from the container and pass the filename to it. This keeps both your command and your loader simple and testable, and makes the fixture loader reusable (within your web app, for example).

    The loader should open the CSV and read each row, pass it to a hydrator that creates a database object, a validator to check it, and then your Doctrine entity manager to persist it. If you want to be clever, you can configure a prototype object in your service container, then just clone it for each row, which would let you inject services, set default values, etc etc. It sounds like a lot of work and a lot of moving parts, but it's not really and the upshot is that all the parts are small and simple, which makes them less likely to go wrong, and easier to replace if you need to change something.

    Still, it's all a bit too complicated for a small snippet like this, so instead, here's a link to a simple CSV File class that lets you foreach it to get each row as a named array (names from the first row) https://github.com/inanimat...

  • 2013-04-17 Emanuele Gaspari Castelletti

    $finder->files() returns an array, so the foreach is executed only once and the result is always only one iteration. this code is quite quick and dirty :)

    your observations are very interesting and I think they should be implemented in order to ship a better code.
    would you do be pleased to contribute to https://github.com/inmareli...

    Then I will update this guide.

  • 2013-04-17 Matt Robinson

    What's happening on line 33? Are you just getting the last file found with that name in that folder? What's the point of that? Seems weird to be using Finder if you already know the filename and folder - just make a new SPLFileInfo object directly.

    Also, since Finder returns SPLFileInfo objects (well, extended versions of the same), you can do $object = $csv->openFile(); then $object->setFlags(\SplFileObject::READ_CSV) and foreach it to get your rows instead of using fopen() and fgetcsv().

  • 2013-04-17 Pascal Borreli

    cvsParsongOptions -> cvsParsingOptions