Strip repository of large files before release

Issue #40 resolved
Martin Sandve Alnæs created an issue

As we did in fenics when moving to git, we can rewrite the repository such that large files are not part of the repository, keeping it small. This makes the repository history incompatible with forks and branches, and should not be done lightly. In particular, if we do it it should be only once and before the upcoming release.

Before doing this we need a solution for reference data, which I suggest we get from the ffc regression tests. Maybe the ffc regression test data scripts can be written as a py.test plugin.

We should also consider storing the meshes in a separate location as dolfin does.

What do you think about these three points?

Comments (10)

  1. Øyvind Evju

    I'm not sure I understand your point that it will be incompatible with forks and branches? Anyways, it would be beneficial, so we can freely add more demos/examples/benchmarks without worrying about the size of the repository.

    I believe reference solutions and meshes should be kept separate from the repo, and there aren't many other large files...

  2. Martin Sandve Alnæs reporter

    The only way to remove large files from the repository completely, i.e. such that they are not stored in .git/ forever, is to rewrite the entire repository history as if the files were never there in the first place. If you make a fork of the repository before doing this, every commit in the fork and the new repo will represent a different state and have a different hash. If you merge the fork back, you get double up of the entire history. Hint: That's not twice as good ;)

  3. Martin Sandve Alnæs reporter

    I'll take a look at porting over the regression test scripts from ffc, as I wrote them (based on work by Anders Logg).

    Øyvind, can you discuss solutions for mesh files with Johannes Ring? Maybe we can use the same solution as dolfin does.

    We have to get these in place before we do the history rewrite and then make sure we don't have any outstanding branches to merge.

  4. Øyvind Evju

    Ah, that makes sense. Well, I think we have reasonably good control of the forks and branches of the repository now, don't you? ;)

  5. Øyvind Evju

    dolfin stores all it's meshen on fenicsproject.org. We could use cbc.simula.no, however, the simple variant would be to keep separate bitbucket projects under simula_cbc. Maybe two projects called cbcflow-data and cbcflow-regression?

  6. Martin Sandve Alnæs reporter

    Ok. For the data we can add a simple script to cbcflow to clone/pull the latest data.

    For the regression tests it's a bit more tricky because we want to match versions, I'll adapt the ffc scripts for this.

    The regression tests should not write to reference/, they should always just write to output/. The scripts I'll adapt will just copy output/ to the references respository. You don't want to rerun the tests to store data to reference/!

  7. Log in to comment