Backuppy

Timeline
Login

Timeline

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

50 most recent check-ins

2021-01-19
15:00
_match_move_chains_in_changed_files_by_hash(): The set of moved files was created twice under different names. Removed one, as creating both is redundant. Leaf check-in: 5eb679355d user: thomas tags: fix_crash_during_source_update
14:56
Implemented fix for [664df7e8adc2a06e0cf324be5d76720b3c96e4df]. Instead of collecting moved files as a list of tuples (file, new location), collect them in a map {new location: file} and only add files, if their target location is not already present as a key in the map. This prevents adding multiple files with the same destination to be added to the moved files. check-in: 8c84ec0f0f user: thomas tags: fix_crash_during_source_update
14:52
Remved unused variable in test code. check-in: cc8b9e924f user: thomas tags: fix_crash_during_source_update
14:51
Fixed wrong asserts in the last-added test. check-in: 1f405e875c user: thomas tags: fix_crash_during_source_update
13:17
Renamed test case to have a better name. check-in: 5521705bff user: thomas tags: fix_crash_during_source_update
12:58
Added currently failing test case for a bug in the UpdateFileSourceController. See ticket [664df7e8adc2a06e0cf324be5d76720b3c96e4df]. check-in: dd3e67cbf6 user: thomas tags: fix_crash_during_source_update
2021-01-18
17:08
Implemented more benchmarks and optimized code using these. Leaf check-in: 7147ae4c94 user: thomas tags: trunk
17:06
UpdateFileSourceController: Removed logging entries causing excessive output. Fixed error in a logging string and some minor performance improvements. Use a set instead of a list to test if values are in a range of values. And do not perform an unneccessary list copy. Closed-Leaf check-in: e6e6ceb9b2 user: thomas tags: more_benchmarking
17:04
FileHashController: Removed debug logging that causes massive output on larger targets. check-in: 4862acf26a user: thomas tags: more_benchmarking
11:07
Replaced remaining dedent() calls in DirectoryController. check-in: 3e4769001b user: thomas tags: more_benchmarking
10:40
Wrap textwrap.dedent in a functools.lru_cache to avoid the overhead of repeated dedent() calls. This mitigates the performance impact of dedent() calls inside of loops. Also add some cached_dedent() calls around multi-line statements forgotten in [ca8a37be5bfd256248f0dabb46c732d741202b7683965f666270b52c59e135bf]. check-in: 8303002c63 user: thomas tags: more_benchmarking
10:05
backuppy-runner.py: Mark as executable and make sure to load the correct main() of the current checkout. check-in: 124bc286f8 user: thomas tags: trunk
2020-12-09
13:05
DirectoryController: Moved the shared query for get_directory_id and get_directory_and_parent_id into a member variable initialized in __init__(). check-in: a7d151250a user: thomas tags: more_benchmarking
12:40
DirectoryController: Moved dedent() call out of the inner loop to mitigate most of the incurred performance loss. check-in: e72c8a289e user: thomas tags: more_benchmarking
12:22
Unified usage of multiline SQL statements. Uniformly use triple-quoted raw strings wrapped in textwrap.dedent() calls. This is mostly a whitespace change without changing semantics. But it unifies different styles in the code base and makes the SQL log more readable. check-in: ca8a37be5b user: thomas tags: more_benchmarking
12:20
Model: Replaced one executemany() call with a series of execute() calls. check-in: 6e07c2daf4 user: thomas tags: more_benchmarking
12:19
Database changed one of the indices for better coverage. check-in: ec655b9523 user: thomas tags: more_benchmarking
2020-12-08
18:38
Benchmark script: Added benchmark for UpdateTargetController._database_update_changed_files(). Made database path configurable by adding it to the script arguments. Added option to dump the SQL log. check-in: f58a8b8378 user: thomas tags: more_benchmarking
17:34
Model: Added option to dump all executed SQL statements in a log file. This option is meant for debugging and (performance-) analysis of executed statements. check-in: 8cc1bf6a48 user: thomas tags: more_benchmarking
2020-11-15
13:43
Database: Disallow negative file sizes (check size_bytes >= 0) check-in: efb15aa068 user: thomas tags: trunk
12:18
Database: Disallow slashes in directory and file names, except for leading slashes, which are still allowed. This is a last line of defense, the controller methods should never attempt to insert invalid file names. check-in: 88678692a3 user: thomas tags: trunk
2020-11-14
22:53
Added benchmarks for UpdateTargetController: finding and adding new files and directories. check-in: 0084ff93df user: thomas tags: more_benchmarking
22:31
DirectoryController: Removed unused variable and enumerate() in get_directory_id(). check-in: b206d5943c user: thomas tags: trunk
2020-11-13
14:36
UpdateFileSourceController: Minor optimizations for _database_update_moved_files(), removed logging of all updated entries. Included this method in the benchmark script, but it did not reveal any real performance issues. check-in: 937a19d56f user: thomas tags: trunk
12:08
Benchmark script: Fix issue when the directory count is less than the requested directory tree depth. check-in: b753f61537 user: thomas tags: trunk
2020-11-11
23:37
Optimized runtime of UpdateFileSourceController._partition_files_in_dir() when the file count is large. The method previously did a full table scan of the File relation for each visited directory. It is now possible to traverse a million files without much issues. check-in: d724c64829 user: thomas tags: trunk
22:40
UpdateFileSourceController: Further optimized _partition_files_in_dir(). Removed the duplicated list comprehension with filtering using lambdas. Instead use os.path.join and loop over the data only once. This cuts the execution time in half again. Closed-Leaf check-in: 75b2f305b3 user: thomas tags: optimize_partition_files_in_dir
21:39
Database Schema: Added index that speeds up querying the files contained in a given directory. This is used by the UpdateFileSourceController in _partition_files_in_dir(). check-in: 4ef0aaa647 user: thomas tags: optimize_partition_files_in_dir
21:23
DirectoryController.add_directories(): Support passing any Iterable[Path], instead of only Sized[Path]. check-in: 96c3efd265 user: thomas tags: optimize_partition_files_in_dir
21:22
Benchmark script: Further optimized and fixed the script. It now runs decently and commits after each larger write for easier investigations using the generated data. check-in: f2a252b3b2 user: thomas tags: optimize_partition_files_in_dir
18:51
Benchmark script: Heavily optimized random string generation. Added new benchmark: UpdateFileSourceController._partition_files_in_dir() which gets slow on larger inputs (> 100ms per iteration) check-in: f6bfb6aac4 user: thomas tags: optimize_partition_files_in_dir
17:26
Implemended batch modes for source updating and target updating. Implements most of [a8875de545a384633d43b4a6d8ba63ad7be88357]. check-in: 383132226c user: thomas tags: trunk
17:24
Implemented batch mode to update all backup targets. Closed-Leaf check-in: 01ecc17835 user: thomas tags: implement_cli_batch_processing
17:17
CLI: Fixed formatting and output texts for the Update Source CLIs. check-in: 1fbcb878f1 user: thomas tags: implement_cli_batch_processing
16:28
Imported fixes from trunk. check-in: f92d358d49 user: thomas tags: implement_cli_batch_processing
16:25
FileHashController.get_hash_strings_for_file(): Use the already existing database view FileHashesPivot instead of re-implementing the exact same query. check-in: 86899017e3 user: thomas tags: trunk
16:11
FileHashController: Fixed bug in get_hash_strings_for_file(), which was somehow missing the target_id in the SQL query, which is part of the primary key. Added that to the query and wrote a unit test that failed prior to implementing the fix. check-in: 0eec5de377 user: thomas tags: trunk
15:56
Tests: Refactored test code: Moved some duplicated helper objects and functions into tests.helpers, unifying the slight differences in each object structure. Moved there is FileData, create_file(), create_files(). The latter two take a fake filesystem, target and FileData to create test files. check-in: 188946c2a0 user: thomas tags: trunk
14:42
TargetController & CLI: Added progress reporting when reading the total file system size. check-in: 4f0746c898 user: thomas tags: implement_cli_batch_processing
14:25
Import fixes from trunk check-in: 4e062f4b0f user: thomas tags: implement_cli_batch_processing
2020-11-10
22:43
CLI: Added batch option to update all available file sources. check-in: e97df0479c user: thomas tags: implement_cli_batch_processing
22:23
TargetController: Added get_available_sources() and get_available_targets(), each returning a list of Target instances where each target is available. Both directly return Target instances, because the information has to be queried anyways. check-in: beff17bc47 user: thomas tags: implement_cli_batch_processing
22:21
Renamed TargetController.get_available_source_names() and .get_available_target_names(), to get_all{source,target}_names(), as the former uses the term available in a confusing manner. A target is 'available' if the base_path is mounted and an accessible directory. check-in: f9f64a5f2f user: thomas tags: implement_cli_batch_processing
12:43
Optimize FileController.add_files() and FileHashController.store_hashes_for_files(). Both functions now run significantly faster. These optimizations mainly benefit UpdateFileSourceController.update_source(). It previously took several hours to insert ~ 400k files during the source update, if the controller found a large number of new files. The same now runs in 1 minute at maximum. check-in: bd9b817b56 user: thomas tags: trunk
11:54
Benchmark script: Use tqdm to add progress indicators in the data generation step, which takes some time on larger data sets. check-in: f0d534115a user: thomas tags: trunk
10:36
UpdateFileSourceController: Removed excessive logging when many new files are found, as that hangs Cutelog. Instead better log the actual progression of the method. check-in: 32cdae19e6 user: thomas tags: trunk
10:34
Updated the benchmark script to check the performance of some database operations in the UpdateFileSourceController that are slow. check-in: b071f3d4fc user: thomas tags: trunk
10:08
Handle broken symbolic links and special files, like character devices or block devices that may be present via symbolic link. These get created by Wine in it’s prefixes. Re-structured how walking file system trees is implemented, the walker is now exhausted by a controller method that performs the filtering internally. Closes [736f8bb4f950f1db4287e13e83b56ffbfa78c133]. check-in: 13c7fe543c user: thomas tags: trunk
2020-11-09
19:48
TargetController.create_filtered_filesystem_walker(): Skip all non-files and broken symbolic links during file system iteration. This skips special files like block devices, character devices, sockets, fifos, etc. and broken symbolic links. It prevents issues when such files occur inside a regular file system. Closed-Leaf check-in: 11fbefb6b6 user: thomas tags: handle_broken_symbolic_links
14:45
Dependencies: Added pytest-xdist to the development requirements. Run the tests in parallel when executing run_tests.sh. check-in: 8ffb610884 user: thomas tags: handle_broken_symbolic_links