Skip to content

Commit

Permalink
Scanmode
Browse files Browse the repository at this point in the history
  • Loading branch information
adlerweb committed May 21, 2017
1 parent 8db9d53 commit ed43058
Show file tree
Hide file tree
Showing 44 changed files with 3,227 additions and 1 deletion.
674 changes: 674 additions & 0 deletions LICENSE

Large diffs are not rendered by default.

43 changes: 43 additions & 0 deletions README
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# AdAr - Another dumb Archive

(sorry, ATM german only)

AdAr ist eine Weiterentwicklung auf Basis des Systems "DiBaS (Digitales Bildarchiv Saffig)", welches zur Archivierung des Fotos-Bestandes des Geschichtsvereins Saffig entwickelt wurde. In diesem Projekt wurde das System um dokumentenrelevante Funktionen wie OCR, Kontaktverwaltung u.A. ergaenzt.

AdAr ist vorerst nur in Deutsch verfuegbar. Der PHP-Code wird unter den Bedingungen der GPLv3 oder neuer bereitgestellt. Einige Libraries, welche sich in diesem Repo befinden, stehen unter anderen Lizenzen, welche im jeweiligen Projektordner eingesehen werden koennen.

Achtung: Gebastel mit Teils historischem Code. Nicht ohne prüfenden Blick produktiv verwenden.

Wenn die PHP-EXIF-Erweiterung installiert ist wird diese verwendet
Wenn pdftotext installiert ist wird dies verwendet

## Nutzung
Das System wird von mir aktiv zur Datenablage genutzt. Hierzu werden PDF-Dateien mit Text generiert (siehe tools/) und im Anschluss hochgeladen

## Installation

- Benötigt einen Webserver mit PHP >=5.6 und EXIF-Support
- Benötigt eine MySQL-Datenbank
- Benötigt [composer](https://getcomposer.org/)
- tesseract >=3
- Um OCR für Grafiken auszuführen
- nicht wirklich getestet, Sprache Deutsch voreingestellt
- pdftotext
- Zum Extrahieren von Text aus PDF-Dateien


- Daten auf Webserver kopieren
- Die Ordner daten/* und tpl/cache/ müssen für den Webserver schreibbar sein
- MySQL-Datenbank anlegen und doc/mysql.sql importieren
- Zugangsdaten in config.php ergänzen
- Optional: Name der Installation (ADAR_PROGNAME) anpassen
- Optional: E-Mail-Adresse in ADAR_INFOMAIL_TO ergänzen, in diesem Fall wird bei jeder Neuanlage eine E-Mail an diese Adresse versendet
- Abhängigkeiten installieren: ```composer install```
- cron.php sollte regelmäßig als Webserver aufgerufen werden, andernfalls werden temporäre Dateien nicht aufgeräumt und OCR nicht ausgeführt
- z.B. ```*/15 * * * * /usr/bin/php -f /var/www/cron.php > /var/log/adar.cron.log``` in crontab
- Login mit admin/admin

## Hinweise
- Aktuell existiert keine grafische Nutzerverwaltung, das Passwort kann also nicht geändert werden. Generell empfieht es sich eine Authentifizierung auf Webserverebene einzurichten. Die Nutzer lassen sich in SQL editieren, passende Passwort-Hashes können üder die Funktion [session_getNewPasswordHash](https://github.com/adlerweb/awtools/blob/master/session.php#L137) generiert werden.
- Backups.
- Mehr Backups.
166 changes: 166 additions & 0 deletions api.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,166 @@
<?PHP

/**
* AdAr - Another dumb Archive
*
* AJAX API
*
* @package adar
* @author Florian Knodt <[email protected]>
*/

if(!file_exists('config.php') ||!is_readable('config.php')) {
die('Missing configuration');
}

require_once('config.php'); //Config
require_once('lib/mysql.wrapper.php'); //ATools->MySQL
require_once('vendor/adlerweb/awtools/session.php'); //ATools->Session-Manager

if(!$GLOBALS['adlerweb']['session']->session_isloggedin()) {
echo 'Invalid session';
header('HTTP/1.0 403 Forbidden');
}

$requestData= $_REQUEST;

$columns = array(
// datatable column index => database column name
0 => array(false, 'ItemID', array('<a href="?m=content_detail&id=%s">%s</a>', array('ItemID', 'ItemID'))),
1 => array(false, 'Caption', false),
2 => array(false, 'Format', false),
3 => array(false, 'Date', false),
4 => array('CONCAT(`Sender`.`FamilyName`,", ",`Sender`.`GivenName`)', 'S_Sender', array('<a href="?m=contact_create&id=%s">%s</a>', array('Sender', 'S_Sender'))),
5 => array('CONCAT(`Receiver`.`FamilyName`,", ",`Receiver`.`GivenName`)', 'S_Receiver', array('<a href="?m=contact_create&id=%s">%s</a>', array('Receiver', 'S_Receiver')))
);

$colout = array();
$colout_f = array();
$colout_done = array();
foreach($columns as $col) {
$colout_done[] = $col[1];
if($col[0]) {
$colout[] = $col[0].' AS `'.$col[1].'`';
$colout_f[] = $col[0].' AS `'.$col[1].'`';
}else{
$colout[] = '`'.$col[1].'`';
}

if($col[2]) {
foreach($col[2][1] as $tcol) {
if(!in_array($tcol, $colout_done)) {
$colout[] = $tcol;
$colout_done[] = $tcol;
}
}
}
}

$sql_data = "SELECT ";
$sql_data .= implode(", ", $colout);
$sql_data .= " FROM Items
LEFT JOIN `Contacts` AS `Sender` ON `Items`.`Sender` = `Sender`.`CID`
LEFT JOIN `Contacts` AS `Receiver` ON `Items`.`Receiver` = `Receiver`.`CID` ";

$sql_anz = "SELECT COUNT(`Items`.`ItemID`) as anz ";

$sql_anz .= " FROM Items
LEFT JOIN `Contacts` AS `Sender` ON `Items`.`Sender` = `Sender`.`CID`
LEFT JOIN `Contacts` AS `Receiver` ON `Items`.`Receiver` = `Receiver`.`CID` ";

// getting total number records without any external filters
//$anzq=$GLOBALS['adlerweb']['sql']->query($sql_anz.$sql_filter);
$anzq=$GLOBALS['adlerweb']['sql']->query_single($sql_anz);
if(!$anzq) {
$totalData=0;
}else{
$totalData=$anzq['anz'];
}
$totalFiltered = $totalData;

$sql_filter_data = array();
$sql_filter = " WHERE 1 = ?";
$sql_filter_data[] = 1;

// getting records as per search parameters
for($i=0; $i<count($columns); $i++) {
if( !empty($requestData['columns'][$i]['search']['value']) ){
if($columns[$i][0]) {
$sql_filter.=" AND (".$columns[$i][0].") LIKE ? ";
$sql_filter_data[] = '%'.$requestData['columns'][$i]['search']['value'].'%';
}else{
$sql_filter.=" AND `".$columns[$i][1]."` LIKE ? ";
$sql_filter_data[] = '%'.$requestData['columns'][$i]['search']['value'].'%';
}
}
}

if(!empty($requestData['search']['value'])) {
$sql_filter.="
AND (
`ItemID` LIKE ? OR
`Caption` LIKE ? OR
`Description` LIKE ? OR
`Format` LIKE ? OR
CONCAT(`Sender`.`FamilyName`,\", \",`Sender`.`GivenName`) LIKE ? OR
CONCAT(`Receiver`.`FamilyName`,\", \",`Receiver`.`GivenName`) LIKE ?
) ";
$sql_filter_data[] = '%'.$requestData['search']['value'].'%';
$sql_filter_data[] = '%'.$requestData['search']['value'].'%';
$sql_filter_data[] = '%'.$requestData['search']['value'].'%';
$sql_filter_data[] = '%'.$requestData['search']['value'].'%';
$sql_filter_data[] = '%'.$requestData['search']['value'].'%';
$sql_filter_data[] = '%'.$requestData['search']['value'].'%';
}

if(count($sql_filter_data) > 1) {
$anzq=$GLOBALS['adlerweb']['sql']->querystmt_single($sql_anz.$sql_filter, str_repeat('s', count($sql_filter_data)), $sql_filter_data);
if(!$anzq) {
$totalFiltered=0;
}else{
$totalFiltered=$anzq['anz'];
}
}

if(isset($requestData['order'][0]['column']) && isset($requestData['order'][0]['dir'])) {
if(!in_array($requestData['order'][0]['dir'], array('ASC', 'DESC', 'asc', 'desc'))) die('Errr?');
$sql_filter.=" ORDER BY ". $columns[$requestData['order'][0]['column']][1]." ".$requestData['order'][0]['dir'].' ';
}
if(isset($requestData['start']) && isset($requestData['length']) && $requestData['length'] > 0)
$sql_filter.="LIMIT ".(int)$requestData['start']." ,".(int)$requestData['length']." "; // adding length

$query = $GLOBALS['adlerweb']['sql']->querystmt($sql_data.$sql_filter, str_repeat('s', count($sql_filter_data)), $sql_filter_data);
$data = array();
if($query) {
foreach($query as $row) { // preparing an array
$nestedData=array();

foreach($columns as $col) {
if($col[2]) {
$argdata = array(
$col[2][0]
);
foreach($col[2][1] as $in) {
$argdata[] = $row[$in];
}
$nestedData[] = call_user_func_array('sprintf', $argdata);
}else{
$nestedData[] = $row[$col[1]];
}
}

$data[] = $nestedData;
}
}

$json_data = array(
"recordsTotal" => intval( $totalData ), // total number of records
"recordsFiltered" => intval( $totalFiltered ), // total number of records after searching, if there is no searching then totalFiltered = totalData
"data" => $data // total data array
);

if(isset($requestData['draw'])) $json_data['draw'] = $requestData['draw'];

echo json_encode($json_data); // send data as json format

?>
40 changes: 40 additions & 0 deletions composer.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
{
"name": "adlerweb/adar",
"type": "project",
"description": "AdAr - Another dumb Archive - Document archiving solution",
"keywords": ["archive", "DMS", "PDF", "OCR"],
"homepage": "https://github.com/adlerweb/adar",
"license": "GPL-3.0",
"authors": [
{
"name": "Florian Knodt",
"email": "[email protected]"
}
],
"support": {
"issues": "https://github.com/adlerweb/adar/issues"
},
"require": {
"php": ">=5.6",
"datatables/datatables": "1.10.*",
"components/jquery": "3.2.*",
"components/jqueryui": "1.12.*",
"smarty/smarty": "3.1.*",
"adlerweb/calender-date-input": "dev-master",
"pixabay/jquery-tageditor": "dev-master",
"adlerweb/awtools": "0.2.*",
"koala-framework/library-silkicons": "1.3"
},
"repositories": [
{
"type": "vcs",
"url": "https://github.com/adlerweb/calendarDateInput"
},{
"type": "vcs",
"url": "https://github.com/adlerweb/jQuery-tagEditor"
},{
"type": "vcs",
"url": "https://github.com/adlerweb/awtools"
}
]
}
27 changes: 27 additions & 0 deletions config.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
<?PHP

/**
* AdAr - Another dumb Archive
*
* Main Configuration
*
* @package adar
* @author Florian Knodt <[email protected]>
*/

error_reporting(E_ALL);

define("AW_SQL_SERV", "localhost");
define("AW_SQL_USER", "adar");
define("AW_SQL_PASS", "testinstallation");
define("AW_SQL_DATB", "adar");
define("AW_SQL_DEBUG", true);
define("AW_SQL_DEBUG_SHOW", false);

define("SMARTY_CACHE", false);

define("ADAR_PROGNAME", 'AdAr - Another dumb Archive');

define("ADAR_INFOMAIL_TO", '');
define("ADAR_INFOMAIL_FROM", 'ADAR <adar@localhost>');
?>
79 changes: 79 additions & 0 deletions cron.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
<?PHP

/**
* AdAr - Another dumb Archive
*
* System zur Archivierung von Fotos und Dokumenten
*
* @package adar
* @author Florian Knodt <[email protected]>
*/

if(!file_exists('config.php') ||!is_readable('config.php')) {
die('Konfiguration fehlt');
}
require_once('config.php'); //Config
require_once('lib/mysql.wrapper.php'); //ATools->MySQL
require_once('lib/ocr.php');

//Step 1: Temp Cleanup
echo "Temp Cleanup\n";
$dir = opendir('data/tmp/');
while (($file = readdir($dir)) !== false) {
if(filetype('data/tmp/' . $file) == 'file' && filectime('data/tmp/' . $file) <= time()-(12*60*60)) {
echo " Delete: data/tmp/".$file."\n";
unlink('data/tmp/' . $file);
}
}
closedir($dir);
echo "DONE!\n";

//Step 2: OCR
echo "OCR...\n";
$list = $GLOBALS['adlerweb']['sql']->query('SELECT ItemID,Description FROM `Items` WHERE OCRStatus = 1');
if($list->num_rows > 0) {
while($item = $list->fetch_object()) {
echo " ORC for ".$item->ItemID."\n";
$ocr='';
if(file_exists('data/org/'.$item->ItemID.'.png')) {
echo " PNG OCR\n";
$ocr = ocr('data/org/'.$item->ItemID.'.png');
}elseif(file_exists('data/org/'.$item->ItemID.'.jpg')) {
echo " JPG OCR\n";
$ocr = ocr('data/org/'.$item->ItemID.'.jpg');
}elseif(file_exists('data/org/'.$item->ItemID.'.pdf')) {
echo " PDF TXT...";
exec('pdftotext -layout data/org/'.$item->ItemID.'.pdf data/tmp/'.$item->ItemID.'.txt');
if(!file_exists('data/tmp/'.$item->ItemID.'.txt') || !($text = file_get_contents('data/tmp/'.$item->ItemID.'.txt')) || strlen(trim($text)) < 100) {
echo "FAILED\n PDF OCR\n";
//Fallback to optical method
exec('convert -density 400 '.escapeshellarg('data/org/'.$item->ItemID.'.pdf').' '.escapeshellarg('data/tmp/'.$item->ItemID.'.png'));
$page=0;
do {
$ocr .= ocr('data/tmp/'.$item->ItemID.'-'.$page.'.png');
unlink('data/tmp/'.$item->ItemID.'-'.$page.'.png');
$page++;
} while(file_exists('data/tmp/'.$item->ItemID.'-'.$page.'.png'));
}else{
echo "OK\n";
$ocr = $text;
}
if(file_exists('data/tmp/'.$item->ItemID.'.txt')) unlink('data/tmp/'.$item->ItemID.'.txt');
}else{
echo "No original?!\n";
}

if($ocr != '') {
$desc = '';
if($item->Description != '') {
$desc = $item->Description."\n\n---\n\n.";
}
$desc .= $ocr;
$GLOBALS['adlerweb']['sql']->querystmt("UPDATE `Items` SET `Description` = ? `OCRStatus` = 2 WHERE ItemID = ?;", 'ss', array($desc, $item->ItemID));
echo " Added ".strlen($ocr)." chars\n";
}
}
}
echo "DONE!\n";

?>
4 changes: 4 additions & 0 deletions data/cache/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Ignore everything in this directory
*
# Except this file
!.gitignore
4 changes: 4 additions & 0 deletions data/org/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Ignore everything in this directory
*
# Except this file
!.gitignore
4 changes: 4 additions & 0 deletions data/tmp/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Ignore everything in this directory
*
# Except this file
!.gitignore
12 changes: 12 additions & 0 deletions doc/TODO.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
- search tags
- user management
- installer
- API/Liveforms kombinieren
- Verwaltung Kontakte
- Insert-API aufnehmen
- Suche: Datumsbereiche/Datepicker
- Formularerkennung
- Mehr Dateitypen (Libreoffice-API?)
- GPG statt SHA256
- gettext / Übersetzungen

Loading

0 comments on commit ed43058

Please sign in to comment.