Commit 76e006a5 authored by Leonard Marschke's avatar Leonard Marschke

initial student commit

parents
Pipeline #7020 failed with stages
in 38 seconds
*
!main.py
!app
!util
!application.cfg
!requirements.txt
!uwsgi.ini
# Created by https://www.gitignore.io/api/pycharm,python
### PyCharm ###
# Covers JetBrains IDEs: IntelliJ, RubyMine, PhpStorm, AppCode, PyCharm, CLion, Android Studio and Webstorm
# Reference: https://intellij-support.jetbrains.com/hc/en-us/articles/206544839
# User-specific stuff:
Pipfile
.idea/
# ignore data directory for local development
data
somedb.sqlite
tests/somedb.sqlite
# Gradle:
.idea/**/gradle.xml
.idea/**/libraries
# Mongo Explorer plugin:
.idea/**/mongoSettings.xml
## File-based project format:
*.iws
## Plugin-specific files:
# IntelliJ
/out/
# mpeltonen/sbt-idea plugin
.idea_modules/
# JIRA plugin
atlassian-ide-plugin.xml
# Crashlytics plugin (for Android Studio and IntelliJ)
com_crashlytics_export_strings.xml
crashlytics.properties
crashlytics-build.properties
fabric.properties
### PyCharm Patch ###
# Comment Reason: https://github.com/joeblau/gitignore.io/issues/186#issuecomment-215987721
# *.iml
# modules.xml
# .idea/misc.xml
# *.ipr
### Python ###
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
# C extensions
*.so
# Distribution / packaging
.Python
env/
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*,cover
.hypothesis/
# Translations
*.mo
*.pot
# Django stuff:
*.log
local_settings.py
# Flask stuff:
instance/
.webassets-cache
# Scrapy stuff:
.scrapy
# Sphinx documentation
docs/_build/
# PyBuilder
target/
# Jupyter Notebook
.ipynb_checkpoints
# pyenv
.python-version
# celery beat schedule file
celerybeat-schedule
# dotenv
.env
# virtualenv
.venv
venv/
ENV/
# Spyder project settings
.spyderproject
# Rope project settings
.ropeproject
# End of https://www.gitignore.io/api/pycharm,python
/coverage/
image: lmmdock/build-environment
stages:
- test
- build
- deploy
before_script:
- mkdir -p build
python-style:
stage: test
script:
- wget https://sre18.pages.rechenknecht.net/misc/pylintrc -O .pylintrc
- pip3 install -r requirements.txt
- pip3 install pylint
- pylint main.py app tests util
python-test:
stage: test
script:
- pip3 install -r requirements.txt
- pip3 install coverage
- coverage run --branch --omit='tests/*','*/site-packages/*','*/dist-packages/*' -m unittest discover -p 'Test*.py'
- coverage html -d build/coverage
- coverage report
coverage: '/^TOTAL.*?(\d{1,3})%$/'
artifacts:
paths:
- build/coverage
expire_in: 1 week
build-api-definition:
stage: build
image: node:10
script:
- yarn global add api2html
- mkdir -p build/api
- api2html docs/api-definition.yml -o build/api/index.html
artifacts:
paths:
- build/api
expire_in: 1 week
pages:
stage: deploy
script:
- mkdir -p public
- cp -av build/api public/api
only:
- master
artifacts:
paths:
- public
expire_in: 1 week
FROM lmmdock/flask-webserver
COPY ./ /app
RUN pip3 install -r requirements.txt
# Flask Training REST API
This project contains a simple [Flask](http://flask.pocoo.org/) application.
Flask is a Python based microframework to build small to mid sized applications and APIs.
We wrote an OpenAPI v3 definition that defines the endpoint that should be implemented in this exercise.
You can find the definition file in this repository at [docs/api-definition.yml](docs/api-definition.yml).
We recommend to view the definition [here](https://www-technologien.pages.rechenknecht.net/flask-training-api-reference/api).
## Quickstart
We put a very basic implementation at [app/views_v1.py](app/views_v1.py).
Please clone this repository on your machine and complete the tasks given in the comments.
Please have a look at [docs/api-definition.yml](docs/api-definition.yml) for a definition of the endpoints that should be implemented.
## Tasks
In this assignment we will build a REST API which serves pictures with metadata from a DB.
These pictures we feed into the DB by ourself by allowing clients to call `v1/images/fetch` method.
To have some data to play with, we are using a freely available image set normally used for Deep Learning you can find [here](https://image-annotations.marschke.me/NAACL/).
You are supposed to perform the following tasks:
1. Read this README and make sure you understand it
1. Look into the code, understand what it is doing in general
1. Answer questions stated in `get_answers` in [app/views_v1.py](app/views_v1.py)
1. Implement the API itself by looking at https://www-technologien.pages.rechenknecht.net/flask-training-api-reference/api or using the API definition directly: [docs/api-definition.yml](docs/api-definition.yml)
1. Make sure you understood the concept of routes and Blueprints / API-Versioning by Paths
1. Try to run the API and test it with a simple request (such as the curl request below)
1. We suggest you to start with the `/images/fetch` route because it will get the data into your DB
1. Implement other get requests, most simple one is the request for one image. To get image data from database check out the Peewee documentation.
1. Hand in your solution as stated in the route `/v1/answers`
## Getting started
This repository contains a generic Flask REST API framework, which contains mainly the Flask library itself and some other helpful tools.
One of these tools is the Flask wrapper, which takes care of helpful error messages in case of a program failure (500 Server Error) and provides some cool additional features to deal with new web security stuff such as CORS (check it out ;) ).
**Please check out [this guide](https://blog.philipphauer.de/restful-api-design-best-practices/) for best practices in RESTful API design.**
You will also find an Object to Relational Mapping (ORM) library named [Peewee](http://docs.peewee-orm.com/en/latest/) to handle connections to our database. If you are not familiar with an ORM you should check out [this neat guide](https://stackoverflow.com/questions/1279613/what-is-an-orm-and-where-can-i-learn-more-about-it#answer-1279678).
Because we were lazy people we created a little wrapper for Peewee as well, which makes some integration into Flask a little bit easier.
Therefore to access the database object you have to import the `database_holder` from `app` to access the database object itself.
You need this for doing cool things like
```python
with database_holder.database.transaction():
image_object = Image(someProp='someValue').save()
# Here could arise an Exception in case of program failure
# If it is breaking here, all actions done in the transaction will be reverted by the DBMS.
# So you will not get a state, which is invalid.
Caption(someOtherProp='someValue', image=image_object)
```
We recommend you to use an IDE like PyCharm for developing this piece of software.
If you are not familiar with IDEs in general you should use this project as a starting point.
Trust us, you will have many (Python) projects where a nice IDE is a real gamechanger.
## Development setup
You have to set the correct PythonPath when executing the tests to the base directory of this project.
The PythonPath is the base path for the Python interpreter, which is used for resolving imports.
When you are executing the `run.py` file from the project root, you most likely will not have to modify your PythonPath.
In order to execute this project, you have to have a recent Python Interpreter (we recommend at least Python 3.6, 3.5 should work as well, Python 2 is not working) with pip (normally installed along).
Also you have to install all the project requirements stated in `requirements.txt`.
You can do so by executing `pip3 install -r requirements.txt` in your project folder.
Please take note that you will use `lxml` later on in the project.
`lxml` is a Python wrapper for `libxml`, which allows you to parse XML files easily.
Therefore you may have to install the `lxml` library on your system (on Linux and Mac, Windows is using a static build provided by pip).
Check out [this](https://lxml.de/installation.html) website how to install `lxml` on your system (Mac and Linux).
This project contains a lot of stub code, so you will not have to understand all code in it.
For your assignments a good starting point should be the file `app/views_v1.py`, which contains the route definitions.
Also you should check out the directory `app/models` where you can find all of the Database models we use.
Because we are no Database lecture we defined the models for you.
But probably it is important to you to understand how foreign keys work and why they exist.
In our case you can navigate through the foreign keys from one side by accessing `image` and from the other through the `backref` `captions`.
Pro tip: We recommend you to use a Python venv (check this out), but it should work "native" as well.
## Run the API
To run the API simply execute `python3 run.py` in project root.
After successful starting (you should see something like this in your terminal):
```
* Serving Flask app "app" (lazy loading)
* Environment: production
WARNING: Do not use the development server in a production environment.
Use a production WSGI server instead.
* Debug mode: off
```
you can access your API for example by the following command:
```bash
curl http://localhost:5000/v1/tasks
```
### Manual testing
You can access your API with almost every HTTP client on your PC.
For an easy way to test different operations (POST, GET, PUT, PATCH) we recommend to use [Postman](https://www.getpostman.com/) (or simply write tests ;) ),
### Executing tests
Try `python3 -m unittest discover -p 'Test*.py'` after setting up your development environment.
If you are using PyCharm you can use the built in test environment as well.
Just add a new run config `Python tests` -> `unittest` pointing to the test directory of this project.
As a pattern please provide `Test*.py` because we do not use the standard pattern for our tests.
## Writing tests
We do not require you but encourage you to write tests for your application. You can do so by adding methods beginning with `test_` in `tests/TestBlueprintV1.py`.
We are using the `flask_testing` [framework](https://github.com/jarus/flask-testing) to execute our tests - just in case you want a reference.
### Linter
If you are really ambitious you can run `pylint` with our own flavour.
Because you are so ambitious you can pick up the command right from our `.gitlab-ci.yml`, which we are using normally for all our projects.
### GitLab
Just in case you want to use the automated test and linter feature you have to upload the code simply to a GitLab instance with a configured worker on it.
## Docker
The following section is just for your information. You do not have to use Docker or understand it at all (at this moment).
We provide a simple Docker file for you, which should have all features you need.
If you think it would be better to use another Docker base you can surely do it as long as you ensure that you expose your web server on port `8080`.
You can build the Docker container by executing `docker build .` (you should define `--tag=someTag` to use the image later on).
If you are new to Docker, please check out [this](https://docs.docker.com/get-started/) little guide which explains the core concepts (you only need part 1 and 2).
Also you should get information about how to build a Docker container.
Please make sure you can build the Docker container before handing in your solution.
We will grade the software in the container and because you want to get some credits for this assignment you should make sure that we can build the container right off.
## Ok, I really do not know what I am supposed to do
If you have any questions left or found a bug in our stub code, please mail us at `www-coding@lists.myhpi.de`.
We will be happy to help you.
from flask_common.flask import Flask
from flask_common.util import register_cors_headers
from peewee_common.Database import Holder
flask = Flask(__name__)
flask.config.from_pyfile("../application.cfg")
database_holder = Holder(flask)
from app.models import Image, Caption
database_holder.database.create_tables([Image, Caption])
from app.views_v1 import BASE
from app.views_v1 import V1
flask.register_blueprint(BASE)
flask.register_blueprint(V1)
register_cors_headers(flask, allowOrigin=lambda _: True, allowCredentials=False,
allowedHeaders='Content-Type,cache-control,x-requested-with')
import datetime
from peewee import CharField, DateTimeField, ForeignKeyField
from app import database_holder
from app.models import Image
class Caption(database_holder.Model):
creation_date = DateTimeField(default=datetime.datetime.now)
# Anticipating that a caption is not longer than 255 chars
text = CharField()
# Be careful: on_delete is not working for SQLite, which we are using
image = ForeignKeyField(Image, backref="captions", on_delete='CASCADE')
import datetime
from peewee import CharField, DateTimeField
from app import database_holder
class Image(database_holder.Model):
# time received on api
creation_date = DateTimeField(default=datetime.datetime.now)
# be cautious when storing large strings in here, they probably get shortened
src = CharField()
category = CharField()
from .Image import Image
from .Caption import Caption
# Remove for testing with pylint ;)
# pylint: disable=fixme
# **It took me ? hours to solve this assignment.**
# Solved by (Matrikelnummern): Person 1, Person 2
import json
import re
from lxml import html
import requests
from flask import Blueprint, request, Response
from app import database_holder
from app.models import Image, Caption
BASE = Blueprint('', __name__, url_prefix='')
V1 = Blueprint('v1', __name__, url_prefix='/v1')
BASE_URL_DATASET = 'https://image-annotations.marschke.me/NAACL/'
@V1.route('/health', methods=['GET'])
@BASE.route('/health', methods=['GET']) # This route exists for the health check of the docker container
def check_health():
return json.dumps({
"status": "up",
"message": "operational",
}), 200
def get_image_json_object(image):
caption_list = []
for caption in image.captions:
caption_list.append({
'text': caption.text
})
return {
'id': image.id,
'category': image.category,
'captions': caption_list,
}
@V1.route('/tasks', methods=['GET'])
def get_tasks():
return json.dumps({
'taskList': [
'Answer questions stated in get_answers() (5 Points)',
'Build API defined in docs/api-definition.yml (precise as possible) (23 Points)',
]
})
# 2 Points code style:
# Linter (each "real" error -0.5)
# Readability (up to -2 if very "obfuscated")
#
# Bonus points for really nice solutions and approaches, which aren't covered by our solution / in our opinion nicer
# than in our solution
@V1.route('/answers', methods=['GET'])
def get_answers():
return json.dumps({
'Is a REST API bound to a single exchange format like JSON? How can multiple formats be used? (1 Point)':
'',
'Are all operations on this REST APIs idempotent? Explain why! (1 Point)':
'',
'Why could it be problematic to work with full URLs as links? Name and explain two reasons (2 Points)': [
'',
],
'Hand in requirements (1 Point)': [
'ZIP the complete source code directly from the root directory (so no useless subdirs in ZIP please)',
'Make an own solution, show us that this is your solution by adding comments where they are needed to '
'understand your source code',
'Do not change any file names if not really needed (and then please document).',
'Do not alter return definitions from functions get_tasks and get_answers.',
'Write your answers in German or English, in code please write all English.',
'Please write down your matriculation number at top of this file.',
'Please add a comment at top of this file how much time you needed for this assignment (please be honest).',
]
})
# TODO implement missing routes
# You can request images from the database by executing something like
# for image in Image.select().limit(limit).offset(offset):
# do something with the image
#
# To access captions, you need to access image.captions for a list of all caption object (more precisely a generator)
#
# For counting all images in DB you can use this command: Image.select().count()
# If using pylint please add # pylint: disable=no-value-for-parameter to the end of the line.
#
# You can access GET arguments by request.args.get('key', 'default') where request is a global object defined by Flask
@V1.route('/images/fetch', methods=['POST'])
def update_image_storage():
"""This operation should clear DB if run multiple times on the same DB
8 Points: 3 for xpath (1 each), 1 for correct resource GET, 1 for correct DB cleaning, 1 for correct DB saving,
2 for correct regex, 2 for Transaction explanation
-0.5 for small mistakes (return value...)
:return: json if successful or not
"""
# TODO get BASE_URL_DATASET, we would suggest requests for it (already installed) Think about error handling.
answer = requests.get(BASE_URL_DATASET)
# Error handling
# TODO Please explain what this line is doing. Why is it needed? In which case? (directly here as comment)
with database_holder.database.transaction():
# Empty databases
Image.delete().execute() # pylint: disable=no-value-for-parameter
Caption.delete().execute() # pylint: disable=no-value-for-parameter
# TODO We encourage you to use the html.fromstring method provided by the lxml package (already installed).
tree = None
# TODO After parsing the XML tree, please use the xpath method to iterate over all elements
for pictureTree in tree.xpath(''):
# TODO get image src by xpath method, you can check lxml documentation or use a debugger to find attributes
src = None
# TODO parse category by appling a regex to src, probably check out regex101.com
# check out re docs of Python3
category = None
# save Image in DB, nothing magical here
imageDb = Image(src=src, category=category)
imageDb.save()
# TODO iterate over all captions by using xpath method. Try to make the xpath expression as short as
# possible
for captionTree in []:
caption_text = ''
Caption(text=caption_text, image=imageDb).save()
return json.dumps({'status': 'finished'}), 200
DATABASE = {
'name': "somedb.sqlite",
'engine': 'peewee.SqliteDatabase',
}
COOKIE_AES_KEY = 'CHANGEME'
openapi: "3.0.0"
info:
description: "This is the API definition for the Flask Training API."
version: "1.0.0"
title: "Flask Training API"
contact:
email: "leonard@marschke.me"
servers:
- url: "https://someserver/v1"
paths:
/health:
get:
summary: Gets service health
responses:
200:
description: Service operational
content:
application/json:
schema:
$ref: '#/components/schemas/HealthResponse'
503:
description: Service not operational
content:
application/json:
schema:
$ref: '#/components/schemas/HealthResponse'
default:
$ref: '#/components/responses/Error'
/images:
get:
summary: Get saved images
description: To use pagination check out query parameters.
parameters:
- name: limit
in: query
description: Limits the response size, defaults to 100, minimum 1, maximum 500
schema:
type: integer
minimum: 1
default: 100
maximum: 500
description: Maximal number of images
- name: offset
in: query
description: Skips the beginning of the sorted response
schema:
type: integer
minimum: 0
default: 0
description: Number of images to skip
responses:
200:
description: Mail content
content:
application/json:
schema:
type: object
properties:
images:
type: array
items:
$ref: '#/components/schemas/Image'
count:
type: number
format: uint
description: Count of all images available
400:
$ref: '#/components/responses/InvalidArgument'
default:
$ref: '#/components/responses/Error'
/images/{imageId}:
get:
summary: Get metadata of image
parameters:
- $ref: '#/components/parameters/ImageId'
responses:
200:
description: Mail metadata
content:
application/json:
schema:
type: object
properties:
image:
$ref: '#/components/schemas/Image'
404:
$ref: '#/components/responses/NotFound'
default:
$ref: '#/components/responses/Error'
/images/{imageId}/bitmap:
get:
summary: Get bitmap of image
parameters:
- $ref: '#/components/parameters/ImageId'
responses:
200:
description: Bitmap
content:
image:
schema:
type: string
format: binary
description: Image bitmap
404:
$ref: '#/components/responses/NotFound'
default:
$ref: '#/components/responses/Error'
/images/fetch:
post:
summary: Fetches image DB
description: Should fetch images from https://image-annotations.marschke.me/NAACL/. Operation must be idempotent.
responses:
200:
description: Mail successfully saved
content:
application/json:
schema:
type: object
properties:
status:
type: string
description: Finish status
example: 'finished'
default:
$ref: '#/components/responses/Error'
components:
schemas:
HealthResponse:
type: object
properties:
status:
type: string
description: Current status of the service
message:
type: string
description: Explaining the current status
Error:
type: object
properties:
message:
type: string
description: the error message
Image:
type: object
properties:
id:
type: number
format: uint
example: 30
category:
type: string
description: Category of image
example: 'somecategory'
captions:
type: array
description: Captions for this image
items:
type: object
properties:
text:
type: string
description: Caption text
responses:
Error:
description: Unknown Error
content:
application/json:
schema:
$ref: '#/components/schemas/Error'
InvalidArgument:
description: Invalid argument passed, see message field for more information.
content:
application/json:
schema:
$ref: '#/components/schemas/Error'
NotFound:
description: Not found
content: