FLOSS Project Planets

Continuum Analytics Blog: Anaconda Enters a New Chapter

Planet Python - Wed, 2019-10-09 11:54

Today I am excited to announce that I am stepping into the role of CEO at Anaconda. Although I am a founder of the company and have previously served as president, this marks the first…

The post Anaconda Enters a New Chapter appeared first on Anaconda.

Categories: FLOSS Project Planets

PyCon: Financial Aid Launches for PyCon US 2020!

Planet Python - Wed, 2019-10-09 11:28
PyCon US 2020 is opening applications for Financial Aid today, and we’ll be accepting them through January 31, 2020.

To apply, first set up an account on the site, and then you will be able to fill out the application through your dashboard.

The financial aid program aims to bring many folks to PyCon by limiting the maximum grant amount per person; in that way, we can offer support to more people based on individual need. The financial aid program reimburses direct travel costs including transportation, hotel, and childcare, as well as offering discounted or waived registration tickets. For complete details, see our FAQ, and contact pycon-aid@python.org with further questions.

The Python Software Foundation & PyLadies make Financial Aid possible. This year, the Python Software Foundation is contributing $130,000 USD towards financial aid and PyLadies will contribute as much as they can based on the contributions they get throughout 2019.

For more information about go to the Financial Aid page on the conference website.
Call for Proposals is also open!Tutorial proposals are due November 22, 2019, while talk, poster, and education summit proposals are due December 20, 2019. For more information, see details here.

For those proposing talks, tutorials, or posters, selecting the “I require a speaker grant if my proposal is accepted” box on your speaker profile serves as your request, you do not need to fill out the financial aid application. Upon acceptance of the proposal, we’ll contact the speakers who checked that box to gather the appropriate information. Accepted speakers and presenters are prioritized for travel grants. Additionally, we do not expose grant requests to reviewers while evaluating proposals. The Program Committee evaluates proposals on the basis of their presentation, and later the Financial Aid team comes in and looks at how we can help our speakers.
Categories: FLOSS Project Planets

TEN7 Blog's Drupal Posts: Joe Shindelar: A Passion for Open Source

Planet Drupal - Wed, 2019-10-09 11:14
Joe Shindelar of Osio Labs chats with Ivan about making interactive sculptures, snowboarding and his long history with open source software.
Categories: FLOSS Project Planets

TEN7 Blog's Drupal Posts: Joe Shindelar: A Passion for Open Source

Planet Drupal - Wed, 2019-10-09 11:14
Joe Shindelar of Osio Labs chats with Ivan about making interactive sculptures, snowboarding and his long history with open source software.
Categories: FLOSS Project Planets

1xINTERNET blog: Zsofi and Adam join 1xINTERNET

Planet Drupal - Wed, 2019-10-09 10:48
Zsofi and Adam join 1xINTERNET hadda Wed, 10/09/2019 - 16:48

We welcome Zsofi Major and Adam Juran to the 1x team.

 
 

Categories: FLOSS Project Planets

Gary Benson: “Reformat the filesystem to enable support”

GNU Planet! - Wed, 2019-10-09 08:32

Apparently it’s been a while since I ran containers on my office computer—and by a while, I mean, since November 2016—because if your initial install was RHEL or CentOS 7.2 or older then neither Docker nor Podman will work:

# yum -q -y install podman skopeo buildah # podman pull registry.access.redhat.com/ubi7/ubi Error: could not get runtime: kernel does not support overlay fs: overlay: the backing xfs filesystem is formatted without d_type support, which leads to incorrect behavior. Reformat the filesystem with ftype=1 to enable d_type support. Running without d_type is not supported.: driver not supported

So… ugh. I didn’t have any disks it’d work on either:

# for i in $(awk '{ if ($3 == "xfs") print $2 }' /etc/mtab); do xfs_info $i; done | grep ftype naming =version 2 bsize=4096 ascii-ci=0 ftype=0 naming =version 2 bsize=4096 ascii-ci=0 ftype=0 naming =version 2 bsize=4096 ascii-ci=0 ftype=0 naming =version 2 bsize=4096 ascii-ci=0 ftype=0

I didn’t reformat anything though. podman pull wants overlayFS on /var/run/containers/storage, and buildah bud wants it on /var/lib/containers/storage. I made loopback disks for them both:

  1. Find/make space somewhere, then create a directory to put the images in: # mkdir -p /store/containers
  2. Create a big file, whatever size you want, for the disk image. I made mine 20GiB. It took a couple minutes, my disks are slow: # dd if=/dev/zero of=/store/containers/var_lib_containers.img bs=1M count=20K
  3. Find a free loop device and associate the file to it: # losetup -f /dev/loop1 # losetup /dev/loop1 /store/containers/var_lib_containers.img
  4. Format the “device”, then detach it from the file: # mkfs -t xfs -n ftype=1 /dev/loop1 # losetup -d /dev/loop1
  5. Mount the “disk”, and see if it worked: # mount -oloop /store/containers/var_lib_containers.img /var/lib/containers # df -h /var/lib/containers Filesystem Size Used Avail Use% Mounted on /dev/loop1 20G 33M 20G 1% /var/lib/containers
  6. It worked? Make it permanent: # echo "/store/containers/var_lib_containers.img /var/lib/containers xfs defaults,loop 1 2" >> /etc/fstab

Rinse and repeat for the other drive it needed. Then try again:

# podman pull registry.access.redhat.com/ubi7/ubi Trying to pull registry.access.redhat.com/ubi7/ubi...Getting image source signatures Copying blob bff3b73cbcc4 done Copying blob 7b1c937e0f67 done Copying config 6fecccc91c done Writing manifest to image destination Storing signatures 6fecccc91c83e11ae4fede6793e9410841221d4779520c2b9e9fb7f7b3830264

#victorydance

Categories: FLOSS Project Planets

Qt 5.14.0 Beta1 Released

Planet KDE - Wed, 2019-10-09 08:04

I am happy to announce that Qt 5.14.0 Beta1 is released today. We will release updates as Beta N regularly until we are ready for RC. Current estimation for RC is 12th November 2019, see the schedule from 5.14 wiki.

Categories: FLOSS Project Planets

Stack Abuse: Python for NLP: Neural Machine Translation with Seq2Seq in Keras

Planet Python - Wed, 2019-10-09 08:02

This is the 22nd article in my series of articles on Python for NLP. In one of my previous articles on solving sequence problems with Keras, I explained how to solve many to many sequence problems where both inputs and outputs are divided over multiple time-steps. The seq2seq architecture is a type of many-to-many sequence modeling, and is commonly used for a variety of tasks such as Text-Summarization, chatbot development, conversational modeling, and neural machine translation, etc.

In this article, we will see how to create a language translation model which is also a very famous application of neural machine translation. We will use seq2seq architecture to create our language translation model using Python's Keras library.

It is assumed that you have good knowledge of recurrent neural networks, particularly LSTM. The code in this article is written in Python with the Keras library. Therefore, it is assumed that you have good knowledge of the Python language, as well as the Keras library. So, without any further ado, let's begin.

Libraries and Configuration Settings

As a first step, we will import the required libraries and will configure values for different parameters that we will be using in the code. Let's first import the required libraries:

import os, sys from keras.models import Model from keras.layers import Input, LSTM, GRU, Dense, Embedding from keras.preprocessing.text import Tokenizer from keras.preprocessing.sequence import pad_sequences from keras.utils import to_categorical import numpy as np import matplotlib.pyplot as plt

Execute the following script to set values for different parameters:

BATCH_SIZE = 64 EPOCHS = 20 LSTM_NODES =256 NUM_SENTENCES = 20000 MAX_SENTENCE_LENGTH = 50 MAX_NUM_WORDS = 20000 EMBEDDING_SIZE = 100 The Dataset

The language translation model that we are going to develop in this article will translate English sentences into their French language counterparts. To develop such a model, we need a dataset that contains English sentences and their French translations. Luckily, such dataset is freely available at this link. Download the file fra-eng.zip and extract it. You will then see the fra.txt file. On each line, the text file contains an English sentence and its French translation, separated by a tab. The first 20 lines of the fra.txt file look like this:

Go. Va ! Hi. Salut ! Hi. Salut. Run! Cours ! Run! Courez ! Who? Qui ? Wow! Ça alors ! Fire! Au feu ! Help! À l'aide ! Jump. Saute. Stop! Ça suffit ! Stop! Stop ! Stop! Arrête-toi ! Wait! Attends ! Wait! Attendez ! Go on. Poursuis. Go on. Continuez. Go on. Poursuivez. Hello! Bonjour ! Hello! Salut !

The model contains more than 170,000 records, but we will only use the first 20,000 records to train our model. You can use more records if you want.

Data Preprocessing

Neural machine translation models are often based on the seq2seq architecture. The seq2seq architecture is an encoder-decoder architecture which consists of two LSTM networks: the encoder LSTM and the decoder LSTM. The input to the encoder LSTM is the sentence in the original language; the input to the decoder LSTM is the sentence in the translated language with a start-of-sentence token. The output is the actual target sentence with an end-of-sentence token.

In our dataset, we do not need to process the input, however, we need to generate two copies of the translated sentence: one with the start-of-sentence token and the other with the end-of-sentence token. Here is the script which does that:

input_sentences = [] output_sentences = [] output_sentences_inputs = [] count = 0 for line in open(r'/content/drive/My Drive/datasets/fra.txt', encoding="utf-8"): count += 1 if count > NUM_SENTENCES: break if '\t' not in line: continue input_sentence, output = line.rstrip().split('\t') output_sentence = output + ' <eos>' output_sentence_input = '<sos> ' + output input_sentences.append(input_sentence) output_sentences.append(output_sentence) output_sentences_inputs.append(output_sentence_input) print("num samples input:", len(input_sentences)) print("num samples output:", len(output_sentences)) print("num samples output input:", len(output_sentences_inputs))

Note: You will likely need to change the file path of the fra.txt file on your computer for this to work.

In the script above we create three lists input_sentences[], output_sentences[], and output_sentences_inputs[]. Next, in the for loop the fra.txt file is read line by line. Each line is split into two substrings at the position where the tab occurs. The left substring (the English sentence) is inserted into the input_sentences[] list. The substring to the right of the tab is the corresponding translated French sentence. The <eos> token, which marks the end-of-sentence is prefixed to the translated sentence, and the resultant sentence is appended to the output_sentences[] list. Similarly, the <sos> token, which stands for "start of sentence", is concatenated at the start of the translated sentence and the result is added to the output_sentences_inputs[] list. The loop terminates if the number of sentences added to the lists is greater than the NUM_SENTENCES variable, i.e. 20,000.

Finally the number of samples in the three lists are displayed in the output:

num samples input: 20000 num samples output: 20000 num samples output input: 20000

Let's now randomly print a sentence from the input_sentences[], output_sentences[], and output_sentences_inputs[] lists:

print(input_sentences[172]) print(output_sentences[172]) print(output_sentences_inputs[172])

Here is the output:

I'm ill. Je suis malade. <eos> <sos> Je suis malade.

You can see the original sentence, i.e. I'm ill; its corresponding translation in the output, i.e Je suis malade. <eos>. Notice, here we have <eos> token at the end of the sentence. Similarly, for the input to the decoder, we have <sos> Je suis malade.

Tokenization and Padding

The next step is tokenizing the original and translated sentences and applying padding to the sentences that are longer or shorter than a certain length, which in case of inputs will be the length of the longest input sentence. And for the output this will be the length of the longest sentence in the output.

For tokenization, the Tokenizer class from the keras.preprocessing.text library can be used. The tokenizer class performs two tasks:

  • It divides a sentence into the corresponding list of word
  • Then it converts the words to integers

This is extremely important since deep learning and machine learning algorithms work with numbers. The following script is used to tokenize the input sentences:

input_tokenizer = Tokenizer(num_words=MAX_NUM_WORDS) input_tokenizer.fit_on_texts(input_sentences) input_integer_seq = input_tokenizer.texts_to_sequences(input_sentences) word2idx_inputs = input_tokenizer.word_index print('Total unique words in the input: %s' % len(word2idx_inputs)) max_input_len = max(len(sen) for sen in input_integer_seq) print("Length of longest sentence in input: %g" % max_input_len)

In addition to tokenization and integer conversion, the word_index attribute of the Tokenizer class returns a word-to-index dictionary where words are the keys and the corresponding integers are the values. The script above also prints the number of unique words in the dictionary and the length of the longest sentence in the input:

Total unique words in the input: 3523 Length of longest sentence in input: 6

Similarly, the output sentences can also be tokenized in the same way as shown below:

output_tokenizer = Tokenizer(num_words=MAX_NUM_WORDS, filters='') output_tokenizer.fit_on_texts(output_sentences + output_sentences_inputs) output_integer_seq = output_tokenizer.texts_to_sequences(output_sentences) output_input_integer_seq = output_tokenizer.texts_to_sequences(output_sentences_inputs) word2idx_outputs = output_tokenizer.word_index print('Total unique words in the output: %s' % len(word2idx_outputs)) num_words_output = len(word2idx_outputs) + 1 max_out_len = max(len(sen) for sen in output_integer_seq) print("Length of longest sentence in the output: %g" % max_out_len)

Here is the output:

Total unique words in the output: 9561 Length of longest sentence in the output: 13

From the comparison of the number of unique words in the input and the output, it can be concluded that English sentences are normally shorter and contain a smaller number of words on average, compared to the translated French sentences.

Next, we need to pad the input. The reason behind padding the input and the output is that text sentences can be of varying length, however LSTM (the algorithm that we are going to train our model) expects input instances with the same length. Therefore, we need to convert our sentences into fixed-length vectors. One way to do this is via padding.

In padding, a certain length is defined for a sentence. In our case the length of the longest sentence in the inputs and outputs will be used for padding the input and output sentences, respectively. The longest sentence in the input contains 6 words. For the sentences that contain less than 6 words, zeros will be added in the empty indexes. The following script applies padding to the input sentences.

encoder_input_sequences = pad_sequences(input_integer_seq, maxlen=max_input_len) print("encoder_input_sequences.shape:", encoder_input_sequences.shape) print("encoder_input_sequences[172]:", encoder_input_sequences[172])

The script above prints the shape of the padded input sentences. The padded integer sequence for the sentence at index 172 is also printed. Here is the output:

encoder_input_sequences.shape: (20000, 6) encoder_input_sequences[172]: [ 0 0 0 0 6 539]

Since there are 20,000 sentences in the input and each input sentence is of length 6, the shape of the input is now (20000, 6). If you look at the integer sequence for the sentence at index 172 of the input sentence, you can see that there are three zeros, followed by the values 6 and 539. You may recall that the original sentence at index 172 is I'm ill. The tokenizer divided the sentence into two words I'm and ill, converted them to integers, and then applied pre-padding by adding three zeros at the start of the corresponding integer sequence for the sentence at index 172 of the input list.

To verify that the integer values for i'm and ill are 6 and 539 respectively, you can pass the words to the word2index_inputs dictionary, as shown below:

print(word2idx_inputs["i'm"]) print(word2idx_inputs["ill"])

Output:

6 539

In the same way, the decoder outputs and the decoder inputs are padded as follows:

decoder_input_sequences = pad_sequences(output_input_integer_seq, maxlen=max_out_len, padding='post') print("decoder_input_sequences.shape:", decoder_input_sequences.shape) print("decoder_input_sequences[172]:", decoder_input_sequences[172])

Output:

decoder_input_sequences.shape: (20000, 13) decoder_input_sequences[172]: [ 2 3 6 188 0 0 0 0 0 0 0 0 0]

The sentence at index 172 of the decoder input is <sos> je suis malade.. If you print the corresponding integers from the word2idx_outputs dictionary, you should see 2, 3, 6, and 188 printed on the console, as shown here:

print(word2idx_outputs["<sos>"]) print(word2idx_outputs["je"]) print(word2idx_outputs["suis"]) print(word2idx_outputs["malade."])

Output:

2 3 6 188

It is further important to mention that in the case of the decoder, the post-padding is applied, which means that zeros are appended at the end of the sentence. In the encoder, zeros were padded at the beginning. The reason behind this approach is that encoder output is based on the words occurring at the end of the sentence, therefore the original words were kept at the end of the sentence and zeros were padded at the beginning. On the other hand, in the case of the decoder, the processing starts from the beginning of a sentence, and therefore post-padding is performed on the decoder inputs and outputs.

Word Embeddings

I have written a detailed article on word embeddings, which you may want to check in order to understand word embeddings in Keras. This section only provides the implementation of word embeddings for neural machine translation. However the basic concept remains the same.

Since we are using deep learning models, and deep learning models work with numbers, therefore we need to convert our words into their corresponding numeric vector representations. But we already converted our words into integers. So what's the difference between integer representation and word embeddings?

There are two main differences between single integer representation and word embeddings. With integer reprensentation, a word is represented only with a single integer. With vector representation a word is represented by a vector of 50, 100, 200, or whatever dimensions you like. Hence, word embeddings capture a lot more information about words. Secondly, the single-integer representation doesn't capture the relationships between different words. On the contrary, word embeddings retain relationships between the words. You can either use custom word embeddings or you can use pretrained word embeddings.

In this article, for English sentences, i.e. the inputs, we will use the GloVe word embeddings. For the translated French sentences in the output, we will use custom word embeddings.

Let's create word embeddings for the inputs first. To do so, we need to load the GloVe word vectors into memory. We will then create a dictionary where words are the keys and the corresponding vectors are values, as shown below:

from numpy import array from numpy import asarray from numpy import zeros embeddings_dictionary = dict() glove_file = open(r'/content/drive/My Drive/datasets/glove.6B.100d.txt', encoding="utf8") for line in glove_file: records = line.split() word = records[0] vector_dimensions = asarray(records[1:], dtype='float32') embeddings_dictionary[word] = vector_dimensions glove_file.close()

Recall that we have 3523 unique words in the input. We will create a matrix where the row number will represent the integer value for the word and the columns will correspond to the dimensions of the word. This matrix will contain the word embeddings for the words in our input sentences.

num_words = min(MAX_NUM_WORDS, len(word2idx_inputs) + 1) embedding_matrix = zeros((num_words, EMBEDDING_SIZE)) for word, index in word2idx_inputs.items(): embedding_vector = embeddings_dictionary.get(word) if embedding_vector is not None: embedding_matrix[index] = embedding_vector

Let's first print the word embeddings for the word ill using the GloVe word embedding dictionary.

print(embeddings_dictionary["ill"])

Output:

[ 0.12648 0.1366 0.22192 -0.025204 -0.7197 0.66147 0.48509 0.057223 0.13829 -0.26375 -0.23647 0.74349 0.46737 -0.462 0.20031 -0.26302 0.093948 -0.61756 -0.28213 0.1353 0.28213 0.21813 0.16418 0.22547 -0.98945 0.29624 -0.62476 -0.29535 0.21534 0.92274 0.38388 0.55744 -0.14628 -0.15674 -0.51941 0.25629 -0.0079678 0.12998 -0.029192 0.20868 -0.55127 0.075353 0.44746 -0.71046 0.75562 0.010378 0.095229 0.16673 0.22073 -0.46562 -0.10199 -0.80386 0.45162 0.45183 0.19869 -1.6571 0.7584 -0.40298 0.82426 -0.386 0.0039546 0.61318 0.02701 -0.3308 -0.095652 -0.082164 0.7858 0.13394 -0.32715 -0.31371 -0.20247 -0.73001 -0.49343 0.56445 0.61038 0.36777 -0.070182 0.44859 -0.61774 -0.18849 0.65592 0.44797 -0.10469 0.62512 -1.9474 -0.60622 0.073874 0.50013 -1.1278 -0.42066 -0.37322 -0.50538 0.59171 0.46534 -0.42482 0.83265 0.081548 -0.44147 -0.084311 -1.2304 ]

In the previous section, we saw that the integer representation for the word ill is 539. Let's now check the 539th index of the word embedding matrix.

print(embedding_matrix[539])

Output:

[ 0.12648 0.1366 0.22192 -0.025204 -0.7197 0.66147 0.48509 0.057223 0.13829 -0.26375 -0.23647 0.74349 0.46737 -0.462 0.20031 -0.26302 0.093948 -0.61756 -0.28213 0.1353 0.28213 0.21813 0.16418 0.22547 -0.98945 0.29624 -0.62476 -0.29535 0.21534 0.92274 0.38388 0.55744 -0.14628 -0.15674 -0.51941 0.25629 -0.0079678 0.12998 -0.029192 0.20868 -0.55127 0.075353 0.44746 -0.71046 0.75562 0.010378 0.095229 0.16673 0.22073 -0.46562 -0.10199 -0.80386 0.45162 0.45183 0.19869 -1.6571 0.7584 -0.40298 0.82426 -0.386 0.0039546 0.61318 0.02701 -0.3308 -0.095652 -0.082164 0.7858 0.13394 -0.32715 -0.31371 -0.20247 -0.73001 -0.49343 0.56445 0.61038 0.36777 -0.070182 0.44859 -0.61774 -0.18849 0.65592 0.44797 -0.10469 0.62512 -1.9474 -0.60622 0.073874 0.50013 -1.1278 -0.42066 -0.37322 -0.50538 0.59171 0.46534 -0.42482 0.83265 0.081548 -0.44147 -0.084311 -1.2304 ]

You can see that the values for the 539th row in the embedding matrix are similar to the vector representation of the word ill in the GloVe dictionary, which confirms that rows in the embedding matrix represent corresponding word embeddings from the GloVe word embedding dictionary. This word embedding matrix will be used to create the embedding layer for our LSTM model.

The following script creates the embedding layer for the input:

embedding_layer = Embedding(num_words, EMBEDDING_SIZE, weights=[embedding_matrix], input_length=max_input_len) Creating the Model

Now is the time to develop our model. The first thing we need to do is to define our outputs, as we know that the output will be a sequence of words. Recall that the total number of unique words in the output are 9562. Therefore, each word in the output can be any of the 9562 words. The length of an output sentence is 13. And for each input sentence, we need a corresponding output sentence. Therefore, the final shape of the output will be:

(number of inputs, length of the output sentence, the number of words in the output)

The following script creates the empty output array:

decoder_targets_one_hot = np.zeros(( len(input_sentences), max_out_len, num_words_output ), dtype='float32' )

The following script prints the shape of the decoder:

decoder_targets_one_hot.shape

Output:

(20000, 13, 9562)

To make predictions, the final layer of the model will be a dense layer, therefore we need the outputs in the form of one-hot encoded vectors, since we will be using softmax activation function at the dense layer. To create such one-hot encoded output, the next step is to assign 1 to the column number that corresponds to the integer representation of the word. For instance, the integer representation for <sos> je suis malade is [ 2 3 6 188 0 0 0 0 0 0 0 ]. In the decoder_targets_one_hot output array, in the second column of the first row, 1 will be inserted. Similarly, at the third index of the second row, another 1 will be inserted, and so on.

Look at the following script:

for i, d in enumerate(decoder_output_sequences): for t, word in enumerate(d): decoder_targets_one_hot[i, t, word] = 1

Next, we need to create the encoder and decoders. The input to the encoder will be the sentence in English and the output will be the hidden state and cell state of the LSTM.

The following script defines the encoder:

encoder_inputs_placeholder = Input(shape=(max_input_len,)) x = embedding_layer(encoder_inputs_placeholder) encoder = LSTM(LSTM_NODES, return_state=True) encoder_outputs, h, c = encoder(x) encoder_states = [h, c]

The next step is to define the decoder. The decoder will have two inputs: the hidden state and cell state from the encoder and the input sentence, which actually will be the output sentence with an <sos> token appended at the beginning.

The following script creates the decoder LSTM:

decoder_inputs_placeholder = Input(shape=(max_out_len,)) decoder_embedding = Embedding(num_words_output, LSTM_NODES) decoder_inputs_x = decoder_embedding(decoder_inputs_placeholder) decoder_lstm = LSTM(LSTM_NODES, return_sequences=True, return_state=True) decoder_outputs, _, _ = decoder_lstm(decoder_inputs_x, initial_state=encoder_states)

Finally, the output from the decoder LSTM is passed through a dense layer to predict decoder outputs, as shown here:

decoder_dense = Dense(num_words_output, activation='softmax') decoder_outputs = decoder_dense(decoder_outputs)

The next step is to compile the model:

model = Model([encoder_inputs_placeholder, decoder_inputs_placeholder], decoder_outputs) model.compile( optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'] )

Let's plot our model to see how it looks:

from keras.utils import plot_model plot_model(model, to_file='model_plot4a.png', show_shapes=True, show_layer_names=True)

Output:

From the output, you can see that we have two types of input. input_1 is the input placeholder for the encoder, which is embedded and passed through lstm_1 layer, which basically is the encoder LSTM. There are three outputs from the lstm_1 layer: the output, the hidden layer and the cell state. However, only the cell state and the hidden state are passed to the decoder.

Here the lstm_2 layer is the decoder LSTM. The input_2 contains the output sentences with <sos> token appended at the start. The input_2 is also passed through an embedding layer and is used as input to the decoder LSTM, lstm_2. Finally, the output from the decoder LSTM is passed through the dense layer to make predictions.

The next step is to train the model using the fit() method:

r = model.fit( [encoder_input_sequences, decoder_input_sequences], decoder_targets_one_hot, batch_size=BATCH_SIZE, epochs=EPOCHS, validation_split=0.1, )

The model is trained on 18,000 records and tested on the remaining 2,000 records. The model is trained for 20 epochs, you can modify the number of epochs to see if you can get better results. After 20 epochs, I got training accuracy of 90.99% and the validation accuracy of 79.11% which shows that the model is overfitting. To reduce overfitting, you can add dropout or more records. We are only training on 20,0000 records, so you can add more records to reduce overfitting.

Modifying the Model for Predictions

While training, we know the actual inputs to the decoder for all the output words in the sequence. An example of what happens during training is as follows. Suppose we have a sentence i'm ill. The sentence is translated as follows:

// Inputs on the left of Encoder/Decoder, outputs on the right. Step 1: I'm ill -> Encoder -> enc(h1,c1) enc(h1,c1) + <sos> -> Decoder -> je + dec(h1,c1) step 2: enc(h1,c1) + je -> Decoder -> suis + dec(h2,c2) step 3: enc(h2,c2) + suis -> Decoder -> malade. + dec(h3,c3) step 3: enc(h3,c3) + malade. -> Decoder -> <eos> + dec(h4,c4)

You can see that the input to the decoder and output from the decoder is known and the model is trained on the basis of these inputs and outputs.

However, during predictions the next word will be predicted on the basis of the previous word, which in turn is also predicted in the previous time-step. Now you will understand the purpose of <sos> and <eos> tokens. While making actual predictions, the full output sequence is not available, in fact that is what we have to predict. During prediction the only word available to us is <sos> since all the output sentences start with <sos>.

An example of what happens during prediction is as follows. We will again translate the sentence i'm ill:

// Inputs on the left of Encoder/Decoder, outputs on the right. Step 1: I'm ill -> Encoder -> enc(h1,c1) enc(h1,c1) + <sos> -> Decoder -> y1(je) + dec(h1,c1) step 2: enc(h1,c1) + y1 -> Decoder -> y2(suis) + dec(h2,c2) step 3: enc(h2,c2) + y2 -> Decoder -> y3(malade.) + dec(h3,c3) step 3: enc(h3,c3) + y3 -> Decoder -> y4(<eos>) + dec(h4,c4)

You can see that the functionality of the encoder remains the same. The sentence in the original language is passed through the encoder and the hidden state, and the cell state is the output from the encoder.

In step 1, the hidden state and cell state of the encoder, and the <sos>, is used as input to the decoder. The decoder predicts a word y1 which may or may not be true. However, as per our model, the probability of correct prediction is 0.7911. At step 2, the decoder hidden state and cell state from step 1, along with y1, is used as input to the decoder, which predicts y2. The process continues until the <eos> token is encountered. All the predicted outputs from the decoder are then concatenated to form the final output sentence. Let's modify our model to implement this logic.

The encoder model remains the same:

encoder_model = Model(encoder_inputs_placeholder, encoder_states)

Since now at each step we need the decoder hidden and cell states, we will modify our model to accept the hidden and cell states as shown below:

decoder_state_input_h = Input(shape=(LSTM_NODES,)) decoder_state_input_c = Input(shape=(LSTM_NODES,)) decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]

Now at each time step, there will be only single word in the decoder input, we need to modify the decoder embedding layer as follows:

decoder_inputs_single = Input(shape=(1,)) decoder_inputs_single_x = decoder_embedding(decoder_inputs_single)

Next, we need to create the placeholder for decoder outputs:

decoder_outputs, h, c = decoder_lstm(decoder_inputs_single_x, initial_state=decoder_states_inputs)

To make predictions, the decoder output is passed through the dense layer:

decoder_states = [h, c] decoder_outputs = decoder_dense(decoder_outputs)

The final step is to define the updated decoder model, as shown here:

decoder_model = Model( [decoder_inputs_single] + decoder_states_inputs, [decoder_outputs] + decoder_states )

Let's now plot our modified decoder LSTM that makes predictions:

from keras.utils import plot_model plot_model(decoder_model, to_file='model_plot_dec.png', show_shapes=True, show_layer_names=True)

Output:

In the image above lstm_2 is the modified decoder LSTM. You can see that it accepts the sentence with with one word as shown in input_5, and the hidden and cell states from the previous output (input_3 and input_4). You can see that the shape of the of the input sentence is now (none,1) since there will be only one word in the decoder input. On the contrary, during training the shape of the input sentence was (None,6) since the input contained a complete sentence with a maximum length of 6.

Making Predictions

In this step, you will see how to make predictions using English sentences as inputs.

In the tokenization steps, we converted words to integers. The outputs from the decoder will also be integers. However, we want our output to be a sequence of words in the French language. To do so, we need to convert the integers back to words. We will create new dictionaries for both inputs and outputs where the keys will be the integers and the corresponding values will be the words.

idx2word_input = {v:k for k, v in word2idx_inputs.items()} idx2word_target = {v:k for k, v in word2idx_outputs.items()}

Next we will create a method, i.e. translate_sentence(). The method will accept an input-padded sequence English sentence (in the integer form) and will return the translated French sentence. Look at the translate_sentence() method:

def translate_sentence(input_seq): states_value = encoder_model.predict(input_seq) target_seq = np.zeros((1, 1)) target_seq[0, 0] = word2idx_outputs['<sos>'] eos = word2idx_outputs['<eos>'] output_sentence = [] for _ in range(max_out_len): output_tokens, h, c = decoder_model.predict([target_seq] + states_value) idx = np.argmax(output_tokens[0, 0, :]) if eos == idx: break word = '' if idx > 0: word = idx2word_target[idx] output_sentence.append(word) target_seq[0, 0] = idx states_value = [h, c] return ' '.join(output_sentence)

In the script above we pass the input sequence to the encoder_model, which predicts the hidden state and the cell state, which are stored in the states_value variable.

Next, we define a variable target_seq, which is a 1 x 1 matrix of all zeros. The target_seq variable contains the first word to the decoder model, which is <sos>.

After that, the eos variable is initialized, which stores the integer value for the <eos> token. In the next line, the output_sentence list is defined, which will contain the predicted translation.

Next, we execute a for loop. The number of execution cycles for the for loop is equal to the length of the longest sentence in the output. Inside the loop, in the first iteration, the decoder_model predicts the output and the hidden and cell states, using the hidden and cell state of the encoder, and the input token, i.e. <sos>. The index of the predicted word is stored in the idx variable. If the value of the predicted index is equal to the <eos> token, the loop terminates. Else if the predicted index is greater than zero, the corresponding word is retrieved from the idx2word dictionary and is stored in the word variable, which is then appended to the output_sentence list. The states_value variable is updated with the new hidden and cell state of the decoder and the index of the predicted word is stored in the target_seq variable. In the next loop cycle, the updated hidden and cell states, along with the index of the previously predicted word, are used to make new predictions. The loop continues until the maximum output sequence length is achieved or the <eos> token is encountered.

Finally, the words in the output_sentence list are concatenated using a space and the resulting string is returned to the calling function.

Testing the Model

To test the code, we will randomly choose a sentence from the input_sentences list, retrieve the corresponding padded sequence for the sentence, and will pass it to the translate_sentence() method. The method will return the translated sentence as shown below.

Here is the script to test the functionality of the model:

i = np.random.choice(len(input_sentences)) input_seq = encoder_input_sequences[i:i+1] translation = translate_sentence(input_seq) print('-') print('Input:', input_sentences[i]) print('Response:', translation)

Here is the output:

- Input: You're not fired. Response: vous n'êtes pas viré.

Brilliant, isn't it? Our model has successfully translated the sentence You're not fired into French. You can verify that on Google Translate too. Let's try another one.

Note: Since the sentences are selected randomly, you will most probably have a different English sentence translated to French.

Execute the above script once more to see some other English sentence translated into the French language. I got the following results:

- Input: I'm not a lawyer. Response: je ne suis pas avocat.

The model has successfully translated another English sentence into French.

Conclusion and Perspective

Neural machine translation is a fairly advance application of natural language processing and involves a very complex architecture.

This article explains how to perform neural machine translation via the seq2seq architecture, which is in turn based on the encoder-decoder model. The encoder is an LSTM that encodes input sentences while the decoder decodes the inputs and generate corresponding outputs. The technique explained in this article can be used to create any machine translation model, as long as the dataset is in a format similar to the one used in this article. You can also use the seq2seq architecture to develop chatbots.

The seq2seq architecture is pretty successful when it comes to mapping input relations to output. However, there is one limitation to a seq2seq architecture. The vanilla seq2seq architecture explained in this article is not capable of capturing context. It simply learns to map standalone inputs to a standalone outputs. Real-time conversations are based on context and the dialogues between two or more users are based on whatever was said in the past. Therefore, a simple encoder-decoder-based seq2seq model should not be used if you want to create a fairly advanced chatbot.

Categories: FLOSS Project Planets

qed42.com: QED42 at DrupalCon Amsterdam 2019

Planet Drupal - Wed, 2019-10-09 07:28
QED42 at DrupalCon Amsterdam 2019 Body

Autumn is here! One of the things we’re looking forward to this month is DrupalCon Amsterdam from the 28th - 31st Oct 2019. Autumn is the most magical time to visit Amsterdam! An opportunity to mingle with the Drupal community in Amsterdam, what else could a Drupaler wish for? 

QED42’s support for the Drupal Community around the world is unwavering. Would it be sponsoring DrupalCamps, DrupalCons, organizing Drupal meetups, or contributing to Drupal.org. 

And we hope you will join us at DrupalCon Amsterdam, this month! 

| Meet us!

QED42 is proud to be a Silver sponsor this year. If you are a Drupaler you wouldn’t miss visiting QED42’s booth. We are known for our exuberant booth vibes, designs, activities, and goodies. It is our tradition of unveiling a new Drupal t-shirt design at every DrupalCon. Check out the story behind our Hindi Drupal t-shirt series here - https://www.qed42.com/blog/story-behind-our-hindi-drupal-t-shirt. Our Drupal t-shirt design for #DCA is inspired by the vibrant and spirited culture of Amsterdam. Accompanied by a couple more goodies that you will absolutely adore! 

This year, QED42 will be showcasing a wide range of our capabilities including:

  • Decoupled Drupal
  • Gatsby e-commerce demos
  • Use cases around Drupal, JavaScript, and Design!

Come say Hi to our team at DrupalCon Amsterdam Booth No - 16, we would love to discuss ideas around how Drupal meets the ever-changing needs of the digital world.

| Sessions

Our Drupal experts are presenting at DrupalCon Amsterdam 2019. You can find us at these sessions:

Houdini - New Era of CSS 
  • Date: 28th Oct 2019
  • Time: 15:00 - 15:40
  • Location: G 103
  • Track: Drupal + Frontend
  • Level: Intermediate 
  • Speakers: Vidit Anjaria and Saket Kumar

Here’s a sneak peek of our session - https://www.qed42.com/blog/building-powerful-custom-properties-CSS-houdini

Designing the future of the Drupal Admin UI | Keynotes

There are some exciting keynotes lined up for you at DrupalCon Amsterdam! 

- Tuesday, October 29 at 9:00 AM | Talk: Driesnote

Speaker: Dries Buytaert - Founder

- Wednesday, October 30 at 1:30 PM | Talk: If I can do it, so can you

Speaker: Sue Black - Professor of Computer Science and Technology Evangelist, UK Government Strategic Advisor, Women’s Equality Party candidate for London Mayor 2020, Professional Speaker, Author

- Tuesday, October 29 at 1:30 PM | Talk: Humanity in tech 

Speaker:  Boris Veldhuijzen Van Zanten - CEO and Co-founder of thenextweb.com

- Monday, October 28 at 1:30 PM | Talk: Drupal core initiative leads keynote

 

| Conclusion

Attending DrupalCon Amsterdam? Don’t forget to1  flash your badge and spread the word - https://events.drupal.org/amsterdam2019/spread-word. Follow @DrupalConEur for recent updates around the event.

Drop by our Booth 16 and meet the QED42 team! We would love to share our exciting projects and learn more about your experiences and challenges with Drupal.

Ruchika.Mohite Wed, 10/09/2019 - 16:58
Categories: FLOSS Project Planets

Enrico Zini: Fixed XSS issue on debtags.debian.org

Planet Debian - Wed, 2019-10-09 04:51

Thanks to Moritz Naumann who found the issues and wrote a very useful report, I fixed a number of Cross Site Scripting vulnerabilities on https://debtags.debian.org.

The core of the issue was code like this in a Django view:

def pkginfo_view(request, name): pkg = bmodels.Package.by_name(name) if pkg is None: return http.HttpResponseNotFound("Package %s was not found" % name) # …

The default content-type of HttpResponseNotFound is text/html, and the string passed is the raw HTML with clearly no escaping, so this allows injection of arbitrary HTML/<script> code in the name variable.

I was so used to Django doing proper auto-escaping that I missed this place in which it can't do that.

There are various things that can be improved in that code.

One could introduce escaping (and while one's at it, migrate the old % to format):

from django.utils.html import escape def pkginfo_view(request, name): pkg = bmodels.Package.by_name(name) if pkg is None: return http.HttpResponseNotFound("Package {} was not found".format(escape(name))) # …

Alternatively, set content_type to text/plain:

def pkginfo_view(request, name): pkg = bmodels.Package.by_name(name) if pkg is None: return http.HttpResponseNotFound("Package {} was not found".format(name), content_type="text/plain") # …

Even better, raise Http404:

from django.utils.html import escape def pkginfo_view(request, name): pkg = bmodels.Package.by_name(name) if pkg is None: raise Http404(f"Package {name} was not found") # …

Even better, use standard shortcuts and model functions if possible:

from django.shortcuts import get_object_or_404 def pkginfo_view(request, name): pkg = get_object_or_404(bmodels.Package, name=name) # …

And finally, though not security related, it's about time to switch to class-based views:

class PkgInfo(TemplateView): template_name = "reports/package.html" def get_context_data(self, **kw): ctx = super().get_context_data(**kw) ctx["pkg"] = get_object_or_404(bmodels.Package, name=self.kwargs["name"]) # … return ctx

I proceeded with a review of the other Django sites I maintain in case I reproduced this mistake also there.

Categories: FLOSS Project Planets

OSTraining: How to Manually Update Drupal Core through CPanel

Planet Drupal - Wed, 2019-10-09 00:00

Drupal is a wonderful Content Management System with tons of features that solve many problems for editors and content managers. A developer can take different approaches to achieve the same result, and the Drupal update process is no exception.

It is possible to update Drupal either with Composer, Drush, or manually within the CPanel interface. The first two methods imply that you have at least mid-level experience with the command line and secure shell access to your public host. The third method is more visual and is suitable for developers beginning with Drupal.

Keep reading, if you want to learn how to update your Drupal site with this method.

Categories: FLOSS Project Planets

IslandT: Find the position of the only odd number within a list with Python

Planet Python - Tue, 2019-10-08 21:50

In this example, we will write a python function that will return the position of the only odd number within the number list. If there is no odd number within that list then the function will return -1 instead.

def odd_one(arr): for number in arr: if number % 2 != 0: return arr.index(number) return -1

The method above will loop through the number list to determine the position of the only odd number within that number list. If no odd number has been found then the method above will return -1!

That is a simple solution, if you have better idea then leave your comment below.

Announcement:

After a while of rest, I have begun to write again, I am planning to turn this website into a full tech website which will deal with programming, electronic gadget topic, and gaming. Feel free to subscribe to any rss topic which you feel you are interested in to read the latest article for that particular topic.

Categories: FLOSS Project Planets

Improving Plasma’s Rendering (Part 1/2)

Planet KDE - Tue, 2019-10-08 21:05
Background:

Many parts of Plasma are powered by QtQuick, an easy to use API to render shapes/text/buttons etc.
QtQuick contains a rendering engine powered by OpenGL making full use of the graphics card keeping our drawing super fast, super lightweight and in general amazing…when things work.

Handling Nvidia context loss events

When the proprietary nvidia driver comes out of suspend, or from a different terminal the pictures that it had stored get corrupted. This leads to returning to an image like this. Worst as we show stray video memory it even leak data on the lock screen.

When this occurs it might look something like this:

With various text or icons getting distorted seemingly at random.

Fortunately NVidia do have an API to emit a signal when this happens, allowing us to at least do something about it.

This handling to be done inside every single OpenGL powered application, which with the increasing popularity of QtQuick is a lot of places.

The new state

After over a year of gradual changes all of Plasma now handles this event and recovers gracefully. Some of these changes are in Qt5.13, but some more have only just landed in Qt 5.14 literally this evening.

Some notes for other devs A QtQuick application

Due to some complications, handling this feature has to be opt-in. Triggering the notification leads to some behavioural changes. These behavioural changes if you’re not prepared for will freeze or crash your application.

To opt-in one must explicitly set the surfaceFormat used for the backing store to have:
QSurfaceFormat::setOption(QSurfaceFormat::ResetNotification)

Within KDE code this can be done automagically with the helper function
KQuickAddons::QtQuickSettings::init() early in your main method.
This sets up the default surface format as well as several other QtQuick configurable settings.

Everything else is now all handled within Qt, if we detect an event the scene graph is invalidated cleaned and recreated.

However, if you create custom QQuickItem’s using the QSG classes, you must be really really sure to handle resources correctly.

As a general rule all GL calls should be tied to the lifespan of QSGNodes and not of the QQuickItem itself. Otherwise items should connect to the window’s sceneGraphInvalidated signals.

Using QtOpenGL outside QtQuick

To detect a context loss, check for myQOpenGLContext->isValid() if a makeCurrent fails.

In the event of a context loss one must discard all known textures, shaders, programs, vertex buffers, everything known to GL and recreate the context.

One especially quirky aspect of this flag is that in the event of a context loss glGetError will not clear itself, but continue to report a context loss error. Code trying to reset all gl errors will get stuck in a loop. This was the biggest battle in the seemingly never-ending series of patches upstream and why it has to be opt-in.

In the case of a shared context a reset notification is sent to all contexts who should recreate independently.

You can read more about the underlying GL spec.

Categories: FLOSS Project Planets

Chris Lamb: Tour d'Orwell: Southwold

Planet Debian - Tue, 2019-10-08 20:29

I recently read that that during 1929 George Orwell returned to his family home in the Suffolk town of Southwold but when I further learned that he had acquired a motorbike during this time to explore the surrounding villages I could not resist visiting myself on such a transport mode.

Orwell would end up writing his first novel here ("Burmese Days") followed by his first passable one ("A Clergyman's Daughter") but unfortunately the local bookshop was only to have the former in stock. He moved back to London in 1934 to work in a bookshop in Hampstead, now a «Le Pain Quotidien».

If you are thinking of visiting, Southwold has some lovely quaint beach huts, a brewery and the officially signposted A1120 "Scenic Route" I took on the way out was neither as picturesque nor as fun to ride as the A1066

Categories: FLOSS Project Planets

Ned Batchelder: Pytest-cov support for who-tests-what

Planet Python - Tue, 2019-10-08 19:14

I’ve added a new option to the pytest-cov coverage plugin for pytest: --cov-context=test will set the dynamic context based on pytest test phases. Each test has a setup, run, and teardown phase. This gives you the best test information in the coverage database:

  • The full test id is used in the context. You have the test file name, and the test class name if you are using class-based tests.
  • Parameterized tests start a new context for each new set of parameter values.
  • Execution is a little faster because coverage.py doesn’t have to poll for test starts.

For example, here is a repo of simple pytest tests in a number of forms: pytest-gallery. I can run the tests with test contexts being recorded:

$ pytest -v --cov=. --cov-context=test
======================== test session starts =========================
platform darwin -- Python 3.6.9, pytest-5.2.1, py-1.8.0, pluggy-0.12.0 -- /usr/local/virtualenvs/pytest-cov/bin/python3.6
cachedir: .pytest_cache
rootdir: /Users/ned/lab/pytest-gallery
plugins: cov-2.8.1
collected 25 items

test_fixtures.py::test_fixture PASSED                          [  4%]
test_fixtures.py::test_two_fixtures PASSED                     [  8%]
test_fixtures.py::test_with_expensive_data PASSED              [ 12%]
test_fixtures.py::test_with_expensive_data2 PASSED             [ 16%]
test_fixtures.py::test_parametrized_fixture[1] PASSED          [ 20%]
test_fixtures.py::test_parametrized_fixture[2] PASSED          [ 24%]
test_fixtures.py::test_parametrized_fixture[3] PASSED          [ 28%]
test_function.py::test_function1 PASSED                        [ 32%]
test_function.py::test_function2 PASSED                        [ 36%]
test_parametrize.py::test_parametrized[1-101] PASSED           [ 40%]
test_parametrize.py::test_parametrized[2-202] PASSED           [ 44%]
test_parametrize.py::test_parametrized_with_id[one] PASSED     [ 48%]
test_parametrize.py::test_parametrized_with_id[two] PASSED     [ 52%]
test_parametrize.py::test_parametrized_twice[3-1] PASSED       [ 56%]
test_parametrize.py::test_parametrized_twice[3-2] PASSED       [ 60%]
test_parametrize.py::test_parametrized_twice[4-1] PASSED       [ 64%]
test_parametrize.py::test_parametrized_twice[4-2] PASSED       [ 68%]
test_skip.py::test_always_run PASSED                           [ 72%]
test_skip.py::test_never_run SKIPPED                           [ 76%]
test_skip.py::test_always_skip SKIPPED                         [ 80%]
test_skip.py::test_always_skip_string SKIPPED                  [ 84%]
test_unittest.py::MyTestCase::test_method1 PASSED              [ 88%]
test_unittest.py::MyTestCase::test_method2 PASSED              [ 92%]
tests.json::hello PASSED                                       [ 96%]
tests.json::goodbye PASSED                                     [100%]

---------- coverage: platform darwin, python 3.6.9-final-0 -----------
Name                  Stmts   Miss  Cover
-----------------------------------------
conftest.py              16      0   100%
test_fixtures.py         19      0   100%
test_function.py          4      0   100%
test_parametrize.py       8      0   100%
test_skip.py             12      3    75%
test_unittest.py         17      0   100%
-----------------------------------------
TOTAL                    76      3    96%


=================== 22 passed, 3 skipped in 0.18s ====================

Then I can see the contexts that were collected:

$ sqlite3 -csv .coverage "select context from context"
context
""
test_fixtures.py::test_fixture|setup
test_fixtures.py::test_fixture|run
test_fixtures.py::test_two_fixtures|setup
test_fixtures.py::test_two_fixtures|run
test_fixtures.py::test_with_expensive_data|setup
test_fixtures.py::test_with_expensive_data|run
test_fixtures.py::test_with_expensive_data2|run
test_fixtures.py::test_parametrized_fixture[1]|setup
test_fixtures.py::test_parametrized_fixture[1]|run
test_fixtures.py::test_parametrized_fixture[2]|setup
test_fixtures.py::test_parametrized_fixture[2]|run
test_fixtures.py::test_parametrized_fixture[3]|setup
test_fixtures.py::test_parametrized_fixture[3]|run
test_function.py::test_function1|run
test_function.py::test_function2|run
test_parametrize.py::test_parametrized[1-101]|run
test_parametrize.py::test_parametrized[2-202]|run
test_parametrize.py::test_parametrized_with_id[one]|run
test_parametrize.py::test_parametrized_with_id[two]|run
test_parametrize.py::test_parametrized_twice[3-1]|run
test_parametrize.py::test_parametrized_twice[3-2]|run
test_parametrize.py::test_parametrized_twice[4-1]|run
test_parametrize.py::test_parametrized_twice[4-2]|run
test_skip.py::test_always_run|run
test_skip.py::test_always_skip|teardown
test_unittest.py::MyTestCase::test_method1|setup
test_unittest.py::MyTestCase::test_method1|run
test_unittest.py::MyTestCase::test_method2|run
test_unittest.py::MyTestCase::test_method2|teardown
tests.json::hello|run
tests.json::goodbye|run

Version 2.8.0 of pytest-cov (and later) has the new feature. Give it a try. BTW, I also snuck another alpha of coverage.py (5.0a8) in at the same time, to get a needed API right.

Still missing from all this is a really useful way to report on the data. Get in touch if you have needs or ideas.

Categories: FLOSS Project Planets

PyBites: PyCon ES 2019 Alicante Highlights

Planet Python - Tue, 2019-10-08 18:55

Last weekend it was Pycon time again, my 6th one so far. This time closer to home: Alicante.

I had an awesome time, meeting a lot of nice people, watching interesting talks and getting inspired overall to keep learning more Python.

🤩🤩🤩 https://t.co/Xch3HXkPbr

— PyCon España (@PyConES) October 04, 2019

In this post I share 10 highlights, but keep in mind this is a selection only, there are quite a few more talks I want to check out once they appear on Youtube ...

1. Kicking off with @captainsafia's keynote

I did not attend the Friday workshops so Saturday morning I got straight to Safia's keynote which was very inspiring:

"Spread ideas that last" @captainsafia #pycones19 https://t.co/NOwPeuZvP6

— PyCon España (@PyConES) October 05, 2019

It made me realize documentation is actually quite important:

RT @Cecil_gabaxi: "Software are temporary, ideas are forever" & the importance of documenting code to help spread these ideas #pycones19 #i…

— PyCon España (@PyConES) October 05, 2019

She also linked to an interesting paper: Roads and Bridges: The Unseen Labor Behind Our Digital Infrastructure I want to check out, a theme that also came back in Sunday's keynote (see towards the end):

Our modern society runs on software. But the tools we use to build software are buckling under increased demand. Nearly all software today relies on free, public code, written and maintained by communities of developers and other talent.

2. Meeting great people

Funny enough I met Antonio Melé, the author of Django 2 by Example which I am currently going through (great book!):

Although I attended quite some talks, the best part of Pycon is always the people / connections:

RT @nubeblog: @txetxuvel @PyConES @bcnswcraft No hay fines de semana suficientes para todos los eventos tecnológicos que se hacen en España…

— PyCon España (@PyConES) October 06, 2019

And remember: all talks are recorded and authors usually upload their slides to github or what not ...

RT @gnuites: En https://t.co/HLoZKnytSd encontraréis toda la información relacionada con las charlas de esta #PyConES19 @PyConES @python_es

— PyCon España (@PyConES) October 08, 2019 3. Django

There were quite some Django talks:

Genial la charla de @javierabadia hablando sobre cómo conectar una BBDD diferente a las dadas con Django y explican… https://t.co/llVFs6crU0

— Rubén Valseca (@rubnvp) October 06, 2019 4. Testing and exceptions

There were also quite some talks about testing:

Mario Corchero gave a great talk about exceptions. Unfortunately I sat way back in the room so need to look at the slides again. It seems he gave the same talk at Pycon US 2019 so you can watch it here (and in English):

5. Katas!!

Awesome talk about katas by Irene Pérez Encinar. Her talk was funny, practical and of course right up our alley given our platform and learn by challenge approach!

@irenuchi es una auténtica Sra. Miyagi de las katas. Me ha encantado la charla y su manera de explicar las cosas. Y… https://t.co/cn4enM9vej

— Eun Young Cho-조은영 (@eunEsPlata) October 06, 2019

Talking about challenges, we released blog code challenge #64 - PyCon ES 2019 Marvel Challenge for the occation. PR before the end of this week (Friday 11th of Oct. 2019 23.59 AoE) and you can win one of our prizes ...

6. PyCamp

Interesting initiative by Manuel Kaufmann to get together a bunch of Pythonistas in Barcelona, spring 2020, to work on a couple of projects, Hackathon style. I will definitely keep an eye out for this event, see if we can contribute / collaborate ...

¿Todavía no sabés lo que es el PyCamp? Tomate 3 minutos para ver esta lightning talk en @PyConES #PyConEs19 en dónd… https://t.co/Asy9CF8syh

— PyCamp España (@PyCampES) October 08, 2019 7. Coffee lovers

Katerin Perdom travelled all the way from Colombia to share her interesting graduation project about building an artificial nose to spot defects in the quality of coffee:

Desde 🇨🇴 a @PyConES #PyConEs19 #python #womenintech #WomenWhoCode https://t.co/gJb05037Cq

— Katerin Perdom (@Katerin_Perdom) October 05, 2019

Looking forward to see some code responsible for this project. Also another use case of Raspberry PI ... lot of IoT right now! There was another talk about How to warm your house using Python, cool stuff!

8. Data artist

Amazing talk and interesting field:

A data artist (also known as “unicorn”) lives in the intersection of data analysis, decision-making, visualization and wait for it... ART. They are able, not only to use a number of techniques and tools to transform complex data into delightful and memorable visualizations, but to build their own tools and workflows to create visualizations that go beyond the state of the art.

What is a Data Artist explained by @alrocar in #PyConEs19 https://t.co/xHCVWcI9wq

— Eduardo Fernández (@efernandezleon) October 05, 2019

For example look at this beautiful graph expressing global warming (#warmingstripes):

This is what you call a "data artist" https://t.co/wsQT9dMyWY

— alrocar (@alrocar) June 19, 2019

Or check this NBA graph out of 3-pointers scored (I cannot remember the player, but the project is here):

Flipando con @alrocar #DataArtist #PyConEs19 https://t.co/VmgYBBjxbR

— Elena Torró (@BytesAndHumans) October 05, 2019 9. Python is everywhere!

Apart from IoT and data science, one fascinating field is animation (kids movies). Ilion animation studios (one of the sponsors), uses a lot of Python behind the scenes. Can't wait to watch their talk Py2hollywood - usando Python en una producción de películas de animación when it becomes available.

Another cool use case for Python are chatbots! I enjoyed Àngel Fernández's talk about chatops which of course hit home given our (Slack) karmabot. There was another talk about creating chatbots using Rasa.

Chatops 101 con opsdroid por @anxodio en #PyConEs19 https://t.co/t2SPog5KOV

— Argentina en Python (@argenpython) October 05, 2019

Opsdroid is an open source ChatOps bot framework with the moto: Automate boring things! - opsdroid project!

Or what about astronomy?! If that's your thing, check out: Making a galaxy with Python.

10. Experience of a Python core dev

Awesome keynote by Pablo Galindo, really inspiring and humbling knowing it's the hard work of core devs and many contributors that makes that Python is in such a great shape / position today!

Absolutely outstanding keynote by @pyblogsal at #PyConES19. It touches me the passion and dedication he puts everyd… https://t.co/2t82BNPb7b

— Mario Corchero (@mariocj89) October 06, 2019 Conclusion

If you can attend a Pycon, be it close or far from home, do it!

You get so much out of just a few days:

  • Ideas and inspiration (stay hungry stay foolish).

  • See what's moving in the industry.

  • Python / tools/ field / tech knowledge.

  • Meet awesome people and opportunities to collaborate.

  • And lastly, be humbled: lot of volunteering, passion and hard work, give back where you can.

Join our slack and share your own Pycon experience in our #pycon channel, hope to salute you there ...

Keep Calm and Code in Python!

-- Bob

Categories: FLOSS Project Planets

Promet Source: Supreme Court Marks New Era for Web Accessibility

Planet Drupal - Tue, 2019-10-08 18:10
This week, the U.S. Supreme Court declined to review the Ninth Circuit Court’s decision in Robles v. Domino’s Pizza, LLC,* signaling a long-anticipated answer to an essential question: Does the Title III of the 1990 Americans with Disabilities Act, which was written before the current digital landscape was ever envisioned, apply to websites and apps?
Categories: FLOSS Project Planets

Python Engineering at Microsoft: Python in Visual Studio Code – October 2019 Release

Planet Python - Tue, 2019-10-08 17:58

We are pleased to announce that the October 2019 release of the Python Extension for Visual Studio Code is now available. You can download the Python extension from the Marketplace, or install it directly from the extension gallery in Visual Studio Code. If you already have the Python extension installed, you can also get the latest update by restarting Visual Studio Code. You can learn more about  Python support in Visual Studio Code in the documentation.  

In this release we addressed 97 issues, including native editing of Jupyter Notebooks, a button to run a Python file in the terminal, and linting and import improvements with the Python Language Server. The full list of enhancements is listed in our changelog

Native editing of Jupyter Notebooks 

We’re excited to announce the first release of native editing of Jupyter notebooks inside VS Code! The native Jupyter experience brings a new way for both data scientists and notebook developers alike to directly edit .ipynb files and get the interactivity of Jupyter notebooks with all of the power of VS Code. You can check the Native Support for Editing Jupyter Notebooks in VS Code blog post to learn more about this feature and how to get started.

Run Python File in Terminal button 

This release includes a “play” button to run the Run Python File in Terminal command. Now it only takes one click to run Python files with the Python extension!  

The new button is located on the top-right side of the editor, matching the behavior of the Code Runner extension: 

If you’re into key bindings, you can also customize your own keyboard shortcut to run Python files in the terminal, by running the Preferences: Open Keyboard Shortcuts (JSON) command in the command palette (View > Command Palette…) and entering a key binding for the python.execInTerminal command as you prefer. For example, you could have the following definition to run Python files in the terminal with a custom shortcut: 

If the Code Runner extension is enabled, the Python extension doesn’t display this button in order to avoid possible confusion. 

Linting and import improvements with the Python Language Server 

This release also includes three new linting rules with the Python Language Server, as well as significant improvements to autocompletion for packages such as PyTorch and pandas.

Additionally, there have been large improvements made to import resolution. Historically the Language Server has treated the workspace root as the sys.path entry (i.e. the main workspace root) of user module imports, which led to false-positive unresolved imports warnings when importing modules from a src directory. With this release, if there’s such a src directory in the project’s environment, the Language Server automatically detects and adds the directory to its list of search paths. You can refer to the documentation to learn more about configuring search paths for the Language Server. 

Other Changes and Enhancements 

We have also added small enhancements and fixed issues requested by users that should improve your experience working with Python in Visual Studio Code. Some notable changes include: 

  • Fix for test discovery issues with pytest 5.1+. (#6990
  • Fixes for detecting the shell. (#6928
  • Opt insiders users into the Beta version of the Language Server by default. (#7108
  • Replaced occurrences of pep8 with pycodestyle. All mentions of pep8 have been replaced with pycodestyle (thanks Marsfan). (#410

We are continuing to A/B test new features. If you see something different that was not announced by the team, you may be part of the experiment! To see if you are part of an experiment, you can check the first lines in the Python extension output channel. If you wish to opt-out from A/B testing, you can open the user settings.json file (View Command Palette… and run Preferences: Open Settings (JSON)) and set the “python.experiments.enabled” setting to false.   

Be sure to download the Python extension for Visual Studio Code now to try out the above improvements. If you run into any problems, please file an issue on the Python VS Code GitHub page. 

 

 

 

The post Python in Visual Studio Code – October 2019 Release appeared first on Python.

Categories: FLOSS Project Planets

Python Engineering at Microsoft: Announcing Support for Native Editing of Jupyter Notebooks in VS Code

Planet Python - Tue, 2019-10-08 17:51

With today’s October release of the Python extension, we’re excited to announce the support of native editing of Jupyter notebooks inside Visual Studio Code! You can now directly edit .ipynb files and get the interactivity of Jupyter notebooks with all of the power of VS Code. You can manage source control, open multiple files, and leverage productivity features like IntelliSense, Git integration, and multi-file management, offering a brand-new way for data scientists and developers to experiment and work with data efficiently. You can try out this experience today by downloading the latest version of the Python extension and creating/opening a Jupyter Notebook inside VS Code.

Since the initial release of our data science experience in VS Code, one of the top features that users have requested has been a more notebook-like layout to edit their Jupyter notebooks inside VS Code. In the rest of this post we’ll take a look at the new capabilities this offers.

Getting Started

Here’s how to get started with Jupyter in VS Code.

  • If you don’t already have an existing Jupyter Notebook file, open the VS Code Command Palette with the shortcut CTRL + SHIFT + P (Windows) or Command + SHIFT + P (macOS), and run the “Python: Create Blank New Jupyter Notebook” command.
  • If you already have a Jupyter Notebook file, it’s as simple as just opening that file in VS Code. It will automatically open with the new native Jupyter editor.

Once you have a Jupyter Notebook open, you can add new cells, write code in cells, run cells, and perform other notebook actions.

AI-Assisted Autocompletion

As you write code, IntelliSense will give you intelligent code complete suggestions right inside your code cells. You can further supercharge your editor experience by installing our IntelliCode extension to get AI-powered IntelliSense with smarter auto-complete suggestions based on your current code context.

Variable Explorer

Another benefit of using VS Code is that you can take advantage of the variable explorer and plot viewer by clicking the “Variables” button in the notebook toolbar. The variable explorer will help you keep track of the current state of your notebook variables at a glance, in real-time.

Now you can explore your datasets, filter your data, and even export plots! Gone are the days of having to type df.head() just to view your data.

Connecting To Remote Jupyter Servers

When a Jupyter notebook file is created or opened, VS Code automatically creates a Jupyter server for you locally by default. If you want to use a remote Jupyter server, it’s as simple as using the “Specify Jupyter server URI” command via the VS Code command palette, and entering in the server URI.

Exporting as Python Code

When you’re ready to turn experimentation into production-ready Python code, just simply press the “Convert and Save as Python File” button in the top toolbar and let the Python extension do all the work for you. You can then view that Python code in our existing Python Interactive window and keep on working with the awesome features of the Python extension to further make your code production-ready, such as the integrated debugger, refactoring, Visual Studio Live Share, and Git source control.

Debugging

VS Code supports debugging Jupyter Notebooks through using the “Exporting as Python Code” functionality outlined in the previous section. Once you have your code in the Python Interactive window, you can use VS Code’s integrated debugger to debug your code. We are working on bringing cell debugging into the Jupyter editor in a future release so stay tuned!

Try it out today!

You can check out the documentation for the full list of features available in the first release of the native Jupyter experience and learn to get started with Jupyter notebooks in VS Code. Also, if you have any suggestions or run across any issues, please file an issue in the Python extension GitHub page.

We’re excited for everyone to try out this new experience and have the Python extension in VS Code empower your notebook development!

The post Announcing Support for Native Editing of Jupyter Notebooks in VS Code appeared first on Python.

Categories: FLOSS Project Planets

health @ Savannah: GNU Health HMIS 3.6 Release Candidate 1 is out !

GNU Planet! - Tue, 2019-10-08 17:44

Dear community

We are pleased to announce the initial release candidate for the upcoming GNU Health HMIS server !

Please download and test the following files:

  • gnuhealth-3.6RC1.tar.gz: Server with the 45 packages
  • gnuhealth-client-3.6RC1.tar.gz  : The GH HMIS GTK client
  • gnuhealth-client-plugins-3.6RC1.tar.gz : The Federation Resource Locator; the GNU Health Camera and the crypto plugin

Remember that all the components of the 3.6 series run in Python 3.

You can download the RC tarballs from the development dir:

https://www.gnuhealth.org/downloads/development/unstable/

Time to test, report bugs and translate :)

All the best
Luis

Categories: FLOSS Project Planets

Pages