Planet Python

Subscribe to Planet Python feed
Planet Python - http://planetpython.org/
Updated: 20 hours 45 min ago

ListenData: 15 Free Open Source ChatGPT Alternatives (with Code)

Sun, 2024-03-17 20:50

In this article we will explain how Open Source ChatGPT alternatives work and how you can use them to build your own ChatGPT clone for free. By the end of this article you will have a good understanding of these models and will be able to compare and use them.

Benefits of Open Source ChatGPT Alternatives

There are various benefits of using open source large language models which are alternatives to ChatGPT. Some of them are listed below.

  1. Data Privacy: Many companies want to have control over data. It is important for them as they don't want any third-party to have access to their data.
  2. Customization: It allows developers to train large language models with their own data and some filtering on some topics if they want to apply
  3. Affordability: Open source GPT models let you to train sophisticated large language models without worrying about expensive hardware.
  4. Democratizing AI: It opens room for further research which can be used for solving real-world problems.
Table of Contents Llama Introduction : Llama

Llama stands for Large Language Model Meta AI. It includes a range of model sizes from 7 billion to 65 billion parameters. Meta AI researchers focused on scaling the model's performance by increasing the volume of training data, rather than the number of parameters. They claimed the 13 billion parameter model outperformed 175 billion parameters of GPT-3 model. It uses the transformer architecture and was trained on 1.4 trillion tokens extracted by web scraping Wikipedia, GitHub, Stack Exchange, Books from Project Gutenberg, scientific papers on ArXiv.

Python Code : Llama # Install Package pip install llama-cpp-python from llama_cpp import Llama llm = Llama(model_path="./models/7B/ggml-model.bin") output = llm("Q: Name the planets in the solar system? A: ", max_tokens=128, stop=["Q:", "\n"], echo=True) print(output) In the model path, you need to have weights for Llama in GGML format and then store them into the models folder. You can search it on Hugging Face website. See one of them here Llama 2 What's New in Llama 2

Here are some of the key differences between Llama 2 and Llama:

  • Training data: Llama 2 is trained on 40% more tokens than Llama, a total of 2 trillion tokens. This gives it a larger knowledge base and allows it to generate more accurate responses.
  • Model size: Llama 2 is available in three sizes: 7 billion parameters, 13 billion parameters, and 70 billion parameters. Whereas, the maximum size of Llama is 65 billion parameters.
  • Chat optimization: Llama 2-Chat is a specialized version of Llama 2 that is optimized for engaging in two-way conversations. It has been trained on a dataset of human conversations, which allows it to generate more natural and engaging responses.
  • Safety and bias mitigation: Llama 2 has been trained with a focus on safety and bias mitigation. This means that it is less likely to generate toxic or harmful content.
  • Open source: Llama 2 is open source, which means that anyone can use it for research or commercial purposes. Whereas, Llama can't be used for commercial purposes.
Python Code : Llama 2 Llam2: 7 Billion Parameters

To run Llama2 7B model, refer the code below. The following code uses a 4-bit quantization technique that reduces the size of the LLM, which can make it easier to deploy and use on sytems with limited memory.

  Colab: Llama2 7B Model %cd /content !apt-get -y install -qq aria2 !git clone -b v1.3 https://github.com/camenduru/text-generation-webui %cd /content/text-generation-webui !pip install -r requirements.txt !pip install -U gradio==3.28.3 !mkdir /content/text-generation-webui/repositories %cd /content/text-generation-webui/repositories !git clone -b v1.2 https://github.com/camenduru/GPTQ-for-LLaMa.git %cd GPTQ-for-LLaMa !python setup_cuda.py install !aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/4bit/Llama-2-7b-Chat-GPTQ/raw/main/config.json -d /content/text-generation-webui/models/Llama-2-7b-Chat-GPTQ -o config.json !aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/4bit/Llama-2-7b-Chat-GPTQ/raw/main/generation_config.json -d /content/text-generation-webui/models/Llama-2-7b-Chat-GPTQ -o generation_config.json !aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/4bit/Llama-2-7b-Chat-GPTQ/raw/main/special_tokens_map.json -d /content/text-generation-webui/models/Llama-2-7b-Chat-GPTQ -o special_tokens_map.json !aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/4bit/Llama-2-7b-Chat-GPTQ/resolve/main/tokenizer.model -d /content/text-generation-webui/models/Llama-2-7b-Chat-GPTQ -o tokenizer.model !aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/4bit/Llama-2-7b-Chat-GPTQ/raw/main/tokenizer_config.json -d /content/text-generation-webui/models/Llama-2-7b-Chat-GPTQ -o tokenizer_config.json !aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/4bit/Llama-2-7b-Chat-GPTQ/resolve/main/gptq_model-4bit-128g.safetensors -d /content/text-generation-webui/models/Llama-2-7b-Chat-GPTQ -o gptq_model-4bit-128g.safetensors %cd /content/text-generation-webui !python server.py --share --chat --wbits 4 --groupsize 128 --model_type llama Llam2: 13 Billion Parameters

To run Llama2 13B model, refer the code below.

  Colab: Llama2 13B Model %cd /content !apt-get -y install -qq aria2 !git clone -b v1.8 https://github.com/camenduru/text-generation-webui %cd /content/text-generation-webui !pip install -r requirements.txt !aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/4bit/Llama-2-13b-chat-hf/resolve/main/model-00001-of-00003.safetensors -d /content/text-generation-webui/models/Llama-2-13b-chat-hf -o model-00001-of-00003.safetensors !aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/4bit/Llama-2-13b-chat-hf/resolve/main/model-00002-of-00003.safetensors -d /content/text-generation-webui/models/Llama-2-13b-chat-hf -o model-00002-of-00003.safetensors !aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/4bit/Llama-2-13b-chat-hf/resolve/main/model-00003-of-00003.safetensors -d /content/text-generation-webui/models/Llama-2-13b-chat-hf -o model-00003-of-00003.safetensors !aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/4bit/Llama-2-13b-chat-hf/raw/main/model.safetensors.index.json -d /content/text-generation-webui/models/Llama-2-13b-chat-hf -o model.safetensors.index.json !aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/4bit/Llama-2-13b-chat-hf/raw/main/special_tokens_map.json -d /content/text-generation-webui/models/Llama-2-13b-chat-hf -o special_tokens_map.json !aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/4bit/Llama-2-13b-chat-hf/resolve/main/tokenizer.model -d /content/text-generation-webui/models/Llama-2-13b-chat-hf -o tokenizer.model !aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/4bit/Llama-2-13b-chat-hf/raw/main/tokenizer_config.json -d /content/text-generation-webui/models/Llama-2-13b-chat-hf -o tokenizer_config.json !aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/4bit/Llama-2-13b-chat-hf/raw/main/config.json -d /content/text-generation-webui/models/Llama-2-13b-chat-hf -o config.json !aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/4bit/Llama-2-13b-chat-hf/raw/main/generation_config.json -d /content/text-generation-webui/models/Llama-2-13b-chat-hf -o generation_config.json %cd /content/text-generation-webui !python server.py --share --chat --load-in-8bit --model /content/text-generation-webui/models/Llama-2-13b-chat-hf Alpaca Introduction : Alpaca

A team of researchers from Stanford University developed an open-source language model called Alpaca. It is based on Meta's large-scale language model Llama. The team used OpenAI's GPT API (text-davinci-003) to fine tune the Llama 7 billion (7B) parameters sized model. The goal of the team is to make AI available for everyone for free so that academicians can do further research without worrying about expensive hardwares to execute these memory-intensive algorithms. Although these open source models are not available for commercial use, small businesses can still utilize it for building their own chatbots.

How does Alpaca work

The Stanford team began their research with the smallest language model among Llama models, which was the Llama 7B model, and pre-trained it with 1 trillion tokens. They started with the 175 human-written instruction-output pairs from the self-instruct seed set. They then used OpenAI API to ask ChatGPT to generate more instructions using the seed set. It is to obtain roughly 52,000 sample conversations, which the team used to further fine-tune the Llama models using Hugging Face's training framework.

To read this article in full, please click hereThis post appeared first on ListenData
Categories: FLOSS Project Planets

Brett Cannon: State of WASI support for CPython: March 2024

Sun, 2024-03-17 18:43

The biggest update since June 2023 is WASI is now a tier 2 platform for CPython! This means that the main branch of CPython should never be broken more than 24 hours for WASI and that a release will be blocked if WASI support is broken. This only applies to Python 3.13 and later, although I have been trying to keep Python 3.11 and 3.12 working with WASI as well.

To help make this support easier, the devguide has build instructions for WASI. There is also now a WASI step in CI to help make things easier for core developers.

Starting in wasmtime 14, a new command line interface was introduced. All the relevant bits of code that call wasmtime have been updated to use the new CLI in Python 3.11, 3.12, and 3.13/main.

Lastly, 3.13/main and 3.12 now support WASI SDK 21 – which is the official name of the project – and 3.11 is one bug fix away in the test suite from also having support.

At this point I think CPython has caught up to what&aposs available in WASI 0.2 and wasi-libc via WASI SDK. The open issues are mostly feature requests or checking if assumptions related to what&aposs supported still hold.

I&aposm on parental leave at this point, so future WASI work from me is on hold until I return to work in June. Another side effect of me becoming a parent soon is I stepped down as the sponsor of Emscripten support in CPython. That means CPython 3.13 does not officially support Emscripten and probably starting in 3.14, I will be removing any code that complicates supporting WASI. The Pyodide project already knows about this and they don&apost expect it to be a major hindrance for them since they are already used to patching CPython source code.

Categories: FLOSS Project Planets

Robin Wilson: How to speed up appending to PostGIS tables with ogr2ogr

Fri, 2024-03-15 13:28

Summary: If appending to a PostGIS table with GDAL/OGR is taking a long time, try setting the PG_USE_COPY config option to YES (eg. adding --config PG_USE_COPY YES to your command line). This should speed it up, but beware that if there are concurrent writes to your table at the same time as OGR is accessing it then there could be issues with unique identifiers.

As with many of my blog posts, I’m writing this in the hope that it will appear in searches when someone else has the same problem that I ran into recently. In the past I’ve found myself Googling problems that I’ve had before and finding a link to my blog with an explanation in a post that I didn’t even remember writing.

Anyway, the problem I’m talking about today is one I ran into when working with a client a few weeks ago.

I was using the ogr2ogr command-line tool (part of the GDAL software suite) to import data from a local Geopackage file into a PostGIS database (ie. a PostgreSQL database with the PostGIS extension).

I had multiple files of data that I wanted to put into one Postgres table. Specifically, I was using the lovely data collated by Alasdair Rae on the resources page of his website. Even more specifically, I was using some of the Local Authority GIS data to get buildings data for various areas of the UK. I downloaded multiple GeoPackage files (for example, for Southampton City Council, Hampshire County Council and Portsmouth City Council) and wanted to import them all to a buildings table.

I originally tested this with a Postgres server running on my local machine, and ran the following ogr2ogr commands:

ogr2ogr --debug ON \ -f PostgreSQL PG:"host=localhost user=postgres password=blah dbname=test_db" \ buildings1.gpkg -nln buildings ogr2ogr -append -update --debug ON \ -f PostgreSQL PG:"host=localhost user=postgres password=blah dbname=test_db" \ buildings2.gpkg -nln buildings

Here I’m using the -f switch and the arguments following it to tell ogr2ogr to export to PostgreSQL and how to connect to the server, giving it the input file of buildings1.gpkg and using the -nln parameter to tell it what layer name (ie. table name) to use as the output. In the second command I do exactly the same with buildings2.gpkg but also add -append and -update to tell it to append to the existing table rather than overwriting it.

This all worked fine. Great!

A few days later I tried the same thing with a Postgres server running on Azure (using Azure Database for PostgreSQL). The first command ran fine, but the second command seemed to hang.

I was expecting that it would be a bit slower when connecting to a remote database, but I left it running for 10 minutes and it still hadn’t finished. I then tried importing the second file to a new table and it completed quickly – therefore suggesting it was some sort of problem with appending the data.

I worked round this for the time being (using the ogrmerge.py script to merge my buildings1.gpkg and buildings2.gpkg into one file and then importing that file), but resolved to get to the bottom of it when I had time.

Recently, I had that time, and posted on the GDAL mailing list about this. The maintainer of GDAL got back to me to tell me about something I’d missed in the documentation. This was that when importing to a brand new table, the Postgres COPY mode is used, but when appending to an existing table individual INSERT statements are used instead, which can be a lot slower.

Let’s look into this in a bit more detail. The PostgreSQL COPY command is a fast way of importing data into Postgres which involves copying a whole file of data into Postgres in one go, rather than dealing with each row of data individually. This can be significantly faster than iterating through each row of the data and running a separate INSERT statement for each row.

So, ogr2ogr hadn’t hung, it was just running extremely slowly, as inserting my buildings layer involved running an INSERT statement separately for each row, and there were hundreds of thousands of rows. Because the server was hosted remotely on Azure, this involved sending the INSERT command from my computer to the server, waiting for the server to process it, and then the server sending back a result to my computer – a full round-trip for each row of the table.

So, I was told, the simple way to speed this up was to use a configuration setting to turn COPY mode on when appending to tables. This can be done by adding --config PG_USE_COPY YES to the ogr2ogr command. This did the job, and the append commands now completed nice and quickly. If you’re using GDAL/OGR from within a programming language, then have a look at the docs for the GDAL bindings for your language – there should be a way to set GDAL configuration options in your code.

The only final part of this was to understand why the COPY method isn’t used all the time, as it’s so much quicker. Even explained that this is because of potential issues with other connections to the database updating the table at the same time as GDAL is accessing it. It is a fairly safe assumption that if you’re creating a brand new table then no-one else will be accessing it yet, but you can’t assume the same for an existing table. The COPY mode can’t deal with making sure unique identifiers are unique when other connections may be accessing the data. whereas individual INSERT statements can cope with this. Therefore it’s safer to default to INSERT statements when there is any risk of data corruption.

As a nice follow-up for this, and on the maintainer’s advice, I submitted a PR to the GDAL docs, which adds a new section explaining this and giving guidance on setting the config option. I’ve copied that section below:

When data is appended to an existing table (for example, using the -append option in ogr2ogr) the driver will, by default, emit an INSERT statement for each row of data to be added. This may be significantly slower than the COPY-based approach taken when creating a new table, but ensures consistency of unique identifiers if multiple connections are accessing the table simultaneously.

If only one connection is accessing the table when data is appended, the COPY-based approach can be chosen by setting the config option PG_USE_COPY to YES, which may significantly speed up the operation.

Categories: FLOSS Project Planets

Real Python: The Real Python Podcast – Episode #196: Exploring Duck Typing in Python & Dynamics of Monkey Patching

Fri, 2024-03-15 08:00

What are the advantages of determining the type of an object by how it behaves? What coding circumstances are not a good fit for duck typing? Christopher Trudeau is back on the show this week, bringing another batch of PyCoder's Weekly articles and projects.

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

PyCharm: Pytest vs. Unittest: Which Is Better?

Fri, 2024-03-15 06:47
Python, being a versatile and widely used programming language, offers several testing frameworks to facilitate the testing process. Two prominent choices are pytest and unittest, both of which come with their own sets of features and advantages. In this article, we’ll be covering the following sections: This will help you determine which testing framework is […]
Categories: FLOSS Project Planets

Paolo Melchiorre: Pybites Podcast #155

Thu, 2024-03-14 19:00

Episode 155 of Pybites podcast with the title “Django, Open Source & Pycon Conferences, Paolo Melchiorre’s Developer Odyssey”

Categories: FLOSS Project Planets

Python Does What?!: Wisdom for the ages

Thu, 2024-03-14 18:09
>>>type(type) is type True
Categories: FLOSS Project Planets

Peter Bengtsson: Leibniz formula for π in Python, JavaScript, and Ruby

Thu, 2024-03-14 16:24
Different ways to calculate the value of π using the Leibniz formula
Categories: FLOSS Project Planets

Mike Driscoll: Python 3.13 Allows Disabling of the GIL + subinterpreters

Thu, 2024-03-14 14:53

Python 3.13 adds the ability to remove the Global Interpreter Lock (GIL) per PEP 703. Just this past week, a PR was merged in that allows the disabling of the GIL using a command line flag or an environment variable in free-threaded builds. Note that Python must be built using the Py_GIL_DISABLED flag for this to work.

What is the GIL?

The Global Interpreter Lock makes threading easier in Python and prevents race conditions. It also makes running multiple threads per CPU core impossible. However, you can use the multiprocessing module to at least give you the ability to better utilize the cores on your CPU.

Removing the GIL

Removing the GIL will likely make Python buggy initially, so they are making the GIL-less version a build flag. That means that the GIL will still be on by default in Python 3.13 while they test the GIL-less version and find all the bugs the change causes. Hopefully, by 3.14 or so, they will know all the bugs and have them fixed to some degree or another.

Subinterpreters Are Coming

A related enhancement are subinterpreters, which Eric Snow has worked on for quite some time. PEP 554 and PEP 734 cover what subinterpreters are and how they are being exposed so you can use them.

Here’s the abstract from PEP 734 to give you an idea of what they are:

This PEP proposes to add a new module, interpreters, to support inspecting, creating, and running code in multiple interpreters in the current process. This includes Interpreter objects that represent the underlying interpreters. The module will also provide a basic Queue class for communication between interpreters. Finally, we will add a new concurrent.futures.InterpreterPoolExecutor based on the interpreters module.

Both of these enhancements have lots of potential for making Python code faster, more efficient, or both. Peter Sobot announced he tried out the GIL removal flag for the pedalboard package and got a 17x speed up for certain processes.

The post Python 3.13 Allows Disabling of the GIL + subinterpreters appeared first on Mouse Vs Python.

Categories: FLOSS Project Planets

Python Does What?!: sequence unpack a dict

Thu, 2024-03-14 08:45
>>> a, b = {"a": 1, "b": 2} >>> a 'a'
Categories: FLOSS Project Planets

Programiz: Python Program to Convert Bytes to a String

Thu, 2024-03-14 07:22
In this example, you will learn to convert bytes to a string.
Categories: FLOSS Project Planets

Programiz: Python Program to Count the Number of Occurrence of a Character in String

Thu, 2024-03-14 07:21
In this example, you will learn to count the number of occurrences of a character in a string.
Categories: FLOSS Project Planets

Programiz: Python Program to Check If Two Strings are Anagram

Thu, 2024-03-14 07:19
In this example, you will learn to check if two strings are anagram.
Categories: FLOSS Project Planets

Programiz: Python Program to Count the Number of Digits Present In a Number

Thu, 2024-03-14 07:19
In this example, you will learn to count the number of digits present in a number.
Categories: FLOSS Project Planets

Mike Driscoll: NEW COURSE: Python 101 Video Course on Udemy and TutorialsPoint

Wed, 2024-03-13 23:45

I recently put my Python 101 Video Course up on Udemy and TutorialsPoint.

There are one thousand free copies of the course available on Udemy by using the following link:

If you prefer TutorialsPoint, you can get a free copy here:

The Python 101 video course is also available on TeachMePython and my Gumroad store.

I am slowly adding quizzes to the Udemy version of the course and plan to add the same or similar quizzes on TeachMePython. Unfortunately, TutorialsPoint doesn’t support quizzes in the same way as they use a video quiz format, so that site won’t have any quizzes.

The post NEW COURSE: Python 101 Video Course on Udemy and TutorialsPoint appeared first on Mouse Vs Python.

Categories: FLOSS Project Planets

Spyder IDE: Reusable research Birds of a Feather session at Scipy 2023: Solutions and tools

Wed, 2024-03-13 22:30

The Spyder team and collaborators hosted a Birds of a Feather (BoF) session at SciPy 2023, focused on moving beyond just scripts and notebooks toward truly reproducible, reusable research. In Part 1 of this two-part series, we went over our motivation and goals for the session and the challenges that attendees brought up. Now, we’ll review the tips, strategies, tools and platforms (including Spyder!) that participants shared as ways to address these obstacles. We'd again like to thank Juanita Gomez for helping organize the BoF, Hari for his hard work compiling a summary of the outcomes, and everyone for attending and sharing such great ideas and insights!

Making notebooks more reusable

As far as reproducibility is concerned, it was brought up that it can be difficult to easily compare outputs between notebooks created by different researchers. In response, one participant mentioned that VSCode recently made an improvement to the notebook diff viewers to more easily show just the code changes. However, users stressed that it was critical to be able to diff the actual notebook output, not just its contents, and expressed a desire for a tool to cover that aspect.

In response to these concerns, others responded that notebooks should not be considered a unit of reproducible research, which should instead be a complete software project, including notebooks or scripts, an environment/requirements file and a record of commands to run there. They recommended the 8-levels of Reproducibility and Conda Project to help guide and implement this.

Additionally, attendees recommended Papermill, describing it as a very useful tool for parameterizing and executing notebooks programmatically. Others suggested Devcontainers, to allow collaborating with a lab group or team in a shared environment and seeing everything on their screen, as well as Live Share in VSCode.

Participants also expressed frustration that despite notebooks being intended to make programming more literate, this often does not happen in practice. Beginners like the interactivity in notebooks because they don't know how to use more advanced programming tools, but they don't always take advantage of their readability features. To address this, attendees stressed the importance of getting users accustomed to best practices that can also be helpful for reproducibility. A participant mentioned a nbflake8 tool to lint notebooks, though it could not be easily found online, and others wished for a Ruff implementation (which at the time of this writing is now complete).

Migrating notebooks to modules

As one participant put it, "I love notebooks, and also love modules, and love the flow of code from notebooks into modules once it approaches that point." They went on to describe modules as a key unit of documented, tested code, but which doesn't mean a lot on its own, whereas combined with a notebook, it gives them context and meaning. For communities that may be afraid of modules, the participant recommended trying to make creating and transitioning to them easier, so users have fully importable, reusable Python code. For students, notebooks often turn into a fancy scratch pad or script file, and once they get stuff that works, they can move that stuff out into modules, and then the notebooks start to morph into examples and the history of what the work was about that can be interpreted by other researchers.

Other attendees chimed in with similar stories, with a NIST researcher mentioning this is an area they'd been working on for 10 years, with their approach being putting the stuff they want to be modular in a regular Python module, and then have a Jupyter notebook that shows an example using the code, such as in their IPRPy project. To aid this process, participants suggested tools like the Autodocstring extension in VSCode and the docstring generator built into Spyder's editor as great ways to reduce the friction for students when writing documentation, as they just add the triple quotes and the IDE generates a pre-filled docstring for them.

An important reproducibility and reusability tool many cited for this was nbdev, which can allow users to develop their code and let it grow, and then eventually export the parts as modules at the end. According to attendees, its documentation mostly talks about everything as packages, but it can also be used for individual notebooks and modules. Some participants were initially hesitant to show it to their students since they're early Python programmers, but it was actually quite easy for them, only requiring as little as one line of code at the end. (Unfortunately as of this writing, it seems ndbdev development has stalled due to its expected commercial opportunities not materializing.) Others asked for more documentation resources for this, since they were still learning Python themselves and would like to learn more about this and teach it to their students. In addition to this very blog post and guide, one attendee brought up that they did a tutorial on that topic at SciPy, adding that the documentation is pretty intimidating but it would be great to have something more focused on smaller-scale usage.

As additional approaches, attendees mentioned they have their students use Jupytext, which helps the student to convert notebooks to Python files that can be committed to a Git repository. This allows the code to be committed as a Python file, while allowing Jupyter to open it as a notebook and continue working on it. Others brought up nb-convert, a command line tool that can convert notebooks to many different formats including a Python script, which is integrated into IDEs like Spyder, and that there is also a similar VSCode feature.

Enabling reusable Python packages

When it comes to overall workflow, all agreed that going from a script or notebook to a reusable, installable Python package could be a major challenge, especially for students and non-programmers. Attendees from NASA mentioned that for their projects everything has to be documented, and one of the things they've struggled with was converting a notebook to the type of report NASA is typically looking for. Others described their workflow being as simple and "old school" as writing a aaa_readme.txt file where they record a diary of what they were doing on that project so if they have a break working on it, they can go back to those notes and remind themselves.

To help address this, participants recommended a "really cool" tool called "Show Your Work" that comes out of the astrophysics community, which is primarily aimed at producing a paper at the end but also a Python package, and includes all the steps that show users' work along the way. It is built around a tool called Snakemake, which then sets up a template for both the Python package and the paper. Additionally, attendees described it as having a "really helpful" guide for getting started and ensuring all of a user's projects have the same structure. It was brought up that Azel Donath, maintainer of Gammapy and speaker at SciPy 2023, published their Gammapy paper by using this tool.

As a followup, participants asked how this differed from Quarto, to which the response was that Quarto is much more general, whereas Show Your Work was specifically built to allow users to produce a PDF in LaTeX at the end. Others mentioned Duecredit, a related tool for citing open source authors which looks at code and finds the authors (via Git commits) that wrote it.

Additionally, users expressed particular appreciation for the Cookiecutter template that Henry Schreiner III has for packaging. They mentioned that a lot of their workflows are just messing around with their data, and having something like a package structure from the get go helps make it easier to not miss things. As a followup, a nuclear engineer mentioned they often have two week projects leveraging Jupyter at their center, with a cookiecutter template that has Sphinx, and a directory structure, and metadata that looks familiar and has everything set up by default. They described how this particularly helps ensure that different colleagues and team members are on the same page with doing things. Finally, others suggested the data-driven Cookiecutter template, which provides an ordered structure for where things go, what they are named and how they are run.

Next steps

Now that we’ve gathered a wealth of community feedback, ideas and resources, we’re currently working to further translate these insights into an actionable guide (or series of such) on a community platform, to make it easier for everyone to apply them. Keep an eye out for that, and until then, happy Spydering!

Categories: FLOSS Project Planets

Wingware: Wing Python IDE Version 10.0.3 - March 15, 2024

Wed, 2024-03-13 21:00

Wing 10.0.3 adds more control over AI request context, improves keyboard navigability on Windows, fixes folding failures seen in Python files, avoids failure to show debug process output in Debug I/O, improves Diff/Merge for directories and files, fixes hanging while examining some types of debug data, solves several problems with setting up Poetry projects on Windows, and makes about 20 other improvements.

See the change log for details.

Download Wing 10 Now: Wing Pro | Wing Personal | Wing 101 | Compare Products


What's New in Wing 10

AI Assisted Development

Wing Pro 10 takes advantage of recent advances in the capabilities of generative AI to provide powerful AI assisted development, including AI code suggestion, AI driven code refactoring, description-driven development, and AI chat. You can ask Wing to use AI to (1) implement missing code at the current input position, (2) refactor, enhance, or extend existing code by describing the changes that you want to make, (3) write new code from a description of its functionality and design, or (4) chat in order to work through understanding and making changes to code.

Examples of requests you can make include:

"Add a docstring to this method" "Create unit tests for class SearchEngine" "Add a phone number field to the Person class" "Clean up this code" "Convert this into a Python generator" "Create an RPC server that exposes all the public methods in class BuildingManager" "Change this method to wait asynchronously for data and return the result with a callback" "Rewrite this threaded code to instead run asynchronously"

Yes, really!

Your role changes to one of directing an intelligent assistant capable of completing a wide range of programming tasks in relatively short periods of time. Instead of typing out code by hand every step of the way, you are essentially directing someone else to work through the details of manageable steps in the software development process.

Read More

Support for Python 3.12

Wing 10 adds support for Python 3.12, including (1) faster debugging with PEP 669 low impact monitoring API, (2) PEP 695 parameterized classes, functions and methods, (3) PEP 695 type statements, and (4) PEP 701 style f-strings.

Poetry Package Management

Wing Pro 10 adds support for Poetry package management in the New Project dialog and the Packages tool in the Tools menu. Poetry is an easy-to-use cross-platform dependency and package manager for Python, similar to pipenv.

Ruff Code Warnings & Reformatting

Wing Pro 10 adds support for Ruff as an external code checker in the Code Warnings tool, accessed from the Tools menu. Ruff can also be used as a code reformatter in the Source > Reformatting menu group. Ruff is an incredibly fast Python code checker that can replace or supplement flake8, pylint, pep8, and mypy.


Try Wing 10 Now!

Wing 10 is a ground-breaking new release in Wingware's Python IDE product line. Find out how Wing 10 can turbocharge your Python development by trying it today.

Downloads: Wing Pro | Wing Personal | Wing 101 | Compare Products

See Upgrading for details on upgrading from Wing 9 and earlier, and Migrating from Older Versions for a list of compatibility notes.

Categories: FLOSS Project Planets

Real Python: Visualizing Data in Python With Seaborn

Wed, 2024-03-13 10:00

If you have some experience using Python for data analysis, chances are you’ve produced some data plots to explain your analysis to other people. Most likely you’ll have used a library such as Matplotlib to produce these. If you want to take your statistical visualizations to the next level, you should master the Python seaborn library to produce impressive statistical analysis plots that will display your data.

In this tutorial, you’ll learn how to:

  • Make an informed judgment as to whether or not seaborn meets your data visualization needs
  • Understand the principles of seaborn’s classic Python functional interface
  • Understand the principles of seaborn’s more contemporary Python objects interface
  • Create Python plots using seaborn’s functions
  • Create Python plots using seaborn’s objects

Before you start, you should familiarize yourself with the Jupyter Notebook data analysis tool available in JupyterLab. Although you can follow along with this seaborn tutorial using your favorite Python environment, Jupyter Notebook is preferred. You might also like to learn how a pandas DataFrame stores its data. Knowing the difference between a pandas DataFrame and Series will also prove useful.

So now it’s time for you to dive right in and learn how to use seaborn to produce your Python plots.

Free Bonus: Click here to download the free code that you can experiment with in Python seaborn.

Getting Started With Python seaborn

Before you use seaborn, you must install it. Open a Jupyter Notebook and type !python -m pip install seaborn into a new code cell. When you run the cell, seaborn will install. If you’re working at the command line, use the same command, only without the exclamation point (!). Once seaborn is installed, Matplotlib, pandas, and NumPy will also be available. This is handy because sometimes you need them to enhance your Python seaborn plots.

Before you can create a plot, you do, of course, need data. Later, you’ll create several plots using different publicly available datasets containing real-world data. To begin with, you’ll work with some sample data provided for you by the creators of seaborn. More specifically, you’ll work with their tips dataset. This dataset contains data about each tip that a particular restaurant waiter received over a few months.

Creating a Bar Plot With seaborn

Suppose you wanted to see a bar plot showing the average amount of tips received by the waiter each day. You could write some Python seaborn code to do this:

Python In [1]: import matplotlib.pyplot as plt ...: import seaborn as sns ...: ...: tips = sns.load_dataset("tips") ...: ...: ( ...: sns.barplot( ...: data=tips, x="day", y="tip", ...: estimator="mean", errorbar=None, ...: ) ...: .set(title="Daily Tips ($)") ...: ) ...: ...: plt.show() Copied!

First, you import seaborn into your Python code. By convention, you import it as sns. Although you can use any alias you like, sns is a nod to the fictional character the library was named after.

To work with data in seaborn, you usually load it into a pandas DataFrame, although other data structures can also be used. The usual way of loading data is to use the pandas read_csv() function to read data from a file on disk. You’ll see how to do this later.

To begin with, because you’re working with one of the seaborn sample datasets, seaborn allows you online access to these using its load_dataset() function. You can see a list of the freely available files on their GitHub repository. To obtain the one you want, all you need to do is pass load_dataset() a string telling it the name of the file containing the dataset you’re interested in, and it’ll be loaded into a pandas DataFrame for you to use.

The actual bar plot is created using seaborn’s barplot() function. You’ll learn more about the different plotting functions later, but for now, you’ve specified data=tips as the DataFrame you wish to use and also told the function to plot the day and tip columns from it. These contain the day the tip was received and the tip amount, respectively.

The important point you should notice here is that the seaborn barplot() function, like all seaborn plotting functions, can understand pandas DataFrames instinctively. To specify a column of data for them to use, you pass its column name as a string. There’s no need to write pandas code to identify each Series to be plotted.

The estimator="mean" parameter tells seaborn to plot the mean y values for each category of x. This means your plot will show the average tip for each day. You can quickly customize this to instead use common statistical functions such as sum, max, min, and median, but estimator="mean" is the default. The plot will also show error bars by default. By setting errorbar=None, you can suppress them.

The barplot() function will produce a plot using the parameters you pass to it, and it’ll label each axis using the column name of the data that you want to see. Once barplot() is finished, it returns a matplotlib Axes object containing the plot. To give the plot a title, you need to call the Axes object’s .set() method and pass it the title you want. Notice that this was all done from within seaborn directly, and not Matplotlib.

Note: You may be wondering why the barplot() function is encapsulated within a pair of parentheses (...). This is a coding style often used in seaborn code because it frequently uses method chaining. These extra brackets allow you to horizontally align method calls, starting each with its dot notation. Alternatively, you could use the backslash (\) for line continuation, although that is discouraged.

If you take another look at the code, the alignment of .set() is only possible because of these extra encasing brackets. You’ll see this coding style used throughout this tutorial, as well as when you read the seaborn documentation.

In some environments like IPython and PyCharm, you may need to use Matplotlib’s show() function to display your plot, meaning you must import Matplotlib into Python as well. If you’re using a Jupyter notebook, then using plt.show() isn’t necessary, but using it removes some unwanted text above your plot. Placing a semicolon (;) at the end of barplot() will also do this for you.

When you run the code, the resulting plot will look like this:

As you can see, the waiter’s daily average tips rise slightly on the weekends. It looks as though people tip more when they’re relaxed.

Note: One thing you should be aware of is that load_dataset(), unlike read_csv(), will automatically convert string columns into the pandas Categorical data type for you. You use this where your data contains a limited, fixed number of possible values. In this case, the day column of data will be treated as a Categorical data type containing the days of the week. You can see this by using tips["day"] to view the column:

Python In [2]: tips["day"] Out[2]: 0 Sun 1 Sun 2 Sun 3 Sun 4 Sun ... 239 Sat 240 Sat 241 Sat 242 Sat 243 Thur Name: day, Length: 244, dtype: category Categories (4, object): ['Thur', 'Fri', 'Sat', 'Sun'] Copied!

As you can see, your day column has a data type of category. Note, also, that while your original data starts with Sun, the first entry in the category is Thur. In creating the category, the days have been interpreted for you in the correct order. The read_csv() function doesn’t do this.

Read the full article at https://realpython.com/python-seaborn/ »

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

Talk Python to Me: #453: uv - The Next Evolution in Python Packages?

Wed, 2024-03-13 04:00
Have you ever been wait around for pip to do its thing while installing packages or syncing a virtual environment or through some higher level tool such as pip-tools? Then you'll be very excited to hear about the tool just announced from Astral called uv. It's like pip, but 100x faster. Charlie Marsh from Ruff fame and founder of Astral is here to dive in. Let's go.<br/> <br/> <strong>Episode sponsors</strong><br/> <br/> <a href='https://talkpython.fm/neo4j-notes'>Neo4j</a><br> <a href='https://talkpython.fm/training'>Talk Python Courses</a><br/> <br/> <strong>Links from the show</strong><br/> <br/> <div><b>Charlie Marsh on Twitter</b>: <a href="https://twitter.com/charliermarsh" target="_blank" rel="noopener">@charliermarsh</a><br/> <b>Charlie Marsh on Mastodon</b>: <a href="https://hachyderm.io/@charliermarsh" target="_blank" rel="noopener">@charliermarsh</a><br/> <b>Astral</b>: <a href="https://astral.sh" target="_blank" rel="noopener">astral.sh</a><br/> <b>uv</b>: <a href="https://github.com/astral-sh/uv" target="_blank" rel="noopener">github.com</a><br/> <b>Ruff</b>: <a href="https://github.com/astral-sh/ruff" target="_blank" rel="noopener">github.com</a><br/> <b>Ruff Rules</b>: <a href="https://docs.astral.sh/ruff/rules/" target="_blank" rel="noopener">docs.astral.sh</a><br/> <b>When "Everything" Becomes Too Much: The npm Package Chaos of 2024</b>: <a href="https://socket.dev/blog/when-everything-becomes-too-much" target="_blank" rel="noopener">socket.dev</a><br/> <br/> <b>Talk Python's free Audio AI Course</b>: <a href="https://training.talkpython.fm/courses/build-an-audio-ai-app-with-python-and-assemblyai?ref=talkpython" target="_blank" rel="noopener">training.talkpython.fm</a><br/> <b>Watch this episode on YouTube</b>: <a href="https://www.youtube.com/watch?v=g5RWwvzfs0I" target="_blank" rel="noopener">youtube.com</a><br/> <b>Episode transcripts</b>: <a href="https://talkpython.fm/episodes/transcript/453/uv-the-next-evolution-in-python-packages" target="_blank" rel="noopener">talkpython.fm</a><br/> <br/> <b>--- Stay in touch with us ---</b><br/> <b>Subscribe to us on YouTube</b>: <a href="https://talkpython.fm/youtube" target="_blank" rel="noopener">youtube.com</a><br/> <b>Follow Talk Python on Mastodon</b>: <a href="https://fosstodon.org/web/@talkpython" target="_blank" rel="noopener"><i class="fa-brands fa-mastodon"></i>talkpython</a><br/> <b>Follow Michael on Mastodon</b>: <a href="https://fosstodon.org/web/@mkennedy" target="_blank" rel="noopener"><i class="fa-brands fa-mastodon"></i>mkennedy</a><br/></div>
Categories: FLOSS Project Planets

Pages