Feeds

Glyph Lefkowitz: Okay, I’m A Centrist I Guess

Planet Python - Mon, 2024-01-22 12:41

Today I saw a short YouTube video about “cozy games” and started writing a comment, then realized that this was somehow prompting me to write the most succinct summary of my own personal views on politics and economics that I have ever managed. So, here goes.

Apparently all I needed to trim down 50,000 words on my annoyance at how the term “capitalism” is frustratingly both a nexus for useful critque and also reductive thought-terminating clichés was to realize that Animal Crossing: New Horizons is closer to my views on political economy than anything Adam Smith or Karl Marx ever wrote.

Cozy games illustrate that the core mechanics of capitalism are fun and motivating, in a laboratory environment. It’s fun to gather resources, to improve one’s skills, to engage in mutually beneficial exchanges, to collect things, to decorate. It’s tremendously motivating. Even merely pretending to do those things can captivate huge amounts of our time and attention.

In real life, people need to be motivated to do stuff. Not because of some moral deficiency, but because in a large complex civilization it’s hard to tell what needs doing. By the time it’s widely visible to a population-level democratic consensus of non-experts that there is an unmet need — for example, trash piling up on the street everywhere indicating a need for garbage collection — that doesn’t mean “time to pick up some trash”, it means “the sanitation system has collapsed, you’re probably going to get cholera”. We need a system that can identify utility signals more granularly and quickly, towards the edges of the social graph. To allow person A to earn “value credits” of some kind for doing work that others find valuable, then trade those in to person B for labor which they find valuable, even if it is not clearly obvious to anyone else why person A wants that thing. Hence: money.

So, a market can provide an incentive structure that productively steers people towards needs, by aggregating small price signals in a distributed way, via the communication technology of “money”. Authoritarian communist states are famously bad at this, overproducing “necessary” goods in ways that can hold their own with the worst excesses of capitalists, while under-producing “luxury” goods that are politically seen as frivolous.

This is the kernel of truth around which the hardcore capitalist bootstrap grindset ideologues build their fabulist cinematic universe of cruelty. Markets are motivating, they reason, therefore we must worship the market as a god and obey its every whim. Markets can optimize some targets, therefore we must allow markets to optimize every target. Markets efficiently allocate resources, and people need resources to live, therefore anyone unable to secure resources in a market is undeserving of life. Thus we begin at “market economies provide some beneficial efficiencies” and after just a bit of hand-waving over some inconvenient details, we get to “thus, we must make the poor into a blood-sacrifice to Moloch, otherwise nobody will ever work, and we will all die, drowning in our own laziness”. “The cruelty is the point” is a convenient phrase, but among those with this worldview, the prosperity is the point; they just think the cruelty is the only engine that can possibly drive it.

Cozy games are therefore a centrist1 critique of capitalism. They present a world with the prosperity, but without the cruelty. More importantly though, by virtue of the fact that people actually play them in large numbers, they demonstrate that the cruelty is actually unnecessary.

You don’t need to play a cozy game. Tom Nook is not going to evict you from your real-life house if you don’t give him enough bells when it’s time to make rent. In fact, quite the opposite: you have to take time away from your real-life responsibilities and work, in order to make time for such a game. That is how motivating it is to engage with a market system in the abstract, with almost exclusively positive reinforcement.

What cozy games are showing us is that a world with tons of “free stuff” — universal basic income, universal health care, free education, free housing — will not result in a breakdown of our society because “no one wants to work”. People love to work.

If we can turn the market into a cozy game, with low stakes and a generous safety net, more people will engage with it, not fewer. People are not lazy; laziness does not exist. The motivation that people need from a market economy is not a constant looming threat of homelessness, starvation and death for themselves and their children, but a fun opportunity to get a five-star island rating.

Acknowledgments

Thank you to my patrons who are supporting my writing on this blog. If you like what you’ve read here and you’d like to read more of it, or you’d like to support my various open-source endeavors, you can support me on Patreon as well!

  1. Okay, I guess “far left” on the current US political compass, but in a just world socdems would be centrists. 

Categories: FLOSS Project Planets

Chris Lamb: Increasing the Integrity of Software Supply Chains awarded IEEE ‘Best Paper’ award

Planet Debian - Mon, 2024-01-22 12:11

IEEE Software recently announced that a paper that I co-authored with Dr. Stefano Zacchiroli has recently been awarded their ‘Best Paper’ award:

Titled Reproducible Builds: Increasing the Integrity of Software Supply Chains, the abstract reads as follows:


Although it is possible to increase confidence in Free and Open Source Software (FOSS) by reviewing its source code, trusting code is not the same as trusting its executable counterparts. These are typically built and distributed by third-party vendors with severe security consequences if their supply chains are compromised.

In this paper, we present reproducible builds, an approach that can determine whether generated binaries correspond with their original source code. We first define the problem and then provide insight into the challenges of making real-world software build in a "reproducible" manner — that is, when every build generates bit-for-bit identical results. Through the experience of the Reproducible Builds project making the Debian Linux distribution reproducible, we also describe the affinity between reproducibility and quality assurance (QA).

According to Google Scholar, the paper has accumulated almost 40 citations since publication. The full text of the paper can be found in PDF format.

Categories: FLOSS Project Planets

The Drop Times: The DropTimes Carousels and Exciting Events

Planet Drupal - Mon, 2024-01-22 11:09

Have you ever wondered what a media partnership means to us? Simply put, it's like teaming up with some of the most remarkable events to bring their incredible stories directly to the readers through multiple channels, including our social media handles. We are humbled to acknowledge that The DropTimes (TDT) got the opportunity to be a media partner for several upcoming events, such as Florida Drupal Camp, Drupal Mountain Camp, and NERD Summit. We're already in friendly talks with events happening in 2024 for web coverage! We're planning to bring you even more fantastic stories.

Now, let's take a trip down memory lane with captivating carousels. Think of them like visual stories capturing the most exciting moments from events. It's our way of sharing each event's fun, happiness, and success. These carousels are like time machines, taking you back to the best parts of our media partnerships and the lively Drupal community.

The first features highlights from last year's events, including DrupalCon Pittsburgh and DrupalCon Lille 2023. Plus, get an exclusive sneak peek into what's coming up at DrupalCon Portland 2024 and DrupalCon Barcelona 2024.  

But that's not all! Brace yourselves for a visual feast as we proudly present a collection of the best moments from Splash Awards (Germany and Austria), Drupal Developers Day Vienna, and DrupalCamp Costa Rica in 2023.

Moreover, we've compiled The Drop Times 2023 Carousel, a journey back to revisit the year's most noteworthy moments and achievements.

A big shout-out to the fantastic Drupal community for all the support in 2023. Your love and encouragement mean the world to us!

These moments are just the beginning. We're eager to build more partnerships in the future and share even more exciting stories with you. Now, let's shift our focus to the present. Explore some of the latest news stories and articles we covered last week. We've got a mix of engaging content waiting for you.

Elma John conducted a captivating interview with Nneka Hector, the Director of Web Development at DSFederal and a co-lead for Drupal GovCon. Nneka reflected on the community's eagerness for in-person interaction and valuable lessons learned.

Lukas Fischer, Founder of NETNODE AG and one of the developers behind the Content Planner module, shared a customised Dashboard for Drupal websites. Covered by Alka Elizabeth, the latest enhancements promise to make your Drupal experience even more delightful and user-friendly.  

The Event Organizers Working Group (EOWG) election has wrapped up, and we're eagerly awaiting the results. Alka Elizabeth shared insights into the candidates' unique contributions. Stay tuned for the big reveal!

Meet Drupal Droid, a specially crafted AI model designed exclusively for the Drupal Community. Offering assistance with Drupal 9+ site building, development, and coding standards, this innovative tool was introduced by Michael Miles. Alka Elizabeth, sub-editor of The Drop Times, connected with Michael to glean insights into the creation and potential of Drupal Droid.

Now, let's explore what's been happening on the event front: Get a chance to showcase your talent and win a ticket to DrupalCon by submitting your design for the official DrupalCon Portland t-shirt. Enter before February 12! Volunteer as a Trivia Night Coordinator and embrace the opportunity to contribute to the organization of the iconic DrupalCon Trivia Night at Portland 2024. 

Drupal Mountain Camp is leading the charge for diversity and inclusion in the Drupal community with a new initiative. They actively encourage underrepresented voices to participate, promoting a more diverse and enriched community. For more information, click here. 

Explore exclusive sponsorship opportunities for NERD Summit 2024, a prominent mini-conference in web development and technology. Today is the last day for the NERD Summit 2024 for session submission. Make sure to propose your sessions or ideas before midnight. Get more details here

Discover the upcoming Drupal Iberia 2024 event, set to convene in Evora on May 10th and 11th. 

The largest Drupal Conference in Poland, DrupalCamp Poland 2024, calls for session submissions until April 16, 2024. 

Secure your spot at Drupalcamp Rennes 2024! Ticket reservations are now available for the three-day event featuring insightful conferences and contribution opportunities.

Join the Drupal Delhi Meetup Group as they bring back the joy of in-person gatherings on February 24, 2024. Get more information here. 

Missed LocalGov Drupal Week 2023? Don't worry! Dive into the virtual experience on their YouTube channel. Explore 14 sessions over five days, where 530+ participants shared experiences, best practices, and innovative code. 

Join the GitLab Innovation Pitch Competition to showcase your software innovation skills. Compete for a $30,000 prize pool and the opportunity to collaborate with GitLab, focusing on DevOps, Machine Learning/AI, and Social Good projects. Deadline: Feb 27, 2024.

Here is a noteworthy update from the past week: Drupal pioneers innovation with its new credit bounty program, encouraging contributors to align with impactful projects and fostering a purpose-driven community for lasting impact.  

There are more stories available out there. But the compulsion to limit the selection of stories is forcing us to put a hard break on further exploration.

As always, stay tuned for more exciting stories and updates. follow us on LinkedIn, Twitter and Facebook.

Thank you,

Sincerely
Kazima Abbas
Sub-editor, TheDropTimes

Categories: FLOSS Project Planets

Drupal Association blog: Drupal Innovation in 2024: the Contribution Health Dashboards

Planet Drupal - Mon, 2024-01-22 10:32

2023 has been an eventful year, full of ideas, discussions and plans regarding innovation, where Drupal is heading, and, in our case, how the Drupal Association can best support. On top of that, you may have already heard, but innovation is a key goal for the Drupal Association.

Drupal is nothing but a big, decentralized, community. And before we can even think of how we can innovate, we need to understand how contribution actually happens and evolves in our ecosystem. And one of the things we agreed early on was that, without numbers, we don’t even know where we are going. 

For that reason in 2024 we want to introduce you to part of the work we’ve been doing during the last part of 2023 to make sure that we know where we are coming from, we understand where we are going and how the changes we are doing are affecting (or not) the whole contribution ecosystem. I want to introduce you to the Contribution Health Dashboards (CHD).

The CH dashboards should help identify what stops or blocks people from contributing, uncover any friction, and if any problems are found, help to investigate and apply adequate remedies while we can as well measure those changes.

One thing to note is that the numbers we are showing next are based on the contribution credit system. The credit system has been very successful in standardizing and measuring contributions to Drupal.  It also provides incentives to contribute to Drupal, and has raised interest from individuals and organizations.

Using the credit system to evaluate the contribution is not 100% perfect, and it could show some flaws and imperfections, but we are committed to review and improve those indicators regularly, and we think it’s the most accurate way to measure the way contribution happens in Drupal.

It must be noted as well that the data is hidden, deep, in the Drupal.org database. Extracting that data has proved a tedious task, and there are numbers and statistics that we would love to extract in the near future to validate further the steps we are taking. Again, future reviews of the work will happen during the next months while we continue helping contributors to continue innovating.

You can find the dashboards here, in the Contribution Health Dashboards, but keep reading next to understand the numbers better.

Unique individuals and organisations

Jumping to what matters here, the numbers, one of the most important metrics to understand in the Drupal ecosystem is the number of contributions of both individuals and organisations.

As you can see, the number of individuals has stayed relatively stable, while their contribution has been more and more significant over the years (except for a slide in the first year of the pandemic). In a way this is telling us that once a user becomes a contributor, they stay for the long run. And, in my opinion, the numbers say that they stay actually very committed.

The number of organisations on the other hand displays a growing healthy trend. This shows that organisations are an important partner for Drupal and the Drupal Association, bringing a lot of value in the form of (but not just) contributors.

It definitely means that we need to continue supporting and listening to them. It’s actually a symbiotic relationship. These companies support and help moving forward, not just Drupal, but the whole concept of the Open Web. And their involvement doesn’t end up there, as their daily role in expanding the reach, the number of instances and customers of every size using Drupal is as well key.

In practical terms in 2023 we have been meeting different companies and organisations, and the plan is to continue listening and finding new ways to help their needs in 2024 and beyond. One of the things we are releasing soon is the list of priorities and strategic initiatives where your contributions, as individuals as well as organisations, are most meaningful. This is something I have been consistently asked for when meeting with those individuals and organisations, and I think it’s going to make a big difference unleashing innovation in Drupal. I recommend you to have a look at the blog post about the bounty program.

First year contributors

The next value we should be tracking is how first time users are interacting with our ecosystem.

While the previous numbers are encouraging, we have a healthy ecosystem of companies and a crowd of loyal individuals contributing to the project, making sure that we onboard and we make it easier and attractive for new generations to contribute to the project is the only possible way to ensure that this continues to be the case for many years to come.

That’s why we are looking at first time contributions, or said differently, how many users make a first contribution in their first 12 months from joining the project. During 2024 I would like to look deeper into this data, reveal contribution data further on time, like after 24 and 36 months. For now this will be a good lighthouse that we can use to improve the contribution process.

Although last year's numbers give us a nice feeling of success, we want to be cautious about them, and try to make sure that the trend of previous years of a slight decline does not continue.

That is the reason why my first priority during the first months of 2024 is to review the registration process and the next step for new users on their contribution journey. From the form they are presented, to the documentation we are facilitating, to the messages we are sending them in the weeks and months after.

The changes we make should be guided as well by the next important graph, which is the Time To First Contribution. In other words, the amount of time a new user has taken to make their first contribution to Drupal.

You’ll see that the Contribution Health Dashboards includes other data that I have not mentioned in this post. It does not mean that it is not equally important, but given the Drupal Association has a finite amount of resources, we consider that this is the data that we need to track closely to get a grasp of the health of our contribution system.

For now, have a look at the Contribution Health Dashboards to get a grasp of the rest of the information that we have collected. If you are curious about the numbers and maybe would like to give us a hand, please do not hesitate to send me a message at alex.moreno@association.drupal.org

Categories: FLOSS Project Planets

PyCon: Applications For Booth Space on Startup Row Are Now Open!

Planet Python - Mon, 2024-01-22 10:15

 Applications For Booth Space on Startup Row Are Now Open

To all the startup founders out there, ‌PyCon US organizers have some awesome news for you! The application window for Startup Row at PyCon US is now open.

You’ve got until March 15th to apply, but don’t delay. (And if you want to skip all this reading and go straight to the application, here’s a link for ya.)

That’s right! Your startup could get the best of what PyCon US has to offer:

  • Coveted Expo Hall booth space
  • Exclusive placement on the PyCon US website
  • Access to the PyCon Jobs Fair (since, after all, there’s no better place to meet and recruit Python professionals)
  • A unique in-person platform to access a fantastically diverse crowd of thousands of engineers, data wranglers, academic researchers, students, and enthusiasts who come to PyCon US.

Corporate sponsors pay thousands of dollars for this level of access, but to support the entrepreneurial community PyCon US organizers are excited to give the PyCon experience to up-and-coming startup companies for free. (Submitting a Startup Row application is completely free. To discourage no-shows at the conference itself, we do require a fully-refundable $400 deposit from companies who are selected for and accept a spot on Startup Row. If you show up, you’ll get your deposit back after the conference.)

Does My Startup Qualify?

The goal of Startup Row is to give seed and early-stage companies access to the Python community. Here are the qualification criteria:

  • Somewhat obviously: Python is used somewhere in your tech or business stack, the more of it the better!
  • Your startup is roughly 2.5 years or less at the time of applying. (If you had a major pivot or took awhile to get a product in the market, measure from there.)
  • You have 25 or fewer folks on the team, including founders, employees, and contractors.
  • You or your company will fund travel and accommodation to PyCon US 2024 in Pittsburgh, Pennsylvania. (There’s a helpful page on the PyCon US website with venue and hotel information.)
  • You haven’t already presented on Startup Row or sponsored a previous PyCon US. (If you applied before but weren’t accepted, please do apply again!)

There is a little bit of wiggle room. If your startup is more of a fuzzy rather than an exact match for these criteria, still consider applying.

How Do I Apply?

Assuming you’ve already created a user account on the PyCon US website, applying for Startup Row is easy. 

  1. Make sure you’re logged in.
  2. Go to the Startup Row application page and submit your application by March 15th. (Note: It might be helpful to draft your answers in a separate document.)
  3. Wait to hear back! Our goal is to notify folks about their application decision toward the end of March.

Again, the application deadline is March 15, 2024 at 11:59 PM Eastern. Applications submitted after that deadline may not be considered.

Can I learn more about Startup Row?

You bet! Check out the Startup Row page for more details and testimonials from prior Startup Row participants. (There’s a link to the application there, too!)

Who do I contact with questions about Startup Row?

First off, if you have questions about PyCon US in general, you can send an email to the PyCon US organizing team at pycon-reg@python.org. We’re always happy to help.

For specific Startup Row-related questions, reach out to co-chair Jason D. Rowley via email at jdr [at] omg [dot] lol, or find some time in his calendar at calendly [dot] com [slash] jdr.

Wait, What’s The Deadline Again?

Again, the application deadline is March 15, 2024 at 11:59PM Eastern.

Good luck! We look forward to reviewing your application!

Categories: FLOSS Project Planets

Paul Tagliamonte: Writing a simulator to check phased array beamforming 🌀

Planet Debian - Mon, 2024-01-22 10:11
Interested in future updates? Follow me on mastodon at @paul@soylent.green. Posts about hz.tools will be tagged #hztools.

If you're on the Fediverse, I'd very much appreciate boosts on my toot!

While working on hz.tools, I started to move my beamforming code from 2-D (meaning, beamforming to some specific angle on the X-Y plane for waves on the X-Y plane) to 3-D. I’ll have more to say about that once I get around to publishing the code as soon as I’m sure it’s not completely wrong, but in the meantime I decided to write a simple simulator to visually check the beamformer against the textbooks. The results were pretty rad, so I figured I’d throw together a post since it’s interesting all on its own outside of beamforming as a general topic.

I figured I’d write this in Rust, since I’ve been using Rust as my primary language over at zoo, and it’s a good chance to learn the language better.

⚠️ This post has some large GIFs

It make take a little bit to load depending on your internet connection. Sorry about that, I'm not clever enough to do better without doing tons of complex engineering work. They may be choppy while they load or something. I tried to compress an ensmall them, so if they're loaded but fuzzy, click on them to load a slightly larger version.

This post won’t cover the basics of how phased arrays work or the specifics of calculating the phase offsets for each antenna, but I’ll dig into how I wrote a simple “simulator” and how I wound up checking my phase offsets to generate the renders below.

Assumptions

I didn’t want to build a general purpose RF simulator, anything particularly generic, or something that would solve for any more than the things right in front of me. To do this as simply (and quickly – all this code took about a day to write, including the beamforming math) – I had to reduce the amount of work in front of me.

Given that I was concerend with visualizing what the antenna pattern would look like in 3-D given some antenna geometry, operating frequency and configured beam, I made the following assumptions:

All anetnnas are perfectly isotropic – they receive a signal that is exactly the same strength no matter what direction the signal originates from.

There’s a single point-source isotropic emitter in the far-field (I modeled this as being 1 million meters away – 1000 kilometers) of the antenna system.

There is no noise, multipath, loss or distortion in the signal as it travels through space.

Antennas will never interfere with each other.

2-D Polar Plots

The last time I wrote something like this, I generated 2-D GIFs which show a radiation pattern, not unlike the polar plots you’d see on a microphone.

These are handy because it lets you visualize what the directionality of the antenna looks like, as well as in what direction emissions are captured, and in what directions emissions are nulled out. You can see these plots on spec sheets for antennas in both 2-D and 3-D form.

Now, let’s port the 2-D approach to 3-D and see how well it works out.

Writing the 3-D simulator

As an EM wave travels through free space, the place at which you sample the wave controls that phase you observe at each time-step. This means, assuming perfectly synchronized clocks, a transmitter and receiver exactly one RF wavelength apart will observe a signal in-phase, but a transmitter and receiver a half wavelength apart will observe a signal 180 degrees out of phase.

This means that if we take the distance between our point-source and antenna element, divide it by the wavelength, we can use the fractional part of the resulting number to determine the phase observed. If we multiply that number (in the range of 0 to just under 1) by tau, we can generate a complex number by taking the cos and sin of the multiplied phase (in the range of 0 to tau), assuming the transmitter is emitting a carrier wave at a static amplitude and all clocks are in perfect sync.

let observed_phases: Vec<Complex> = antennas .iter() .map(|antenna| { let distance = (antenna - tx).magnitude(); let distance = distance - (distance as i64 as f64); ((distance / wavelength) * TAU) }) .map(|phase| Complex(phase.cos(), phase.sin())) .collect();

At this point, given some synthetic transmission point and each antenna, we know what the expected complex sample would be at each antenna. At this point, we can adjust the phase of each antenna according to the beamforming phase offset configuration, and add up every sample in order to determine what the entire system would collectively produce a sample as.

let beamformed_phases: Vec<Complex> = ...; let magnitude = beamformed_phases .iter() .zip(observed_phases.iter()) .map(|(beamformed, observed)| observed * beamformed) .reduce(|acc, el| acc + el) .unwrap() .abs();

Armed with this information, it’s straight forward to generate some number of (Azimuth, Elevation) points to sample, generate a transmission point far away in that direction, resolve what the resulting Complex sample would be, take its magnitude, and use that to create an (x, y, z) point at (azimuth, elevation, magnitude). The color attached two that point is based on its distance from (0, 0, 0). I opted to use the Life Aquatic table for this one.

After this process is complete, I have a point cloud of ((x, y, z), (r, g, b)) points. I wrote a small program using kiss3d to render point cloud using tons of small spheres, and write out the frames to a set of PNGs, which get compiled into a GIF.

Now for the fun part, let’s take a look at some radiation patterns!

1x4 Phased Array

The first configuration is a phased array where all the elements are in perfect alignment on the y and z axis, and separated by some offset in the x axis. This configuration can sweep 180 degrees (not the full 360), but can’t be steared in elevation at all.

Let’s take a look at what this looks like for a well constructed 1x4 phased array:

And now let’s take a look at the renders as we play with the configuration of this array and make sure things look right. Our initial quarter-wavelength spacing is very effective and has some outstanding performance characteristics. Let’s check to see that everything looks right as a first test.

Nice. Looks perfect. When pointing forward at (0, 0), we’d expect to see a torus, which we do. As we sweep between 0 and 360, astute observers will notice the pattern is mirrored along the axis of the antennas, when the beam is facing forward to 0 degrees, it’ll also receive at 180 degrees just as strong. There’s a small sidelobe that forms when it’s configured along the array, but it also becomes the most directional, and the sidelobes remain fairly small.

Long compared to the wavelength (1¼ λ)

Let’s try again, but rather than spacing each antenna ¼ of a wavelength apart, let’s see about spacing each antenna 1¼ of a wavelength apart instead.

The main lobe is a lot more narrow (not a bad thing!), but some significant sidelobes have formed (not ideal). This can cause a lot of confusion when doing things that require a lot of directional resolution unless they’re compensated for.

Going from (¼ to 5¼ λ)

The last model begs the question - what do things look like when you separate the antennas from each other but without moving the beam? Let’s simulate moving our antennas but not adjusting the configured beam or operating frequency.

Very cool. As the spacing becomes longer in relation to the operating frequency, we can see the sidelobes start to form out of the end of the antenna system.

2x2 Phased Array

The second configuration I want to try is a phased array where the elements are in perfect alignment on the z axis, and separated by a fixed offset in either the x or y axis by their neighbor, forming a square when viewed along the x/y axis.

Let’s take a look at what this looks like for a well constructed 2x2 phased array:

Let’s do the same as above and take a look at the renders as we play with the configuration of this array and see what things look like. This configuration should suppress the sidelobes and give us good performance, and even give us some amount of control in elevation while we’re at it.

Sweet. Heck yeah. The array is quite directional in the configured direction, and can even sweep a little bit in elevation, a definite improvement from the 1x4 above.

Long compared to the wavelength (1¼ λ)

Let’s do the same thing as the 1x4 and take a look at what happens when the distance between elements is long compared to the frequency of operation – say, 1¼ of a wavelength apart? What happens to the sidelobes given this spacing when the frequency of operation is much different than the physical geometry?

Mesmerising. This is my favorate render. The sidelobes are very fun to watch come in and out of existence. It looks absolutely other-worldly.

Going from (¼ to 5¼ λ)

Finally, for completeness' sake, what do things look like when you separate the antennas from each other just as we did with the 1x4? Let’s simulate moving our antennas but not adjusting the configured beam or operating frequency.

Very very cool. The sidelobes wind up turning the very blobby cardioid into an electromagnetic dog toy. I think we’ve proven to ourselves that using a phased array much outside its designed frequency of operation seems like a real bad idea.

Future Work

Now that I have a system to test things out, I’m a bit more confident that my beamforming code is close to right! I’d love to push that code over the line and blog about it, since it’s a really interesting topic on its own. Once I’m sure the code involved isn’t full of lies, I’ll put it up on the hztools org, and post about it here and on mastodon.

Categories: FLOSS Project Planets

Real Python: When to Use a List Comprehension in Python

Planet Python - Mon, 2024-01-22 09:00

One of Python’s most distinctive features is the list comprehension, which you can use to create powerful functionality within a single line of code. However, many developers struggle to fully leverage the more advanced features of list comprehensions in Python. Some programmers even use them too much, which can lead to code that’s less efficient and harder to read.

By the end of this tutorial, you’ll understand the full power of Python list comprehensions and know how to use their features comfortably. You’ll also gain an understanding of the trade-offs that come with using them so that you can determine when other approaches are preferable.

In this tutorial, you’ll learn how to:

  • Rewrite loops and map() calls as list comprehensions in Python
  • Choose between comprehensions, loops, and map() calls
  • Supercharge your comprehensions with conditional logic
  • Use comprehensions to replace filter()
  • Profile your code to resolve performance questions

Get Your Code: Click here to download the free code that shows you how and when to use list comprehensions in Python.

Transforming Lists in Python

There are a few different ways to create and add items to a lists in Python. In this section, you’ll explore for loops and the map() function to perform these tasks. Then, you’ll move on to learn about how to use list comprehensions and when list comprehensions can benefit your Python program.

Use for Loops

The most common type of loop is the for loop. You can use a for loop to create a list of elements in three steps:

  1. Instantiate an empty list.
  2. Loop over an iterable or range of elements.
  3. Append each element to the end of the list.

If you want to create a list containing the first ten perfect squares, then you can complete these steps in three lines of code:

Python >>> squares = [] >>> for number in range(10): ... squares.append(number * number) ... >>> squares [0, 1, 4, 9, 16, 25, 36, 49, 64, 81] Copied!

Here, you instantiate an empty list, squares. Then, you use a for loop to iterate over range(10). Finally, you multiply each number by itself and append the result to the end of the list.

Work With map Objects

For an alternative approach that’s based in functional programming, you can use map(). You pass in a function and an iterable, and map() will create an object. This object contains the result that you’d get from running each iterable element through the supplied function.

As an example, consider a situation in which you need to calculate the price after tax for a list of transactions:

Python >>> prices = [1.09, 23.56, 57.84, 4.56, 6.78] >>> TAX_RATE = .08 >>> def get_price_with_tax(price): ... return price * (1 + TAX_RATE) ... >>> final_prices = map(get_price_with_tax, prices) >>> final_prices <map object at 0x7f34da341f90> >>> list(final_prices) [1.1772000000000002, 25.4448, 62.467200000000005, 4.9248, 7.322400000000001] Copied!

Here, you have an iterable, prices, and a function, get_price_with_tax(). You pass both of these arguments to map() and store the resulting map object in final_prices. Finally, you convert final_prices into a list using list().

Leverage List Comprehensions

List comprehensions are a third way of making or transforming lists. With this elegant approach, you could rewrite the for loop from the first example in just a single line of code:

Python >>> squares = [number * number for number in range(10)] >>> squares [0, 1, 4, 9, 16, 25, 36, 49, 64, 81] Copied!

Rather than creating an empty list and adding each element to the end, you simply define the list and its contents at the same time by following this format:

new_list = [expression for member in iterable]

Every list comprehension in Python includes three elements:

  1. expression is the member itself, a call to a method, or any other valid expression that returns a value. In the example above, the expression number * number is the square of the member value.
  2. member is the object or value in the list or iterable. In the example above, the member value is number.
  3. iterable is a list, set, sequence, generator, or any other object that can return its elements one at a time. In the example above, the iterable is range(10).
Read the full article at https://realpython.com/list-comprehension-python/ »

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

Russell Coker: Storage Trends 2024

Planet Debian - Mon, 2024-01-22 07:57

It has been less than a year since my last post about storage trends [1] and enough has changed to make it worth writing again. My previous analysis was that for <2TB only SSD made sense, for 4TB SSD made sense for business use while hard drives were still a good option for home use, and for 8TB+ hard drives were clearly the best choice for most uses. I will start by looking at MSY prices, they aren't the cheapest (you can get cheaper online) but they are competitive and they make it easy to compare the different options. I'll also compare the cheapest options in each size, there are more expensive options but usually if you want to pay more then the performance benefits of SSD (both SATA and NVMe) are even more appealing. All prices are in Australian dollars and of parts that are readily available in Australia, but the relative prices of the parts are probably similar in most countries. The main issue here is when to use SSD and when to use hard disks, and then if SSD is chosen which variety to use.

Small Storage

For my last post the cheapest storage devices from MSY were $19 for a 128G SSD, now it’s $24 for a 128G SSD or NVMe device. I don’t think the Australian dollar has dropped much against foreign currencies, so I guess this is partly companies wanting more profits and partly due to the demand for more storage. Items that can’t sell in quantity need higher profit margins if they are to have them in stock. 500G SSDs are around $33 and 500G NVMe devices for $36 so for most use cases it wouldn’t make sense to buy anything smaller than 500G.

The cheapest hard drive is $45 for a 1TB disk. A 1TB SATA SSD costs $61 and a 1TB NVMe costs $79. So 1TB disks aren’t a good option for any use case.

A 2TB hard drive is $89. A 2TB SATA SSD is $118 and a 2TB NVMe is $145. I don’t think the small savings you can get from using hard drives makes them worth using for 2TB.

For most people if you have a system that’s important to you then $145 on storage isn’t a lot to spend. It seems hardly worth buying less than 2TB of storage, even for a laptop. Even if you don’t use all the space larger storage devices tend to support more writes before wearing out so you still gain from it. A 2TB NVMe device you buy for a laptop now could be used in every replacement laptop for the next 10 years. I only have 512G of storage in my laptop because I have a collection of SSD/NVMe devices that have been replaced in larger systems, so the 512G is essentially free for my laptop as I bought a larger device for a server.

For small business use it doesn’t make sense to buy anything smaller than 2TB for any system other than a router. If you buy smaller devices then you will sometimes have to pay people to install bigger ones and when the price is $145 it’s best to just pay that up front and be done with it.

Medium Storage

A 4TB hard drive is $135. A 4TB SATA SSD is $319 and a 4TB NVMe is $299. The prices haven’t changed a lot since last year, but a small increase in hard drive prices and a small decrease in SSD prices makes SSD more appealing for this market segment.

A common size range for home servers and small business servers is 4TB or 8TB of storage. To do that on SSD means about $600 for 4TB of RAID-1 or $900 for 8TB of RAID-5/RAID-Z. That’s quite affordable for that use.

For 8TB of less important storage a 8TB hard drive costs $239 and a 8TB SATA SSD costs $899 so a hard drive clearly wins for the specific case of non-RAID single device storage. Note that the U.2 devices are more competitive for 8TB than SATA but I included them in the next section because they are more difficult to install.

Serious Storage

With 8TB being an uncommon and expensive option for consumer SSDs the cheapest price is for multiple 4TB devices. To have multiple NVMe devices in one PCIe slot you need PCIe bifurcation (treating the PCIe slot as multiple slots). Most of the machines I use don’t support bifurcation and most affordable systems with ECC RAM don’t have it. For cheap NVMe type storage there are U.2 devices (the “enterprise” form of NVMe). Until recently they were too expensive to use for desktop systems but now there are PCIe cards for internal U.2 devices, $14 for a card that takes a single U.2 is a common price on AliExpress and prices below $600 for a 7.68TB U.2 device are common – that’s cheaper on a per-TB basis than SATA SSD and NVMe! There are PCIe cards that take up to 4*U.2 devices (which probably require bifurcation) which means you could have 8+ U.2 devices in one not particularly high end PC for 56TB of RAID-Z NVMe storage. Admittedly $4200 for 56TB is moderately expensive, but it’s in the price range for a small business server or a high end home server. A more common configuration might be 2*7.68TB U.2 on a single PCIe card (or 2 cards if you don’t have bifurcation) for 7.68TB of RAID-1 storage.

For SATA SSD AliExpress has a 6*2.5″ hot-swap device that fits in a 5.25″ bay for $63, so if you have 2*5.25″ bays you could have 12*4TB SSDs for 44TB of RAID-Z storage. That wouldn’t be much cheaper than 8*7.68TB U.2 devices and would be slower and have less space. But it would be a good option if PCIe bifurcation isn’t possible.

16TB SATA hard drives cost $559 which is almost exactly half the price per TB of U.2 storage. That doesn’t seem like a good deal. If you want 16TB of RAID storage then 3*7.68TB U.2 devices only costs about 50% more than 2*16TB SATA disks. In most cases paying 50% more to get NVMe instead of hard disks is a good option. As sizes go above 16TB prices go up in a more than linear manner, I guess they don’t sell much volume of larger drives.

15.36TB U.2 devices are on sale for about $1300, slightly more than twice the price of a 16TB disk. It’s within the price range of small businesses and serious home users. Also it should be noted that the U.2 devices are designed for “enterprise” levels of reliability and the hard disk prices I’m comparing to are the cheapest available. If “NAS” hard disks were compared then the price benefit of hard disks would be smaller.

Probably the biggest problem with U.2 for most people is that it’s an uncommon technology that few people have much experience with or spare parts for testing. Also you can’t buy U.2 gear at your local computer store which might mean that you want to have spare parts on hand which is an extra expense.

For enterprise use I’ve recently been involved in discussions with a vendor that sells multiple petabyte arrays of NVMe. Apparently NVMe is cheap enough that there’s no need to use anything else if you want a well performing file server.

Do Hard Disks Make Sense?

There are specific cases like comparing a 8TB hard disk to a 8TB SATA SSD or a 16TB hard disk to a 15.36TB U.2 device where hard disks have an apparent advantage. But when comparing RAID storage and counting the performance benefits of SSD the savings of using hard disks don’t seem to be that great.

Is now the time that hard disks are going to die in the market? If they can’t get volume sales then prices will go up due to lack of economy of scale in manufacture and increased stock time for retailers. 8TB hard drives are now more expensive than they were 9 months ago when I wrote my previous post, has a hard drive price death spiral already started?

SSDs are cheaper than hard disks at the smallest sizes, faster (apart from some corner cases with contiguous IO), take less space in a computer, and make less noise. At worst they are a bit over twice the cost per TB. But the most common requirements for storage are small enough and cheap enough that being twice as expensive as hard drives isn’t a problem for most people.

I predict that hard disks will become less popular in future and offer less of a price advantage. The vendors are talking about 50TB hard disks being available in future but right now you can fit more than 50TB of NVMe or U.2 devices in a volume less than that of a 3.5″ hard disk so for storage density SSD can clearly win. Maybe in future hard disks will be used in arrays of 100TB devices for large scale enterprise storage. But for home users and small businesses the current sizes of SSD cover most uses.

At the moment it seems that the one case where hard disks can really compare well is for backup devices. For backups you want large storage, good contiguous write speeds, and low prices so you can buy plenty of them.

Further Issues

The prices I’ve compared for SATA SSD and NVMe devices are all based on the cheapest devices available. I think it’s a bit of a market for lemons [2] as devices often don’t perform as well as expected and the incidence of fake products purporting to be from reputable companies is high on the cheaper sites. So you might as well buy the cheaper devices. An advantage of the U.2 devices is that you know that they will be reliable and perform well.

One thing that concerns me about SSDs is the lack of knowledge of their failure cases. Filesystems like ZFS were specifically designed to cope with common failure cases of hard disks and I don’t think we have that much knowledge about how SSDs fail. But with 3 copies of metadata BTFS or ZFS should survive unexpected SSD failure modes.

I still have some hard drives in my home server, they keep working well enough and the prices on SSDs keep dropping. But if I was buying new storage for such a server now I’d get U.2.

I wonder if tape will make a comeback for backup.

Does anyone know of other good storage options that I missed?

Related posts:

  1. Storage Trends 2023 It’s been 2 years since my last blog post about...
  2. Storage Trends 2021 The Viability of Small Disks Less than a year ago...
  3. Storage Trends In considering storage trends for the consumer side I’m looking...
Categories: FLOSS Project Planets

LN Webworks: AWS S3 Bucket File Upload In Drupal

Planet Drupal - Mon, 2024-01-22 04:59
1. Creating an AWS Bucket
  1. Log in to AWS Console: Go to the AWS Management Console and log in to your account.
  2. Navigate to S3: In the AWS Console, find and click on the "S3" service.
  3. Create a Bucket: Click the "Create bucket" button, provide a unique and meaningful name for your bucket, and choose the region where you want to create the bucket.
  4. Configure Options: Set the desired configuration options, such as versioning, logging, and tags. Click through the configuration steps, review your settings, and create the bucket.
2. Uploading a Public Image

$settings['s3fs.access_key'] = "YOUR_ACCESS_KEY";
$settings['s3fs.secret_key'] = "YOUR_SECRET_KEY";
$settings['s3fs.region'] = "us-east-1";
$settings['s3fs.upload_as_public'] = TRUE;

Categories: FLOSS Project Planets

ADCI Solutions: How to Upgrade Drupal 7 and 8 to Drupal 10: Step-by-Step Guide

Planet Drupal - Mon, 2024-01-22 04:59

Developers of the ADCI Solutions Studio explain why you need to upgrade your Drupal 7 and 8 websites to Drupal 10 and what makes the migration process different from a routine CMS update.

Categories: FLOSS Project Planets

Zato Blog: How to correctly integrate APIs in Python

Planet Python - Sun, 2024-01-21 23:43
How to correctly integrate APIs in Python 2024-01-22, by Dariusz Suchojad

Understanding how to effectively integrate various systems and APIs is crucial. Yet, without a dedicated integration platform, the result will be brittle point-to-point integrations that never lead to good outcomes.

Read this article about Zato, an open-source integration platform in Python, for an overview of what to avoid and how to do it correctly instead.

More blog posts
Categories: FLOSS Project Planets

Dirk Eddelbuettel: RProtoBuf 0.4.22 on CRAN: Updated Windows Support!

Planet Debian - Sun, 2024-01-21 22:41

A new maintenance release 0.4.22 of RProtoBuf arrived on CRAN earlier today. RProtoBuf provides R with bindings for the Google Protocol Buffers (“ProtoBuf”) data encoding and serialization library used and released by Google, and deployed very widely in numerous projects as a language and operating-system agnostic protocol.

This release matches the recent 0.4.21 release which enabled use of the package with newer ProtoBuf releases. Tomas has been updating the Windows / rtools side of things, and supplied us with simple PR that will enable building with those updated versions once finalised.

The following section from the NEWS.Rd file has full details.

Changes in RProtoBuf version 0.4.22 (2022-12-13)
  • Apply patch by Tomas Kalibera to support updated rtools to build with newer ProtoBuf releases on windows

Thanks to my CRANberries, there is a diff to the previous release. The RProtoBuf page has copies of the (older) package vignette, the ‘quick’ overview vignette, and the pre-print of our JSS paper. Questions, comments etc should go to the GitHub issue tracker off the GitHub repo.

If you like this or other open-source work I do, you can sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

Categories: FLOSS Project Planets

Luke Plant: Python packaging must be getting better - a datapoint

Planet Python - Sun, 2024-01-21 15:47

I’m developing some Python software for a client, which in its current early state is desktop software that will need to run on Windows.

So far, however, I have done all development on my normal comfortable Linux machine. I haven’t really used Windows in earnest for more than 15 years – to the point where my wife happily installs Linux on her own machine, knowing that I’ll be hopeless at helping her fix issues if the OS is Windows – and certainly not for development work in that time. So I was expecting a fair amount of pain.

There was certainly a lot of friction getting a development environment set up. RealPython.com have a great guide which got me a long way, but even that had some holes and a lot of inconvenience, mostly due to the fact that, on the machine I needed to use, my main login and my admin login are separate. (I’m very lucky to be granted an admin login at all, so I’m not complaining). And there are lots of ways that Windows just seems to be broken, but that’s another blog post.

When it came to getting my app running, however, I was very pleasantly surprised.

At this stage in development, I just have a rough requirements.txt that I add Python deps to manually. This might be a good thing, as I avoid the pain of some the additional layers people have added.

So after installing Python and creating a virtual environment on Windows, I ran pip install -r requirements.txt, expecting a world of pain, especially as I already had complex non-Python dependencies, including Qt5 and VTK. I had specified both of these as simple Python deps via the wrappers pyqt5 and vtk in my requirements.txt, and nothing else, with the attitude of “well I may as well dream this is going to work”.

And in fact, it did! Everything just downloaded as binary wheels – rather large ones, but that’s fine. I didn’t need compilers or QMake or header files or anything.

And when I ran my app, apart from a dependency that I’d forgotten to add to requirements.txt, everything worked perfectly first time. This was even more surprising as I had put zero conscious effort into Windows compatibility. In retrospect I realise that use of pathlib, which is automatic for me these days, had helped me because it smooths over some Windows/Unix differences with path handling.

Of course, this is a single datapoint. From other people’s reports there are many, many ways that this experience may not be typical. But that it is possible at all suggests that a lot of progress has been made and we are very much going in the right direction. A lot of people have put a lot of work in to achieve that, for which I’m very grateful!

Categories: FLOSS Project Planets

Debian Brasil: MiniDebConf BH 2024 - patrocínio e financiamento coletivo

Planet Debian - Sun, 2024-01-21 06:00

Já está rolando a inscrição de participante e a chamada de atividades para a MiniDebConf Belo Horizonte 2024, que acontecerá de 27 a 30 de abril no Campus Pampulha da UFMG.

Este ano estamos ofertando bolsas de alimentação, hospedagem e passagens para contribuidores(as) ativos(as) do Projeto Debian.

Patrocínio:

Para a realização da MiniDebConf, estamos buscando patrocínio financeiro de empresas e entidades. Então se você trabalha em uma empresa/entidade (ou conhece alguém que trabalha em uma) indique o nosso plano de patrocínio para ela. Lá você verá os valores de cada cota e os seus benefícios.

Financiamento coletivo:

Mas você também pode ajudar a realização da MiniDebConf por meio do nosso financiamento coletivo!

Faça uma doação de qualquer valor e tenha o seu nome publicado no site do evento como apoiador(a) da MiniDebConf Belo Horizonte 2024.

Mesmo que você não pretenda vir a Belo Horizonte para participar do evento, você pode doar e assim contribuir para o mais importante evento do Projeto Debian no Brasil.

Contato

Qualquer dúvida, mande um email para contato@debianbrasil.org.br

Organização

Categories: FLOSS Project Planets

TechBeamers Python: LangChain Agent Basics with Sample Agent Code

Planet Python - Sun, 2024-01-21 01:29

LangChain agents are fascinating creatures! They live in the world of text and code, interacting with humans through conversations and completing tasks based on instructions. Think of them as your digital assistants, but powered by artificial intelligence and fueled by language models. Getting Started with Agents in LangChain Imagine a chatty robot friend that gets […]

The post LangChain Agent Basics with Sample Agent Code appeared first on TechBeamers.

Categories: FLOSS Project Planets

TechBeamers Python: LangChain Memory Basics

Planet Python - Sun, 2024-01-21 00:34

Langchain Memory is like a brain for your conversational agents. It remembers past chats, making conversations flow smoothly and feel more personal. Think of it like chatting with a real friend who recalls what you talked about before. This makes the agent seem smarter and more helpful. Getting Started with Memory in LangChain Imagine you’re […]

The post LangChain Memory Basics appeared first on TechBeamers.

Categories: FLOSS Project Planets

TypeThePipe: Data Engineering Bootcamp 2024 (Week 1) Docker &amp; Terraform

Planet Python - Sat, 2024-01-20 19:00
pre > code.sourceCode { white-space: pre; position: relative; } pre > code.sourceCode > span { display: inline-block; line-height: 1.25; } pre > code.sourceCode > span:empty { height: 1.2em; } .sourceCode { overflow: visible; } code.sourceCode > span { color: inherit; text-decoration: inherit; } div.sourceCode { margin: 1em 0; } pre.sourceCode { margin: 0; } @media screen { div.sourceCode { overflow: auto; } } @media print { pre > code.sourceCode { white-space: pre-wrap; } pre > code.sourceCode > span { text-indent: -5em; padding-left: 5em; } } pre.numberSource code { counter-reset: source-line 0; } pre.numberSource code > span { position: relative; left: -4em; counter-increment: source-line; } pre.numberSource code > span > a:first-child::before { content: counter(source-line); position: relative; left: -1em; text-align: right; vertical-align: baseline; border: none; display: inline-block; -webkit-touch-callout: none; -webkit-user-select: none; -khtml-user-select: none; -moz-user-select: none; -ms-user-select: none; user-select: none; padding: 0 4px; width: 4em; color: #aaaaaa; } pre.numberSource { margin-left: 3em; border-left: 1px solid #aaaaaa; padding-left: 4px; } div.sourceCode { } @media screen { pre > code.sourceCode > span > a:first-child::before { text-decoration: underline; } } code span.al { color: #ff0000; font-weight: bold; } /* Alert */ code span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */ code span.at { color: #7d9029; } /* Attribute */ code span.bn { color: #40a070; } /* BaseN */ code span.bu { color: #008000; } /* BuiltIn */ code span.cf { color: #007020; font-weight: bold; } /* ControlFlow */ code span.ch { color: #4070a0; } /* Char */ code span.cn { color: #880000; } /* Constant */ code span.co { color: #60a0b0; font-style: italic; } /* Comment */ code span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */ code span.do { color: #ba2121; font-style: italic; } /* Documentation */ code span.dt { color: #902000; } /* DataType */ code span.dv { color: #40a070; } /* DecVal */ code span.er { color: #ff0000; font-weight: bold; } /* Error */ code span.ex { } /* Extension */ code span.fl { color: #40a070; } /* Float */ code span.fu { color: #06287e; } /* Function */ code span.im { color: #008000; font-weight: bold; } /* Import */ code span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */ code span.kw { color: #007020; font-weight: bold; } /* Keyword */ code span.op { color: #666666; } /* Operator */ code span.ot { color: #007020; } /* Other */ code span.pp { color: #bc7a00; } /* Preprocessor */ code span.sc { color: #4070a0; } /* SpecialChar */ code span.ss { color: #bb6688; } /* SpecialString */ code span.st { color: #4070a0; } /* String */ code span.va { color: #19177c; } /* Variable */ code span.vs { color: #4070a0; } /* VerbatimString */ code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */


Free Data Engineering Bootcamp 2024 to become a skilled Data Analytics Engineer. Week 1

I’ve just enrolled in the DataTalks free Data Engineering zootcamp. It’s a fantastic initiative that has been running for several years, with each cohort occurring annually.

The course is organized weekly, featuring one online session per week. There are optional weekly homework assignments which are reviewed, and the course concludes with a mandatory Data Eng final project, which is required to earn the certification.

In this series of posts, I will be sharing with you my course notes and comments, and also how I’m resolving the homework.


1. Dockerized data pipeline (Intro, dockerfile and doocker compose)

Let’s delve into the essentials of Docker, Dockerfile, and Docker Compose. These three components are crucial in the world of software development, especially when dealing with application deployment and management.

Docker: The Cornerstone of Containerization

Docker stands at the forefront of containerization technology. It allows developers to package applications and their dependencies into containers. A container is an isolated environment, akin to a lightweight, standalone, and secure package of software that includes everything needed to run it: code, runtime, system tools, system libraries, and settings. This technology ensures consistency across multiple development and release cycles, standardizing your environment across different stages.

Dockerfile: Blueprint for Docker images

A Dockerfile is a text document containing all the commands a user could call on the command line to assemble a Docker image. It automates the process of creating Docker images. A Dockerfile defines what goes on in the environment inside your container. It allows you to create a container that meets your specific needs, which can then be run on any Docker-enabled machine.

Docker Compose: Simplifying multi-container applications

Docker Compose is a tool for defining and running multi-container Docker applications. With Compose, you use a YAML file to configure your application’s services, networks, and volumes. Then, with a single command, you create and start all the services from your configuration. Docker Compose works in all environments: production, staging, development, testing, as well as CI workflows.

Why these tools matter?

The combination of Docker, Dockerfile, and Docker Compose streamlines the process of developing, shipping, and running applications. Docker encapsulates your application and its environment, Dockerfile builds the image for this environment, and Docker Compose manages the orchestration of multi-container setups. Together, they provide a robust and efficient way to handle the lifecycle of applications. This ecosystem is integral for developers looking to leverage the benefits of containerization for more reliable, scalable, and collaborative software development.

Get Docker and read there the documentation.


What is Docker and why is it useful for Data Engineering?

Alright, data engineers, gather around! Why should you care about containerization and Docker? Well, it’s like having a Swiss Army knife in your tech toolkit. Here’s why:

  • Local Experimentation: Setting up things locally for experiments becomes a breeze. No more wrestling with conflicting dependencies or environments. Docker keeps it clean and easy.

  • Testing and CI Made Simple: Integration tests and CI/CD pipelines? Docker smoothens these processes. It’s like having a rehearsal space for your code before it hits the big stage.

  • Batch Jobs and Beyond: While Docker plays nice with AWS Batch, Kubernetes jobs, and more (though that’s a story for another day), it’s a glimpse into the world of possibilities with containerization.

  • Spark Joy: If you’re working with Spark, Docker can be a game-changer. It’s like having a consistent and controlled playground for your data processing.

  • Serverless, Stress-less: With the rise of serverless architectures like AWS Lambda, Docker ensures that you’re developing in an environment that mirrors your production setup. No more surprises!

So, there you have it. Containers are everywhere, and Docker is leading the parade. It’s not just a tool; it’s an essential part of the modern software development and deployment process.


Run Postgres and PGAdmin containers

You may need to create a network in order the containers communicate. Then, run the Postgres database container. Notice that in order to persist the data, as docker containers would be stateless and reinizialized after each run, you may indicate Docker volumes to persist the PG internal and ingested data.

docker network create pg-network docker run -it \ -e POSTGRES_USER="root" \ -e POSTGRES_PASSWORD="root" \ -e POSTGRES_DB="ny_taxi" \ -v /Users/jobandtalent/data-eng-bootcamp/ny_taxi_pg_data:/var/lib/postgresql/data \ -p 5437:5432 \ --name pg-database \ --network=pg-network \ postgres:13 `` Let's create and start the PG Admin container: docker run -it \ -e PGADMIN_DEFAULT_EMAIL="admin@admin.com" \ -e PGADMIN_DEFAULT_PASSWORD="root" \ -p 8080:80 \ --name pgadmin \ --network=pg-network \ dpage/pgadmin4


Manage multiple containers with the Docker-compose file

Instead of managing our containers individually, it is way better to manage them from one single source. The docker-compose file allows you to specify the services/containers you want to build and run, from the image to run till the environment variables and volumes.

version: "3.11" services: pg-database: image: postgres:13 environment: - POSTGRES_USER=root - POSTGRES_PASSWORD=root - POSTGRES_DB=ny_taxi volumes: - ./ny_taxi_pg_data:/var/lib/postgresql/data ports: - 5432:5432 pg-admin: image: dpage/pgadmin4 environment: - PGADMIN_DEFAULT_EMAIL=admin@admin.com - PGADMIN_DEFAULT_PASSWORD=root ports: - 8080:80 volumes: - "pgadmin_conn_data:/var/lib/pgadmin:rw" volumes: pgadmin_conn_data:


Create your pipeline script and a Dockerfile

The pipeline script objective is download the ddata from the US taxi rides and insert it in the Postgres database. The script could be as simple as:

import polars as pl from pydantic_settings import BaseSettings, SettingsConfigDict from typing import ClassVar TRIPS_TABLE_NAME = "green_taxi_trips" ZONES_TABLE_NAME = "zones" class PgConn(BaseSettings): model_config = SettingsConfigDict(env_prefix='PG_', env_file='.env', env_file_encoding='utf-8') user: str pwd: str host: str port: int db: str connector: ClassVar[str] = "postgresql" @property def uri(self): return f"{self.connector}://{self.user}:{self.pwd}@{self.host}:{self.port}/{self.db}" df_ny_taxi = pl.read_csv("https://github.com/DataTalksClub/nyc-tlc-data/releases/download/green/green_tripdata_2019-09.csv.gz") df_zones = pl.read_csv("https://s3.amazonaws.com/nyc-tlc/misc/taxi+_zone_lookup.csv") conn = PgConn() df_zones.write_database(ZONES_TABLE_NAME, conn.uri) df_ny_taxi.write_database(TRIPS_TABLE_NAME, conn.uri)

We have used Polars and Pydantic (v2) libraries. With Polars, we load the data from csvs and also manage how to write it to the database with write_database DataFrame method. We load Pydantic module in order to create a Postgres connection object, that loads the configuration from the environment. We are using for convenience an .env config file, but it is not mandatory. As we will explain in the following chunk of code, we set the env variables on the Dockerfile and they remain accessible from inside the container, being trivial to load them using BaseSettings and SettingsConfigDict.

Now, in order to Dockerize your data pipeline script, that as we just saw it download and ingest the data to Postgres, we need to create a Dockerfile with the specifications of the container. We are using Poetry as a dependency manager, so we need to include those pyproject.toml (multi-pourpose file, used to specify to Poetry the desired module version constraints) and the poetry.lock (specific for Poetry version pin respecting the package version constraints from pyproject.toml). Also, we may include to the container the actual ingest_data.py Python pipeline file.

FROM python:3.11 ARG POETRY_VERSION=1.7.1 WORKDIR /app COPY ingest_data.py ingest_data.py COPY .env .env COPY pyproject.toml pyproject.toml COPY poetry.lock poetry.lock RUN pip3 install --no-cache-dir poetry==${POETRY_VERSION} \ && poetry env use 3.11 \ && poetry install --no-cache ENTRYPOINT [ "poetry", "run", "python", "ingest_data.py" ]

Just by building the container in our root folder and running it, the ingest_data.py script will be executed, and therefore the data downloaded and persisted on the Postgres database.


2. Terraform: Manage your GCP infra. Google Storage and Google BigQuery


Terraform intro and basic commands

Terraform has become a key tool in modern infrastructure management. Terraform, named with a nod to the concept of terraforming planets, applies a similar idea to cloud and local platform infrastructure. It’s about creating and managing the necessary environment for software to run efficiently on platforms like AWS, GCP…

Terraform, developed by HashiCorp, is described as an “infrastructure as code” tool. It allows users to define and manage both cloud-based and on-premises resources in human-readable configuration files. These files can be versioned, reused, and shared, offering a consistent workflow to manage infrastructure throughout its lifecycle. The advantages of using Terraform include simplicity in tracking and modifying infrastructure, ease of collaboration (since configurations are file-based and can be shared on platforms like GitHub), and reproducibility. For instance, an infrastructure set up in a development environment can be replicated in production with minor adjustments. Additionally, Terraform helps in ensuring that resources are properly removed when no longer needed, avoiding unnecessary costs.

So this tool not only simplifies the process of infrastructure management but also ensures consistency and compliance with your infrastructure setup.

However, it’s important to note what Terraform is not. It doesn’t handle the deployment or updating of software on the infrastructure; it’s focused solely on the infrastructure itself. It doesn’t allow modifications to immutable resources without destroying and recreating them. For example, changing the type of a virtual machine would require its recreation. Terraform also only manages resources defined within its configuration files.


Set up Terraform for GCP deploys. From GCP account permissions to the main.tf file

Diving into the world of cloud infrastructure can be a daunting task, but with tools like Terraform, the process becomes more manageable and streamlined. Terraform, an open-source infrastructure as code software tool, allows users to define and provision a datacenter infrastructure using a high-level configuration language. Here’s a guide to setting up Terraform for Google Cloud Platform (GCP).

Creating a Service Account in GCP

Before we start coding with Terraform, it’s essential to establish a method for Terraform on our local machine to communicate with GCP. This involves setting up a service account in GCP – a special type of account used by applications, as opposed to individuals, to interact with the GCP services.

Creating a service account is straightforward. Log into the GCP console, navigate to the “IAM & Admin” section, and create a new service account. This account should be given specific permissions relevant to the resources you plan to manage with Terraform, such as Cloud Storage or BigQuery.

Once the service account is created, the next step is to manage its keys. These keys are crucial as they authenticate and authorize the Terraform script to perform actions in GCP. It’s vital to handle these keys with care, as they can be used to access your GCP resources. You should never expose these credentials publicly.

Setting Up Your Local Environment

After downloading the key as a JSON file, store it securely in your local environment. It’s recommended to create a dedicated directory for these keys to avoid any accidental uploads, especially if you’re using version control like Git.

Remember, you can configure Terraform to use these credentials in several ways. One common method is to set an environment variable pointing to the JSON file, but you can also specify the path directly in your Terraform configuration.

Writing Terraform Configuration

With the service account set up, you can begin writing your Terraform configuration. This is done in a file typically named main.tf. In this file, you define your provider (in this case, GCP) and the resources you wish to create, update, or delete.

For instance, if you’re setting up a GCP storage bucket, you would define it in your main.tf file. Terraform configurations are declarative, meaning you describe your desired state, and Terraform figures out how to achieve it. You are ready for terraform init to start with your project.

Planning and Applying Changes

Before applying any changes, it’s good practice to run terraform plan. This command shows what Terraform will do without actually making any changes. It’s a great way to catch errors or unintended actions.

Once you’re satisfied with the plan, run terraform apply to make the changes. Terraform will then reach out to GCP and make the necessary adjustments to match your configuration.

Cleaning Up: Terraform Destroy

When you no longer need the resources, Terraform makes it easy to clean up. Running terraform destroy will remove the resources defined in your Terraform configuration from your GCP account.

Lastly, a word on security: If you’re storing your Terraform configuration in a version control system like Git, be mindful of what you commit. Ensure that your service account keys and other sensitive data are not pushed to public repositories. Using a .gitignore file to exclude these sensitive files is a best practice.

For instance, our main.tf file for creating a GCP Storage Bucket and a Big Query dataset looks like:

terraform { required_providers { google = { source = "hashicorp/google" version = "5.12.0" } } } provider "google" { credentials = var.credentials project = "concise-quarter-411516" region = "us-central1" } resource "google_storage_bucket" "demo_bucket" { name = var.gsc_bucket_name location = var.location force_destroy = true lifecycle_rule { condition { age = 1 } action { type = "AbortIncompleteMultipartUpload" } } } resource "google_bigquery_dataset" "demo_dataset" { dataset_id = var.bq_dataset_name }

As you may noticed, some of the values are strings/ints/floats but others are var.* values. In the next section we are talking about keeping the Terraform files tidy with the usage of variables.


Parametrize files with variables.tf

Terraform variables offer a centralized and reusable way to manage values in infrastructure automation, separate from deployment plans. They are categorized into two main types: input variables for configuring infrastructure and output variables for retrieving information post-deployment. Input variables define values like server configurations and can be strings, lists, maps, or booleans. String variables simplify complex values, lists represent indexed values, maps store key-value pairs, and booleans handle true/false conditions.

Output variables are used to extract details like IP addresses after the infrastructure is deployed. Variables can be predefined in a file or via command-line, enhancing flexibility and readability. They also support overriding at deployment, allowing for customized infrastructure management. Sensitive information can be set as environmental variables, prefixed with TF_VAR_, for enhanced security. Terraform variables are essential for clear, manageable, and secure infrastructure plans.

In our case, we are using variables.tf looks as:

variable "credentials" { default = "./keys/my_creds.json" } variable "location" { default = "US" } variable "bq_dataset_name" { description = "BigQuery dataset name" default = "demo_dataset" } variable "gcs_storage_class" { description = "Bucket Storage class" default = "STANDARD" } variable "gsc_bucket_name" { description = "Storage bucket name" default = "terraform-demo-20240115-demo-bucket" }

We are parametrizing here the credentials file, buckets location, storage class, bucket name…

As we’ve discussed, mastering Terraform variables is a key step towards advanced infrastructure automation and efficient code management.

For more information about Terraform variables, you can visit this post.


´

Stay updated on the Data Engineering and Data Analytics Engineer career paths

This was the content I gathered for the very first week of the DataTalks Data Engineering bootcamp. I’ve definetively enjoyed it and I’m excited to continue with Week 2.

If you want to stay updated, homework the homework along with explanations…

#mc_embed_signup{background:#fff; clear:left; font:14px Helvetica,Arial,sans-serif; width:100%;} #mc_embed_signup .button { background-color: #0294A5; /* Green */ color: white; transition-duration: 0.4s; } #mc_embed_signup .button:hover { background-color: #379392 !important; } Suscribe for Data Eng content and explained homework!
Categories: FLOSS Project Planets

Debug symbols for all!

Planet KDE - Sat, 2024-01-20 19:00

When running Linux software and encountering a crash, and you make a bug report about it (thank you!), you may be asked for backtraces and debug symbols.

And if you're not developer you may wonder what in the heck are those?

I wanted to open up this topic a bit, but if you want more technical in-depth look into these things, internet is full of info. :)

This is more a guide for any common user who encounters this situation and what they can do to get these mystical backtraces and symbols and magic to the devs.

Backtrace

When developers ask for a backtrace, they're basically asking "what are the steps that caused this crash to happen?" Debugger software can show this really nicely, line by line. However without correct debug symbols, the backtrace can be meaningless.

But first, how do you get a backtrace of something?

On systems with systemd installed, you often have a terminal tool called coredumpctl. This tool can list many crashes you have had with software. When you see something say "segmentation fault, core dumped", this is the tool that can show you those core dumps.

So, here's a few ways to use it!

How to see all my crashes (coredumps)

Just type coredumpctl in terminal and a list opens. It shows you a lot of information and last the app name.

How to open a specific coredump in a debugger

First, check from the plain coredumpctl list the coredump you want to check out. Easiest way to deduce it is to check the date and time. After that, there's something called PID number, for example 12345. You can close the list by pressing q and then type coredumpctl debug 12345.

This will often open GDB, where you can type bt for it to start printing the backtrace. You can then copy that backtrace. But there's IMO easier way.

Can I just get the backtrace automatically in a file..?

If you only want the latest coredump of the app that crashed on you, then print the backtrace in a text file that you can just send to devs, here's a oneliner to run in terminal:

coredumpctl debug APP_NAME_HERE -A "-ex bt -ex quit" |& tee backtrace.txt

You can also use the PID shown earlier in place of the app name, if you want some specific coredump.

The above command will open the coredump in a debugger, then run bt command, then quit, and it will write it all down in a file called backtrace.txt that you can share with developers.

As always when using debugging and logging features, check the file for possible personal data! It's very unlikely to have anything personal data, BUT it's still a good practice to check it!

Here's a small snippet from a backtrace I have for Kate text editor:

#0 __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44 ... #18 0x00007f5653fbcdb9 in parse_file (table=table@entry=0x19d5a60, file=file@entry=0x19c8590, file_name=file_name@entry=0x7f5618001590 "/usr/share/X11/locale/en_US.UTF-8/Compose") at ../src/compose/parser.c:749 #19 0x00007f5653fc5ce0 in xkb_compose_table_new_from_locale (ctx=0x1b0cc80, locale=0x18773d0 "en_IE.UTF-8", flags=<optimized out>) at ../src/compose/table.c:217 #20 0x00007f565138a506 in QtWaylandClient::QWaylandInputContext::ensureInitialized (this=0x36e63c0) at /usr/src/debug/qt6-qtwayland-6.6.0-1.fc39.x86_64/src/client/qwaylandinputcontext.cpp:228 #21 QtWaylandClient::QWaylandInputContext::ensureInitialized (this=0x36e63c0) at /usr/src/debug/qt6-qtwayland-6.6.0-1.fc39.x86_64/src/client/qwaylandinputcontext.cpp:214 #22 QtWaylandClient::QWaylandInputContext::filterEvent (this=0x36e63c0, event=0x7ffd27940c50) at /usr/src/debug/qt6-qtwayland-6.6.0-1.fc39.x86_64/src/client/qwaylandinputcontext.cpp:252 ...

The first number is the step where we are. Step #0 is where the app crashes. The last step is where the application starts running. Keep in mind though that even the app crashes at #0 that may be just the computer handling the crash, instead of the actual culprit. The culprit for the crash can be anywhere in the backtrace. So you have to do some detective work if you want to figure it out. Often crashes happen when some code execution path goes in unexpected route, and the program is not prepared for that.

Remember that you will, however, need proper debug symbols for this to be useful! We'll check that out in the next chapter.

Debug symbols

Debug symbols are something that tells the developer using debugger software, like GDB, what is going on and where. Without debugging symbols the debugger can only show the developer more obfuscated data.

I find this easier to show with an example:

Without debug symbols, this is what the developer sees when reading the backtrace:

0x00007f7e9e29d4e8 in QCoreApplication::notifyInternal2(QObject*, QEvent*) () from /lib64/libQt5Core.so.5

Or even worse case scenario, where the debugger can't read what's going on but only can see the "mangled" names, it can look like this:

_ZN20QEventDispatcherGlib13processEventsE6QFlagsIN10QEventLoop17ProcessEventsFlagEE

Now, those are not very helpful. At least the first example tells what file the error is happening in, but it doesn't really tell where. And the second example is just very difficult to understand what's going on. You don't even see what file it is.

With correct debug symbols installed however, this is what the developer sees:

QCoreApplication::notifyInternal2(QObject*, QEvent*) (receiver=0x7fe88c001620, event=0x7fe888002c20) at kernel/qcoreapplication.cpp:1064

As you can see, it shows the file and line. This is super helpful since developers can just open the file in this location and start mulling it over. No need to guess what line it may have happened, it's right there!

So, where to get the debug symbols?

Every distro has it's own way, but KDE wiki has an excellent list of most common operating systems and how to get debug symbols for them: https://community.kde.org/Guidelines_and_HOWTOs/Debugging/How_to_create_useful_crash_reports

As always, double check with your distros official documentation how to proceed. But the above link is a good starting point!

But basically, your package manager should have them. If not, you will have to build the app yourself with debug symbols enabled, which is definitely not ideal.. If the above list does not have your distro/OS, you may have to ask the maintainers of your distro/OS for help with getting the debug symbols installed.

Wait, which ones do I download?!

Usually the ones for the app that is crashing. Sometimes you may also need include the libraries the app is using.

There is no real direct answer this, but at the very least, get debug symbols for the app. If developers need more, they will ask you to install the other ones too.

You can uninstall the debug symbols after you're done, but that's up to you.

Thanks for reading!

I hope this has been useful! I especially hope the terminal "oneliner" command mentioned above for printing backtraces quickly into a file is useful for you!

Happy backtracing! :)

Categories: FLOSS Project Planets

TechBeamers Python: Understanding Python Timestamps: A Simple Guide

Planet Python - Sat, 2024-01-20 14:19

In Python, a timestamp is like a timestamp! It’s a number that tells you when something happened. This guide will help you get comfortable with timestamps in Python. We’ll talk about how to make them, change them, and why they’re useful. Getting Started with Timestamp in Python A timestamp is just a way of saying, […]

The post Understanding Python Timestamps: A Simple Guide appeared first on TechBeamers.

Categories: FLOSS Project Planets

Improving the qcolor-from-literal Clazy check

Planet KDE - Sat, 2024-01-20 13:26
Improving the qcolor-from-literal Clazy check

For all of you who don't know, Clazy is a clang compiler plugin that adds checks for Qt semantics. I have it as my default compiler, because it gives me useful hints when writing or working with preexisting code. Recently, I decided to give working on the project a try! One bigger contribution of mine was to the qcolor-from-literal check, which is a performance optimization. A QColor object has different constructors, this check is about the string constructor. It may accept standardized colors like “lightblue”, but also color patterns. Those can have different formats, but all provide an RGB value and optionally transparency. Having Qt parse this as a string causes performance overhead compared to alternatives.

Fixits for RGB/RGBA patterns

When using a color pattern like “#123” or “#112233”, you may simply replace the string parameter with an integer providing the same value. Rather than getting a generic warning about using this other constructor, a more specific warning with a replacement text (called fixit) is emitted.

testf4ile.cpp:92:16: warning: The QColor ctor taking RGB int value is cheaper than one taking string literals [-Wclazy-qcolor-from-literal] QColor("#123"); ^~~~~~ 0x112233 testfile.cpp:93:16: warning: The QColor ctor taking RGB int value is cheaper than one taking string literals [-Wclazy-qcolor-from-lite QColor("#112233"); ^~~~~~~~~ 0x112233

In case a transparency parameter is specified, the fixit and message are adjusted:

testfile.cpp:92:16: warning: The QColor ctor taking ints is cheaper than one taking string literals [-Wclazy-qcolor-from-literal] QColor("#9931363b"); ^~~~~~~~~~~ 0x31, 0x36, 0x3b, 0x99 Warnings for invalid color patterns

Next to providing fixits for more optimized code, the check now verifies that the provided pattern is valid regarding the length and contained characters. Without this addition, an invalid pattern would be silently ignored or cause an improper fixit to be suggested.

.../qcolor-from-literal/main.cpp:21:28: warning: Pattern length does not match any supported one by QColor, check the documentation [-Wclazy-qcolor-from-literal] QColor invalidPattern1("#0000011112222"); ^ .../qcolor-from-literal/main.cpp:22:28: warning: QColor pattern may only contain hexadecimal digits [-Wclazy-qcolor-from-literal] QColor invalidPattern2("#G00011112222"); Fixing a misleading warning for more precise patterns

In case a “#RRRGGGBBB” or “#RRRRGGGGBBBB” pattern is used, the message would previously suggest using the constructor taking ints. This would however result in an invalid QColor, because the range from 0-255 is exceeded. QRgba64 should be used instead, which provides higher precision.

I hope you find this new or rather improved feature of Clazy useful! I utilized the fixits in Kirigami, see https://invent.kde.org/frameworks/kirigami/-/commit/8e4a5fb30cc014cfc7abd9c58bf3b5f27f468168. Doing the change manually in Kirigami would have been way faster, but less fun. Also, we wouldn't have ended up with better tooling :)

Categories: FLOSS Project Planets

Pages