SciForce Blog

Read our blog and carry on

Learn who we are and why we stand out among the others.

iconHealthcare
iconData Analytics
iconData Science
iconMachine Learning
iconNLP
iconFollow Sciforce on Medium
Recommended
WHAT IS GPT-3, HOW DOES IT WORK, AND WHAT DOES IT ACTUALLY DO?What is GPT-3, How Does It Work, and What Does It Actually Do?
Recommended
HOW TO TELL A FANTASTIC DATA STORYHow to tell a fantastic data story

What are the central parts of any data-driven story, and how to apply the Gestalt laws to your data visualization reports? What role does context play in your data story? We are dwelling on these questions and providing you with a list of the best books and tools for stunning data visualization. Check it out! First, let us start from the central part of any good data story — the data-ink ratio. You’ve probably heard of or read something from Edward Tufte, the father of data visualization, who has coined the term. Thus, by saying data-ink ratio, we mean the amount of data-ink divided by the total ink needed for your visualization. Every element of your infographics requires a reason. Simply put, Tufte says that you should use only the required details and remove the visual noise. How far could you go? *Until your visualization communicates the overall idea. Daniel Haight calls it the visualization spectrum, the constant trade-off between **clarity **and *engagement. Daniel proposes to measure the clarity by the time needed to understand the visualization, dependent on information density. Then, you can measure the engagement of your data story by the emotional connection it involves (besides the shares and mentions on social media). Take the data story by Fivethirthyeight about women of color at the US Congress as an example. Authors are using simple elements and not overloading the viewer with needles details (but still communicating the overall story crystal clear). At the same time, looking at timelines of different colors, you can see the drastic changes behind them. That is a pretty good compromise between clarity and engagement. Edward Tufte also coined the term chartjunk, which stands for all the ugly visualization you may see endless online. 11 Reasons Infographics Are Poison And Should Never Be Used On The Internet Again by Walt Hickey, dated back by 2013, is still topical. Thus, it is better to follow the principles that never make infographics appear on the list. Gestalt principles are heuristics standing as the mental shortcuts for the brain and explain how we group the small objects to form the larger ones. Thus, we tend to perceive the things located close to each other as a group, and it is a principle of proximity. Check out also post by Becca Selah, rich in quick and easy tips on clear data visualization. In essence, color conveys information in the most effective way for a human brain. But choosing it according to the rules might be frustrating. As a rule of thumb, remember that choosing a color of the same saturation helps a viewer to perceive such colors as a group. Also, check out the excellent guide from Lisa Charlotte Rost (Datawrapper) since it is the best thing that we have seen for beginners looking for color picking tools. Pro-tip: gray color is not fighting for human attention and should become your best friend. Andy Kirk tells more about it here. Firstly, let us make it clear — a data visualization specialist is a data analyst. Thus, your primary task is to differentiate the signals from the noise, i.e., finding the hidden gems in tons of data. But presenting your findings with good design in mind is not enough while no context is introduced to your audience. Here is when storytelling comes in handy. What is your audience, and how will they see it? This question is relevant for both cases. At the outset of your data-driven story, define your focus — is it a broad or narrow one? In the first case, you will spend lots of time digging in data, so it is crucial to ask your central question. Working with a narrow focus is different. When you have some specific prerequisites at the very beginning and harness several datasets to find the answer, it is the case. Thus, it might be easier to cope with one specific inquiry than to look for some insights in datasets. Consider placing simple insights at the beginning of your story. Thus, you draw in the reader immediately and add some relevant points to illustrate the primary idea better. But when it comes to “well-it-depends-what-are-we-talking-about” answers, try to be mother-like, careful with your audience. Guide them step-by-step into your story. You can use comparing within different elements of periods or apply analogies. Also, it is helpful to use individual data points on a small scale before delving into the large-scaled story. Besides the links to the datasets you have been working with while crafting your story, it is worth sharing your methodology. Thus, your savvy audience may be relaxed while looking at the results. You may read tons of blogs on data visualization, but we believe in the old-fashioned style — build your stable foundation first with classics. Thus, here is the list of ones that we recommend depending on the tasks you are solving. Edward Tufte is the father of data visualization, so we recommend starting with his books to master the main ideas. The Visual Display of Quantitative Information, Envisioning Information, Beautiful Evidence, and Visual Explanation — You are a rock star of data visualization! Naked Statistics: Stripping the Dread from the Data by Charles Wheelan. Applying statistics to your analysis is crucial, and you will delve into the principal concepts like inference, correlation, and regression analysis. How Not to Be Wrong: The Power of Mathematical Thinking by Jordan Ellenberg. We recommend it for those who are coming to DataViz with a non-tech background. Statistics Unplugged by Sally Caldwell. Again, statistics explained, since you will love it. Interactive Data Visualization for the Web: An Introduction to Designing with D3 by Scott Murray comes in handy to create online visualization even if you have no experience with web development. D3.js in Action: Data visualization with JavaScript by Elijah Meeks — a guide on creating interactive graphics with D3. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data by Hadley Wickham to brush up your coding skills with R. Data Visualisation: A Handbook for Data Driven Design by Andy Kirk. This one can help you choose the best visualization for your data, as your insights should not be as clear as engaging, ideally. Visualization Analysis and Design by Tamara Munzner. This one represents a comprehensive, systematic approach to design for DataViz. Information Visualization: Perception for Design by Colin Ware is a cherry on the cake of your designing skills. However, if you’ve got some tasks regarding data visualization right now and have no time for upskilling, we recommend online tools like Florish or Datawrapper. Mala Deep dwells on five free data visualization tools pretty clear, check it out! Also, we would appreciate your suggestions in the comments section!

Generative models under a microscope: Comparing VAEs, GANs, and Flow-Based Models

Two years after Generative Adversarial Networks were introduced in a paper by Ian Goodfellow and other researchers, including in 2014, Facebook’s AI research director and one of the most influential AI scientists, Yann LeCun called adversarial training “the most interesting idea in the last ten years in ML.” Interesting and promising as they are, GANs are only a part of the family of generative models that can offer a completely different angle of solving traditional AI problems. Generative algorithms When we think of Machine Learning, the first algorithms that will probably come to mind will be discriminative. Discriminative models that predict a label or a category of some input data depending on its features are the core of all classification and prediction solutions. In contrast to such models, generative algorithms help us tell a story about the data, providing a possible explanation of how the data has been generated. Instead of mapping features to labels, like discriminative algorithms do, generative models attempt to predict features given a label. While discriminative models define the relation between a label y and a feature x, generative models answer “how you get x.” Generative Models model P(Observation/Cause) and then use Bayes theorem to compute P(Cause/Observation). In this way, they can capture p(x|y), the probability of x given y, or the probability of features given a label or category. So, actually, generative algorithms can be used as classifiers, but much more, as they model the distribution of individual classes. There are many generative algorithms, yet the most popular models that belong to the Deep Generative Models category are Variational Autoencoders (VAE), GANs, and Flow-based Models. Variational Autoencoders A Variational Autoencoder (VAE) is a generative model that “provides probabilistic descriptions of observations in latent spaces.” Simply put, this means VAEs store latent attributes as probability distributions. The idea of Variational Autoencoder (Kingma & Welling, 2014), or VAE, is deeply rooted in the variational bayesian and graphical model methods. A standard autoencoder comprises a pair of two connected networks, an encoder, and a decoder. The encoder takes in an input and converts it into a smaller representation, which the decoder can use to convert it back to the original input. However, the latent space they convert their inputs to and where their encoded vectors lie may not be continuous or allow easy interpolation. For a generative model, it becomes a problem, since you want to randomly sample from the latent space or generate variations on an input image from a continuous latent space. Variational Autoencoders have their latent spaces continuous by design, allowing easy random sampling and interpolation. To achieve this, the hidden nodes of the encoder do not output an encoding vector but, rather, two vectors of the same size: a vector of means and a vector of standard deviations. Each of these hidden nodes will act as its own Gaussian distribution. The new vectors form the parameters of a so-called latent vector of random variables. The ith element of both mean and standard deviation vectors corresponds to the ith random variable's mean and standard deviation values. We sample from this vector to obtain the sampled encoding that is passed to the decoder. Decoders can then sample randomly from the probability distributions for input vectors. This process is stochastic generation. It implies that even for the same input, while the mean and standard deviation remain the same, the actual encoding will somewhat vary on every pass simply due to sampling. The loss of the autoencoder is to minimize both the reconstruction loss (how similar the autoencoder’s output to its input) and its latent loss (how close its hidden nodes were to a normal distribution). The smaller the latent loss, the less information can be encoded that boosts the reconstruction loss. As a result, the VAE is locked in a trade-off between the latent loss and the reconstruction loss. When the latent loss is small, the generated images will resemble the images at train time too much, but they will look bad. If the reconstruction loss is small, the reconstructed images at train time will look good, but novel generated images will be far from the reconstructed images. Obviously, we want both, so it’s important to find a nice equilibrium. VAEs work with remarkably diverse types of data, sequential or nonsequential, continuous or discrete, even labeled or completely unlabelled, making them highly powerful generative tools. A major drawback of VAEs is the blurry outputs that they generate. As suggested by Dosovitskiy & Brox, VAE models tend to produce unrealistic, blurry samples. This is caused by the way data distributions are recovered, and loss functions are calculated. A 2017 paper by Zhao et al. has suggested modifications to VAEs not to use variational Bayes method to improve output quality. Generative Adversarial Networks Generative Adversarial Networks, or GANs, are a deep-learning-based generative model that is able to generate new content. The GAN architecture was first described in the 2014 paper by Ian Goodfellow, et al. titled “Generative Adversarial Networks.” GANs adopt the supervised learning approach using two sub-models: the generator model that generates new examples and the discriminator model that tries to classify examples as real or fake (generated). GAN sub-models Generator. The model that is used to generate new plausible examples from the problem domain. Discriminator. The model that is used to classify examples as real (from the domain) or fake (generated). The two models compete against an adversary in a zero-sum game. The generator directly produces samples. Its adversary, the discriminator, attempts to distinguish between samples drawn from the training data and samples drawn from the generator. The game continues until the discriminator model is fooled about half the time, meaning the generator model is generating plausible examples. Image credit: Thalles Silva “Zero-sum” means that when the discriminator successfully identifies real and fake samples, it is rewarded, or its parameters remain the same. In contrast, the generator is penalized with updates to model parameters and vice versa. Ideally, the generator can generate perfect replicas from the input domain every time The discriminator cannot tell the difference and predicts “unsure” (e.g. 50% for real and fake) in every case. This is essentially an actor-critic model. It is important to remember that each model can overpower the other. If the discriminator is too good, it will return values so close to 0 or 1 that the generator will struggle to read the gradient. If the generator is too good, it will exploit weaknesses in the discriminator, leading to false negatives. Both neural networks must have a similar “skill level” achieved by respective learning rates. Generator model The generator takes a fixed-length random vector as input and generates a sample in the domain. The vector is drawn randomly from a Gaussian distribution. After training, points in this multidimensional vector space will correspond to points in the problem domain, forming a compressed representation of the data distribution. Similar to VAEs, this vector space is called a latent space, or a vector space comprised of latent variables. In the case of GANs, the generator applies meaning to points in a chosen latent spaceNew points drawn from the latent space can be provided to the generator model as input and used to generate new and different output examples. After training, the generator model is kept and used to generate new samples. The Discriminator Model The discriminator model takes an example as input (real from the training dataset or generated by the generator model) and predicts a binary class label of real or fake (generated). The discriminator is a normal (and well understood) classification model. After the training process, the discriminator is discarded, since we are interested in a robust generator. GANs can produce viable samples and have stimulated a lot of interesting research and writing. However, there are downsides to using a GAN in its plain version: Images are generated off some arbitrary noise. When generating a picture with specific features, you cannot determine what initial noise values would produce that picture but search over the entire distribution. A GAN only differs between "real" and "fake" images. But there are no constraints that a picture of a cat has to look like a cat. Thus, it might result in no actual object in a generated image, but the style looks like the picture. GANs take a long time to train. A GAN might take hours on a single GPU and a single CPU more than a day. Flow-based models Flow-based generative models are exact log-likelihood models with tractable sampling and latent-variable inference. In general terms, Flow-Based Models apply a stack of invertible transformations to a sample from a prior so that exact log-likelihood of observations can be computed. Unlike the previous two algorithms, the model learns the data distribution explicitly, and therefore the loss function is the negative log-likelihood. As a rule, a flow model f is constructed as an invertible transformation that maps the high-dimensional random variable x to a standard Gaussian latent variable z=f(x), as in nonlinear independent component analysis. The key idea in the design of a flow model is that it can be any bijective function and can be formed by stacking individual simple invertible transformations. Explicitly, the flow model f is constructed by composing a series of invertible flows as f(x) =f1◦···◦fL(x), with each fi having a tractable inverse and a tractable Jacobian determinant. Flow-based models have two large categories: models with normalizing flows and models with autoregressive flows that try to enhance the performance of the basic model. Normalizing Flows Being able to do good density estimation is essential for many machine learning problems. Still, it is intrinsically complex: when we need to run backward propagation in deep learning models, the embedded probability distribution needs to be simple enough to calculate the derivative efficiently. The conventional solution is to use Gaussian distribution in latent variable generative models, even though most real-world distributions are much more complicated. Normalizing Flow (NF) models, such as RealNVP or Glow, provide a robust distribution approximation. They transform a simple distribution into a complex one by applying a sequence of invertible transformation functions. Flowing through a series of transformations, we repeatedly substitute the variable for the new one according to the change of variables theorem. Then we eventually obtain a probability distribution of the final target variable. Models with Autoregressive Flows When flow transformation in a normalizing flow is framed as an autoregressive model where every dimension in a vector variable is under the condition of the preceding dimensions, this variation of a flow-model is referred to as an autoregressive flow. It takes a step forward compared to models with the normalizing flow. The popular autoregressive flow models are PixelCNN for image generation and WaveNet for 1-D audio signals. They both consist of a stack of causal convolution – a convolution operation made considering the ordering: the prediction at a specific timestamp consumes only data observed in the past. In PixelCNN, the causal convolution is performed by a masked convolution kernel. WaveNet shifts the output by several timestamps to the future. Thus, that output is aligned with the last input element. Flow-based models are conceptually attractive for modeling complex distributions but are limited by density estimation performance issues compared to state-of-the-art autoregressive models. Besides, though Flow Models might initially substitute GANs producing decent output, there is a significant gap between the computational cost of training between them, with Flow-Based Models taking several times as long as GANs to generate images of the same resolution. Conclusion Therefore, each of the algorithms has its benefits and limitations in terms of accuracy and efficiency. While GANs and Flow-Based Models usually generate better or closer to life images than VAEs, the latter type is more time and parameter-efficient than Flow-Based Models. Thus, GANs are parallel and efficient, not reversible. Flow Models, instead, are reversible and parallel but not efficient, and VAEs are reversible and efficient, but not parallel. In practice, it implies a constant trade-off between the output, the training process, and the efficiency.

HOW TO BUILD ETHICAL AI: GUIDING PRINCIPLES AND ACTIONABLE STEPSHow to Build Ethical AI: Guiding Principles and Actionable Steps

What springs up in your mind when hearing or reading something about AI ethics? Timnit Gebru, data biases, automated inequality, discrimination at scale? But have you ever thought of what do we mean by using the term “ethical AI?” What are the ideas of ethical AI shared and agreed upon to be perceived as so? And how to develop actionable steps toward ethical AI practices at your company? Let’s dive right into it. We don’t have any universal and conventional shared practice worldwide, but the need for an ethical AI framework is rising. Meanwhile, some principles are mentioned more often than others when drafting ethical AI policies, as per the research, The global landscape of AI ethics guidelines by Anna Jobin et al. Given their findings, AI ethics is mainly concentrating on the following five ethical principles: “transparency, justice and fairness, non-maleficence, responsibility and privacy.” Skip to the next block if you are eager to learn the actionable steps to implement ethical AI at your company. Here, we’d like to provide you with some foundational principles and useful links for further research. Based on the OECD recommendations, you can think of transparency in AI as facilitating the general understanding of AI systems and providing awareness of possible outcomes and interaction with AI systems for all stakeholders. It’s part of a bigger story related to the black-box and white-box AI principles. When interested in the practices of transparency in AI, check out our blog Introduction to the White-Box AI: the Concept of Interpretability for more details. As per the guidelines of the European Commission, this principle relates to “ensuring equal and just distribution of both benefits and costs, and ensuring that individuals and groups are free from unfair bias, discrimination and stigmatisation.” To make this principle more feasible, take the automated gender recognition of commercial solutions as an example. There is still a lot to change when talking about commercial automated: gender recognition: “AI systems and the environments in which they operate must be safe and secure” and defended from malicious use, as the European Commission defines. One of the interpretations proposed by Anna Jobin et al. implies “avoidance of specific risks or potential harms — for example, intentional misuse via cyberwarfare and malicious hacking — and suggest risk-management strategies.” Privacy is closely related to data governance, at least per the European Commission’s guidelines, relating to the principle of non-maleficence. AI Actors should ensure that data collected or provided as an output by the AI system would not harm their users. Moreover, the regulation relates to “responsibility and accountability for AI systems and their outcomes, both before and after their development, deployment and use.” Check out also this great list on AI ethics, curated by Eirini Malliaraki for further research. However, developing ethical AI practices is easier said than done. Shipping the biased-free AI-powered product could be tricky, and tech giants demonstrated this multiple times. So, how to transform nebulous theory into actionable steps at your company? First, AI ethics should not exist in a vacuum but be adjusted to the values and principles of your company. You will ensure a sustainable and scalable ethical AI program when developing ethical AI practices on top of your mission and values. Consider the following steps: AI ethics is a responsibility of the whole company but relates to the C-level first. Thus, executives can set the tempo of how employees would take these guidelines. Usually, the data governance department deals with compliance and privacy issues, so they could also deal with AI ethics challenges. Consider inviting other relevant experts and ethicists to empower your ethical AI team. An internal document including your company's ethical standards explained in clear terms comes in handy in case of emergencies. Thus, product owners, data collectors, and managers would know their scenarios when the crisis is knocking on your doors. Do not forget to define the precise KPIs and quality assurance practices, keeping your framework updated. Consider the specificity of your domain. For instance, it would be crucial to measure the quality of recommendations that should be free of biased associations related to some particular groups of society in retail. Someone has already faced this challenge and probably found out how to deal with it. Take the healthcare industry as an example — patients could not be treated until they express their consent. Thus, it should be the same when talking about data of users collected, used, and shared further. All privacy details should be communicated clearly to ensure that users understand the possible outcomes. Check out our blog HIPPA vs. GDPR: major acts regulating health data protection for more details. Ensure that all know about your code of conduct regarding ethical AI. For example, just a decade ago, companies paid little attention to cybersecurity. But now, every organization faces 22 security breaches every year, as per the State of Cybersecurity Report 2020 by Accenture. Now, this fact makes cybersecurity a point of primary concern in every organization. Developing ethical AI practices is going the same way. Thus, it is better to define crucial ethical infrastructure at the outset. Start building a culture that would nurture an appropriate attitude toward ethical AI. Making everyone related informed and motivated to respect the principles is an investment into the reputation of your AI-driven product. It is crucial to constantly monitor the changes in the AI world since it is not possible to oversee all the outcomes. But developing an ethical AI framework, bearing the best practices in mind, fostering the AI-bias-aware culture at your company, and monitoring the changes around ethical AI would make your future safer. Ethical AI is not an agreed-upon and conventionalized practice globally, but researchers suggest the following principles. When working with AI, every organization should consider “transparency, justice and fairness, non-maleficence, responsibility and privacy” when working with AI. In practice, we propose to use existing organizational infrastructure, take the best methods available, facilitate AI-bias-aware culture, and constant monitoring of changes in this domain.

WHAT’S NEXT FOR GENERATIVE ADVERSARIAL NETWORKS (GANS): LATEST TECHNIQUES AND APPLICATIONSWhat’s Next for Generative Adversarial Networks (GANs): Latest Techniques and Applications

Generative Adversarial Networks (GANs) are constantly improving year over the year. In October 2021, NVIDIA presented a new model, StyleGAN3, that outperforms StyleGAN2 with its hierarchical refinement. The new model resolves “sticking issues” of StyleGAN2 and learns to mimic camera motion. Moreover, StyleGAN3 promises to improve the models for video and animation generation. That’s impressive progress, compared to 2014 when GANs entered the picture with low-resolution images. We are also witnessing applications beyond simple images generation. They include but are not limited to: medical products, training data, scientific simulation development, improvements for augmented reality (AR) experience, and speech enhancement and generation. Let’s delve into the most impressive applications we’ve got so far! Lots of articles illustrate the GANs abilities when it comes to image generation and editing. You’ve probably read about functions like face aging, text-to-image translation, frontal face view or pose generation, and so on. As a starter, we recommend 18 Impressive Applications of Generative Adversarial Networks (GANs) and Best Resources for Getting Started With GANs by Jason Brownlee. In this article, we’d like to delve into the latest applications of GANs in real life to go deeper. Labeling medical datasets is expensive and time-consuming, but it seems like GANs have something to offer. Since GANs predominantly belong to the data augmentation techniques, we'd like to dwell on the latest updates in healthcare. Data augmentation is about GANs helping computer vision professionals struggling with class imbalance, leading to biased models while training datasets. Data augmentation, ensured by GANs, helps fight overfitting and the inability to generalize novel examples. So this is how GANs are increasing performance for underrepresented classes of chest X-ray classification, as per the research of Sundaram et al. in 2021. They proved that GANs-based data augmentation is more efficient than standard data augmentation. Meanwhile, researchers point out that GAN data augmentation was most effective when applied to small, significantly imbalanced datasets. Also, it has a limited impact on large datasets. Also, researchers from the University of New Hampshire, in the US, demonstrated that GANs-based data augmentation is beneficial for neuroimaging. Functional near-infrared spectroscopy (fNIRS) belongs to the neuroimaging techniques for mapping the functioning human cortex. By the way, fNIRS applies to brain-computer interfaces, so a large amount of new data for deep learning classification training is crucial. Conditional Generative Adversarial Networks (CGAN), combined with a CNN classifier, led to the 96.67% task classification accuracy, as per the research of Sajila D. Wickramaratne and Shaad Mahmud in 2021. Researchers from the University of California, Berkeley and Glidewell Dental Labs presented one of the first real applications for medical product development. With the help of generative models, dental crowns can be designed to reach the same morphology quality as dental experts do. It takes years of training to develop synthetic crowns for a professional in the dental industry. Thus, it paves the way for the mass customization of products in the healthcare industry. At the same time, GANs come as a good fit for super-resolution medical imaging like low dose Computer Tomography (CT), low field magnetic resonance imaging (MRI). GANs-based method, proposed in 2020, Medical Images SR using Generative Adversarial Networks (MedSRGAN) increase radiologists' efficiency. Thus, it helps to increase the quality of scans and avoid harmful effects this procedure may bring. Automatic speech recognition (ASR) is one of the areas of our expertise. Speech enhancement GANs (SEGAN) apply to the noisy inputs to refine them and make a qualitative output. This function is crucial for people with speech impairments, for example. Thus, GANs could enhance their quality of life. Recently, Huy Phan et al. proposed using “multiple generators that are chained to perform a multi-stage enhancement.” As researchers state, new models, ISEGAN and DSEGAN, are performing better than SEGAN. GANs also come in handy for augmented reality (AR) scenes with creative generation capabilities. For example, recent use cases include completing environmental maps with lightning, reflections, and shadows. Thus, ARShadowGAN, presented in 2020 by Daquan Liu et al., generates shadows of the virtual objects in single light scenes. This technology bridges the real-world environment and the virtual object’s shadow without 3D geometric details or any explicit estimation of illumination: When it comes to advertising, the phrase “time is money” means a lot. One of our use cases implied automated advertising images generation at scale. For example, it costs time and money for a designer to resize images for marketing campaigns from socials to Amazon platforms. However, Super-Resolution Using a Generative Adversarial Network (SRGAN) for single image super-resolution can deal with it. As a result, using SRGAN, it’s possible to resize qualitative images without any human interaction with design. “What could be better than data for a data scientist? More data!” This joke became a real application thanks to GANs. As any neural network-based model is hungry for training data, generative models that could create labeled training data on demand could become a game-changer. For instance, Zhenghao Fei et al. (University of California, Davis) demonstrated how semantically constrained GAN (CycleGAN plus task constrained network) can eliminate the labor-, cost-, and time-consuming process of data labeling. Thus, it ensures more data-efficient and generalizable fruit detection. Simply put, semantically constrained GAN can generate realistic day and night images of grapevine from 3D rendering images and retain grape position and size simultaneously. Labeled data generation could be beneficial in the NLP domain — supporting the research of low resource languages. For instance, Sangramsing Kayte used GANs for text-to-speech translation of low-resource languages of the Indian subcontinent. Recent research shows that ML models can leak sensitive information provided by the training samples. For example, the paper This Person (Probably) Exists. Identity Membership Attacks Against GAN Generated (2021) illustrates that many images of faces produced by GANs strongly resemble the real faces taken from the training data. Researchers propose differential privacy that could help networks learn the data distribution while securing the training data’s privacy. GANs demonstrated impressive progress, compared to 2014, when introduced first by Ian Goodfellow. Despite still being in its infancy, we are already witnessing how GANs improve the design of medical products, automate image editing for advertising, and merge with AR technology. At the same time, the privacy of data generated remains topical, and differential privacy is one of the techniques to consider.

Whitepapers

SCIFORCE IS NOW THE EUROPEAN HEALTH DATA AND EVIDENCE NETWORK (EHDEN) CERTIFIED PARTNERSciForce is now the European Health Data and Evidence Network (EHDEN) certified partner

We are pleased to announce that SciForce is now the first and only enterprise in Ukraine that became the certified partner of the European Health Data and Evidence Network (EHDEN). Thus, we contribute to the wider Observational Health Data Sciences and Informatics (OHDSI) initiative to take advantage of large-scale health data analytics. This step brings us into the community of 47 European SMEs delivering their expertise in health data, data standardization, and interoperability. Read on for details. EHDEN, with support from the EU’s Horizon 2020 program and the European Federation of Pharmaceutical Industries and Associations (EFPIA), aims to take advantage of so-called Big Health Data. In the case of the EHDEN, this data may include any information related to patients in Europe, facilitating research, supporting the conduct of healthcare researches, or managing any payment or HR data. As the official website states, EHDEN was launched “to support patients, clinicians, payers, regulators, governments, and the industry in understanding wellbeing, disease, treatments, outcomes, and new therapeutics and devices.” Starting from August 2020, EHDEN certifies the European SMEs on standardizing health data to the Observational Medical Outcomes Partnership (OMOP) common data model (CDM) and setting needed technical ecosystem. At the moment, EHDEN built a community of 47 enterprises taking their part in this initiative. Before joining the community of certified SMEs to cooperate with the European data partners, our professionals took pieces of training within the EHDEN Academy, grasping the technical ecosystem and work standards. See the scheme of the training process, aimed to become the certified Data Partner: Thus, within the EHDEN, we now can provide services related to the ecosystem OHDSI Software and Tools, Technical infrastructure services, OHDSI Training, OMOP CDM ETL, OMOP Standardized Vocabularies. Speaking about healthcare expertise, we are not limiting our capabilities to observational research, data mapping, and ETL only. Our healthcare professionals also focus on prediction, digitizing, telehealth services, and impaired speech recognition. Finally, we are excited to share our expertise with our European partners, moving medical science forward.

astronaut
Our contacts
+380(50)54-41-150