Grouping a variable that has a calculated field

Richard Hom

Join Date: Aug 2017

Posts: 3
#1

Grouping a variable that has a calculated field

15 Jan 2019, 22:07

Dear list,
I'm a beginner on Stata.

My situation is as follows:
I have 23,000+ IDs (individuals) who each have performed one or more services which creates an invoice. Each record is a line item on an invoice with the interval variable "allowedcharege" A date variable and a transaction number is assigned to each line item to prevent duplication of records.

I want to sort the IDs from the highest calculated sum of the invoices to the lowest.

Thanks
Richard
Tags: calcuated field, grouping variable
Clyde Schechter

Join Date: Apr 2014

Posts: 30177
#2

15 Jan 2019, 22:21

So you want something like this:

Code:

by ID, sort: egen invoice_total = total(invoice) gsort -invoice_total ID

In the future, when asking for help with code, post an example of your data set using the -dataex- command. It is infinitely more helpful than even the most meticulous attempt at describing your data in words. If you are running version 15.1 or a fully updated version 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

Added: "calculated field" is spreadsheet or database terminology. A Stata data set is neither of those things, and there is nothing analogous to a calculated field in Stata. The nearest equivalent is to actually calculate a new variable, as in the code shown above. But I raise the point because many people come to Stata with extensive experience using spreadsheets. The instincts honed in that experience are generally not helpful when approaching problems in Stata and often lead you astray. The approach to data management in Stata is drastically different from the approach needed for spreadsheets. So it is important to keep Stata data sets and spreadsheets as separate in your mind as possible. One thing that is helpful is to use distinct terminology when referring to them. So, for example, when talking about Stata, do not refer to rows and columns: refer to observations and variables.

Last edited by Clyde Schechter; 15 Jan 2019, 22:25.
Comment

Joseph Coveney

Join Date: Apr 2014
Posts: 4453

15 Jan 2019, 22:29

It would be helpful to see what you're looking at, but maybe something like

Code:

generate long row_nr = _n // (i) keeps the within-ID order the same as originally
tempfile invoices
quietly save `invoices'
bysort ID: generate double fee_sum = sum(fee)
by ID: keep if _n == _N
keep ID sum
merge 1:m ID using `invoices', assert(match) nogenerate noreport
gsort -sum +ID +row_nr  // (ii) if two more IDs have the same fee sum . . .
drop sum row_nr

Announcement

Grouping a variable that has a calculated field

Comment

Comment